The compositional data are proportionals of mutually exclusive groups that would be summed up to the unity. Statistical models for compositional data have been applicable in a number of areas, e.g. the product or channel mix in the marketing research and asset allocations of a investment portfolio.
In the example below, I will show how to model compositional outcomes with a simple LogRatio regression. The underlying idea is very simple. With the D-dimension outcome [p_1, p_2…p_D], we can derive a [D-1]-dimension outcome [log(p_2 / p_1)…log(p_D / p_1)] and then estimate a multivariate regression based on the new outcome.
df = get("ArcticLake", envir = asNamespace('DirichletReg')) # sand silt clay depth #1 0.775 0.195 0.030 10.4 #2 0.719 0.249 0.032 11.7 #3 0.507 0.361 0.132 12.8 lm(cbind(log(silt / sand), log(clay / sand)) ~ depth, data = df) #Response log(silt/sand): #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -0.649656 0.236733 -2.744 0.0093 ** #depth 0.037522 0.004269 8.790 1.36e-10 *** # #Response log(clay/sand) : #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -2.614897 0.421383 -6.206 3.31e-07 *** #depth 0.062181 0.007598 8.184 8.00e-10 ***
Since log(x / y) = log(x) – log(y), we can also estimate the model with log(sand) as an offset term.
lm(cbind(log(silt), log(clay)) ~ depth + offset(log(sand)), data = df) #Response log(silt) : #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -0.649656 0.236733 -2.744 0.0093 ** #depth 0.037522 0.004269 8.790 1.36e-10 *** # #Response log(clay) : #Coefficients: # Estimate Std. Error t value Pr(>|t|) #(Intercept) -2.614897 0.421383 -6.206 3.31e-07 *** #depth 0.062181 0.007598 8.184 8.00e-10 ***
Alternatively, we can also use the comp.reg function in the Compositional package.
Compositional::comp.reg(as.matrix(df[, 1:3]), df[, 4]) #$be # [,1] [,2] #(Intercept) -0.64965598 -2.61489731 #x 0.03752186 0.06218069 # #$seb # [,1] [,2] #(Intercept) 0.236733203 0.421382652 #x 0.004268588 0.007598043var vglnk = { key: '949efb41171ac6ec1bf7f206d57e90b8' }; (function(d, t) { var s = d.createElement(t); s.type = 'text/javascript'; s.async = true; s.src = '//cdn.viglink.com/api/vglnk.js'; var r = d.getElementsByTagName(t)[0]; r.parentNode.insertBefore(s, r); }(document, 'script'));
R-bloggers.com offers daily e-mail updates about R news and tutorials on topics such as: Data science, Big Data, R jobs, visualization (ggplot2, Boxplots, maps, animation), programming (RStudio, Sweave, LaTeX, SQL, Eclipse, git, hadoop, Web Scraping) statistics (regression, PCA, time series, trading) and more...