clr.transform {GCDkit} | R Documentation |
Implementation of centred-log-ratio (clr) transformation for compositional data.
clr.trans(comp.data=NULL,GUI=FALSE) pr.comp.clr(comp.data=NULL,use.cov=FALSE,scale=TRUE,GUI=FALSE) lda.clr(comp.data=NULL,grouping=groups,GUI=FALSE)
comp.data |
a numerical matrix; the data to be normalized.
Or just names of variables in the data matrix ' |
use.cov |
logical; should be the covariance matrix used instead of correlation matrix? |
scale |
logical; the scalings applied to each variable. |
GUI |
logical; is the function called from a menu (GUI)? |
grouping |
character or factor; grouping information for each of the samples. |
Compositional data - i.e., multivariate data in which all the components sum up to some constant (e.g. 1 or 100, for percentages) - are widespread in the geosciences. A typical example represent major-element analyses from whole-rock samples.
Numerous workers have argued that much of correlation in such closed datasets is spurious, due to the so-called constant sum or closure effect (e.g., Chayes 1960; Rock 1988; Rollinson 1992, 1993).
This effect arises from the fact that such components in the compositional datasets cannot vary independently. If one oxide, for instance SiO2 that dominates the whole-rock analyses of many igneous rocks, increases in abundance, all other oxides must decrease. Therefore, everything must be anti-correlated with silica.
For their correct statistical treatment, compositional data have to be transformed, or 'opened'. A classic remedy to the closure effect are log-ratio transformations (Aitchison 1986; Buccianti et al. eds 2006).
The functions 'clr.trans
', 'pr.comp.clr
' and 'lda.clr
' implement
the so-called centred-log-ratio (clr) transformation. Data opening in this case is done
by dividing each value of a variable by the geometric mean of all the variables for that
sample and then taking logarithms. It is critical of course that all the variables are
expressed in the same measurement unit.
For instance, for MgO, the centred-log-ratio transformed version is given as:
MgO_clr = ln(MgO)/geom.mean
where 'ln
' is natural logarithm, 'C
' concentration in wt. % of the selected variable
(oxide) and the denominator a geometric mean of all variables being transformed
(e.g., Pawlowsky-Glahn & Egozcue 2006)).
The function 'pr.comp.clr
' performs principal components analysis and plots a biplot
(Gabriel, 1971; Buccianti & Peccerillo, 1999). The function 'lda.clr
' serves
for linear discriminant analysis.
For clr.trans
, a numeric matrix 'results
'. The names of components are preserved, and supplemented
by a suffix '_clr
'.
disclosure.r
Vojtěch Janoušek, vojtech.janousek@geology.cz
Vladimír Kusbach, kusbach@gmail.com
Aitchison J (1986) The Statistical Analysis of Compositional Data. Methuen, New York, pp 1-416
Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) (2006) Compositional Data Analysis in the Geosciences. Geological Society London Special Publications 264: pp 1-212
Chayes F (1960) On correlation between variables of constant sum. J Geophys Res 65: 4185-4193 doi: 10.1029/JZ065i012p04185
Gabriel KR (1971) The biplot graphical display of matrices with application to principal component analysis. Biometrika 58: 453-467 doi: 10.1093/biomet/58.3.453
Greenacre, M. J. (2010). Biplots in Practice. Bilbăo: Fundación BBVA.
Pawlowsky-Glahn V, Egozcue JJ (2006) Compositional data and their analysis: an introduction. In: Buccianti A, Mateu-Figueras G, Pawlowsky-Glahn V (eds) Compositional Data Analysis in the Geosciences. Geological Society London Special Publications 264: pp 1-10 doi: 10.1144/GSL.SP.2006.264.01.01
Reimann C, Filzmoser P, Garrett R, Dutter R (2008) Statistical Data Analysis Explained: Applied Environmental Statistics with R. John Wiley & Sons, Chichester, pp 1-362
Rock NMS (1988) Numerical geology. A Source Guide, Glossary and Selective Bibliography to Geological Uses of Computers and Statistics. Lecture Notes in Earth Sciences 18, Springer, Berlin, pp 1-427 doi: 10.1007/BFb0045143
Rollinson HR (1992) Another look at the constant sum problem in geochemistry. Mineral Mag 56: 469-475 doi: 10.1180/minmag.1992.056.385.03
Rollinson HR (1993) Using Geochemical Data: Evaluation, Presentation, Interpretation. Longman, London, pp 1-352 doi: 10.4324/9781315845548
van den Boogaart KG, Tolosana-Delgado R (2008) "compositions": a unified R package to analyze compositional data. Comput Geosci 34: 320-338 doi: 10.1016/j.cageo.2006.11.017
van den Boogaart KG, Tolosana-Delgado R (2013) Analyzing Compositional Data with R. Springer, Berlin, pp 1-258
Venables WN, Ripley BD (1999) Modern Applied Statistics with S-Plus. Springer, Berlin. doi: 10.1007/978-1-4757-3121-7
See Reimann et al. (2008) with van den Boogaart and Tolosana-Delgado (2013) for further details and van den Boogaart and Tolosana-Delgado (2008) for implementation of a comprehensive R library dealing with compositional data.
sampleDataset("sazava") # Centered-log-ratio transformation ox<-c("SiO2","Al2O3","FeOt","MgO","CaO") clr.trans(ox) addResults() # Needed to append the clr-transformed data to the matrix 'WR' multiple(x="SiO2_clr", y="Al2O3_clr,FeOt_clr,MgO_clr,CaO_clr") plateCex(2) plateCexLab(1.3) # Principal components on basis of clr-transformed data pr.comp.clr() pr.comp.clr("SiO2,TiO2,Al2O3,MgO,CaO")