Chapter 10: Relevant metabolites selection strategies
By Jos Hageman
Abstract
Statistical modelling is an inherent part of any metabolomics study. In the basis, statistical models assess the association between metabolites and the trait(s) of interest. Complicating factors are that not all metabolites are connected to the trait of interest and modelling is hampered by the large numbers of metabolites especially in relation to the number of samples. To remedy this situation several variable selection strategies that operate at different levels are discussed. Low level variable selection is focused on removing non-informative or redundant metabolites. Medium level variable selection involves methods that explicitly select a subset of most predictive metabolites. Lastly, high level variable selection entails statistical techniques that select metabolites as part of their inner workings, or their importance is indicated using an auxiliary criterion. By selecting metabolites at these different levels, the complexity of the problem is reduced. This helps statistical modelling and the subsequent interpretation of the results. It helps researchers to focus on the most important metabolites that have a clear association with the trait under investigation.
Jos Hageman

Biometris, Applied Statistics, Wageningen University & Research, P.O. Box 16, 6700 AA, Wageningen, the Netherlands
Jos Hageman (Haarlem, 1974) obtained an M.Sc. (with merit) in medicinal chemistry, specializing in organic synthesis and cheminformatics from the Vrije Universiteit Amsterdam. He obtained his Ph.D. in chemometrics in 2004 at the Radboud University in Nijmegen, with a focus on global optimization. From 2007, he is appointed as an Assistant Professor of Statistics at the Mathematical & Statistical Methods group of Wageningen University & Research. In 2015 he received a UTQ registration and in 2017 he became a registered biostatistician.
His primary research interests are within the field of biostatistics and chemometrics. His research is centred on the development and validation of statistical models for the prediction of complex quantitative traits. Common application domains are (i) Quantitative Structure Activity Relationship (QSAR) models for prediction of e.g., microbial activity of molecules and bitterness of peptides and (ii) the prediction of sensory and physicochemical traits from foodstuffs from e.g., metabolic or transcriptome profiles. Other research interests include development of methods for fusion of data sets measured on multiple instruments and developing methods for closing the gap between univariate and multivariate statistical methods. He teaches several intermediate and advanced level statistics courses as well as specialized PhD courses on chemometrics/multivariate analysis.