Alessandra Biancolillo defended her PhD thesis at the University of Copenhagen on November 18th. Alessandra has been associated with Nofima’s strategic program “Multiblock Methods for prediction and interpretation”, developing methodology for analyzing and interpreting several blocks of data sequentially (so-called SO-PLS). The thesis is titled “Method development in the area of multi-block analysis focused on food analysis”. Supervisors have been Tormod Næs (Nofima), Ingrid Måge (Nofima) and Rasmus Bro (University of Copenhagen).
Alessandra did a very good job at the defense. Opponents Thomas Skov (University of Copenhagen), El Mostafa Qannari (Oniris, Nantes) and Lars Nørgaard (Roskilde University) were well prepared and contributed to an interesting and fruitful discussion.
The HotPLS toolbox for MATLAB is now available. It contains a set of functions for performing classification in a fixed hierarchy. For more information see:
Kristian Hovde Liland has authored a paper on classification in a fixed hierarchy using PLS-based methods together with Achim Kohler and Volha Shapaval. The paper is titled: “Hot PLS—a framework for hierarchically ordered taxonomic classification by partial least squares” and was recently published in the journal Chemometrics and Intelligent Laboratory Systems.
- Classification in a fixed hierarchy
- Utilization of replicate measurements for improved robustness through majority voting
- Automatic model building and complexity estimation from taxonomic information
- Detection of outliers and samples representing new classes absent in calibration
A novel framework for classification by partial least squares in a fixed hierarchy is presented. The hierarchical approach ensures flexible local modelling with varying complexity. It results in an intuitive classification path from the highest taxonomic levels down to species and beyond. Results are presented as phylogenetic trees with local diagnostic information to gain maximum information about the classification and help the researcher to focus on interesting phenomena.
Information on sample replicates is included in the classification to increase performance and avoid misclassifications due to low quality measurements. Detection of samples coming from previously unobserved classes is enabled by estimating cut-off distances from the calibration data classes. To further increase flexibility and improve customization the canonical powered partial least squares algorithm is used for modelling and classification together with linear discriminant analysis. This opens up for additional sample response information and forced sharpening of focus on important variables. The presented framework is not limited to biological taxonomy, but was first developed for this purpose.
- Partial least squares
- Fixed hierarchy
- Local modelling
- Replicate measurements
Kristian Hovde Liland, Achim Kohler, Volha Shapaval, Hot PLS—a framework for hierarchically ordered taxonomic classification by partial least squares. Chemometrics and Intelligent Laboratory Systems, Volume 138, 15 November 2014, Pages 41–47.
Tormod Næs has been coauthoring a review on handling data structures from multiple instruments/platforms in food science. The paper is entitle “Chemometrics in foodomics: Handling data structures from multiple analytical platforms” and was recently published in the journal Trends in Analytical Chemistry
- Multiple analytical platforms provide complementary and synergistic information.
- Simple correlation studies ensure sound foodomics data handling and interpretation.
- orrelation studies can provide foodomics researchers with new ways of looking into data.
- Multi-block methods provide additional tools in foodomics.
Foodomics studies are normally concerned with multifactorial problems and it makes good sense to explore and to measure the same samples on complementary, synergistic analytical platforms that comprise multifactorial sensors and separation methods. However, the challenge of exploring, extracting and describing the data increases exponentially. Moreover, the risk of becoming flooded with non-informative data increases concomitantly.
Acquisition of data from different analytical platforms provides opportunities for checking the validity of the data, comparing analytical platforms and ensuring proper data (pre)processing – all in the context of correlation studies. We provide practical and pragmatic tools to validate and to deal advantageously with data from more than one analytical platform. We emphasize the need for complementary correlation studies within and between blocks of data to ensure proper data handling, interpretation and dissemination. Correlation studies are a preliminary step prior to multivariate data analysis or as an introduction to more advanced multi-block methods.
- Correlation studies;
- Data processing;
- Data validity;
- Multi-block chemometrics;
- Multivariate data analysis;
- Pearson correlation
Skov, T.; Honoré, A. H.; Jensen, H. M.; Næs, T.; Engelsen, S. B., Chemometrics in foodomics: Handling data structures from multiple analytical platforms. TrAC Trends in Analytical Chemistry 2014, 60, (0), 71-79
PO-PLS and SO-PLS are a collection of methods for multiblock regression (data fusion) developed at Nofima. A toolbox for fitting, validating and plotting such models is now freely availablefor download.
For more information, go to Software & downloads.
The Open EMSC toolbox is now available. For more information see:
Kristian Hovde Liland has co-authored a paper on variable selection in aligned 16S rRNA sequences together with Hilde Vinje, Trygve Almøy and Lars Snipen. The paper is titled: “A systematic search for discriminating sites in the 16S ribosomal RNA gene” and was recently published in the journal Microbial Informatics and Experimentation. This was the last paper to ever be accepted in the journal. Whether this was due to the journal editors feeling that the peak of their careers had been met with this paper and therefore decided to close the journal has not been confirmed.
The 16S rRNA is by far the most common genomic marker used for prokaryotic classification, and has been used extensively in metagenomic studies over recent years. Along the 16S gene there are regions with more or less variation across the kingdom of bacteria. Nine variable regions have been identified, flanked by more conserved parts of the sequence. It has been stated that the discriminatory power of the 16S marker lies in these variable regions. In the present study we wanted to examine this more closely, and used a supervised learning method to search systematically for sites that contribute to correct classification at either the phylum or genus level.
- Partial least squares
- Selectivity ratio
- Sequence alignment
Hilde Vinje, Trygve Almøy, Kristian Hovde Liland, Lars Snipen,
A systematic search for discriminating sites in the 16S ribosomal RNA gene
Microbial Informatics and Experimentation 4(2) (2014).