Pattern recognition methods to relate time profiles of gene expression with phenotypic data: a comparative study

DM Hendrickx, DGJ Jennen, JJ Briede, R Cavill… - …, 2015 - academic.oup.com
DM Hendrickx, DGJ Jennen, JJ Briede, R Cavill, TM de Kok, JCS Kleinjans
Bioinformatics, 2015academic.oup.com
Motivation: Comparing time courses of gene expression with time courses of phenotypic
data may provide new insights in cellular mechanisms. In this study, we compared the
performance of five pattern recognition methods with respect to their ability to relate genes
and phenotypic data: one classical method (k-means) and four methods especially
developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model
mixtures, Dynamic Time Warping for-Omics and linear modeling with R/Bioconductor limma …
Abstract
Motivation: Comparing time courses of gene expression with time courses of phenotypic data may provide new insights in cellular mechanisms. In this study, we compared the performance of five pattern recognition methods with respect to their ability to relate genes and phenotypic data: one classical method (k-means) and four methods especially developed for time series [Short Time-series Expression Miner (STEM), Linear Mixed Model mixtures, Dynamic Time Warping for -Omics and linear modeling with R/Bioconductor limma package]. The methods were evaluated using data available from toxicological studies that had the aim to relate gene expression with phenotypic endpoints (i.e. to develop biomarkers for adverse outcomes). Additionally, technical aspects (influence of noise, number of time points and number of replicates) were evaluated on simulated data.
Results: None of the methods outperforms the others in terms of biology. Linear modeling with limma is mostly influenced by noise. STEM is mostly influenced by the number of biological replicates in the dataset, whereas k-means and linear modeling with limma are mostly influenced by the number of time points. In most cases, the results of the methods complement each other. We therefore provide recommendations to integrate the five methods.
Availability: The Matlab code for the simulations performed in this research is available in the Supplementary Data (Word file). The microarray data analysed in this paper are available at ArrayExpress (E-TOXM-22 and E-TOXM-23) and Gene Expression Omnibus (GSE39291). The phenotypic data are available in the Supplementary Data (Excel file). Links to the pattern recognition tools compared in this paper are provided in the main text.
Contact:  d.hendrickx@maastrichtuniversity.nl
Supplementary information:  Supplementary data are available at Bioinformatics online.
Oxford University Press