Marcelo R.P Ferreira, Ricardo B.C. Prudencio, Wolfgang Wagner and Ivan G. Costa
The human aging is a complex process and the prediction of biological age from biomarkers has important practical applications in many fields such as, forensics, disease treatment and geriatrics. It has been previously observed that changes in DNA methylation were correlated to biological aging and cancer. Linear regression models have been widely used for age prediction from DNA methylation data. Moreover, it is known that the accuracy of age prediction from DNA methylation data are not the same across distinct tissues. Most previous works use curated data to derive DNAm signatures from single tissues. Alternatively multi-tissue predictors were applied for the analysis of large compendia over distinct tissues without taking tissue source information into account. In this work, we evaluate four strategies for identification of single tissue or multi-tissue DNA methylation signatures based on penalized regression models. Moreover, we introduce tissue-aware modelling by either including dummy variables representing the tissue types or through a modified Sparse Group Lasso approach which is able to blend tissue specific signatures and non-tissue specific signatures. We evaluate the strategies in a benchmarking data comprising of data sets from 12 different tissue types. For most of the cases, the tissue-aware modelling outperformed the strategies that do not use the tissue information. This was particularly the case in tissues with low number of samples. Our experimental results indicate that take into account the tissue source information leads to an improvement on age prediction accuracy.
Datasets and Code
You can find here pre-processed data and scripts here. Original data is available on request.
to come …