Machine learning algorithms, deep learning approach, multi-OMICS data, biostatistical analysis, epidemiology, secure computing environment, population-based research on T2D and related risk factors
Application of machine learning models for KORA Cohort: On prediction and progression of Type 2 Diabetes and related traits.
The Research Unit of Molecular Epidemiology (AME) has the opportunity to conduct cross sectional, longitudinal, and case/control studies for diseases or intermediate phenotypes, using genomics, epigenomics, transcriptomics, proteomics, metabolomics, and functional analyses with access to large population-based cohorts like KORA (n~18.000) and in near future the German national cohort (NAKO; n~200.000). The challenge for finding useful predictive biomarkers is that simple clinical factors such as BMI, ethnicity and family history, and the easily measured biomarkers glucose and HbA1c are already good at predicting risk of developing Type 2 Diabetes. However our goal is to extend these traits by including more clinical and molecular phenotypes such as fasting glucose, HDL-cholesterol, serum levels of triacylglycerol, estimates of β-cell function and blood pressure, diet, living environment, physical activity and lifestyle. Ultimately we will use high-dimensional patterns derived from multi-OMICS data for the identification of new subgroups of patients. The project requires application of machine learning approaches for the analysis of cross-sectional and longitudinal data to identify T2D progression in the KORA Cohort. Developed pipelines/algorithms will be further applicable in a deep learning approach to the German National cohort NAKO. The big data components emerge due to large sample size with up to 200.000 persons but also due to the large number of genetic (e.g. several millions of genetic variants) and molecular markers (e.g. several hundreds of thousands of epigenetic, proteomics, and metabolomics traits) and their respective interactions. A special emphasis will be laid on an implementation of algorithms in a secure computing environment that respects data protection requirements. Findings from this study will advance our understanding of more precise phenotypic characterization based on OMICS biomarkers of disease state.