About Omics Data

The GS cohort is a rich, world-leading multi-omics resource. We have genome-wide genotyping data for 83%, DNA methylation for 79% of participants, and proteomic data for 82% of participants.

DNA from over 20,000 participants has been analysed by high density genome-wide chip genotyping, Illumina OmniExpress SNP GWAS (700k) and exome chip (250K), with low failure and high call rates. Genetic profiles have been imputed using three different reference panels: 1000 Genomes, Haplotype Reference Consortium and Trans-Omics for Precision Medicine. Quality control analyses were performed, data cleaned using quality scores and proportions typed. Population stratification was assessed by analysis of principal components and imputing all data to the 1000 Genome data set. 

Pedigrees were constructed using relationship information provided by study participants and validated with genetic kinship information following genotyping. The cohort contains 1,361 singletons (with no relatives in the study) and 5,501 families of at least two people, with a mean size of 4.1 family members. Sample identity was verified against recorded gender and pedigree and data checked for unknown relationships based on estimated identity-by-descent.

DNA methylation (DNAm) data have been generated using the Illumina HumanMethylationEPICv1 BeadChip array at >850,000 CpG sites, from blood samples.

Protein levels have been quantified in plasma samples from 1,065 participants using the SOMAscan V.4 array from SomaLogic. Tandem mass spectrometry has been performed on peripheral blood mononuclear cells from 860 participants. Liquid chromatography mass spectrometry (LC-MS) proteomics data, with measures for 325 proteins, is available for 18,826 participants.

Quantification of 54 urinary metabolite biomarkers in 2,743 GS participants has been conducted by Nightingale Health using nuclear magnetic resonance.

GS has contributed to multiple genome-wide association study (GWAS) meta-analyses. Summary statistics and polygenic scores with GS data excluded ("Leave-One-Out") have been calculated for select phenotypes (major depression and schizophrenia from the Psychiatric Genomics Consortium). 

Omics dataN (%)Analysis tool
Genotype data20,026 (83%)

Illumina HumanOmniExpressExome8V.1-2_A 

HumanOmniExpressExome-8V.1_A  

Beadstudio-Gencall V.3

Methylation19,062 (79%) Illumina HumanMethylationEPIC BeadChip array
Proteomics

1,065 (4%)

18,826 (78%)

SomaScan SOMAscan V.4 array 

Liquid chromatography mass spectrometry

Metabolomics2,743 (11%)Nuclear magnetic resonance