export const faq = `These genetic score models can be used to predict levels of biomolecular traits in genotyped cohorts. The predicted levels can be associated with complex phenotypes, which offers as a useful tool to investigate the molecular underpinnings of these phenotypes. The predicted levels can also allow integrative analyses with other available biomolecular traits in the cohort.


You can find a download link (named 'Download model files') on the Score page of each platform. Following the link, you will be directed to a cloud drive page where we hosted model files of all the considerred traits for the platform. There is a "Download" bottom at right-top corner of the page, which will allow you to download all the model files in bulk. Or you can choose the model file of a trait you are intersted in and download the particular model file in a similar way. Please note that you can only download model files of gene expression traits in bulk as we have compressed them to a single file.

The machine learning method Bayesian Ridge(BR), that based on individual - level genotype data, was used to construct genetic scores of biomolecular traits in the Atlas.The selection of BR is based on the results in one of our previous studies that benchmarked the performance of a variety of representative genetic scoring methods for the construction of numerous continuous molecular traits, and demonstrated BR was the top performing method in terms of both efficacy and efficiency.

To ensure the generalizability of genetic score models when applied to other cohorts, a variant filtering step was first performed for all the traits considered, which applied a MAF threshold of 0.5% and excluded all multi-allelic variants as well as ambiguous variants (i.e. A/T, G/C). A follow-up LD thinning step was carried out at an r2 threshold of 0.8 on all the variants, which aims to remove a certrain level of LD dependencies among variants and reduce the computational burden of genetic scoring method. The remaining variants were then filtered at the genome-wide significance threshold of 5e-8 (based on their GWAS summary statistics conducted on the INTERVAL training samples) for each trait.

The INTERVAL training samples of a trait were randomly and equally partitioned to five portions, from which any four portions are used to learn a genetic score model of the trait with Bayesian ridge regression, and the model’s performance was then tested on the remaining 20% of INTERVAL training samples, i.e. calculating the r2 score and Spearman correlation coefficient between the predicted genetic scores and the actual levels of the trait for these samples.

The genetic score model trained with INTERVAL training samples for a trait was used to calculate genetic scores of the validation samples (external cohorts or withheld INTERVAL samples). Then R2 score and Spearman correlation coefficient were calculated using the predicted scores of these samples against their acutal trait levels.`