Linear Mixed Models in Statistical Genetics



One of the goals of statistical genetics is to elucidate the genetic architecture of phenotypes (i.e., observable individual characteristics) that are affected by many genetic variants (e.g., single-nucleotide polymorphisms; SNPs). A particular aim is to identify specific SNPs that are robustly associated with a given phenotype using a so-called genome-wide association study (GWAS).

Although GWAS sample sizes have increased in recent years, the number of SNPs still tends to vastly exceed sample sizes. Hence, multiple regression cannot be used to infer the association between SNPs and a phenotype jointly. Instead, the linear mixed model (LMM) has become a popular tool in statistical genetics. By placing a reasonable prior on SNP effects, LMMs can be used to jointly estimate SNP effects and to infer their contribution to phenotypic variance.

In this dissertation, I investigate several aspects of LMMs and related methods, such as ridge regression and LD-score regression. In addition, an LMM is used to develop an online tool, called MetaGAP, which quantifies the statistical power of a GWAS in case of heterogeneity in underlying subsamples. Using MetaGAP, I show that ongoing GWAS efforts are well-powered even for considerably heterogeneous phenotypes. This prediction is bolstered by a GWAS of reproductive choices, reported here, that finds twelve robustly associated SNPs.

I conclude that current GWAS sample sizes enable researchers to uncover parts of the genetic architecture of complex social-scientific outcomes and posit that GWAS efforts will soon attain sufficient predictive accuracy for useful applications throughout the social sciences.