In the past 10 years, over 21,000 genetic variants have been linked to complex human traits through genome-wide association studies (GWAS). However, the predictive power of many of these variants remains limited, and it is still unclear how best to use the wealth of information generated by GWAS to impact personal health and clinical practice. For nearly 10 years, 23andMe has been not only a driving force in direct-to-consumer genetic testing but also has established an innovative crowd-sourced genetics research platform. This platform has yielded a compelling data resource and many genetic discoveries. In this proposal, we will address the next phase of 23andMe human genetics research: the development of highly scalable and accurate disease risk estimation. Two of the key challenges in human genetics research are (1) to determine how to use results of GWAS to paint an accurate picture of an individual's disease risk, and (2) to determine how these estimates can provide information of personal and clinical utility. These challenges are difficult due to many factors including the wide spectrum of disease classes, the paucity of genetic and phenotypic data and significant methodological and computational challenges. In this proposal, we present a plan to utilize the genetic and phenotypic data stores at 23andMe to ?develop validated risk estimation algorithms?. In Phase I, we will build a computational pipeline that will be used to develop predictive algorithms for estimating disease risk (Aim #1) and use this pipeline to evaluate predictive ability of different estimation approaches in a broad class of human complex traits (Aim #2). In Phase II, we will validate these algorithms in external cohorts and build customer-facing reports that we will test for user comprehension. We believe that the development of accurate risk estimation capability will have a major impact on both consumer genetics and clinical genetics markets.

Public Health Relevance

The promise of genetics-based estimation of disease risk has yet to be realized. In this project, 23andMe will use its database of genetic and phenotypic information from over 1,000,000 research participants who have contributed more than 285,000,000 phenotypic data points on a wide spectrum of disease to build risk estimation algorithms. This project will enable 23andMe to produce the first validated risk estimation algorithms that provide both personal and clinical utility.

National Institute of Health (NIH)
National Human Genome Research Institute (NHGRI)
Small Business Innovation Research Grants (SBIR) - Phase I (R43)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-IMST-K (14)B)
Program Officer
Wiley, Kenneth L
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
23andme, Inc.
Domestic for-Profits
Mountain View
United States
Zip Code