Many people are interested in their ancestry. Traditionally, historical records are the main sources of information for knowing one's ancestry. In the age of Genomics, analyzing one's genome is becoming the most popular way of ancestry test. Companies now offer such tests to millions of customers. The influx of DNA ancestry tests now make people know not only something about themselves, but also more about their ancestors. Interesting questions have been raised about how much one can learn about his or her recent ancestors from one's own DNA. Recently, the investigator has been working with a population geneticist on developing computational methods for inferring recent ancestors from an individual's genome. Imagine that a little girl, Alice, has both European and Native American heritage and wants to know something about her recent ancestors. Existing commercial ancestry tests provide estimates of genetic composition on Alice's genome (e.g., percentage of Alice's DNAs or long genomic segments that can be traced to Native American origin). The problem considered by the investigator and his collaborators, on the other hand, concerns the inference of the genetic composition of recent ancestors from Alice's genome. That is, the investigator aims to answer questions such as "I only have my own genome but I want to know about my recent ancestors. Are my parents 50%-50% European and Native American? Or one is unadmixed European and the other is unadmixed Native American? How about my grandparents?" Such questions have not been rigorously addressed in the literature before, even though these questions may be of interests to both geneticists and consumers of genetic tests.
To address these questions, the investigator will work on new computational methods for DNA-based ancestry inference. The investigator plans to build on his recent research on this subject, the PedMix approach, which can infer the ancestry of recent ancestors (e.g., parents and grandparents) from a focal individual's genome. At present, PedMix is the only publically available method for inferring recent ancestors from a single individual's genome. In this project, the investigator plans to conduct research on the general subject of ancestry inference. The first objective is to improve the performance of PedMix to obtain more accurate inference results. This can make PedMix more applicable and practical to real genetic tests. The second objective is developing ancestry inference methods which can learn ancestry information for more distant ancestors. At present, due to computational difficulty, PedMix can only work for great grandparental inference at most. Finally, this project also aims to study new ancestry inference formulations which haven't been rigorously studied before. The key technical aspect of this project is computational efficiency. Successful completion of the research in this project will produce new efficient and accurate algorithms that are implemented in practical software tools and enable new ancestry inference from large scale genomics data. Developed software tools will be made available freely to the multidisciplinary research community, and are expected to enable novel biological applications in DNA-based ancestry inference.
This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.