This project will build a geometric, mathematical framework for analyzing tree-shaped data, which occurs in disease, cancer, and medical imaging. The evolutionary histories of viruses, relationships of tumor mutations over time, and images of lung airways and arteries are all tree-shaped. The geometry of trees must be considered during analysis to avoid errors and bias, but traditional statistical methods are of limited use. Furthermore, technological improvements such as genetic sequencing are producing increasingly large and complex datasets of such trees. This project will develop new mathematical tools to understand and derive insights from tree-shaped data. An educational component will create training programs and extra-curricular opportunities for undergraduates to participate in research and learn valuable statistical and technical skills. Many of the students are expected to be from low-income and underrepresented groups, thus broadening the participation of these groups in STEM industry and research.

This research will develop methods for analyzing tree data that use both the tree shape and the length of its edges in a mathematically integrated way. The majority of existing tree analysis methods only focus on the tree shape, but the proposed methods will be based on the underlying non-Euclidean geometric space in which the data lie. This research will contribute to the growing field of geometric statistics with a mathematical formulation and method of computation for bias in tree-shaped data, and two-sample tests validated on biologically realistic data. In scaling these statistical methods to meet the Big Data challenge, this research will continue to expand the field of computational geometry to piece-wise Euclidean, non-positively curved spaces. Finally, this research will lead to characterizing the first continuous geometric spaces for phylogenetic networks.

This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.

National Science Foundation (NSF)
Division of Mathematical Sciences (DMS)
Application #
Program Officer
Junping Wang
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Research Foundation of the City University of New York (Lehman)
United States
Zip Code