This proposal is for research on finding ways of developing population level understanding of, and insights about, a collection of tree structured objects. I.e. the goal is the analysis of the variation, including variation in branching structure, in a population of data points that are trees. While this goal is statistical in nature, it is very far beyond the reach of existing statistical methods. Thus an entire new area of statistical research is opened up by the work proposed here. This work is driven by a particular example data set of human brain artery trees, collected by a neuro-surgeon collaborator, who has, and will continue to, inform the research directions chosen, and the steps taken. While this motivating example is vasculature of the human brain, there are many other contexts which will be impacted by the new methods developed here, discussed below. The closest statistical area to the proposed research is the currently active area of Functional Data Analysis, in which the atoms of the statistical analysis are curves (instead of the more typical numbers or vectors). This abstract concept was extended to Object Oriented Data Analysis (OODA), by Wang and Marron (2007), where the atoms become more complicated objects of various types, including tree structured objects. OODA presents a number of major new data analytic challenges. Addressing these challenges will require the development of totally new types of statistical methods. Even simple statistical concepts, such as the population mean, are not straightforward to develop. Deeper properties, in particular the quantitation of variation about the mean, are far more challenging. The results of Wang and Marron (2007) were a first pass at formulating statistical concepts in terms of optimization problems. A major limitation was that it was unclear how to compute useful solutions of these for realistic data sets. Aydin et al (2008) achieved a major breakthrough in this direction by inventing linear time solutions to some of these apparently intractable optimization problems, which brings practical OODA of the artery tree data set within reach of modern computational facilities. The proposed work is on much deeper analyses, which requires the invention of powerful new approaches to understanding variation. In particular, the current topology-only analyses will be extended to full nodal attribute data types that will enable simultaneous study of other types of variation as well (e.g. in branch thickness and location), the entirely new area of discrimination for populations of tree structured objects will be explored, and innovations in the visualization of complex tree data objects will be made. These deeper analyses are expected to yield deep new anatomical results, involving symmetry and dependence on covariates such as age, that are unavailable from the simple summaries currently being used to analyze tree data.
The driving data set for this research is a collection of over 100 patients? magnetic resonance angiographic (MRA) brain artery trees. This proposed project will develop new statistical methods for extracting useful information from this collection of trees. Major new population-level insights on human brain anatomy will be targeted. The big picture goal of this research is the development of methods for characterizing normal brain artery structure. As well as being an important scientific anatomical goal in itself, this has potential for major medical applications. For example, this work provides the potential for a vascular-based diagnosis of brain tumors (which have aberrant arterial trees). Arterial tree analysis should improve the success of current cancer treatments through earlier diagnosis than is available using current techniques. Future medical applications are expected to extend well beyond the driving problem of human brain arteries to many other types of widely-studied anatomical structures, such as airways in the lung, the nervous system and various types of collection duct systems. In addition, this body of work is expected to drive new ideas in many other areas which naturally encounter trees as data objects, such as text mining (where a standard technique is representation of grammatical structures as trees), phylogenetic trees in genetics, and the analysis of social and computer networks. Finally this work is expected to have an impact on mathematics by stimulating the development of new ideas there, e.g. in optimization and graph theory.