The project will continue an ongoing study of deep connections between the theory of combinatorial structures on a large finite set, such as trees, forests, mappings, and partitions, and the theory of stochastic processes such as Brownian excursion and Levy processes. Specific topics to be studied include exchangeable coalescent processes, random Gibbs partitions and fragmentations, coherent combinatorial structures, partial exchangeability, consistent systems of fragmentations, continuum tree limits, occupancy problems and power laws for random discrete distributions, and Bayesian inference. There has been a rich interplay of ideas between these subjects in the last few years, and it is expected that much more remains to be discovered as present models for the asymptotics of various combinatorial structures are challenged by problems arising in various application areas. One general theme is the rich variety of models of random discrete distributions which arise when unit of mass continuously distributed in the leaves of a continuum random tree is decomposed in various ways, such as by projection onto a branch of the tree, or by cutting with a Poisson point process of cuts along branches of the tree. Such constructions lead to models for random discrete distributions with natural interpretations in application contexts, for instance the Poisson cuts may represent mutations in a phylogenetic tree.
Results of the project should provide deeper understanding of models for the evolution of random partitions and partition-valued processes, particularly processes of fragmentation and coagulation. Such results should be of value in the numerous fields where such processes have been applied before, including physics, astronomy, genetics, phylogeny, ecology, and document analysis. In particular, the application of random discrete distributions with power law tails to model the distribution of topics among scientific documents may provide improved methods for classification and navigation of large bodies of scientific literature.
This NSF award supported research into the construction and asymptotics of various models of random trees, including continuum random tree limits, as well as path transformations of stochastic processes and models for random clustering and partitioning. Pitman and Chistopher Haulk obtained a characterization of exchangeable random hierarchies by sampling from a random real tree, which may be viewed as the analog for hierarchies of Kingman's representation of exchangeable random partitions. Pitman, Douglas Rizzolo and Matthias Winkel obtained a number of new results on the asymptotic behavior of various random fragmentation processes and associated random trees. A systematic account was provided of residual mass processes of a tagged particle in an exchangeable fragmentation process, with attention to mechanisms of selection of particles which have regenerative properties but are not fully exchangeable. This lead to a novel study of binary tree growth processes by a transition mechanism called bead-splitting which involves random discrete distributions. It was shown for a large family of bead-splitting processes that there is almost sure convergence of rescaled trees to to a self-similar continuum random tree, in the Gromov-Hausdorff-Prohorov sense. Regenerative tree growth processes were further characterized as consistent families of random trees with labelled leaves, with a regenerative property at branch points. This framework includes growth processes for exchangeably labelled Markov branching trees, as well as non-exchangeable models such as the alpha-theta model, the alpha-gamma model and all restricted exchangeable models previously studied. The main structural result is a representation of the growth rule by a sigma-finite dislocation measure kappa on the set of partitions of the natural numbers extending Bertoin's notion of exchangeable dislocation measures from the setting of homogeneous fragmentations. This representation was applied to establish necessary and sufficient conditions on the growth rule under which results by Haas and Miermont apply to establish self-similar random trees and residual mass processes as scaling limits. While previous studies exploited some form of exchangeability, these scaling limit results only require a regularity condition on the convergence of asymptotic frequencies under kappa, in addition to a regular variation condition. In a series of papers by Pitman with Nathan Ross, Josh Abramson and Geronimo Uribe Bravo, definitive results were obtained describing the convex minorants of random walks and Levy processes. These results include point process descriptions of the convex minorant of random walks and Levy processes on a fixed finite interval, up to an independent exponential time, and in the infinite horizon case. These descriptions follow from the invariance of these processes under an adequate path transformation. In the case of Brownian motion, further special properties of this process, including time-inversion, imply a sequential description for the convex minorant of the Brownian meander. A new path transform on 1-dimensional simple random walks and Brownian motion, the quantile transform, was introduced and its properties were exposed in joint work by Pitman, Noah Forman and Sami Assaf. This transformation relates to identities in fluctuation theory due to Wendel, Port, Dassios and others, and to discrete and Brownian versions of Tanaka's formula. For an n-step random walk, the quantile transform reorders increments according to the value of the walk at the start of each increment. The distribution of the quantile transform of a simple random walk of n steps was described using a bijection to characterize the number of pre-images of each possible transformed path. A surprising consquence is that both for simple random walks and for Brownian motion, that the quantile transform has the same distribution as Vervaat's transform. For Brownian motion, the quantile transforms of the embedded simple random walks converge to a time change of the local time profile, leading to a novel derivation of Jeulin's description of the local time profile of a Brownian bridge or excursion. Pitman worked with Tamara Broderick and Michael Jordan on applications of random partitions to problems in machine learning. A theory was developed of which generalizes Bayesian cluster analysis. In this generalization, called feature allocation, each data point may belong to an arbitrary, non-negative integer number of groups, now called features or topics. A notion of an "exchangeable feature probability function" (EFPF)---analogous to the exchangeable partition probability function (EPPF) in the clustering setting--- was developed for certain types of feature models. A "feature paintbox" characterization---analogous to the Kingman paintbox for clustering---was provided for a very general class of exchangeable feature models, along with a further characterization of the subclass of feature allocations that have EFPF representations.