Overview: We will extend and develop implementations of foundational methods for analyzing populations of attributed connectomes. Our toolbox will enable brain scientists to (1) infer latent structure from individual connectomes, (2) identify meaningful clusters among populations of connectomes, and (3) detect relationships between connectomes and multivariate phenotypes. The methods we develop and extend will naturally overcome the challenges inherent in connectomics: high-dimensional non-Euclidean data with multi-level nonlinear interactions. Our implementations will comply with the highest open-source standards by: providing extensive online documentation and extended tutorials, hosting workshops to demonstrate our tools on an annual basis, and merging our implementations into commonly used packages such as scikit-learn [1], scipy [2], and networkx [3]. All of the code we develop is open source. We strive to ensure that our code is shared in accordance with the strictest guiding principles. We chose to implement these algorithms in Python due to its wide adoption in the neuroscience and data science fields. In particular, many other neuroscience tools applicable to connectomics, including NetworkX DiPy, mindboggle, nilearn, and nipy, are also implemented in Python. This will enable researchers to chain our analysis tools onto pre-existing pipelines for data preprocessing and visualization. Nonetheless, we feel that sharing our code in our own public repositories is insufficient for global reach. We have also begun reaching out to developers of the leading data science packages in python, including scipy, sklearn, networkx, scikit-image, and DiPy. For each of those packages, we have informal approval to begin integrating algorithms that we have developed. Those packages are collectively used by >220,000 other packages, so merging our algorithms into those packages will significantly extend our global reach. All researchers investigating connectomics, including all the authors of the 24,000 papers that mention the word ?connectome?, will be able to apply state-of-the-art statistical theory and methods to their data. Currently, we have about 150 open source software projects on our NeuroData GitHub organization. Collectively, these projects get about 2,000 downloads and >11,000 views per month. As we incorporate additional functionality as described in this proposal, we expect far more researchers across disciplines and sectors will utilize our software. 20 ? ?? ? ??
Connectomes are an increasingly important modality for characterizing the structure of the brain, to complement behavior, genetics, and physiology. We and others have developed foundational statistical theory and methods over the last decade for the analysis of networks, networks with edge, vertex, and other attributes, and populations thereof, with preliminary implementations of those tools that we leverage in our laboratory for various application papers. In this project, we will extend our package, called graspy, to be of professional quality, implementing key functionality to include (1) estimating latent structure from attributed connectomes, (2) identifying meaningful clusters among populations of connectomes, and (3) detecting relationships between connectomes and multivariate phenotypes, such as behavior, genetics, and physiology. 18