The Data Analysis Unit will establish a comprehensive software environment to manage all aspects of data acquisition, interpretation and visualization for the PATCH Center and will develop and release novel algorithms for carrying out these functions. In the preliminary stage, the Atlas interface is envisioned to resemble Google Maps, presenting zoomable, multi-layer images, with mouse hover actions for interactive multi-view inspection of data and results from upstream analysis (prototype at http://cycif.org/PCA2018). Example actions include an enumeration of similar or related cells types and spatial descriptive statistics, recovery of features and whole images by searching over clinical, measured or computed properties, examination of interactions between tumor and immune cells, and export of user selections for further offline analysis. All data will be machine readable and accessible via an API. Proposed computational models of melanoma and clonal hematopoesis data focus on two use cases: (i) subdivision of pre-malignant lesions into subdomains and identification of molecular features significantly associated with each (ii) prediction of the risk that a lesions in an individual patient will progress. Success with such models would set the stage for precision approaches to prevention or early use of drugs active against MNs in otherwise asymptomatic individuals.
Aim 1 will implement widely accepted approaches for judging statistical significance based on the probability of detecting a true event given noisy and incomplete data, a limited number of samples, incomplete understanding of confounders and corrections for multi-hypothesis testing.
Aim 2 will link existing resources, and create new ones as necessary, to ensure the integrity of Atlas data and compliance with FAIR data standards. The Unit will record metadata compliant with Open Biological and Biomedical Ontologies (OBO) and NCI Common Data Elements and manage data to FAIR standards via existing and newly-developed pipelines in Python, R and MATLAB,;unit testing and benchmarking will ensure data and metadata integrity and monitor the progress of specimens through all functional units in the PATCH Center.
Aim 3 will establish pipelines for data integration and reconstruction of latent properties involving cellular, morphological or disease states using complementary and interoperable tools for feature classification, discovery of significant molecular associations and predictors. The computational tasks involved include: (i) image processing, segmentation and cell type identification, (ii) discovery of multifactorial cell states using deep learning methods, (iii) reconstruction of multidimensional relationships in single-cell data using dimensionality-reducing neural networks with additional validation/inspection via t-SNE, and (iv) modeling of disease progression from temporal inferences made via Granger Causality with Hidden Markov Models.
Aim 4 will construct Atlases by linking these tools together to create feature sets, predictors and image maps for interactive and off-line visualiation. Development of visualization methods will be undertaken collaboratively.