Driving scientific questions that will be addressed by engaging with the CFDE and why it has not yet been feasible The Library of Integrated Network-based Cellular Signatures (LINCS) program (1) collected massive data from human cells perturbed by thousands of single small molecules as well as knockouts, knockdowns, and over-expression of single genes. The diverse collections of perturbed human cells (n>50) were profiled before and after the perturbations with an array of omics assays that include transcriptomics, proteomics, epigenomics, cell viability, and imaging at different time points and where the small molecules were applied in different concentrations. Altogether, over 2 million signatures are expected to be produced and provided as a resource for the community for query and reuse at the time when the LINCS program officially ends (6/2020). Such a resource can be used for limitless applications, for example, to study molecular mechanisms of disease, repurpose existing drugs, predict side effects and indications for pre-clinical small molecules, associate small molecules with the targets that they likely affect directly and indirectly, reconstruct cell signaling and gene regulatory networks, understand the global space of all possible cellular states in response to all possible perturbations of all human cells, and many more applications and use cases. This utilization of LINCS resources is already happening but can be significantly enhanced via continued efforts led by the LINCS Data Coordination and Integration Center (DCIC) through interactions with the CFDE and other CF DCCs in the next 3 years. So far, the ~400 publications produced by the LINCS consortium have been cited by ~6,000 other papers, demonstrating the high impact of the program on the research community. In particular, the computational resources developed by the LINCS DCIC have been very successful. These tools and databases were already visited by >1 million unique users, with currently ~30,000 unique users per month (based on Google Analytics). These strong usage statistics demonstrate the value of LINCS resources and their potential for making long-lasting impact on drug discovery, and the biomedical research community in general. The LINCS DCIC developed web-based resources to enable the federated access, intuitive querying, and integrative analysis and visualization of the LINCS data combined with other relevant data. To achieve this the LINCS DCIC also processed many additional external data types from other relevant resources to be integrated with LINCS data including data from other Common Fund programs such as GTEx, Epigenomics Roadmap, and IMPC. However, such data integration efforts were achieved with little consideration of community standards to ensure their long term findability, accessibility, interoperability and reusability (FAIR) (2). Our involvement with the NIH Data Commons Pilot Project Consortium (DCPPC) and the Common Fund Data Ecosystem (CFDE) taught us many lessons on how to better achieve data harmonization via the adoption of community standards to achieve long term sustainability of LINCS resources. Hence, by interacting with the CFDE, adhering to the requirements that the CFDE will establish, we will be able to reprocess the LINCS data, and the other data we use to integrate with LINCS, with transformations that will enable improved FAIRness, further enabling more complex use cases. In addition, by interacting directly with other CF DCCs we will enable the direct integration of LINCS data with other CF generated resources. Our plan is to develop an interactive web-based data visualization component that will enable users to project RNA-seq samples (patients, single cells, or signatures) into a lower dimensional space based on their transcriptomics data profiling. Such visualization will be linked to the metadata describing each sample, as well as automatically identified clusters, enrichment analysis results for each sample or cluster, and predictions of drugs and small molecules from the LINCS resource. This interactive web-based data visualization component will enable, for example, assisting KidsFirst portal users, including physicians, to prescribe the most appropriate therapeutics to the right subtype of patients, as well as trace patients over time to monitor their response to treatment enable decision support for changing treatment course early, if necessary. Finally, by moving all LINCS resources into a cloud environment through STRIDES, we will ensure that LINCS resources are archived for the long term ensuring maximal reuse and enabling applications that are currently not even imagined or possible.

Agency
National Institute of Health (NIH)
Institute
Office of The Director, National Institutes of Health (OD)
Project #
1OT2OD030160-01
Application #
10128125
Study Section
Special Emphasis Panel (ZOD1)
Program Officer
Resat, Haluk
Project Start
2020-09-23
Project End
2021-09-22
Budget Start
2020-09-23
Budget End
2021-09-22
Support Year
1
Fiscal Year
2020
Total Cost
Indirect Cost
Name
Icahn School of Medicine at Mount Sinai
Department
Pharmacology
Type
Schools of Medicine
DUNS #
078861598
City
New York
State
NY
Country
United States
Zip Code
10029