) The broad aim of this proposal is to facilitate structural and functional genomics of cancer.
The specific aims are to develop and apply computational tools for (i) identifying and annotating cancer-related protein sequences; (ii) prioritizing target proteins for the structural genomics of cancer; and (iii) maximizing structural information about cancer-related proteins.
The first aim will be achieved by collecting cancer-related protein sequences from The Cancer Genome Anatomy Project at NCI and by identifying additional such sequences in the databases of metabolic and signaling pathways, and primary sequence databases. Proteins that occur in the same pathway or have similar regulatory patterns as cancer proteins, proteins that interact with cancer proteins, or proteins whose expression shares features with that of the cancer proteins will also be considered as cancer- related proteins. Queryable and up-to-date annotations of cancer-related proteins will be obtained by sensitive comparisons to all known protein sequences and structures. The annotations will include comparative protein structure models for all cancer-related proteins with assigned folds.
The second aim i s to identify and prioritize target protein domains for the AECOM/Brookhaven/Rockefeller Structural Genomics Research Consortium (SGRC) that will focus on developing high-throughput technology for structure determination of the cancer-related proteins by X-ray crystallography and NMR spectroscopy. The target domains will correspond primarily to the yeast homologs of the cancer-related proteins without known structure. The target list will be dynamically updated to maximize information from structure determinations.
The third aim i s to analyze and use the structures determined by SGRC for comparative structure modeling and comparative analysis of as many cancer-related proteins as possible. The annotation, modeling and analysis tools will build on the MAGPIE system for automated genome annotation, and on the MODELLER pipeline for large-scale comparative modeling. The annotations will be defined in the computer language Prolog through logical rules and relational facts, including rules to capture computed alignment data, domain definitions, and user preferences about properties of target domains. The ability to refer at the same time to the sequence, structure, and function of cancer-related proteins, organized in sequence and structure families, will allow cancer researchers to address questions that are currently not easily answered. This project will increase significantly the amount of protein structure information available to cancer biologists. The set of cancer- related proteins, their annotations, family membership, and structural models will be accessible efficiently over the web.

Agency
National Institute of Health (NIH)
Institute
National Cancer Institute (NCI)
Type
Exploratory/Developmental Grants Phase II (R33)
Project #
1R33CA084699-01
Application #
6062384
Study Section
Special Emphasis Panel (ZCA1-SRRB-C (O2))
Program Officer
Gallahan, Daniel L
Project Start
2000-02-01
Project End
2003-01-31
Budget Start
2000-02-01
Budget End
2001-01-31
Support Year
1
Fiscal Year
2000
Total Cost
$370,060
Indirect Cost
Name
Rockefeller University
Department
Genetics
Type
Other Domestic Higher Education
DUNS #
071037113
City
New York
State
NY
Country
United States
Zip Code
10065
Pieper, Ursula; Eswar, Narayanan; Davis, Fred P et al. (2006) MODBASE: a database of annotated comparative protein structure models and associated resources. Nucleic Acids Res 34:D291-5
Chance, Mark R; Fiser, Andras; Sali, Andrej et al. (2004) High-throughput computational and experimental techniques in structural genomics. Genome Res 14:2145-54
Pieper, Ursula; Eswar, Narayanan; Braberg, Hannes et al. (2004) MODBASE, a database of annotated comparative protein structure models, and associated resources. Nucleic Acids Res 32:D217-22
Zavolan, Mihaela; Socci, Nicholas D; Rajewsky, Nikolaus et al. (2003) SMASHing regulatory sites in DNA by human-mouse sequence comparisons. Proc IEEE Comput Soc Bioinform Conf 2:277-86
Eswar, Narayanan; John, Bino; Mirkovic, Nebojsa et al. (2003) Tools for comparative protein structure modeling and analysis. Nucleic Acids Res 31:3375-80
Zavolan, Mihaela; Kondo, Shinji; Schonbach, Christian et al. (2003) Impact of alternative initiation, splicing, and termination on the diversity of the mRNA transcripts encoded by the mouse transcriptome. Genome Res 13:1290-300
Zavolan, Mihaela; van Nimwegen, Erik; Gaasterland, Terry (2002) Splice variation in mouse full-length cDNAs identified by mapping to the mouse genome. Genome Res 12:1377-85
Gunther, C S; Gaasterland, T (2001) Characterizing the relationship between protein-fusion and gene co-expression. Genome Inform 12:34-43
Gopal, S; Schroeder, M; Pieper, U et al. (2001) Homology-based annotation yields 1,042 new candidate genes in the Drosophila melanogaster genome. Nat Genet 27:337-40