This project will see development of a software suite (Genetrix) comprised of tools for data management, visualization, machine learning, statistical analysis and biologic interpretation of data from the large-scale biological platforms (gene expression, SNP and proteomics), in conjunction with ancillary clinical, demographic, epidemiological, laboratory and outcome data. The bioinformatics challenges of the new large-scale biotechnologies and formidable: efficient mining of biologically- and clinically-relevant information requires coordinated contributions from computer scientists, statisticians, mathematicians, biologists and clinicians. The potential benefits however, are also substantial as evidenced by the rapidly growing use of gene expression microarrays. The complexities, and payoffs, will increase dramatically as scientists begin to integrate SNP/proteomic data and gene expression data, and there will be demand for a new generation of software to meet this challenge. Genetrix will include algorithms to pre-process and normalize raw data to reduce noise, will provide a flexible, interactive and intuitive graphical interface, will support unsupervised and supervised for classification, and for dichotomous or survival outcome prediction, using appropriate statistic methods as well as proven machine learning heuristics, and will have extensive biological information integrated into the software, and available directly from Web resources. The features implemented under this SBIR include input and management of SNP and protein data, haplotype block inference, tests of association of SNPs with disease in unrelated individuals, linkage analysis using genome-wide SNP arrays, and analysis of proteomics using modified versions of the gene expression tools.

Agency
National Institute of Health (NIH)
Institute
National Human Genome Research Institute (NHGRI)
Type
Small Business Innovation Research Grants (SBIR) - Phase II (R44)
Project #
2R44HG002696-02
Application #
6835032
Study Section
Special Emphasis Panel (ZRG1-SSS-Y (10))
Program Officer
Bonazzi, Vivien
Project Start
2002-11-01
Project End
2006-08-31
Budget Start
2004-09-03
Budget End
2005-08-31
Support Year
2
Fiscal Year
2004
Total Cost
$586,927
Indirect Cost
Name
Epicenter Software
Department
Type
DUNS #
198000077
City
Pasadena
State
CA
Country
United States
Zip Code
91106