This project will see development of a software suite (Genetrix) comprised of tools for data management, visualization, machine learning, statistical analysis and biologic interpretation of data from the large-scale biological platforms (gene expression, SNP and proteomics), in conjunction with ancillary clinical, demographic, epidemiological, laboratory and outcome data. The bioinformatics challenges of the new large-scale biotechnologies and formidable: efficient mining of biologically- and clinically-relevant information requires coordinated contributions from computer scientists, statisticians, mathematicians, biologists and clinicians. The potential benefits however, are also substantial as evidenced by the rapidly growing use of gene expression microarrays. The complexities, and payoffs, will increase dramatically as scientists begin to integrate SNP/proteomic data and gene expression data, and there will be demand for a new generation of software to meet this challenge. Genetrix will include algorithms to pre-process and normalize raw data to reduce noise, will provide a flexible, interactive and intuitive graphical interface, will support unsupervised and supervised for classification, and for dichotomous or survival outcome prediction, using appropriate statistic methods as well as proven machine learning heuristics, and will have extensive biological information integrated into the software, and available directly from Web resources. The features implemented under this SBIR include input and management of SNP and protein data, haplotype block inference, tests of association of SNPs with disease in unrelated individuals, linkage analysis using genome-wide SNP arrays, and analysis of proteomics using modified versions of the gene expression tools.