Biological and biomedical sciences are undergoing a major revolution as new experimental approaches, such as high-throughput DNA sequencing, DNA microarray, whole genome location, and single nucleotide polymorphism (SNP) techniques, are yielding unprecedented amounts of genetic data. The exploration of this information is critically dependent upon the development of advanced computational methods for data analysis. Since the recent completion of the Human Genome Project, the focus of Computational Biology (or Bioinformatics) has shifted to topics that are more directly related to biological functions, i.e. computational problems that arise in functional genomics and proteomics.
In this project, the PI will study some new algorithmic problems that aim at addressing three important questions in computational biology: (i) how to infer haplotype configurations from genotype data based on the Mendelian law of inheritance for a given pedigree, (ii) how to resolve missing values in cluster analysis for oligonucleotide fingerprinting, and (iii) how to assign NMR peaks to individual amino acids. The first question is fundamental to the fine-mapping of genetic diseases using markers such as microsatellites and SNPs. Question (ii) arises in the analysis of discretized oligonucleotide fingerprints from DNA array experiments and has important applications in the classification of microbial communities. Question (iii) represents a crucial step in NMR-based protein structure determination.
Although the above algorithmic problems are relatively new in the literature (in fact, some of them were recently introduced by the PI in collaboration with experimentalists), they have strong ties to well-known combinatorial optimization problems such as bipartite matching, interval scheduling, graph clique partition (or coloring), and set cover. Therefore, their solutions could also be of interest to the general algorithms and combinatorial optimization communities.