New, emerging and re-emerging infectious diseases pose an ever-increasing threat to public health, with attendant escalation of health care costs. In response, the Centers for Disease Control and Prevention (CDC) recommended a front line strategy involving expanded use of molecular epidemiology. Yet, before this can be successfully implemented, multiple analytic issues for statistical handling of the resultant data must be resolved. This study proposes to address concerns surrounding the analysis of fragment data band patterns (DNA fingerprints) that arise from numerous genotyping techniques applied to infectious organisms. These concerns differ substantially from the far more developed forensic use of DNA fingerprints in humans. Although the approaches proposed focus on tuberculosis, the resulting statistical methodology will generalize across organisms and genotyping techniques. Such analytic tools are crucial in fully realizing the potential of the rich, molecular epidemiologic data that is of such vital importance and, accordingly, is being widely obtained.
The Specific Aims of this application will address: 1) evaluating various methods for comparing microbial DNA fingerprint patterns including accommodating sources of measurement error, developing and comparing similarity/distance measures, extending these measures to handle multiple genotyping systems and to systems where band intensity is consequential, and to assess significance of matching individual fingerprints to large fingerprint databases; 2) properties of statistical techniques for representing these data including clustering and phylogenetic algorithms; and 3) integrating these analyses with epidemiologic and clinical data to identify interpersonal transmission of pathogens, and bacterial clones which have distinct pathogenic properties. Fragment data are currently being used extensively because of the many advantages they have over DNA sequence data. These include technical simplicity and relatively low cost, permitting use in epidemiologic studies with large sample sizes. However, a disadvantage is the absence of the variety of statistical analytical approaches to this type of data. This application seeks to redress this deficiency, thereby enhancing the utility of the associated molecular genotyping techniques in the collection of data for combating infectious disease.