Mega2, the Manipulation Environment for Genetic Analysis, is a unique open-source computer program for facilitating the creation of analysis-ready datasets from data gathered as part of a genetic study. Mega2 transparently allows users to process genetic data for family-based or case/control studies accurately and efficiently. In addition to data validation checks, Mega2 provides analysis setup capabilities for a broad choice of commonly-used genetic analysis programs. It has served an important role in the genetics research community worldwide. Under this proposal, we plan to enhance the capabilities of Mega2, while applying best practices in software engineering to meet these critical needs of an open-source project: portability across OS platforms, ability to evolve with the needs of genetics research, extensive documentation, maximizing code reuse, as well as systematic testing, debugging and maintenance. Specifically, we plan to make Mega2 much easier to extend, repair, and maintain by streamlining it and restructuring it as an object-oriented application, implementing an application programming interface and a graphical user interface, and creating a dynamic mechanism for adding new input formats, filters, and output formats. We plan to improve Mega2's large-scale data handling in order to meet the needs of the research community by adding a fast interactive mode, implementing adjustment layers for modifying and filtering original data, and supporting an intermediate pre- validated data format. We plan to extend Mega2's interoperability by supporting more input formats and analysis programs, improving connections to existing databases, and integrating Mega2 within the Galaxy workflow pipeline. We plan to continue to maintain Mega2 as a public resource, constantly revising it to support the very latest versions of the supported analysis programs, carrying out rigorous and extensive quality-control testing, and maintaining and improving the documentation. Accomplishment of these aims will significantly improve Mega2's usability, data input handling capabilities, abilities to set up target analyses, extensibility, maintainability, and interoperability, thus ensuring continuing usefulness and availability of this widely-used software. Mega2 has been used to accelerate gene discovery studies for a large list of complex human diseases. Improving this crucial resource will have a significant impact on speeding up the gene-discovery process, which in turn will ease the healthcare burden due to complex genetic disease in the US and world-wide.

Public Health Relevance

Mega2 is used in many different studies of the genetics of complex human diseases, enabling researchers to swiftly change file formats and prepare files for statistical analysis and transparently processing genetic data accurately and efficiently. As such, Mega2 has and will continue to markedly accelerate the mapping and identification of genetic risk factors for complex human diseases. Already, Mega2 users have used Mega2 to facilitate and accelerate statistical analyses in genetic studies of a large range of important human diseases, including Alzheimer disease, schizophrenia, coronary artery disease, type 1 diabetes, type 2 diabetes, lung cancer, prostate cancer, bipolar affective disorder, major depressive disorder, osteoarthritis, epilepsy, obesity, pain, rheumatoid arthritis, and many more.

National Institute of Health (NIH)
National Institute of General Medical Sciences (NIGMS)
Research Project (R01)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1-BST-H (50))
Program Officer
Krasnewich, Donna M
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
University of Pittsburgh
Schools of Public Health
United States
Zip Code
Bui, Diem K; Jiang, Yingda; Wei, Xin et al. (2015) Genetic ME-a visualization application for merging and editing pedigrees for genetic studies. BMC Res Notes 8:241
Baron, Robert V; Conley, Yvette P; Gorin, Michael B et al. (2015) dbVOR: a database system for importing pedigree, phenotype and genotype data and exporting selected subsets. BMC Bioinformatics 16:91
Baron, Robert V; Kollar, Charles; Mukhopadhyay, Nandita et al. (2014) Mega2: validated data-reformatting for linkage and association analyses. Source Code Biol Med 9:26
Weeks, Daniel E; Tang, Xinyu; Kwon, Amy M (2009) Casares' map function: no need for a 'corrected' Haldane's map function. Genetica 135:305-7
Bhattacharjee, Samsiddhi; Kuo, Chia-Ling; Mukhopadhyay, Nandita et al. (2008) Robust score statistics for QTL linkage analysis. Am J Hum Genet 82:567-82
Davis, A O; O'Leary, J O; Muthaiyan, A et al. (2005) Characterization of Staphylococcus aureus mutants expressing reduced susceptibility to common house-cleaners. J Appl Microbiol 98:364-72
O'Brien, Frances G; Lim, Tien Tze; Winnett, David C et al. (2005) Survey of methicillin-resistant Staphylococcus aureus strains from two hospitals in El Paso, Texas. J Clin Microbiol 43:2969-72
O'Leary, Jessica O; Langevin, Mark J; Price, Christopher T D et al. (2004) Effects of sarA inactivation on the intrinsic multidrug resistance mechanism of Staphylococcus aureus. FEMS Microbiol Lett 237:297-302