Peptide hormones regulate embryonic development and most physiological processes by acting as endocrine or paracrine signals. They are also a rich source of relatively safe medicines to treat both common and rare diseases. Yet finding peptide-coding genes below ~300 base pairs is inherently difficult because they lie within the noise of the genome. Recent multidisciplinary, proteophylogenomic studies in lower species, such as yeast and flies, have uncovered hundreds of new small protein-coding genes called ?smORFs?. In humans, recent work on the mitochondrial genome has also uncovered dozens of small peptide hormone genes called MDPs. Based on these and other studies, it is estimated that about 5% of proteins in the human nuclear genome have not yet been discovered, particularly those that encode small peptides below 100 amino acids. It is a well documented but rarely challenged practice to discard large quantities of sequencing and proteomic data because they do not match the annotated human genome. My overarching goal is to discover the human ?secretome? and make practical use of it to improve the human condition. Over the past few years, we have developed a unique pipeline of technologies that combines breakthroughs in math, computer hardware and software, proteomics, mass spectrometry, and HTS screening, each of which has been optimized and integrated. Our GeneFinder software modules, based on machine-learning, can process data 100 times faster than traditional methods and rapidly validate small human genes using public and in-house generated databases of genetic and proteomic data. Using the prototype version of the platform that finds conservation between humans, chimp, and macaque, we have discovered thousands of putative peptide-coding genes and validated hundreds of them.
We aim to (1) further improve the algorithm to increase its speed and accuracy, (2) improve the genome annotation for thousands of small novel genes, (3) determine their expression profiles in normal and diseased tissues, (4) explore their genetic association with disease loci, and (5) screen the first secretomic library to find hormones with novel biological and therapeutically relevant activities. The data, the software package, and libraries will be made available to the research community. In doing so, we will shed light on the dark matter of the human genome, the parts with the greatest therapeutic potential, thereby helping to steer and accelerate the pace of research and drug development for generations to come.

Public Health Relevance

There has been a rapid expansion in the use of peptide hormones as drugs over the last decade, yet new research indicates that more than 90% of all hormones in the body (encoded by an additional 5% of the human genome) remain to be discovered. As a result, terabytes of data are discarded each week and innumerable opportunities for biological discovery are missed because, according to our findings, the majority of genes below ~300 base pairs are missing from the annotated human genome. We propose an integrated, multi- disciplinary approach to find, validate and characterize an estimated 4000-5000 new peptide-coding genes using a pioneering technology platform that combines breakthroughs in math, custom-built computer hardware and software, and wet-lab approaches, providing a far more complete roadmap for biology and medicine in the 21st century.

National Institute of Health (NIH)
National Institute on Aging (NIA)
NIH Director’s Pioneer Award (NDPA) (DP1)
Project #
Application #
Study Section
Special Emphasis Panel (ZRG1)
Program Officer
Guo, Max
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Harvard Medical School
Schools of Medicine
United States
Zip Code
Rajman, Luis; Chwalek, Karolina; Sinclair, David A (2018) Therapeutic Potential of NAD-Boosting Molecules: The In Vivo Evidence. Cell Metab 27:529-547
Das, Abhirup; Huang, George X; Bonkowski, Michael S et al. (2018) Impairment of an Endothelial NAD+-H2S Signaling Network Is a Reversible Cause of Vascular Aging. Cell 173:74-89.e20
Costford, Sheila R; Brouwers, Bram; Hopf, Meghan E et al. (2018) Skeletal muscle overexpression of nicotinamide phosphoribosyl transferase in mice coupled with voluntary exercise augments exercise endurance. Mol Metab 7:1-11
Longchamp, Alban; Mirabella, Teodelinda; Arduini, Alessandro et al. (2018) Amino Acid Restriction Triggers Angiogenesis via GCN2/ATF4 Regulation of VEGF and H2S Production. Cell 173:117-129.e14
Dai, Han; Sinclair, David A; Ellis, James L et al. (2018) Sirtuin activators and inhibitors: Promises, achievements, and challenges. Pharmacol Ther 188:140-154
Mitchell, Sarah J; Bernier, Michel; Aon, Miguel A et al. (2018) Nicotinamide Improves Aspects of Healthspan, but Not Lifespan, in Mice. Cell Metab 27:667-676.e4
Uddin, Golam Mezbah; Youngson, Neil A; Doyle, Bronte M et al. (2017) Nicotinamide mononucleotide (NMN) supplementation ameliorates the impact of maternal obesity in mice: comparison with exercise. Sci Rep 7:15063
Pollack, Rena M; Barzilai, Nir; Anghel, Valentin et al. (2017) Resveratrol Improves Vascular Function and Mitochondrial Number but Not Glucose Metabolism in Older Adults. J Gerontol A Biol Sci Med Sci 72:1703-1709