Insertion of transposable elements (TEs, sometimes referred to as ?jumping genes?) into the human genome can be pathogenic.
Our aim i n this project is to use sophisticated computational approaches to characterize TE insertions in the whole-genome sequencing data generated in the Gabriella Miller Kids First Pediatric Research Program and identify any insertional mutations that may disrupt gene function. The large scale of the Kids First program provides an unprecedented opportunity to investigate the role of TE insertions in childhood cancers and structural birth defects, as well as to create a resource of reference TE maps that will be important for all other TE studies. We will first modify our existing algorithm called xTEA for the trio design of the Kids First studies and increase the accuracy and efficiency of the algorithm. Then, we will apply it to the thousands of trios that have been profiled in the Kids First program, using a pipeline optimized for the cloud environment. The resulting set of TE insertions (especially L1, Alu, SVA, and HERV insertions) will be curated with all relevant features and be made into a database for the community. We will also apply machine learning methods to improve the calls once a sufficient amount of training data have been obtained. To investigate the potential pathogenicity of the mutation, we will first focus on insertions within genes, but we will also explore those in regulatory elements inferred from epigenetic profiling data.
Transposable elements, or ?jumping genes?, are genetic elements that can alter the DNA of an individual. We aim to utilize a computational method to identify such elements in the genome sequencing data generated in the Gabriella Miller Kids First Pediatric Research Program. Our analysis will identify transposable elements that may be causal for a disease phenotype.