The unprecedented progress in the area of technologies for generating genomic data has led to an imbalance where efforts to analyze these data is now becoming the bottleneck. Common methods in the statistician?s toolbox often falter in the face of these datasets which are massive not only in the number of data points but the dimension of parameters to be estimated. Each of the four projects will be faced with these challenges. It will be the responsibility of Core C to collaborate with project researchers in developing novel computational methods and tools that scale well. As an example, Project 1 will rely heavily on MCMC and high-dimensional regression. Fitting parameters with these statistical models entail massive number of iterations, so development of innovative approaches such as data-parallel algorithms for Graphics Processing Units will be a critical activity of the core. Other projects involve deploying extensive simulations that explore a constellation of model parameterizations, assumptions about disease effects, false discovery rates, etc. To this end, we will streamline such processes with re-usable code that can be easily tailored for specific simulation experiments.
The High Performance Computing and Simulations Core (Core C) will create pipelines for simulations and high performance software libraries and also assist project investigators with implementations. The Core will also develop new user-friendly web applications for users to quickly deploy and test new simulations.
Showing the most recent 10 out of 28 publications