This project studies methods for analyzing large datasets using L1 and related regularization. Coordinate descent algorithms are developed to provide entire families of solutions for L1 and more aggressive concave penalized regression problems. Applications include generalized linear regression models for very wide datasets, and structure-finding algorithms for undirected graphical models. L1 regularization is used as well to develop efficient convex algorithms for finding low-rank approximations (SVDs) to extremely large, sparsely populated matrices.
This project develops tools with a wide variety of applications, illustrated here in medicine and merchandising. Modern technologies in genomics produce measurements of half a million or more genotypes at particular locations (SNPs) along an individual's genome in a few hours. Armed with such measurements on a few thousand individuals, some sick and some healthy, this project develops powerful statistical tools for identifying groups of SNPs associated with diseases such as Alzheimer's or breast cancer. Online movie renters or book buyers are often asked to rate their purchases. Although each individual sees a minuscule fraction of the selections available, the investigators are able to develop recommender systems that exploit the overlap to learn genres of movies, and assign viewers to like-minded cliques, and which allow them to make recommendations for products not yet seen.