In the past 10-15 years there have been significant developments in the areas of applied regression and classification. Much of the impetus originally came from outside of the field of statistics, from areas such as computer science, machine learning and neural networks. In some cases, statisticians have helped to synthesize these innovations, often by reinterpreting them from the point of view of standard statistical models. As a result, we now have at our disposal a very powerful collection of techniques for adaptive regression and classification. These are now being applied to medical diagnosis, bioinformatics and genetic modeling, chemical process control, shape, handwriting, speech and face recognition, financial modeling, and a wide range of other important practical problems. In this proposal, the investigator develops new methods for adaptive regression and classification, with particular application to problems in bioinformatics and genomics. One such method is the "fused lasso", a technique which seeks solutions which are both sparse and and smooth. It is designed for problems in which the features have a natural ordering, e.g. in time or space. Protein mass spectometry represents one such area of application.
In this research, the investigator develops new statistical algorithms for analyzing data from biological and human experiments. This work is part of the exploding field of "bioinformatics", which seeks to make sense of the huge volume of data produced by new technologies in biology and medicine. This kind of work represents the next step after the sequencing of the human genome, and has enormous potential for societal impact through disease prevention and treatment.