Perhaps the most prevalent scenario in science in general is reconstructing a phenomenon that is not directly observable using the data that can be measured. In biology, for example, this occurs when computing an evolutionary alignment of homologous proteins, predicting the secondary structure of a folded RNA molecule, inferring a phylogeny for a collection of taxa, recovering the regulatory network for a set of genes, or assembling the DNA sequence for a genome. These tasks are almost always modeled as optimization problems, where the optimal solution is intended to correspond to the correct reconstruction. A crucial ingredient in any such model is the objective function, whose role is to select out the correct solution as one that maximizes or minimizes the objective function. This objective function usually comes from a family of parameterized functions, and the correctness of the model can critically depend on the choice of parameter values for the function. In practice, the question of how to determine the right values for a model's parameters is both difficult and ubiquitous: improved models that better reflect the underlying biology have many parameters, but yield worse results unless their parameters are set to correct values, yet painstakingly exploring the high-dimensional parameter space to find a correct setting quickly becomes impossible. The team is looking to implement new finding of an algorithm in the area of inverse parametric optimization that can efficiently learn correct parameter values for any linear problem, such as those biology problems noted. The system readily yields efficient software for inverse shortest paths, inverse spanning trees, maximum flow, maximum matching and maximum branching, all of which have linear objective function can optimized, enabling efficient model learning for a multitude of computer science applications.
The proposed work on inverse parametric optimization has extremely broad scientific impact in computer science and computational biology, as our techniques efficiently solve inverse optimization for any problem with a linear objective function. The PI will also create a new combined course that is an integral part of a new interdisciplinary degree program.
This exploratory grant developed a prototype C++ software system for efficient, fully-general inverse parametric optimization of any model that is linear in its parameters. Inverse parametric optimization is the problem of finding values for the parameters of the objective function of an optimization model, given a set of examples of input-output pairs for the model, such that each output in the set is an optimal solution for its corresponding input under the objective function. To contrast this with standard optimization, the forward optimization problem is, given an input to the model and the objective function for the model, find an output for the input that is optimal under the objective function. On the other hand, the inverse optimization problem is, given both an input and an output to the model, find the objective function (in the sense of specifying parameter values for a general parameterized family of objective functions) that makes the output be an optimal solution for its input. In the language of machine learning, this learns values for the parameters of the optimization model from examples of solutions to the model. The C++ system implements the algorithm of Kececioglu and Kim for general inverse parametric optimization, which is guaranteed to efficiently find optimal parameter values for any optimization model whose objective function is linear in its parameters. The algorithm uses a reduction to the optimization problem known as linear programming, and the prototype software system has interfaces to the COIN-OR and CPLEX packages for solving the underlying linear programming instances. The prototype system currently handles so-called complete examples, and extensions are underway to handle partial examples. One graduate research assistant, Lisa Peairs, was supported on the award.