DMS 9703812 Research on Adaptive Estimation and Control of Dynamical Systems. Michael N. Katehakis and Herbert Robbins Rutgers University Abstract This research involves work on adaptive control of dynamic systems. The basic dynamic model is known as the "Markov decision process with incomplete information" (MDP) problem, where the transition law and/or the expected one-period rewards may depend on unknown parameters. The most notable results in this area are based on ideas utilizing either a separation principle and the related certainty-equivalence rule, or uniformly efficient rules for the model of sequential allocation known as the multi-armed bandit (MAB) problem. Limitations of the certainty-equivalence rule are: i) there is no claim on the rate of convergence, and ii) there are cases for which, with positive probability, this rule can prematurely converge to a wrong parameter value so that it eventually uses only a non-optimal policy. The typical approach in the latter studies has been to fit the larger MDP model into the smaller MAB one by considering each deterministic policy as a reward-generating population (bandit). A consequence of this is that the resulting statistically efficient procedures involve sampling from all deterministic policies and do not otherwise utilize the optimization aspect of the problem. Thus, they become limited in scope by data collection complexity. The reason is that in practice the state spaces of MDP models tend to be very large and the set of deterministic policies is immense. In recent work the investigators have obtained adaptive procedures with data collection requirements that are proportional to the number of state - action pairs of the MDP, under a minimal irreducibility condition. A major direction of the proposed research involves the development of solutions for important more general problems such as i) multi-chain MDPs, ii) the case in which there a re side constraints, and iii) discounted streams of rewards. A second important goal is the development of new adaptive statistical methods that possess practically useful implementation and optimality properties for the related problems of detection of total error and change points. The main idea of adaptive control is to compute strategies (policies, or control rules) for the operation of a system that estimate the unknown parameters of the system, and in doing so converge to a strategy that is optimal for the true values of the unknown parameters. Applications arise in many areas of modern engineering, finance, and operations research, such as reliability, maintenance, quality control, scheduling, inventory, and production planning. Consequently, this type of problem has been widely studied in the literature. However, effective procedures that take into account and optimize the speed of convergence have been obtained only recently for specific models, often, with prohibitive data collection complexity. A primary objective of the proposed research is the development of relatively simple adaptive control procedures with reasonable computational and memory requirements for on-line implementation, for a wide class of problems, utilizing ideas from recent work of the investigators. Another important goal is the development of new methods for specific models useful in such areas as software reliability (error detection) and quality control (change points). This research relates to the following strategic areas of national concern: high performance computing, communications, and manufacturing.

Agency
National Science Foundation (NSF)
Institute
Division of Mathematical Sciences (DMS)
Application #
9703812
Program Officer
Dean M Evasius
Project Start
Project End
Budget Start
1997-08-01
Budget End
2000-07-31
Support Year
Fiscal Year
1997
Total Cost
$100,000
Indirect Cost
Name
Rutgers University
Department
Type
DUNS #
City
New Brunswick
State
NJ
Country
United States
Zip Code
08901