Operations Research
HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
 QUICK SEARCH:   [advanced]


     


OPERATIONS RESEARCH
Vol. 53, No. 5, September-October 2005, pp. 780-798
DOI: 10.1287/opre.1050.0216
This Article
Right arrow Full Text (PDF)
Right arrow References
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Alert me to new issues of the journal
Right arrow Download to citation manager
Right arrow reprints & permissions
Citing Articles
Right arrow Citing Articles via HighWire
Right arrow Citing Articles via Google Scholar
Google Scholar
Right arrow Articles by Nilim, A.
Right arrow Articles by El Ghaoui, L.
Right arrow Search for Related Content

Robust Control of Markov Decision Processes with Uncertain Transition Matrices

Arnab Nilim, Laurent El Ghaoui

Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720
Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94720

nilim{at}eecs.berkeley.edu
elghaoui{at}eecs.berkeley.edu

Optimal solutions to Markov decision problems may be very sensitive with respect to the state transition probabilities. In many practical problems, the estimation of these probabilities is far from accurate. Hence, estimation errors are limiting factors in applying Markov decision processes to real-world problems.

We consider a robust control problem for a finite-state, finite-action Markov decision process, where uncertainty on the transition matrices is described in terms of possibly nonconvex sets. We show that perfect duality holds for this problem, and that as a consequence, it can be solved with a variant of the classical dynamic programming algorithm, the "robust dynamic programming" algorithm. We show that a particular choice of the uncertainty sets, involving likelihood regions or entropy bounds, leads to both a statistically accurate representation of uncertainty, and a complexity of the robust recursion that is almost the same as that of the classical recursion. Hence, robustness can be added at practically no extra computing cost. We derive similar results for other uncertainty sets, including one with a finite number of possible values for the transition matrices.

We describe in a practical path planning example the benefits of using a robust strategy instead of the classical optimal strategy; even if the uncertainty level is only crudely guessed, the robust strategy yields a much better worst-case expected travel time.

Subject classifications: dynamic programming: Markov, finite state; game theory; programming: convex, uncertainty; robustness; statistics: estimation.
History: Received January 2003; revision received January 2004; revision received May 2004; accepted September 2004.




This article has been cited by other articles:


Home page
Operations ResearchHome page
A. E. B. Lim and J. G. Shanthikumar
Relative Entropy, Exponential Utility, and Robust Dynamic Pricing
Operations Research, March 1, 2007; 55(2): 198 - 214.
[Abstract] [PDF]




HOME HELP FEEDBACK SUBSCRIPTIONS ARCHIVE SEARCH TABLE OF CONTENTS
Copyright © 2005 by INFORMS.