SIAM News Blog
SIAM News

# 2013 Reid Prize Lecture

#### Some Solvable Stochastic Control Problems

Tyrone Duncan, a professor of mathematics at the University of Kansas, received the 2013 W.T. and Idalia Reid Prize for “fundamental contributions to nonlinear filtering, stochastic control, and the relation between probability and geometry.” The prize was awarded at the SIAM Annual Meeting and control conference in San Diego, where Duncan gave the prize lecture, “Some Solvable Stochastic Control Problems.” For readers who missed the talk, James Case considers highlights.

Optimal control theory began to diverge from the calculus of variations in the aftermath of World War II, when it was found that problems of the form

$\tag{P} \mathrm{minimize}~_{u(\cdot)\in\mathfrak{U}}J=K(T,x(T))\\ + \int^{T}_{0} L(t,x(t),u(t))dt\\ \dot{x}(t)=f(t,x(t),u(t)); u(t) \in U$

offered modeling flexibility unmatched by the older theory. If $$f(t,x(t),u(t)) \equiv u(t)$$, P reduces to a problem in the calculus of variations. Here $$T$$ is the first instant at which $$x(t)$$ belongs to a specific “terminal manifold” $$\mathfrak M$$ in space–time, $$U$$ is a (frequently compact) subset of Euclidean space, and $$\mathfrak U$$ is a class of admissible control histories $$u(t);0 \leq t \leq T$$ taking values in $$U$$. Early milestones include Donald Bushaw’s Princeton thesis (1953), in which he devised a method for solving a narrow class of two-dimensional “bang-bang” control problems, L.S. Pontryagin’s multidimensional maximum principle (published in the West in 1962), and R.E. Kalman’s solution of the (also multidimensional) linear-quadratic “tracking” problem (1960). Initially, the problems considered were all deterministic. Only later was it realized that quite similar methods could be used to treat stochastic problems, in which a random process—most commonly Gaussian white noise $$W(t)$$—is included among the arguments of $$f$$.

Kalman’s linear-quadratic problems form the most exhaustively (and fruitfully) studied class of optimal control problems. In them one seeks—for a fixed termination time $$T$$—to keep the output “history” $$y(t); 0 \leq t \leq T$$ of a specific system from wandering uncomfortably far from a predetermined path $$z(t); 0 \leq t \leq T$$ without recourse to unduly costly control histories $$u(t); 0 \leq t \leq T$$. The instantaneous output $$y(t)$$ is related to the instantaneous state $$x(t)$$ of a linear system $$\dot{x}(t) = A(t)x(t) + B(t)u(t)$$ by the relation $$y(t) = C(t)x(t)$$, where $$x$$, $$y$$, and $$u$$ are column vectors of (constant) dimension $$N$$, $$n$$, and $$m$$, respectively; for each $$t \in [0,T]$$, $$A(t)$$, $$B(t)$$, and $$C(t)$$ are conformable real matrices. In this case

$\tag{1} L(t,x(t),u(t))=\\ ½ < [y(t)-z(t)], Q(t)[y(t)-z(t)]>+\\ ½ <u(t),r(t)u(t)>$

while

$\tag{2} K(T,x(T))=\\ ½<[y(T)-z(T)],M[y(T)-z(T)]>.$

This reduces to (2) when $$y(t)$$ is replaced by $$C(t)x(t)$$ in $$K$$ and $$L$$. The latter evidently grows with the vectors to be multiplied by the symmetric matrices $$Q(t)$$, $$R(t)$$, and $$M$$, of which $$R(t)$$ is assumed to be positive-definite for each $$t \in [0,T]$$, while $$M$$ and $$Q(t)$$ need only be positive-semi-definite. Thus, $$J$$ is invariably non-negative, vanishing only in contrived circumstances. Following Kalman, Duncan confined his remarks to the special case $$z(t) \equiv 0; C(t) \equiv I$$, so that the controller’s conflicting aims are to make both $$x(t)$$ and $$u(t)$$ small.

To obtain the complete solution of such an optimal control problem, it suffices to solve the first-order (typically nonlinear) Hamilton–Jacobi partial differential equation

$\tag{3} V_{t}+\mathrm{min}_{(u \in R^{m})}H(t,x,u,\bigtriangledown V(t,x))=0$

satisfied by the value function $$V(t,x) = \mathrm{inf}_{u(\cdot)\in\mathfrak{U}}J$$. Ordinarily, the class $$\mathfrak{U}$$ of admissible control histories $$u(t);0 \leq t \leq T$$ must include an abundance of discontinuous functions to transform the foregoing inf into an attained minimum. Here

$\tag{4} H(t,x,u,p) = L(t,x,u)+ < p, f(t,x,u),>$
and the HJE is to be solved subject to the boundary condition
$\tag{5} V(t,x)=K(t,x)$

for all $$(t,x) \in \mathfrak{M}$$. These linear-quadratic tracking problems are by far the largest class of optimal control problems for which the foregoing program can be carried to completion. For less tractable problems, it is usually necessary either to “synthesize” the complete solution from partial solutions obtained by satisfying, for example, Euler equations or the maximum principle, or to approximate.

As Kalman demonstrated, the solution of the HJE for the linear-quadratic tracking problem is $$V(t,x) = ½ < x,P(t)x >$$, where $$P(t)$$ is the unique positive-definite symmetric solution of the matrix Riccati initial value problem (IVP)

$\tag{6} \dot{P}=-PA-A^{T}P+PBR^{-1}B^{T}P-Q\\ P(T)=M.$

Accordingly, the optimal feedback control is $$u^{*}(t) = –R^{–1}(t)B(t)P(t)x(t)$$, and, once calculated, the solution $$P(t); 0 \leq t \leq T$$ can be used repeatedly to compute any number of optimal pairs $$(x^{*}(t),u^{*}(t)); 0 \leq t \leq T$$ associated with initial states $$x(0) = x_{0}$$ of interest. Should the matrices $$A$$, $$B$$, $$C$$, $$Q$$, $$R$$, and $$M$$ be composed entirely of constants, one can set $$T = \infty$$ and employ $$u^{*}(t) = –R^{–1}BPx(t)$$, where $$P$$ is the unique positive-definite symmetric solution of the algebraic Riccati equation obtained from the above by setting $$\dot{P} = 0$$. In that case it turns out that—under suitable conditions—the eigenvalues of the matrix $$\hat{A} = A – BR^{–1}BP$$ all have negative real parts, so that the “controlled system” $$\dot{x} = \hat{A}x$$  is asymptotically stable.

Linear-quadratic models are better known for their tractability than for their ability to mimic real-world situations. Nevertheless, they are frequently employed in practice and have been generalized in countless directions. Among the most interesting generalizations are those in which the (now random) state variable $$X(t)$$ evolves according to the stochastic relation

$\tag{7} dX(t)=AX(t)+BU(t))dt+dW(t),$

while the “objective functional” $$J$$ is replaced by its expected value $$\hat{J} = E(J)$$. It is a remarkable fact—for which Duncan provided a partial explanation—that as long as $$W$$ remains Markovian, the previous feedback control $$u^{*}(t) = –R^{–1}(t)B(t)P(t)x(t)$$ is again optimal! If, however, the noise process $$W(t)$$ is less well-behaved, the optimal control becomes

$\tag{8} u^{*}(t)=-R^{-1}(t)B(t)P(t)x(t)+\Sigma(t),$

where $$\Sigma(t)$$ is a certain linear functional (stochastic integral) of $$W(t)$$ representing the minimum least-squares expected error prediction of the response of the system in question to future noise. Duncan has developed explicit constructions for $$\Sigma(t)$$ under various assumptions on the noise process $$W(t)$$. If the noise is standard (Gaussian) Brownian motion, for instance, the value function $$V(t,x)$$ is again well defined and satisfies the (now second-order) Hamilton–Jacobi–Bellman PDE

$\tag{9} V_t+tr(\bigtriangledown^{2}V(t,x))+\\ \mathrm{min}_{u \in R^{m}}H(t,x,u,\bigtriangledown V(t,x))=0,$

along with the boundary condition $$V(t,x) = K(t,x)$$ for all $$(t,x) \in \mathfrak{M}$$. Solving for the unknown value function $$V(\cdot,\cdot)$$ is then feasible. But if $$W$$ is an arbitrary square-integrable process with continuous sample paths and a “filtration” $$\mathcal{F}(t);0 \leq t \leq T$$, other techniques are called for.

A few words on filtration: The study of stochastic differential equations—and thus of stochastic optimal control—ordinarily takes place in a measurable space $$(\Omega, \mathcal{F})$$ in which $$\mathcal{F}$$ denotes a $$\sigma$$-algebra of measurable subsets of $$\Omega$$, to be thought of as “events” that might or might not occur. A filtration $$\mathcal{F}(t);0 \leq t \leq T$$ on $$\mathcal{F}$$ is then a collection of $$\sigma$$-subalgebras of $$\mathcal{F}$$ indexed such that $$\mathcal{F}(s) \subset \mathcal{F}(t)$$ whenever $$s < t$$. $$\mathcal{F}(t)$$ is thus a collection of events—growing with the passage of time—that occur at or before time $$t$$. The requirement that $$\mathfrak{U}$$ consist of control histories $$u(t); 0 \leq t \leq T$$ that are $$\mathcal{F}(t)$$-measurable for each $$0 \leq t leq T$$ then ensures that the elements of $$\mathfrak{U}$$ are “non-anticipative” in the sense that they do not employ as yet unavailable information.

Rather than solve more HJB equations, Duncan resorts to a variant of the age-old “complete the square” argument to establish the optimality of the feedback control  $$u^{*}(t)$$ mentioned earlier, provided that

$\tag{10} \Sigma(t)=\int^T_t \Phi_P(s,t)P(s)dW(s),$

$$\Phi_{P}$$ being the solution to the (linear) matrix IVP

$\tag{11} \dot{\Phi}_P(s,t)=-(A^{T}-P(t)BR^{-1}B^{T})\Phi_P(s,t)\\ \Phi_P(s,s)=I.$
Such solutions are often called “fundamental” solutions of (11).

Perhaps the most interesting special case of the foregoing result is one that Duncan has explored since 2006 in a series of papers with B. Pasik-Duncan, in which $$W(t)$$ is a “standard fractional Brownian motion.” A real-valued process$$(\mathbb{B}(t);t \geq 0)$$ is a standard fractional Brownian motion with Hurst parameter $$H \in (0,1)$$ if it is a Gaussian process with continuous sample paths satisfying $$E[\mathbb{B}(t)] = 0$$ and $$E[\mathbb{B}(s)\mathbb{B}(t)] = ½(t^{2H} + s^{2H} – |t – s|^{2H})$$  for all $$s,t > 0$$. The formal derivative $$d\mathbb{B}/dt$$ is then called “fractional” Gaussian noise. Fractional Brownian motion differs from standard Brownian motion in that, for $$r < s < t$$, successive increments $$\mathbb{B}(s) – \mathbb{B}(r)$$ and $$\mathbb{B}(t) – \mathbb{B}(s)$$ are no longer independent. Increments are positively correlated with Hurst coefficients $$H > ½$$ and negatively correlated with $$H < ½$$. Duncan pointed out that time series with correlated increments occur in the empirical study of turbulence, hydrology, earthquakes, telecommunications, and economic data, to name but a few, and displayed a formula analogous to the above for $$\Sigma(t)$$ involving a fractional integral if $$H > ½$$ and a fractional derivative if $$H < ½$$. Stock market crashes seem to occur when the successive increments in key financial time series cease to be independent and become positively correlated.

Duncan also exhibited a few extensions of linear-quadratic regulator theory to infinite-dimensional state spaces, to cases in which the objective functional is $$\hat{J} = \mathrm{exp}(\int^{T}_{0}L(t,x,u)dt)$$, and to two-player, zero-sum linear-quadratic stochastic differential games. (He did not mention many-player differential game theory, either stochastic or deterministic, as developed during the 1960s and 1970s.) Finally, he considered the control of Brownian motions in rank-one symmetric spaces, such as the (compact) two-sphere and the (noncompact) real hyperbolic space of dimension 2. In the latter, the optimal feedback control involves the hyperbolic tangent of the state variable. In each case, he was able to give specific formulas for the main objects of interest, and argued by example that, though much is known, much remains to be learned about the control of even quite simple stochastic processes.

James Case writes from Baltimore, Maryland.