Solving LQ regulation (LQR) via dynamic programming
In the previous section we have used dynamic programming as a numerical algorithm for solving a general discrete-time optimal control problem. We now show how to use dynamic programming to solve the discrete-time LQR problem. We consider a linear discrete-time system modelled by \bm x_{k+1} = \mathbf A\bm x_k + \mathbf B\bm u_k, for which we want to minimize the quadratic cost given by J_0(\bm x_0, \bm u_0, \bm u_1, \ldots, \bm u_{N-1}) = \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N + \frac{1}{2}\sum_{k=0}^{N-1}\left(\bm x_k^\top \mathbf Q \bm x_k + \bm u_k^\top \mathbf R \bm u_k\right), with \mathbf S_N\succeq 0, \mathbf Q\succeq 0, \mathbf R\succ 0, as usual.
We now invoke the principle of optimality, that is, we start at the end of the time interval and find the optimal cost J_N^\star(\bm x_N) = \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N.
Actually, we have nothing to minimize here, we just evaluate the cost. We then proceed backwards in time, that is, we decrease the time to k=N-1. Here we do have something to minimize: J^\star_{N-1}(\bm x_{N-1}) = \min_{\bm u_{N-1}\in\mathbb R^m} \left[\underbrace{L(\bm x_{N-1},\bm u_{N-1}) + J^\star_{N}(\bm x_{N})}_{Q_{N-1}^\star(\bm x_k, \bm u_k)} \right], in which we recall our previously introduced notation for the function to be minimized as Q_{N-1}^\star, also called a Q-function.
The notational clash between the function Q^\star and the weighting matrix \mathbf Q is unfortunate, but the use of both symbols is so much established in this area that we will have to live with it. We have to rely on the font and the symbol of \star to distinguish between the two.
We now expand the expression for the Q-function, with the obvious motivation to solve the optimization with respect to \bm u_{N-1} analytically:
\begin{aligned}
Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) &= \frac{1}{2} \left(\bm x_{N-1}^\top \mathbf Q \bm x_{N-1} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} \right) + J^\star_{N}(\bm x_{N}) \\
&= \frac{1}{2} \left(\bm x_{N-1}^\top \mathbf Q \bm x_{N-1} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} \right) + \frac{1}{2}\mathbf x_N^\top \mathbf S_N \mathbf x_N\\
&= \frac{1}{2} \left( \bm x_{N-1}^\top \mathbf Q \bm x_{N-1} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} + \mathbf x_N^\top \mathbf S_N \mathbf x_N \right)\\
&= \frac{1}{2} \left[ \bm x_{N-1}^\top \mathbf Q \bm x_{N-1} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} + (\bm x_{N-1}^\top \mathbf A^\top + \bm u_{N-1}^\top \mathbf B^\top) \mathbf S_N (\mathbf A\bm x_{N-1} + \mathbf B\bm u_{N-1}) \right]\\
&= \frac{1}{2} \left[\bm x_{N-1}^\top (\mathbf Q + \mathbf A^\top\mathbf S_N \mathbf A)\bm x_{N-1} + 2\mathbf x^\top_{N-1}\mathbf A ^\top \mathbf S_N \mathbf B \bm u_{N-1} + \mathbf u^\top_{N-1}(\mathbf R + \mathbf B^\top \mathbf S_n \mathbf B)\bm u_{N-1} \right].
\end{aligned}
Since we assumed no constraint on \bm u_{N-1}, finding the minimum of Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) with respect to \bm u_{N-1} (while regarding \bm x_{N-1} fixed) is as easy as setting its gradient to zero \nabla_{\bm u_{N-1}} Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) = (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)\bm u_{N-1} + \mathbf B^\top \mathbf S_N\mathbf A\bm x_{N-1} = \mathbf 0, which leads to \bm u_{N-1}^\star = -\underbrace{(\mathbf B^\top \mathbf S_N\mathbf B + \mathbf R)^{-1}\mathbf B^\top \mathbf S_N \mathbf A}_{\mathbf K_{N-1}} \bm x_{N-1}, which amounts to solving a system of linear equations. We can also recognize the Kalman gain matrix \mathbf K_{N-1}, which we derived using the indirect approach in the previous chapter.
The optimal cost J^\star_{N-1} can be obtained by substituting \bm u_{N-1}^\star into Q_{N-1}^\star J_{N-1}^\star = \frac{1}{2}\bm x_{N-1}^\top \underbrace{\left[(\mathbf A-\mathbf B\mathbf K_{N-1})^\top \mathbf S_N(\mathbf A-\mathbf B\mathbf K_{N-1}) + \mathbf K_{N-1}^\top \mathbf R \mathbf K_{N-1} + \mathbf Q\right]}_{\mathbf S_{N-1}} \bm x_{N-1}.
Note that the optimal cost J^\star_{N-1} is also a quadratic function of the state as is the cost J^\star_{N}. We denote the matrix that defines this quadratic function as \mathbf S_{N-1}. We do this in anticipation of continuation of this procedure to k = N-2, N-3, \ldots, which will give \mathbf S_{N-2}, \mathbf S_{N-3}, \ldots. The rest of the story is quite predictable, isn’t it? Applying the Bellman’s principle of optimality we (re)discovered the discrete-time Riccati equation in the Joseph stabilized form \boxed{ \mathbf S_k = (\mathbf A-\mathbf B\mathbf K_{k})^\top \mathbf S_{k+1}(\mathbf A-\mathbf B\mathbf K_{k}) + \mathbf K_{k}^\top \mathbf R \mathbf K_{k} + \mathbf Q,} together with the prescription for the state feedback (Kalman) gain \boxed{ \mathbf K_{k} = (\mathbf B^\top \mathbf S_{k+1}\mathbf B + \mathbf R)^{-1}\mathbf B^\top \mathbf S_{k+1} \mathbf A,} and with the expression for the optimal cost as a (quadratic) function of the initial state \boxed{ J_k^\star(\bm x_{k}) = \frac{1}{2}\bm x_k^\top \mathbf S_k \bm x_k.}
Back to top