LQ tracking (LQT) via dynamic programming

LQT is not a standard abbreviation

We use the abbreviation LQT for LQ tracking. Admittedly, in contrast to LQR, the abbreviation LQT is not common and widely used.

As usual, we consider a discrete-time LTI system modelled by \bm x_{k+1} = \mathbf A\bm x_k + \mathbf B\bm u_k, \tag{1} but this time we want to minimize the quadratic cost given by \begin{aligned} J_0(\bm x_0, \bm u_0, \bm u_1, \ldots, \bm u_{N-1}) &= \frac{1}{2}(\bm x_N-\mathbf x_N^\mathrm{ref})^\top \mathbf S_N (\bm x_N-\mathbf x_N^\mathrm{ref})\\ &\qquad + \frac{1}{2}\sum_{k=0}^{N-1}\left[(\bm x_k-\mathbf x_k^\mathrm{ref})^\top \mathbf Q (\bm x_k-\mathbf x_k^\mathrm{ref}) + (\bm u_k-\mathbf u_k^\mathrm{ref})^\top \mathbf R (\bm u_k-\mathbf u_k^\mathrm{ref})\right], \end{aligned} \tag{2} where \{\mathbf x_k^\mathrm{ref}\}_{k=0}^N and \{\mathbf u_k^\mathrm{ref}\}_{k=0}^{N-1} are prescribed reference state and control trajectories, respectively, and we impose the usual assumption on the symmetric weighing matrices \mathbf S_N\succeq 0, \mathbf Q\succeq 0, \mathbf R\succ 0.

There are now two possible cases.

Reference trajectories compatible with the system

First, we assume that the reference state and control trajectories are compatible with the system, that is, that they satisfy \mathbf x_{k+1}^\mathrm{ref} = \mathbf A\mathbf x_k^\mathrm{ref} + \mathbf B\mathbf u_k^\mathrm{ref}. \tag{3}

This will naturally be the case when such state and control trajectories are obtained as a solution to an optimal control problem, perhaps by using direct methods. Direct methods solve the optimal control problem for trajectories, hence, they provide open-loop control. One way to introduce feedback into the control scheme, that is particularly useful for control of nonlinear systems, is to design a feedback controller that tracks the reference trajectories.

They do so by using a local linear model linearized around the reference trajectory, which enables to use the linear techniques such as LQR. However, since he operating point (\bm x_k^\mathrm{ref}, \bm u_k^\mathrm{ref}) in the state-control space, around which the linearization is to be performed, changes with time, the linear model is time-varying. Luckily, this imposes no problem for the LQR design, since on a finite time horizon, the LQR design leads to a time-varying state feedback controller anyway.

Let’s now be more concrete. In this case of compatible reference trajectories, the problem of tracking can be equivalently rewritten as a standard LQR problem in deviation variables, which we can get by subtracting Eq. 3 from Eq. 1 \underbrace{\bm x_{k+1} - \mathbf x_{k+1}^\mathrm{ref}}_{\delta x_{k+1}} = \mathbf A (\underbrace{\bm x_k- \mathbf x_{k}^\mathrm{ref}}_{\delta x_k}) + \mathbf B(\underbrace{\bm u_k - \mathbf u_{k}^\mathrm{ref}}_{\delta u_k}).

Not much more is needed here. For the linear system in deviation variables, we can apply the standard LQR design, which will give us a time-varying state feedback controller in deviation variables. The resulting control law will be of the form \bm u_k = \mathbf u_k^\mathrm{ref} - \mathbf K_k (\bm x_k - \mathbf x_k^\mathrm{ref}), where \mathbf K_k is the time-varying feedback gain obtained from the LQR design.

Reference trajectories incompatible with the system

Here we do not assume that the reference trajectories are compatible wit the system. How could that happen? Perhaps they represent requirements specified by an external entity and the dynamics of the system were not known at the time of creating this specification. Frequently, the reference control trajectory \bm u_k^\mathrm{ref} is not a part of such specification, in which case we can simply set \bm u_k^\mathrm{ref} = \mathbf 0 for all k.

We now try to mimic the derivation of the LQR algorithm using DP. We start at the end of the time interval and find the optimal cost J_N^\star(\bm x_N) = \frac{1}{2}(\bm x_N-\mathbf x_N^\mathrm{ref})^\top \mathbf S_N (\bm x_N-\mathbf x_N^\mathrm{ref}).

Let’s multiply the brackets out \begin{aligned} J_N^\star(\bm x_N) &= \frac{1}{2}(\bm x_N-\mathbf x_N^\mathrm{ref})^\top \mathbf S_N (\bm x_N-\mathbf x_N^\mathrm{ref})\\ &= \frac{1}{2}(\bm x_N^\top \mathbf S_N \bm x_N - 2\bm x_N^{\mathrm{ref}\top} \mathbf S_N \mathbf x_N + \mathbf x_N^\mathrm{ref}\mathbf S_N \mathbf x_N^\mathrm{ref})\\ &= \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N + \left(\underbrace{-\mathbf S_N \mathbf x_N^{\mathrm{ref}}}_{\mathbf c_N}\right)^\top\bm x_N + \underbrace{\frac{1}{2}\mathbf x_N^{\mathrm{ref}\top}\mathbf S_N \mathbf x_N^\mathrm{ref}}_{d_N}. \end{aligned} \tag{4}

There is a major difference from the final cost of the LQR problem, because now we also have linear (in \mathrm x_N) and constant terms, for which we identified the vector coefficient \mathbf c_N and the constant coefficient d_N \mathbf c_N=-\mathbf S_N \mathbf x_N^\mathrm{ref}, \qquad d_N=\frac{1}{2}\mathbf x_N^{\mathrm{ref}\top}\mathbf S_N \mathbf x_N^\mathrm{ref}.

We write it down explicitly for later reference J_N^\star(\bm x_N) = \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N + \mathbf c_N^\top \bm x_N + d_N. \tag{5}

Our plan is now to use the Bellman equation to find the optimal cost at time N-1 and identify the three coefficients of the quadratic, linear, and constant terms in J_{N-1}^\star(\bm x_{N-1}), from which we will induce the general recursion for these three coefficients.

Decreasing the discrete time to N-1, we can write the Q-function as Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) = L(\bm x_{N-1},\bm u_{N-1}) + J^\star_{N}(\bm x_{N}), which sets us to some journey of algebraic manipulations \begin{aligned} Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) &= \frac{1}{2} \left[(\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref})^\top \mathbf Q (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref}) + (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref})^\top \mathbf R (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref}) \right] + J^\star_{N}(\bm x_{N})\\ &= \frac{1}{2} \left[(\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref})^\top \mathbf Q (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref}) + (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref})^\top \mathbf R (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref}) \right] + J^\star_{N}(\mathbf A\bm x_{N-1} + \mathbf B\bm u_{N-1}),\\ &= \frac{1}{2} \left[(\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref})^\top \mathbf Q (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref}) + (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref})^\top \mathbf R (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref}) \right]\\ &\qquad + \frac{1}{2}(\bm x_{N-1}^\top \mathbf A^\top + \bm u_{N-1}^\top \mathbf B^\top) \mathbf S_N (\mathbf A\bm x_{N-1} + \mathbf B\bm u_{N-1}) + \mathbf c_N^\top (\mathbf A \bm x_{N-1} + \mathbf B \bm u_{N-1}) + d_N\\ &= \frac{1}{2}\left [ \bm x_{N-1}^\top \mathbf Q \bm x_{N-1} - 2\bm x_{N-1}^{\mathrm{ref}\top} \mathbf Q \mathbf x_{N-1} + \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} - 2\bm u_{N-1}^{\mathrm{ref}\top} \mathbf R \mathbf u_{N-1} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref} \right] \\ &\qquad + \frac{1}{2}\left [ \bm x_{N-1}^\top \mathbf A^\top \mathbf S_N \mathbf A \bm x_{N-1} + 2\bm x_{N-1}^\top \mathbf A^\top \mathbf S_N \mathbf B \bm u_{N-1} + \bm u_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf B \bm u_{N-1}\right] + c_N^\top \mathbf A \bm x_{N-1} + c_N^\top \mathbf B \bm u_{N-1} + d_N\\ &= \frac{1}{2}\bm u_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) \bm u_{N-1} + (-\bm u_{N-1}^{\mathrm{ref}\top} \mathbf R + \bm x_{N-1}^\top \mathbf A^\top \mathbf S_N \mathbf B + \mathbf c_N^\top \mathbf B) \bm u_{N-1}\\ &\qquad + \frac{1}{2}\bm x_{N-1}^\top (\mathbf Q + \mathbf A^\top \mathbf S_N \mathbf A) \bm x_{N-1} + (-\bm x_{N-1}^{\mathrm{ref}\top} \mathbf Q + \mathbf c_N^\top \mathbf A) \bm x_{N-1} + \frac{1}{2}\left[ \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref}\right] + \mathbf d_N. \end{aligned}

The gradient of Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) with respect to \bm u_{N-1} is given by \nabla_{\bm u_{N-1}} Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}) = (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)\bm u_{N-1} + (- \mathbf R\bm u_{N-1}^{\mathrm{ref}} + \mathbf B^\top \mathbf S_N \mathbf A \bm x_{N-1} + \mathbf B^\top \mathbf c_N). Setting the gradient to zero, we can solve for the optimal control input \bm u_{N-1}^\star as \begin{aligned} \bm u_{N-1}^\star &= -(\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}(\mathbf B^\top \mathbf S_N \mathbf A\bm x_{N-1} - \mathbf R \mathbf u_{N-1}^\mathrm{ref} + \mathbf B^\top \mathbf c_N)\\ &= -\underbrace{(\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}\mathbf B^\top \mathbf S_N \mathbf A}_{\mathbf K_{N-1}}\bm x_{N-1} + \underbrace{(\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}(\mathbf R \mathbf u_{N-1}^\mathrm{ref} - \mathbf B^\top \mathbf c_N)}_{\mathbf k_{N-1}}. \end{aligned}

We write it down explicitly \boxed{ \bm u_{N-1}^\star = -\mathbf K_{N-1}\bm x_{N-1} + \mathbf k_{N-1},} \tag{6} where \boxed{ \mathbf K_{N-1} = (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}\mathbf B^\top \mathbf S_N \mathbf A, \qquad \mathbf k_{N-1} = (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}(\mathbf R \mathbf u_{N-1}^\mathrm{ref} - \mathbf B^\top\mathbf c_N).} \tag{7}

Substituting the optimal control input \bm u_{N-1}^\star back into Q_{N-1}^\star(\bm x_{N-1},\bm u_{N-1}), we can obtain the optimal cost J^\star_{N-1} as \begin{aligned} J_{N-1}^\star(\bm x_{N-1}) &= \frac{1}{2}(-\mathbf K_{N-1}\bm x_{N-1} + \mathbf k_{N-1})^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) (-\mathbf K_{N-1}\bm x_{N-1} + \mathbf k_{N-1}) + (-\bm u_{N-1}^{\mathrm{ref}\top} \mathbf R + \bm x_{N-1}^\top \mathbf A^\top \mathbf S_N \mathbf B + \mathbf c_N^\top \mathbf B) (-\mathbf K_{N-1}\bm x_{N-1} + \mathbf k_{N-1})\\ &\qquad + \frac{1}{2}\bm x_{N-1}^\top (\mathbf Q + \mathbf A^\top \mathbf S_N \mathbf A) \bm x_{N-1} + (-\bm x_{N-1}^{\mathrm{ref}\top} \mathbf Q + \mathbf c_N^\top \mathbf A) \bm x_{N-1} + \frac{1}{2}\left[ \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref}\right] + \mathbf d_N. \end{aligned}

Now we aim at collecting the terms of equal powers of \bm x_{N-1}: \begin{aligned} J_{N-1}^\star(\bm x_{N-1}) &= \frac{1}{2}\bm x_{N-1}^\top \mathbf K_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) \mathbf K_{N-1} \bm x_{N-1} + \frac{1}{2}\bm x_{N-1}^\top (\mathbf Q + \mathbf A^\top \mathbf S_N \mathbf A) \bm x_{N-1} - \bm x_{N-1}^\top \mathbf A^\top \mathbf S_N \mathbf B \mathbf K_{N-1} \bm x_{N-1}\\ &\qquad -\mathbf k_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) \mathbf K_{N-1} \bm x_{N-1} - (-\bm u_{N-1}^{\mathrm{ref}\top}\mathbf R + \mathbf c_n^\top \mathbf B)\mathbf K_{N-1}\bm x_{N-1} + \underbrace{\bm x_{N-1}^\top \mathbf A^\top \mathbf S_N \mathbf B \mathbf k_{N-1}}_{\mathbf k_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf A\bm x_{N-1}} + (-\bm x_{N-1}^{\mathrm{ref}\top}\mathbf Q + \mathbf c_N^\top \mathbf A)\bm x_{N-1} \\ &\qquad + \frac{1}{2}\mathbf k_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) \mathbf k_{N-1} + \left(-\bm u_{N-1}^{\mathrm{ref}\top} \mathbf R + \mathbf c_N^\top \mathbf B\right) \mathbf k_{N-1} + \frac{1}{2}\left( \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref}\right) + \mathbf d_N. \end{aligned}

From this we can identify the three terms characterizing the optimal cost J_{N-1}^\star as a quadratic function of \bm x_{N-1}, that is, the matrix coefficient \mathbf S_{N-1} of the quadratic term, the vector coefficient \mathbf c_{N-1} of the linear term, and the constant coefficient d_{N-1}. We start with the quadratic term \mathbf S_{N-1} = \mathbf K_{N-1}^\top\mathbf R \mathbf K_{N-1} + \mathbf K_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf B \mathbf K_{N-1} + \mathbf Q + \mathbf A^\top \mathbf S_N \mathbf A - 2\mathbf A^\top \mathbf S_N \mathbf B \mathbf K_{N-1}.

We can symmetrize the last term by \mathbf A^\top \mathbf S_N \mathbf B \mathbf K_{N-1} = \frac{1}{2}\left(\mathbf A^\top \mathbf S_N \mathbf B \mathbf K_{N-1} + \mathbf K_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf A\right) (since this has no impact on the value of the corresponding scalar cost function) to get \mathbf S_{N-1} = \mathbf K_{N-1}^\top\mathbf R \mathbf K_{N-1} + \mathbf K_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf B \mathbf K_{N-1} + \mathbf Q + \mathbf A^\top \mathbf S_N \mathbf A - \mathbf A^\top \mathbf S_N \mathbf B \mathbf K_{N-1} - \mathbf K_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf A.

From what we have so far, we could write down the recursion for \mathbf S_{k}, but we can still do some simplification. Note that the second, the fourth, the fifth and the sixth terms can be grouped together to get the ultimate expression \boxed{ \mathbf S_{N-1} = \mathbf K_{N-1}^\top \mathbf R \mathbf K_{N-1} + (\mathbf A-\mathbf B\mathbf K_{N-1})^\top \mathbf S_N(\mathbf A-\mathbf B\mathbf K_{N-1}) + \mathbf Q.} \tag{8}

We are all certainly recognizing the good old Riccati recursion that have already derived for the standard discrete-time LQR problem. We now proceed with the remaining two terms, the linear term and the constant term. The transposed linear term is given by \mathbf c_{N-1}^\top = -\mathbf k_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) \mathbf K_{N-1} + \bm u_{N-1}^{\mathrm{ref}\top}\mathbf R - \mathbf c_N^\top \mathbf B\mathbf K_{N-1} + \mathbf k_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf A -\bm x_{N-1}^{\mathrm{ref}\top}\mathbf Q + \mathbf c_N^\top \mathbf A.

Once again, this is enough to write down the recursion for \mathbf c_k, but we can still do some simplification. Substituting for \mathbf K_{N-1} from Eq. 7, but only into the first term, we can get \mathbf c_{N-1}^\top = -\mathbf k_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}\mathbf B^\top \mathbf S_N \mathbf A + \bm u_{N-1}^{\mathrm{ref}\top}\mathbf R - \mathbf c_N^\top \mathbf B\mathbf K_{N-1} + \mathbf k_{N-1}^\top \mathbf B^\top \mathbf S_N \mathbf A -\bm x_{N-1}^{\mathrm{ref}\top}\mathbf Q + \mathbf c_N^\top \mathbf A, in which after cancelling the two identical terms with \mathbf k_{N-1} we get \mathbf c_{N-1}^\top = \bm u_{N-1}^{\mathrm{ref}\top}\mathbf R - \bm x_{N-1}^{\mathrm{ref}\top}\mathbf Q + \mathbf c_N^\top (\mathbf A - \mathbf B\mathbf K_{N-1}), and after transposing we have the final expression \boxed{ \mathbf c_{N-1} = \mathbf R\bm u_{N-1}^{\mathrm{ref}} - \mathbf Q\bm x_{N-1}^{\mathrm{ref}} + (\mathbf A - \mathbf B\mathbf K_{N-1})^\top\mathbf c_N.} \tag{9}

Finally, the constant term is d_{N-1} = \frac{1}{2}\mathbf k_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) \mathbf k_{N-1} + \left(-\bm u_{N-1}^{\mathrm{ref}\top} \mathbf R + \mathbf c_N^\top \mathbf B\right) \mathbf k_{N-1} + \frac{1}{2}\left( \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref}\right) + \mathbf d_N.

This too can be simplified by substituting for \mathbf k_{N-1} from Eq. 7, but only into the second occurence, to get d_{N-1} = \frac{1}{2}\mathbf k_{N-1}^\top (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B) (\mathbf R + \mathbf B^\top \mathbf S_N \mathbf B)^{-1}(\mathbf R \mathbf u_{N-1}^\mathrm{ref} - \mathbf B^\top\mathbf c_N) + \left(-\bm u_{N-1}^{\mathrm{ref}\top} \mathbf R + \mathbf c_N^\top \mathbf B\right) \mathbf k_{N-1} + \frac{1}{2}\left( \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref}\right) + \mathbf d_N, in which the the first two terms subtract to yield the final expression \boxed{d_{N-1} = \frac{1}{2}\left( \mathbf x_{N-1}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^{\mathrm{ref}\top}\mathbf R \mathbf u_{N-1}^\mathrm{ref}\right) + \mathbf d_N.} \tag{10}

We now summarize all the formulas for general discrete time k \boxed{\begin{aligned} \bm u_k &= -\mathbf K_k \bm x_k + \mathbf k_k,\\ \mathbf K_k &= (\mathbf R + \mathbf B^\top \mathbf S_{k+1} \mathbf B)^{-1}\mathbf B^\top \mathbf S_{k+1} \mathbf A,\\ \mathbf k_k &= (\mathbf R + \mathbf B^\top \mathbf S_{k+1} \mathbf B)^{-1}(\mathbf R \mathbf u_{k}^\mathrm{ref} - \mathbf B^\top\mathbf c_{k+1}),\\ \mathbf S_k &= \mathbf K_k^\top \mathbf R \mathbf K_k + (\mathbf A-\mathbf B\mathbf K_k)^\top \mathbf S_{k+1}(\mathbf A-\mathbf B\mathbf K_k) + \mathbf Q,\\ \mathbf c_k &= \mathbf R\bm u_{k}^\mathrm{ref} - \mathbf Q\bm x_{k}^\mathrm{ref} + (\mathbf A - \mathbf B\mathbf K_k)^\top\mathbf c_{k+1},\\ d_k &= \frac{1}{2}\left( \mathbf x_{k}^{\mathrm{ref}\top}\mathbf Q \mathbf x_{k}^\mathrm{ref} + \mathbf u_{k}^{\mathrm{ref}\top}\mathbf R \mathbf u_{k}^\mathrm{ref}\right) + \mathbf d_{k+1}, \end{aligned}} \tag{11} with the terminal conditions \boxed{\mathbf S_N = \mathbf S_N, \qquad \mathbf c_N = -\mathbf S_N \mathbf x_N^\mathrm{ref}, \qquad d_N = \frac{1}{2}\mathbf x_N^{\mathrm{ref}\top}\mathbf S_N \mathbf x_N^\mathrm{ref}.} \tag{12}