Solving LQ tracking (LQT) via dynamic programming

WarningLQT is not a standard abbreviation

We use the abbreviation LQT for LQ tracking. Admittedly, in contrast to LQR, the abbreviation LQT is not common and widely accepted. But it seems not yet reserved for anything else, so why not using it?

As usual, we consider a discrete-time LTI system modelled by \bm x_{k+1} = \mathbf A\bm x_k + \mathbf B\bm u_k, \tag{1} but this time we we want to minimize the quadratic cost given by \begin{aligned} J_0(\bm x_0, \bm u_0, \bm u_1, \ldots, \bm u_{N-1}) &= \frac{1}{2}(\bm x_N-\mathbf x_N^\mathrm{ref})^\top \mathbf S_N (\bm x_N-\mathbf x_N^\mathrm{ref})\\ &\qquad + \frac{1}{2}\sum_{k=0}^{N-1}\left[(\bm x_k-\mathbf x_k^\mathrm{ref})^\top \mathbf Q (\bm x_k-\mathbf x_k^\mathrm{ref}) + (\bm u_k-\mathbf u_k^\mathrm{ref})^\top \mathbf R (\bm u_k-\mathbf u_k^\mathrm{ref})\right], \end{aligned} \tag{2} where \{\mathbf x_k^\mathrm{ref}\}_{k=0}^N and \{\mathbf u_k^\mathrm{ref}\}_{k=0}^{N-1} are prescribed reference state and control trajectories, respectively, and we impose the usual assumption on the symmetric weighing matrices \mathbf S_N\succeq 0, \mathbf Q\succeq 0, \mathbf R\succ 0.

Here we assume that the reference state and control trajectories are compatible with the system, that is, that they satisfy \mathbf x_{k+1}^\mathrm{ref} = \mathbf A\mathbf x_k^\mathrm{ref} + \mathbf B\mathbf u_k^\mathrm{ref}. \tag{3}

This will naturally be the case when such trajectories are obtained as a solution to an optimal control problem, perhaps by using direct methods. Since such pair of control-state trajectories only constitutes an open-loop control strategy, we are now after a feedback control strategy that will track the reference trajectories.

ImportantWhat if the reference trajectory is unrelated to the system dynamics?

It is also fairly common to have the reference trajectory \{\mathbf x_k^\mathrm{ref}\}_{k=0}^N that was not generated by the model of the system, but was obtained from some other source. In this case, the procedure that we are about to describe will have to be modified. #TODO: decide whether or not we promise including such derivation or at least a hint then.

Thanks to the fact that the state-control pair of trajectorie satisfies Eq. 3, we can substract \mathbf x_{k+1}^\mathrm{ref} from both sides of Eq. 1 to get \underbrace{\bm x_{k+1} - \mathbf x_{k+1}^\mathrm{ref}}_{\delta x_{k+1}} = \mathbf A (\underbrace{\bm x_k- \mathbf x_{k}^\mathrm{ref}}_{\delta x_k}) + \mathbf B(\underbrace{\bm u_k - \mathbf u_{k}^\mathrm{ref}}_{\delta u_k}), which we can interpret – and we have already done it in the above equation – as a linear model in deviation (from the reference) variables.

We now try to mimic the derivation of the LQR algorithm using DP. We start at the end of the time interval and find the optimal cost J_N^\star(\bm x_N) = \frac{1}{2}(\bm x_N-\mathbf x_N^\mathrm{ref})^\top \mathbf S_N (\bm x_N-\mathbf x_N^\mathrm{ref}).

Let’s multiple the brackets out \begin{aligned} J_N^\star(\bm x_N) &= \frac{1}{2}(\bm x_N-\mathbf x_N^\mathrm{ref})^\top \mathbf S_N (\bm x_N-\mathbf x_N^\mathrm{ref})\\ &= \frac{1}{2}(\bm x_N^\top \mathbf S_N \bm x_N - 2\bm x_N^\top \mathbf S_N \mathbf x_N^\mathrm{ref} + \mathbf x_N^\mathrm{ref}\mathbf S_N \mathbf x_N^\mathrm{ref})\\ &= \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N - \bm x_N^\top \underbrace{\mathbf S_N \mathbf x_N^\mathrm{ref}}_{\mathbf c_N} + \underbrace{\frac{1}{2}\mathbf x_N^\mathrm{ref}\mathbf S_N \mathbf x_N^\mathrm{ref}}_{d_N}. \end{aligned} \tag{4}

There is a major difference from the final cost of the LQR problem, because now we also have linear (in \mathrm x_N) and constant terms, for which we identified the vector coefficient \mathbf c_N=\mathbf S_N \mathbf x_N^\mathrm{ref} and the constant coefficient d_N=\frac{1}{2}\mathbf x_N^\mathrm{ref}\mathbf S_N \mathbf x_N^\mathrm{ref}.

Decreasing the discrete time to N-1, we can invoke the principle of optimality to write J^\star_{N-1}(\bm x_{N-1}) = \min_{\bm u_{N-1}\in\mathbb R^m} \left[L(\bm x_{N-1},\bm u_{N-1}) + J^\star_{N}(\bm x_{N}) \right], where \begin{aligned} J_{N-1}(\bm x_{N-1},\bm u_{N-1}) &= \frac{1}{2} \left[(\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref})^\top \mathbf Q (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref}) + (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref})^\top \mathbf R (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref}) \right] + J^\star_{N}(\bm x_{N})\\ &= \frac{1}{2} \left[(\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref})^\top \mathbf Q (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref}) + (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref})^\top \mathbf R (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref}) \right] + \frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N\\ &= \frac{1}{2} \left[ (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref})^\top \mathbf Q (\bm x_{N-1}-\mathbf x_{N-1}^\mathrm{ref}) + (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref})^\top \mathbf R (\bm u_{N-1}-\mathbf u_{N-1}^\mathrm{ref}) + \bm x_N^\top \mathbf S_N \bm x_N \right]\\ &= \frac{1}{2} \left[ \bm x_{N-1}^\top \mathbf Q \bm x_{N-1} - 2\bm x_{N-1}^\top \mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf x_{N-1}^\mathrm{ref}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} - 2\bm u_{N-1}^\top \mathbf R \mathbf u_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^\mathrm{ref}\mathbf R \mathbf u_{N-1}^\mathrm{ref} + \bm x_N^\top \mathbf S_N \bm x_N \right]\\ &= \frac{1}{2} \left[ \bm x_{N-1}^\top \mathbf Q \bm x_{N-1} + \bm u_{N-1}^\top \mathbf R \bm u_{N-1} + \bm x_N^\top \mathbf S_N \bm x_N - 2\bm x_{N-1}^\top \mathbf Q \mathbf x_{N-1}^\mathrm{ref} - 2\bm u_{N-1}^\top \mathbf R \mathbf u_{N-1}^\mathrm{ref} + \mathbf x_{N-1}^\mathrm{ref}\mathbf Q \mathbf x_{N-1}^\mathrm{ref} + \mathbf u_{N-1}^\mathrm{ref}\mathbf R \mathbf u_{N-1}^\mathrm{ref} \right] \end{aligned}

Back to top