General nonlinear discrete-time optimal control as a two-point boundary value problem

While in the previous chapter we formulated an optimal control problem (OCP) directly as a mathematical programming (general NLP or even QP) problem over the control (and possibly state) trajectories, in this chapter we introduce an alternative – indirect – approach. The essence of the approach is that we formulate first-order necessary conditions of optimality for the OCP in the form of equations, and then solve these. Although less straightforward to extend with additional constraints than the direct approach, the indirect approach also exhibits some advantages. In particular, in some cases (such as a quadratic cost and a linear system) it yields a feedback control law and not just a control trajectory.

Optimization constrains given only by the state equations

As in the chapter on the direct approach, here we also start by considering a general nonlinear and possibly time-varying discrete-time dynamical system characterized by the state vector \bm x_k\in\mathbb R^n whose evolution in discrete time k is uniquely determined by the state equation \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k), accompanied by the initial state (vector) \bm x_i\in\mathbb R^n and a sequence of control inputs \bm u_i, \bm u_{i+1}, \ldots, \bm u_{k-1}, where the control variable can also be a vector, that is, \bm u_k \in \mathbb R^m.

These state equations will constitute the only constraints of the optimization problem. Unlike in the direct approach, here in our introductory treatment we do not impose any inequality constraints such as bounds on the control inputs, because the theory to be presented is not able to handle them.

General additive cost function

For the above described dynamical system we want to find a control sequence \bm u_k that minimizes a suitable optimization criterion over a finite horizon k\in[i,N]. Namely, we will look for a control that minimizes a criterion of the following kind J_i^N(\underbrace{\bm x_{i+1}, \bm x_{i+2}, \ldots, \bm x_{N}}_{\bar{\bm x}}, \underbrace{\bm u_{i}, \ldots, \bm u_{N-1}}_{\bar{\bm u}};\bm x_i) = \phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k). \tag{1}

Note

Regarding the notation J_i^N(\cdot) for the cost, if the initial and final times are understood from the context, they do not have to be displayed. But we will soon need to indicate the initial time explicitly in our derivations.

The property of the presented cost function that will turn out crucial in our subsequent work is that is additive over the time horizon. Although this restricts the class of cost functions a bit, it is still general enough to encompass a wide range of problems, such as minimizing the total (financial) cost to be paid, the total energy to be used, the total distance to be travelled, the cumulative error to be minimized, etc.

Here is a list of a few popular cost functions.

Minimum-time (or time-optimal) problem: Setting \phi=1 and L_k=1 gives J=N-i, that is, the length of the time horizon, the duration of control. Altough in this course we do not introduce concepts and tools for optimization over integer variables, in this simple case of just a single integer variable even a simple search over the length of control interval will be computationally tractable. Furthermore, as we will see in one of the next chapters once we switch from discrete-time to continuous-time systems, this time-optimal control design problem will turn out tractable using the tools presented in this course.
Minimum-fuel problem: Setting \phi=0 and L_k=|u_k|, which gives J=\sum_{k=i}^{N-1}|u_k|.
Minimum-energy problem: Setting \phi=0 and L_k=\frac{1}{2} u_k^2, which gives J=\frac{1}{2} \sum_{k=i}^{N-1} u_k^2. It is fair to admit that this sum of squared inputs cannot always be interpretted as the energy – for instance, what if the control input is a degree of openning of a valve? Sum of angles over time can hardly be interpreted as energy. Instead, it should be interpretted in the mathematical way as the (squared) norm, that is, a “size” of the input. Note that the same objection can be given to the previous case of a minimum-fuel problem.
Mixed quadratic problem (also LQ-optimal control problem): Setting \phi=\frac{1}{2}s_N x_N^2 and L_k=\frac{1}{2} (qx_k^2+ru_k^2),\, q,r\geq 0, which gives J=\frac{1}{2}s_Nx_N^2+\frac{1}{2} \sum_{k=i}^{N-1} (r x_k^2+q u_k^2). Or in the case of vector state and control variables J=\frac{1}{2}\bm x_N^\top \mathbf S_N \bm x_N+\frac{1}{2} \sum_{k=i}^{N-1} (\bm x_k^\top \mathbf Q \bm x_k + \bm u_k^\top \mathbf R \bm u_k), \, \mathbf Q, \mathbf R \succeq 0. This type of an optimization cost is particularly popular. Both for the mathematical reasons (we all now appreciate the nice properties of quadratic functions) and for practical engineering reasons as it allows us to capture a trade-off between the control performance (penalty on \bm x_k) and control effort (penalty on \bm u_k). Note also that the state at the terminal time N is penalized separately just in order to allow another trade-off between the transient and terminal behavior. The cost function can also be modified to penalize deviation of the state from some nonzero desired (aka reference) state trajectory, that is J=\frac{1}{2}(\bm x_N - \bm x_N^\text{ref})^\top \mathbf S_N (\bm x_N - \bm x_N^\text{ref}) +\frac{1}{2} \sum_{k=i}^{N-1} \left((\bm x_k - \bm x_k^\text{ref})^\top \mathbf Q (\bm x_k - \bm x_k^\text{ref}) + \bm u_k^\top \mathbf R \bm u_k\right).

Note that in none of these cost function did we include \bm u_{N} as an optimization variables as it has no influence over the interval [i,N].

Needless to emphasize, as in some other applications maximizing may seem more appropriate (such as maximizing the yield, bandwidth or robustness), we can always reformulate the maximization into minimization. Therefore in our course we always formulate the optimal control problems as minimization problems.

Derivation of the first-order necessary conditions of optimality

Having formulated a finite-dimensional constrained nonlinear optimization problem, we avoid the temptation to call an NLP solver to solve it numerically, and proceed instead with our own analysis of the problem. Let’s see how far we can get. By introducing Lagrange multipliers {\color{blue}\bm\lambda_k}, we turn the constrained problem into an unconstrained one. The new cost function (we use the prime to distinguish it from the original cost) is \begin{aligned} & {J'}_i^N(\bm x_i, \ldots, \bm x_N, \bm u_i, \ldots, \bm u_{N-1},{\color{blue}\bm \lambda_i, \ldots, \bm \lambda_{N-1}}) \\ &\qquad\qquad\qquad = \phi(\bm x_N,N) + \sum_{k=i}^{N-1}\left[L_{k}(\bm x_k,\bm u_k)+\bm {\color{blue}\lambda^\top_{k}}\;\left[\mathbf f_k(\bm x_k,\bm u_k)-\bm x_{k+1}\right]\right]. \end{aligned}

From now on, in principle, we do not need any guidance here, do we? We are given an unconstrained optimization problem and its solution is just a few moments away. In particular, stationary point(s) must be found (and then we are going to argue if these qualify as minimizers or not). This calls for differentiating the above expression with respect to all the variables and setting these derivatives equal to zeros.

Although the principles are clear, some hindsight might be shared here if compact formulas are to be found. First such advice is to rename the variable(s) {\color{blue}\boldsymbol \lambda_k} to {\color{red}\boldsymbol \lambda_{k+1}} \begin{aligned} & {J'}_i^N(\bm x_i, \ldots, \bm x_N, \bm u_i, \ldots, \bm u_{N-1},{\color{red}\bm \lambda_{i+1}, \ldots, \bm \lambda_{N}}) \\ & \qquad\qquad\qquad = \phi(N,\bm x_N) + \sum_{k=i}^{N-1}\left[L_{k}(\bm x_k,\bm u_k)+\boldsymbol {\color{red}\boldsymbol \lambda^\top_{k+1}}\; \left[\mathbf f_k(\bm x_k,\bm u_k)-\mathbf x_{k+1}\right]\right]. \end{aligned}

This is really just a notational decision but thanks to it our resulting formulas will enjoy some symmetry.

Note

Maybe it would be more didactic to leave you to go on without this advice on notation, and only then to nudge you to figure out this remedy by yourself. But admittedly this is not the kind of competence that we aim at in this course. Let’s spend time with more rewarding things.

Another notational advice – but this one is more systematic and fundamental — is to make the above expression a bit shorter by introducing a new variable defined as \boxed{H_k(\bm x_k,\bm u_k,\boldsymbol\lambda_{k+1}) = L_{k}(\bm x_k,\bm u_k)+\boldsymbol \lambda_{k+1}^\top \; \mathbf f_k(\bm x_k,\bm u_k).}

We will call this new function Hamiltonian. Indeed, the choice of this name is motivated by the analogy with the equally named concept used in physics and theoretical mechanics, but we will only make more references to this analogy later in the course once we transition to continuous-time systems modelled by differential equations.

Introducing the Hamiltonian reformulates the cost function (and we omit the explicit dependence on all its input arguments) as {J'}_i^N = \phi(N,\bm x_N) + \sum_{k=i}^{N-1}\left[H_{k}(\bm x_k,\bm u_k,\boldsymbol\lambda_{k+1})-\boldsymbol\lambda^\top_{k+1}\;\mathbf x_{k+1}\right].

The final polishing of the expression before starting to compute the derivatives consists in bringing together the terms that contain related variables: the state \bm x_N at the final time, the state \bm x_i at the initial time, and the states, controls and Lagrange multipliers in the transient period
{J'}_i^N = \underbrace{\phi(N,\bm x_N) -\boldsymbol\lambda^\top_{N}\;\mathbf x_{N}}_\text{at terminal time} + \underbrace{H_i(\bm x_i,\mathbf u_i,\boldsymbol\lambda_{i+1})}_\text{at initial time} + \sum_{k=i+1}^{N-1}\left[H_{k}(\bm x_k,\bm u_k,\boldsymbol\lambda_{k+1})-\boldsymbol\lambda^\top_{k}\;\mathbf x_{k}\right].

Although this step was not necessary, it will make things a bit more convenient once we start looking for the derivatives. And the time for it has just come.

Recall now the recommended procedure for finding derivatives of functions of vectors: find the differential instead, and identify the derivative within the result. The gradient is then (by convention) obtained as the transpose of the derivative. Following this derivative-identification procedure, we anticipate the differential of the augmented cost function in the following form \begin{split} \text{d}{J'}_i^N &= (\qquad)^\top \; \text{d}\bm x_N + (\qquad)^\top \; \text{d}\bm x_i \\&+ \sum_{k=i+1}^{N-1}(\qquad)^\top \; \text{d}\bm x_k + \sum_{k=i}^{N-1}(\qquad)^\top \; \text{d}\bm u_k + \sum_{k=i+1}^{N}(\qquad)^\top \; \text{d}\boldsymbol \lambda_k. \end{split}

Identifying the gradients amounts to filling in the empty brackets. It straightforward if tedious (in particular the lower and upper bounds on the summation indices must be carefuly checked). The solution is \begin{split} \text{d}{J'}_i^N &= \left(\nabla_{\bm x_N}\phi-\lambda_N\right)^\top \; \text{d}\bm x_N + \left(\nabla_{\bm x_i}H_i\right)^\top \; \text{d}\bm x_i \\&+ \sum_{k=i+1}^{N-1}\left(\nabla_{\bm x_k}H_k-\boldsymbol\lambda_k\right)^\top \; \text{d}\bm x_k + \sum_{k=i}^{N-1}\left(\nabla_{\bm u_k}H_k\right)^\top \; \text{d}\bm u_k + \sum_{k=i+1}^{N}\left(\nabla_{\boldsymbol \lambda_k}H_{k-1}-\bm x_k\right)^\top \; \text{d}\boldsymbol \lambda_k. \end{split}

The ultimate goal of this derivation was to find stationary points for the augmented cost function, that is, to find conditions under which \text{d}{J'}_i^N=0. In typical optimization problems, the optimization is conducted with respect to all the participating variables, which means that the corresponding differentials may be arbitrary and the only way to guarantee that the total differential of J_i' is zeros is to make the associated gradients (the contents of the brackets) equal to zero. There are two exceptions to this rule in our case, though:

The state at the initial time is typically fixed and not available for optimization. Then \text{d}\bm x_i=0 and the corresponding necessary condition is replaced by the statement that \bm x_i is equal to some particular value, say, \bm x_i = \mathbf x^\text{init}. We have already discussed this before. In fact, in these situations we might even prefer to reflect it by the notation J_i^N(\ldots;\bm x_i), which emphasizes that \bm x_i is a parameter and not a variable. But in the solution below we do allow for the possibility that \bm x_i is a variable too (hence \text{d}\bm x_i\neq 0) for completeness.
The state at the final time may also be given/fixed, in which case the corresponding condition is replaced by the statement that \bm x_N is equal to some particular value, say, \bm x_N = \mathbf x^\text{ref}. But if it is not the case, then the final state is also subject to optimization and the corresponding necessary condition of optimality is obtained by setting the content of the corresponding brackets to zero.

Necessary conditions of optimality as two-point boundary value problem (TP-BVP)

The ultimate form of the first-order necessary conditions of optimality, which incorporates the special cases discussed above, is given by these equations \boxed{ \begin{aligned} \mathbf x_{k+1} &= \nabla_{\boldsymbol\lambda_{k+1}}H_k, \;\;\; \color{gray}{k=i,\ldots, N-1},\\ \boldsymbol\lambda_k &= \nabla_{\bm x_k}H_k, \;\;\; \color{gray}{k=i+1,\ldots, N-1}\\ 0 &= \nabla_{\bm u_k}H_k, \;\;\; \color{gray}{k=i,\ldots, N-1}\\ \color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_N}\phi-\lambda_N\right)^\top \mathrm{d}\bm x_N},\\ \color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_i}H_i\right)^\top \mathrm{d}\bm x_i}, \end{aligned} } or more explicitly \boxed{ \begin{aligned} \mathbf x_{k+1} &= \mathbf f_k(\bm x_k,\bm u_k), \;\;\; \color{gray}{k=i,\ldots, N-1},\\ \boldsymbol\lambda_k &= \nabla_{\bm x_k}\mathbf f_k\;\; \boldsymbol\lambda_{\mathbf k+1}+\nabla_{\bm x_k}L_k, \;\;\; \color{gray}{k=i+1,\ldots, N-1}\\ 0 &= \nabla_{\bm u_k}\mathbf f_k\;\; \boldsymbol\lambda_{k+1}+\nabla_{u_k}L_k, \;\;\; \color{gray}{k=i,\ldots, N-1}\\ \color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_N}\phi-\lambda_N\right)^\top \mathrm{d}\bm x_N},\\ \color{blue}{0} &= \color{blue}{\left(\nabla_{\bm x_i}H_i\right)^\top \mathrm{d}\bm x_i}. \end{aligned} }

Recall that since \mathbf f is a vector function, \nabla \mathbf f is not just a gradient but rather a matrix whose columns are gradients of the individual components of the vector \mathbf f — it is a transpose of Jacobian.

Note

The first three necessary conditions above can be made completely “symmetric” by running the second one from k=i because the \boldsymbol\lambda_i introduced this way does not influence the rest of the problem and we could easily live with one useless variable.

We have just derived the (necessary) conditions of optimality in the form of five sets of (vector) equations:

The first two are recurrent (or discrete-time) equations, which means that they introduce coupling between the variables evaluated at consecutive times. In fact, the former is just the standard state equation that gives the state at one time as a function of the state (and the control) at the previous time. The latter gives a prescription for the variable \bm \lambda_k as a function of (among others) the same variable evaluated at the next (!) time, that is, \bm \lambda_{k+1}. Although from the optimization perspective these variables play the role of Lagrange multipliers, we call them co-state variables in optimal control theory because of the way they relate to the state equations. The corresponding vector equation is called a co-state equation.

Important

It is a crucial property of the co-state equation that the evolution of the co-state variable is dictated backward in time by the equation.

The third set of equations are just algebraic equations that relate the control inputs to the state and co-state variables. Sometimes it is called a stationarity equation.
The last two are just single (vector) equations related to the end and the beginning of the time horizon. They are both stated in the general enough form that allows the corresponding states to be treated as either fixed or subject to optimization. In particular, if the final state is to be treated as free (subject to optimization), that is, \mathrm{d}\bm x_N can be atritrary and the only way the corresponding equation can be satisfied is \nabla_{\bm x_N}\phi=\lambda_N. If, on the other hand, the final state is to be treated as fixed, the the corresponding equation is just replaced by \bm x_N = \mathbf x^\text{ref}. Similarly for the initial state. But as we have hinted a few times, most often than not the initial state will be regarded as fixed and not subject to optimization, in which case the corresponding equation is replaced by \bm x_i = \mathbf x^\text{init}.

To summarize, the equations that give the necessary conditions of optimality for a general nonlinear discrete-time optimal control problem form a discrete-time two-point boundary value problem (TP-BVP). Values of some variables are specified at the initial time, values of some (maybe the same or some other) variables are defined at the final time. The equations prescribe the evolution of some variables forward in time while for some other variables the evolution backward in time is dictated.

Note

This is in contrast with the initial value problem (IVP) for state equations, for which we only specify the state at one end of the time horizon — the initial state — and then the state equation disctates the evolution of the (state) variable forward in time.

Boundary value problems are more difficult to solve than the initial value problems. Typically we can only solve them numerically, in which case it is appropriate to ask if anything has been gained by this indirect procedure compared with the direct one. After all, we did not even incorporate the inequality constraints in the problem, which was a piece of case in the direct approach. But we will see that in some special cases the TP-BVP they can be solved analytically and the outcome is particularly useful and would never have been discovered, if only the direct approach had been followed. We elaborate on this in the next section.