General finite-horizon nonlinear discrete-time optimal control as a nonlinear program

In this section we formulate a finite-horizon optimal control problem (OCP) for a discrete-time dynamical system as a mathematical optimization (also mathematical programming) problem, which can then be solved numerically by a suitable solvers for nonlinear programming (NLP), or possibly quadratic programming (QP). The outcome of such numerical optimization is an optimal control trajectory (a sequence of controls), which is why this approach is called direct – we optimize directly over the trajectories.

In the following chapter we then present an alternative – indirect – approach, wherein the conditions of optimality are formulated first. These come in the form of a set of equations, some of them recurrent/recursive, some just algebraic. The indirect approach amounts to solving such equations.

And then in another chapter we present the third approach – dynamic programming.

The three approaches form the backbone of the theory of optimal control for discrete-time systems, but later we are going to recognize the same triplet in the context of continuous-time systems.

G discrete_time_optimal_control Approaches to discrete-time optimal control direct_approach Direct approach discrete_time_optimal_control->direct_approach indirect_approach Indirect approach discrete_time_optimal_control->indirect_approach dynamic_programming Dynamic programming discrete_time_optimal_control->dynamic_programming
Figure 1: Three approaches to discrete-time optimal control

But now back to the direct approaches. We will start with a general nonlinear discrete-time optimal control problem in this section and then specialize to the linear quadratic regulation (LQR) problem in the next section. Finally, since the computed control trajectory constitutes an open-loop control scheme, something must be done about it if feedback scheme is preferred – we introduce the concept of a receding horizon control (RHC), perhaps bettern known as model predictive control (MPC), that turns the direct approach into a feedback control scheme.

We start by considering a nonlinear discrete-time system modelled by the state equation \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k), where

A general nonlinear discrete-time optimal control problem (OCP) is then formulated as \begin{aligned} \operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\ \text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\ &\quad \bm u_k \in \mathcal U_k,\quad k=i, \ldots, N-1,\\ &\quad \bm x_k \in \mathcal X_k,\quad k=i, \ldots, N, \end{aligned} where

Oftentimes it is convenient to handle the constraints of the initial and final states separately: \begin{aligned} \operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\ \text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\ &\quad \bm u_k \in \mathcal U_k,\quad k=i, \ldots, N-1,\\ &\quad \bm x_k \in \mathcal X_k,\quad k=i+1, \ldots, N-1,\\ &\quad \bm x_i \in \mathcal X_\mathrm{init},\\ &\quad \bm x_N \in \mathcal X_\mathrm{final}. \end{aligned}

In particular, at the initial time just one particular state is often considered. At the final time, the state might be required to be equal to some given value, it might be required to be in some set defined through equations or inequalities, or it might be left unconstrained. Finally, the constraints on the control and states typically (but not always) come in the form of lower and upper bounds. The optimal control problem then specializes to \begin{aligned} \operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\ \text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\ &\quad \bm u_{\min} \leq \bm u_k \leq \bm u_{\max},\\ &\quad \bm x_{\min} \leq \bm x_k \leq \bm x_{\max},\\ &\quad\bm x_i = \mathbf x^\text{init},\\ &\quad \left(\bm x_N = \mathbf x^\text{ref}, \; \text{or} \; \mathbf h_\text{final}(\bm x_N) = \mathbf 0, \text{or} \; \mathbf g_\text{final}(\bm x_N) \leq \mathbf 0\right), \end{aligned} where

This optimal control problem is an instance of a general nonlinear programming (NLP) problem \begin{aligned} \operatorname*{minimize}_{\bar{\bm x}\in\mathbb{R}^{n(N-i)},\bar{\bm u}\in\mathbb{R}^{m(N-i)}} &\quad J(\bar{\bm x},\bar{\bm u})\\ \text{subject to} &\quad \mathbf h(\bar{\bm x},\bar{\bm u}) =0,\\ &\quad \mathbf g(\bar{\bm x},\bar{\bm u}) \leq \mathbf 0, \end{aligned} where \bar{\bm u} and \bar{\bm x} are vectors obtained by stacking control and state vectors for individual times

\begin{aligned} \bar{\bm u} &= \operatorname*{vec}(\bm u_i,\ldots, \bm u_{N-1}) = \begin{bmatrix}\bm u_i\\ \bm u_{i+1}\\ \vdots \\ \bm u_{N-1} \end{bmatrix},\\ \bar{\bm x} &= \operatorname*{vec}(\bm x_{i+1},\ldots, \bm x_N) = \begin{bmatrix}\bm x_{i+1}\\ \bm x_{i+2}\\ \vdots \\ \bm x_{N} \end{bmatrix}. \end{aligned}

Note
  • Althought there may be applications where it is desirable to optimize over the initial state \bm x_i as well, mostly the initial state \bm x_i is fixed, and it does not have to be considered as an optimization variable. This can be even emphasized through the notation J(\bar{\bm x},\bar{\bm u}; \bm x_i), where the semicolon separates the variables from (fixed) parameters.
  • The last control that affects the state trajectory on the interval [i,N] is \bm u_{N-1}.
Back to top