General finite-horizon nonlinear discrete-time optimal control as a nonlinear program
In this section we formulate a finite-horizon optimal control problem (OCP) for a discrete-time dynamical system as a mathematical optimization (also mathematical programming) problem, which can then be solved numerically by a suitable solvers for nonlinear programming (NLP), or possibly quadratic programming (QP). The outcome of such numerical optimization is an optimal control trajectory (a sequence of controls), which is why this approach is called direct – we optimize directly over the trajectories.
In the following chapter we then present an alternative – indirect – approach, wherein the conditions of optimality are formulated first. These come in the form of a set of equations, some of them recurrent/recursive, some just algebraic. The indirect approach amounts to solving such equations.
And then in another chapter we present the third approach – dynamic programming.
The three approaches form the backbone of the theory of optimal control for discrete-time systems, but later we are going to recognize the same triplet in the context of continuous-time systems.
But now back to the direct approaches. We will start with a general nonlinear discrete-time optimal control problem in this section and then specialize to the linear quadratic regulation (LQR) problem in the next section. Finally, since the computed control trajectory constitutes an open-loop control scheme, something must be done about it if feedback scheme is preferred – we introduce the concept of a receding horizon control (RHC), perhaps bettern known as model predictive control (MPC), that turns the direct approach into a feedback control scheme.
We start by considering a nonlinear discrete-time system modelled by the state equation \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k), where
- \bm x_k\in \mathbb R^n is the state at the discrete time k\in \mathbb Z,
- \bm u_k\in \mathbb R^m is the control at the discrete time k,
- \mathbf f_k: \mathbb{R}^n \times \mathbb{R}^m \times \mathbb Z \to \mathbb{R}^n is a state transition function (in general not only nonlinear but also time-varying, with the convention that the dependence on k is expressed through the lower index).
A general nonlinear discrete-time optimal control problem (OCP) is then formulated as \begin{aligned} \operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\ \text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\ &\quad \bm u_k \in \mathcal U_k,\quad k=i, \ldots, N-1,\\ &\quad \bm x_k \in \mathcal X_k,\quad k=i, \ldots, N, \end{aligned} where
- i is the initial discrete time,
- N is the final discrete time,
- \phi() is a terminal cost function that penalizes the state at the final time,
- L_k() is a running (also stage) cost function,
- and \mathcal U_k and \mathcal X_k are sets of feasible controls and states – these sets are typically expressed using equations and inequalities. Should they be constant, the notation is just \mathcal U and \mathcal X.
Oftentimes it is convenient to handle the constraints of the initial and final states separately: \begin{aligned} \operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\ \text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\ &\quad \bm u_k \in \mathcal U_k,\quad k=i, \ldots, N-1,\\ &\quad \bm x_k \in \mathcal X_k,\quad k=i+1, \ldots, N-1,\\ &\quad \bm x_i \in \mathcal X_\mathrm{init},\\ &\quad \bm x_N \in \mathcal X_\mathrm{final}. \end{aligned}
In particular, at the initial time just one particular state is often considered. At the final time, the state might be required to be equal to some given value, it might be required to be in some set defined through equations or inequalities, or it might be left unconstrained. Finally, the constraints on the control and states typically (but not always) come in the form of lower and upper bounds. The optimal control problem then specializes to \begin{aligned} \operatorname*{minimize}_{\bm u_i,\ldots, \bm u_{N-1}, \bm x_{i},\ldots, \bm x_N}&\quad \left(\phi(\bm x_N,N) + \sum_{k=i}^{N-1} L_k(\bm x_k,\bm u_k) \right)\\ \text{subject to} &\quad \bm x_{k+1} = \mathbf f_k(\bm x_k,\bm u_k),\quad k=i, \ldots, N-1,\\ &\quad \bm u_{\min} \leq \bm u_k \leq \bm u_{\max},\\ &\quad \bm x_{\min} \leq \bm x_k \leq \bm x_{\max},\\ &\quad\bm x_i = \mathbf x^\text{init},\\ &\quad \left(\bm x_N = \mathbf x^\text{ref}, \; \text{or} \; \mathbf h_\text{final}(\bm x_N) = \mathbf 0, \text{or} \; \mathbf g_\text{final}(\bm x_N) \leq \mathbf 0\right), \end{aligned} where
- the inequalities should be interpreted componentwise,
- \bm u_{\min} and \bm u_{\max} are lower and upper bounds on the control, respectively,
- \bm x_{\min} and \bm x_{\max} are lower and upper bounds on the state, respectively,
- \mathbf x^\text{init} is a fixed initial state,
- \mathbf x^\text{ref} is a required (reference) final state,
- and the functions \mathbf g_\text{final}() and \mathbf h_\text{final}() can be used to define the constraint set for the final state.
This optimal control problem is an instance of a general nonlinear programming (NLP) problem \begin{aligned} \operatorname*{minimize}_{\bar{\bm x}\in\mathbb{R}^{n(N-i)},\bar{\bm u}\in\mathbb{R}^{m(N-i)}} &\quad J(\bar{\bm x},\bar{\bm u})\\ \text{subject to} &\quad \mathbf h(\bar{\bm x},\bar{\bm u}) =0,\\ &\quad \mathbf g(\bar{\bm x},\bar{\bm u}) \leq \mathbf 0, \end{aligned} where \bar{\bm u} and \bar{\bm x} are vectors obtained by stacking control and state vectors for individual times
\begin{aligned} \bar{\bm u} &= \operatorname*{vec}(\bm u_i,\ldots, \bm u_{N-1}) = \begin{bmatrix}\bm u_i\\ \bm u_{i+1}\\ \vdots \\ \bm u_{N-1} \end{bmatrix},\\ \bar{\bm x} &= \operatorname*{vec}(\bm x_{i+1},\ldots, \bm x_N) = \begin{bmatrix}\bm x_{i+1}\\ \bm x_{i+2}\\ \vdots \\ \bm x_{N} \end{bmatrix}. \end{aligned}
- Althought there may be applications where it is desirable to optimize over the initial state \bm x_i as well, mostly the initial state \bm x_i is fixed, and it does not have to be considered as an optimization variable. This can be even emphasized through the notation J(\bar{\bm x},\bar{\bm u}; \bm x_i), where the semicolon separates the variables from (fixed) parameters.
- The last control that affects the state trajectory on the interval [i,N] is \bm u_{N-1}.