General finite-horizon nonlinear discrete-time optimal control as a nonlinear program
In this section we formulate a finite-horizon optimal control problem (OCP) for a discrete-time dynamical system as a mathematical optimization problem (also mathematical program), which can then be solved numerically by a suitable numerical solver for nonlinear programming (NLP), or possibly quadratic programming (QP). The outcome of such numerical optimization is an optimal control trajectory (a sequence of controls), which is why this approach is called direct – we optimize directly over the trajectories.
In the following chapter we then present an alternative – indirect – approach, wherein the conditions of optimality are formulated first. These come in the form of a set of equations, some of them recurrent/recursive, some just algebraic. The indirect approach thus amounts to solving such equations.
And then in another chapter we present the third approach – dynamic programming.
The three approaches form the backbone of the theory of optimal control for discrete-time systems, but later we are going to recognize the same triplet in the context of continuous-time systems.
Figure 1: Three approaches to discrete-time optimal control
But now back to the direct approach. We will start with a general nonlinear discrete-time optimal control problem in this section, and then specialize to the linear quadratic regulation (LQR) problem in the next section. Finally, since the computed control trajectory constitutes an open-loop control scheme, something must be done about it if a feedback scheme is preferred – we introduce the concept of a receding horizon control (RHC), perhaps better known as model predictive control (MPC), which turns the direct approach into a feedback control scheme.
We start by considering a nonlinear discrete-time system modelled by the state equation xk+1=fk(xk,uk), where
xk∈Rn is the state at the discrete time k∈Z,
uk∈Rm is the control at the discrete time k,
fk:Rn×Rm×Z→Rn is a state transition function (in general not only nonlinear but also time-varying, with the convention that the dependence on k is expressed through the lower index).
A general nonlinear discrete-time optimal control problem (OCP) is then formulated as ui,…,uN−1,xi,…,xNminimizesubject to(ϕ(xN,N)+k=i∑N−1Lk(xk,uk))xk+1=fk(xk,uk),k=i,…,N−1,uk∈Uk,k=i,…,N−1,xk∈Xk,k=i,…,N, where
i is the initial discrete time,
N is the final discrete time,
ϕ() is a terminal cost function that penalizes the state at the final time,
Lk() is a running (also stage) cost function,
and Uk and Xk are sets of feasible controls and states – these sets are typically expressed using equations and inequalities. Should they be constant, the notation is just U and X.
Oftentimes it is convenient to handle the constraints of the initial and final states separately: ui,…,uN−1,xi,…,xNminimizesubject to(ϕ(xN,N)+k=i∑N−1Lk(xk,uk))xk+1=fk(xk,uk),k=i,…,N−1,uk∈Uk,k=i,…,N−1,xk∈Xk,k=i+1,…,N−1,xi∈Xinit,xN∈Xfinal.
In particular, at the initial time just one particular state is often considered. At the final time, the state might be required to be equal to some given value, it might be required to be in some set defined through equations or inequalities, or it might be left unconstrained. Finally, the constraints on the control and states typically (but not always) come in the form of lower and upper bounds. The optimal control problem then specializes to ui,…,uN−1,xi,…,xNminimizesubject to(ϕ(xN,N)+k=i∑N−1Lk(xk,uk))xk+1=fk(xk,uk),k=i,…,N−1,umin≤uk≤umax,xmin≤xk≤xmax,xi=xinit,(xN=xref,orhfinal(xN)=0,orgfinal(xN)≤0), where
the inequalities should be interpreted componentwise,
umin and umax are lower and upper bounds on the control, respectively,
xmin and xmax are lower and upper bounds on the state, respectively,
xinit is a fixed initial state,
xref is a required (reference) final state,
and the functions gfinal() and hfinal() can be used to define the constraint set for the final state.
This optimal control problem is an instance of a general nonlinear programming (NLP) problem xˉ∈Rn(N−i),uˉ∈Rm(N−i)minimizesubject toJ(xˉ,uˉ)h(xˉ,uˉ)=0,g(xˉ,uˉ)≤0, where uˉ and xˉ are vectors obtained by stacking control and state vectors for individual times
Althought there may be applications where it is desirable to optimize over the initial state xi as well, mostly the initial state xi is fixed, and it does not have to be considered as an optimization variable. This can be even emphasized through the notation J(xˉ,uˉ;xi), where the semicolon separates the variables from (fixed) parameters.
The last control that affects the state trajectory on the interval [i,N] is uN−1.