Indirect approach to optimal control of a nonlinear system on a finite time horizon

Now we finally seem to be ready for solving our optimal control problems stated at the beginning of the lecture/chapter. Equipped with the solution to the fixed-ends basic problem of calculus of variation, we start with the finite-horizon fixed-final-state version of the optimal control problem. We will extend the result for a free final state in due course.

Continuous-time optimal control problem

The problem to be solved is

\begin{aligned} \operatorname*{minimize}_{\bm x(\cdot),\bm u(\cdot)} &\quad \int_{t_\mathrm{i}}^{t_\mathrm{f}}L(\bm x(t),\bm u(t),t)\text{d}t\\ \text{subject to} &\quad \dot{\bm x}(t)= \mathbf f(\bm x(t),\bm u(t),t),\\ &\quad \bm x(t_\mathrm{i}) = \mathbf x_\mathrm{i},\\ &\quad \bm x(t_\mathrm{f}) = \mathbf x_\mathrm{f}. \end{aligned}

There is, of course, no term in the cost function penalizing the state at the final time since it is requested that the system is brought to some prespecified state.

First-order necessary conditions of optimality

The augmented cost function and the augmented Lagrangian are J^\mathrm{aug}(t,\bm x(\cdot),\dot{\bm x}(\cdot),\bm u(\cdot),\bm\lambda(\cdot)) = \int_{t_\mathrm{i}}^{t_\mathrm{f}}\left[\underbrace{ L(\bm x,\bm u,t)+\bm\lambda^\top\left( \dot {\bm x}-\mathbf f(\bm x,\bm u,\mathbf t)\right)}_{L^\text{aug}}\right ]\text{d}t. \tag{1}

Note also that compared to the original unconstrained calculus of variations setting, here we made a notational shift from x to t as the independent variable, and from y to the triplet (\bm x,\bm u,\bm \lambda) as the triplet of dependent variables (possibly vector ones). Finally, note that the only derivative appearing in the augmented Lagrangian is \dot{\bm x}.

Boundary value problem (BVP) as the necessary conditions of optimality

Applying the Euler-Lagrange equation to this augmented Lagrangian, we obtain three equations (or three systems of equations), one for each dependent variable:

\begin{aligned} L_{\bm{x}}^\text{aug} &= \frac{\text{d}}{\text{d}t} L^\text{aug}_{\dot{\bm{x}}},\\ L_{\bm{u}}^\text{aug} &= 0,\\ L_{\bm \lambda}^\text{aug} &= 0. \end{aligned} \tag{2}

These can be expanded in terms of the unconstrained Lagrangian. First, we assume scalar functions for notational simplicity: \begin{aligned} \frac{\partial L}{\partial x} - \lambda \frac{\partial f}{\partial x}&= \dot{\lambda},\\ \frac{\partial L}{\partial u} - \lambda \frac{\partial f}{\partial u} &= 0,\\ \dot {x} - f(x,u,t) &= 0. \end{aligned}

In the vector case (wherein \bm x, hence \mathbf f(), and/or \bm u are vectors): \begin{aligned} \nabla_{\bm{x}} L - \sum_{i=1}^n \lambda_i \nabla_{\bm{x}} f_i(\bm x, \bm u) &= \dot{\bm{\lambda}},\\ \nabla_{\bm{u}} L - \sum_{i=1}^n \lambda_i \nabla_{\bm{u}} f_i(\bm x, \bm u) &= \mathbf 0,\\ \dot{\bm{x}} - \mathbf{f}(\bm x,\bm u,t) &= \mathbf 0. \end{aligned}

We can also write the same result in the compact vector form. Recall that we agreed in this course to regard gradients as column vectors and that \nabla \mathbf f for a vector function \mathbf{f} is a matrix whose columns are gradients \nabla_\mathbf{x} f_i of the individual elements of the vector function. We can then write the first order conditions compactly as \begin{aligned} \nabla_\mathbf{x}L - \nabla_\mathbf{x} \mathbf{f} \; \bm\lambda &= \dot{\bm\lambda},\\ \nabla_\mathbf{u}L - \nabla_\mathbf{u} \mathbf{f} \; \bm\lambda &= \mathbf 0,\\ \dot {\mathbf{x}} - \mathbf{f}(\bm x,\bm u,t) &= \mathbf 0. \end{aligned}

After reordering the equations and shuffling the terms within the equations, we get \boxed{ \begin{aligned} \dot {\mathbf{x}}(t) &= \mathbf{f}(\bm x(t),\bm u(t),t),\\ \dot{\bm\lambda}(t) &= \nabla_\mathbf{x}L(\bm x(t),\bm u(t),t) - \nabla_\mathbf{x} \mathbf{f}(\bm x(t),\bm u(t),t) \, \bm\lambda(t),\\ \mathbf 0 &= \nabla_\mathbf{u}L(\bm x(t),\bm u(t),t) - \nabla_\mathbf{u} \mathbf{f}(\bm x(t),\bm u(t),t) \, \bm\lambda(t). \end{aligned}} \tag{3}

These three (systems of) equations give the necessary conditions of optimality that we were looking for. We can immediately recognize the first one – the original state equation describing how the state vector variable \bm{x} evolves in time.

The other two equations are new, though. The second one is called costate equation because the variable \bm\lambda, originally introduced as a Lagrange multiplier, now evolves also according to a first-order differential equation; we call it a costate variable.

The last equation is called equation of stationarity. Unlike the previous two, it is not a differential equation, it is just a nonlinear equation. With the exception of some singular cases that we are going to mentions soon, it can be be solved for the control vector \bm u as a function of the state \bm x and the costate \bm \lambda, in which case \bm u can be eliminated from the two differential equations and we end up with differential equations just in \bm{x} and \bm\lambda.

Boundary conditions

Recall that for differential equations we always need a sufficient number of boundary conditions to determine the solution uniquely. In particular, for a state vector of dimension n, the costate is also of dimension n, hence we need in total 2n boundary conditions. In our current setup these are given by the n specified values of the state vector at the beginning and n values at the end: \boxed{ \begin{aligned} &\quad \bm x(t_\mathrm{i}) = \mathbf x_\mathrm{i},\\ &\quad \bm x(t_\mathrm{f}) = \mathbf x_\mathrm{f}. \end{aligned}} \tag{4}

Only after these equations are added to the above DAE systems, we have a full set of necessary conditions of optimality.

This class of problems is called two-point boundary value problem (BVP) and generally it can only be solved numerically (dedicated solvers exist, see the section on software).

First-order necessary conditions of optimality using the Hamiltonian function

We now introduce an auxilliary function called Hamiltonian, which not only makes the conditions of optimality even more compact, but also – and even more importantly – it strengthens the link with the physics-motivated calculus of variations. Recall that the Hamiltonian function in the calculus of variations is defined as H(x,y,y') = py'-L. In the optimal control setting, since we consider constraints in the form of the state equation, we need to use the augmented Lagrangian, which we defined as

L^\mathrm{aug}(\bm x,\bm u,\bm \lambda,t) = L(\bm x,\bm u,t)+\bm \lambda^\top\left( \dot {\bm x}-\mathbf f(\bm x,\bm u,\mathbf t)\right).

We now define the optimal control related Hamiltonian in the same way – product of the auxillary variable and the state variable minus the (augmented) Lagrangian: \begin{aligned} H(\bm x,\bm u,\dot{\bm{x}},\bm\lambda,t) &= \bm\lambda^\top \, \dot{\bm x} - L^\text{aug},\nonumber\\ &= \bm\lambda^\top \,\dot{\mathbf{x}}-L(\bm x,\bm u,t) - \bm\lambda^\top \, \dot{\bm x}+\bm\lambda^\top \, \mathbf f(\bm x,\bm u, t),\nonumber\\ &=\bm\lambda^\top \, \mathbf f(\bm x,\bm u, t)-L(\bm x,\bm u,t). \end{aligned}

Realizing that through the state equation \dot{\bm x}=\mathbf f(\bm x,\bm u, t) the derivative \dot{\bm x} at a given state x (and time t) is uniquelly determined by u, we can consider the control-related Hamiltonian as a function of t, \mathbf x, \mathbf u and \bm\lambda (and discard the \dot{\bm x} argument): \boxed{ H(\bm x,\bm u,\bm\lambda,t) = \bm\lambda^\top \mathbf f(\bm x,\bm u,t) - L(\bm x,\bm u,t).} \tag{5}

The necessary conditions of optimality in Eq. 3 can be rewritten in a more compact form: \boxed{ \begin{aligned} \dot {\mathbf{x}} &= \nabla_{\bm \lambda} H(\bm x,\bm u,\bm \lambda, t),\\ \dot{\bm\lambda} &= - \nabla_{\bm x} H(\bm x,\bm u,\bm \lambda,t),\\ \mathbf 0 &= \nabla_\mathbf{u}H(\bm x,\bm u,\bm \lambda,t). \end{aligned}} \tag{6}

Of course, we must not forget to add the boundary conditions Eq. 4 to have a full set of necessary conditions of optimality.

Two different conventions for defining the Hamiltonian in optimal control

Our choice of the Hamiltonian function was determined by our somewhat arbitrary choice of formulating the constraint function (that is, the function that should be equal to zero) as \dot {\bm x}-\mathbf f(\bm x,\bm u,\mathbf t) when defining the augmented Lagrangian L^\mathrm{aug}(\bm x,\bm u,\bm\lambda,t) = L(\bm x,\bm u,t)+\bm\lambda^\top\left( \dot {\bm x}-\mathbf f(\bm x,\bm u,\mathbf t)\right). If only we (equally arbitrarily) had chosen to define the constraint function as \mathbf f(\bm x,\bm u,\mathbf t)-\dot {\bm x}, we would have ended up with a different augmented Lagrangian, for which the more appropriate definition of the Hamiltonian function would be H(\bm x,\bm u,\bm\lambda,t) = L(\bm x,\bm u,t)+ \bm\lambda^\top \mathbf f(\bm x,\bm u,t). More on the implications of this in the dedicated section.

Example 1 (Pendulum swingup problem approached as an optimal control problem and solved as a two-point boundary value problem) We consider a pendulum on a horizontally moving cart. The only control input is the reference acceleration of the cart – it is assumed that a feedback controller is already implemented that tracks this reference acceleration. The resulting model of dynamics is \ddot \theta = \frac{\mathrm g}{l}\sin\theta + \frac{1}{l}\cos\theta \;u, where u is the reference acceleration of the cart, l is the length of the pendulum, and g is the gravitational constant, \theta(t) is the angular deviation from the upright vertical, positive in the counterclockwise direction.

The pendulum is initially at rest in the downward vertical position, that is, \bm x(0) = \begin{bmatrix}\pi\\ 0\end{bmatrix}. The desired final state is \bm x(t_\mathrm{f}) = \begin{bmatrix}0\\ 0\end{bmatrix}, that is, the pendulum should be upright and at rest. The final time t_\mathrm{f} is fixed, we must specify it before solving the optimal control problem (in the next chapter we will learn how to consider it an optimization variable too).

Show the code

using DifferentialEquations
function pendulum_swingup_bvp()
    g = 9.81                                                        # Gravitational acceleration.
    l = 1.0                                                         # Length of the pendulum.
    q = [1.0, 1.0]                                                  # State weights.
    r = 100.0                                                       # Control weight.  
    #f(x,u) = [x[2]; g/l*sin(x[1]) + 1/l*cos(x[1])*u]
    #L(x,u) = 1/2(q[1]*x[1]^2 + q[2]*x[2]^2 + u^2)
    #H(x,u,λ) = L(x,u) + λ[1]*f(x,u)[1] + λ[2]*f(x,u)[2]
    tinit = 0.0                                                     # Initial time.
    tfinal = 1.0                                                    # Final time.
    xinit = [pi, 0.0]                                               # Initial state.
    xfinal = [0.0, 0.0]                                             # Final state.
    function statecostateeq!(dw, w, p, t)
        x = w[1:2]                                                  # State vector [θ, ω].
        λ = w[3:4]                                                  # Costate vector.
        u = -1/(r*l)*cos(x[1])*λ[2]                                 # Optimal control from the stationarity equation.
        dw[1] = x[2]                                                # State equation 1.
        dw[2] = g/l*sin(x[1]) + 1/l*cos(x[1])*u                     # State equation 2. Damping: - b/(m*l^2)*x[2]
        dw[3] = -q[1]*x[1] -( g/l*cos(x[1]) - 1/l*sin(x[1])*u)*λ[2] # Costate equation 1.
        dw[4] = -q[2]*x[2] - λ[1]                                   # Costate equation 2.
    end
    function bc!(res, w, p,t)
        res[1:2] = w(tinit)[1:2] - xinit
        res[3:4] = w(tfinal)[1:2] - xfinal   
    end
    w0 = [xinit[1], xinit[2], 0.0, 0.0]                             # Guess at the initial state and costate.
    tspan = (tinit, tfinal)                                         # Initial and final time.
    bvprob = BVProblem(statecostateeq!, bc!, w0, tspan)
    sol = solve(bvprob, MIRK4(), dt=0.05)                           # Solve the BVP.
    u = -1/(r*l)*cos.(sol[1,:]).*sol[4,:]
    return sol.t, sol[1:2,:], sol[3:4,:], u
end
t, x, λ, u = pendulum_swingup_bvp()
using Plots
p1 = plot(t, x', ylabel="States", label=["θ" "ω"], lw=2)
p2 = plot(t, λ', ylabel="Costates", label=["λ₁" "λ₂"], lw=2)
p3 = plot(t, u, ylabel="Control", label="u", lw=2)
plot(p1, p2, p3, layout=(3,1), size=(600, 600))

Figure 1: Optimal trajectories of state, costate and control for the pendulum swingup problem.