Balancing the Inverted pendulum with LQR

Keywords: #ros2 #kiwi #lqr #inverted pendulum

I’m back, and as promised we’re now doing a dive on LQR. I had to do a ton of scope adjustments for this post, as it was difficult trying to estimate what the “stopping point” should be. The initial vision for this segment also included an introduction to Control Barrier functions, Control-Lyapunov to be specific. However this would have required a primer on not only stability, but also on convergence and initial guesses. I decided that this was maybe a bit much. Don’t worry, we’re still going to cover a ton of ground today, there is a lot to go over.

I often introduce LQR as “babby’s first optimal control”, which in most cases is likely true. However, I’m fairly confident that for a lot of people, LQR is “babby’s first” many things. While learning about gain values at university, one may have learned about using the eigenvalues of the state space as your negative feedback gain values (Ensuring they are negative of course). However, out there in the wasteland (the real world), most engineers simply opt to “guess and check” their gain values. State estimation is strongly non-trivial (Understatement) after all.

If one wants to go beyond just applying PID, there is a pretty large step-change(pun intended) that needs to be overcome. You’re going to need to evaluate state spaces, determine stability, figure out cost functions, the list goes on. It’s no wonder that so many people prefer the ol’ faithful. In this post, I go over my process for applying LQR to KIWI, and we see if it’s really that difficult…. (It’s not)

Revisiting our state space equations

In the introductory post for KIWI, we derived the state space equations. This can be considered required reading if you’re wanting to understand how we did it. For brevity, I’ll only summarise the result here:

Defining $\dot\omega$ as our wheel velocity, and $\theta$ as our torso angle, omitting $\omega$ as we (for now) aren’t interested in wheel position… $$\begin{bmatrix} \omega \\ \theta \end{bmatrix} \triangleq \begin{bmatrix} x \end{bmatrix}$$ $$ M(x) = \begin{bmatrix} 1 & 0 & 0 \\ 0 & (m_t + m_w)r^2 + I_w & m_tr\frac{L}{2}\cos(\theta) \\ 0 & m_tr\frac{L}{2}\cos(\theta) & m_t\frac{L}{2}^2 + I_t \end{bmatrix}$$ $$ F(x) = \begin{bmatrix} \dot\theta \\ -b_{ground} \dot\omega + b_{drive}(\dot\theta - \dot\omega)+m_tr\frac{L}{2} \dot\theta^2\sin(\theta) \\ -b_{drive}(\dot\theta - \dot\omega) + m_tg\frac{L}{2}\sin(\theta) \end{bmatrix}$$ $$u = \begin{bmatrix}0 & \tau & -\tau\end{bmatrix}^T$$ Which, when combined gives us our state space representation. $$\fcolorbox{#fe6601}{none}{$\dot x = \frac{F(x)+ u}{M(x)} $}$$

I won’t evaluate this here, as it gets ugly, you’re just going to have to take my word for it. The end result here is a nonlinear state space equation, which we’re going to need to do some work on before we can start applying feedback control to it.

Linearisation

First step towards glorious LQR is to obtain a linearised representation of our model. I’ve no doubt the reader has heard of the requirement of linearisation before, but I have found that a lot of the explanations seem to pull a “just trust me, we need to do this”. So, I’d like to quickly discuss why it is that we need to find a linear representation if we find our-self working with a nonlinear system (Which, if you’re doing anything novel, it WILL be nonlinear).

When determining whether a system fits the criteria for non-linearity, one should evaluate the system against the principle of superposition, which can be reduced down to two conditions. If our system doesn’t meet either of these two conditions, it’s non-linear. Admittedly, for a beginner these can be a little abstract in isolation, so I’ll relate them to robotics as best I can:

  • Additivity: The condition of how the system responds to multiple simultaneous inputs. More precisely, we expect the response to the sum of the inputs to be identical to the sum of outputs that would result from each input if acting individually. Expressed mathematically: $T (u + v) = T (u) + T (v)$. We require the response (output) to be a linear mapping of the input, and non-additive functions do not have this property. The textbook examples of this are the trigonometric functions. Using sin as an example: $sin(0.5) + sin(0.5) \neq sin(1)$, which is why if we see trigonometric functions in our state space, we can immediately identify the system as non-linear.

  • Homogeneity: The condition of whether scaling the input results in a proportional scaling of the output. Similar to above, but tested against multiplication. Expressed mathematically: $T(au) = a T (u)$, dictating that if we scale the input by $a$, we require the output to be similarly scaled by $a$. A classic example of this is aerodynamic drag, the force of which increases with the square of the velocity, and is described by ${\displaystyle F_{\rm {d}}={\tfrac {1}{2}}\rho u^{2}c_{\rm {d}}A}$ (Wikipedia lmao), where $F_d$ is the drag force, $\rho$ is the mass density of the fluid, $u$ is the flow speed of the fluid relative to the object, and $A$ is the relative area of the object (Frontal cross section). The $u^2$ is the killer here, as a $2x$ increase in velocity corresponds to a $y^2$ increase in drag.

I will note, it was surprisingly difficult to find (reasonable) examples of non-homogeneity in practice. The example above was almost a Conical Spring, but it felt forced (ha). I also thought of kinetic energy: $e=\frac{ m }{ 2 }v^2$ (No source required), but who would be controlling for kinetic energy as a state? You’re doing something really interesting if you find yourself in that position (And if you are, please contact me, I’d love to chat). My point is that you don’t really notice just how good we have gotten at linear approximations until you start hunting for an exception…

Why do we care?

If the system is non-linear, you’re going to be dealing with system responses that vary in magnitude(!!) as you enter different regions. What may be a stabilising feedback gain $k$ in one region will yield a completely different response in another. This is obvious in the case of our inverted pendulum, whereby the system response when the robot is upright will be wildly different to when our torso angle $\theta$ is closer to horizontal. This is a textbook case of non-linearity, where the state of the robot, depending on how vertical the torso is, results in a different response to an input. To make things better, we even have a singularity! If (God forbid) the torso angle reaches horizontal, our required input force to drive back to vertical shoots to $\infty$… But obviously we won’t let that happen.

Establishing how we linearise

KIWI is the manifestation of the quintessential control problem we see in all of our textbooks. If you’ve seen an inverted pendulum example before, you know exactly where this is going. Our first step towards linearising the system is to observe our equilibrium points. This works particularly well for the inverted pendulum for two reasons: The first being that we have two equilibrium points, both being perfectly vertical [$\theta={0, \pi}$]. The second reason is that our chosen equilibrium point (Upright) conveniently makes for a great target state.

If you’re a long time reader, you may remember THIS monstrosity:

Well, looks like it DID come in handy. In this diagram, there are 5 (five) equilibrium points! Starting at $-2\pi$, and incrementing by $\pi$, the pendulum is in equilibrium at the straight up and down positions. You can tell which equilibrium points are upright by the potential energy; the points with a peak in potential energy are torso upright. These upright equilibrium points are unstable, the inverse being true for the downward position. The arrows represent the kinetic energy of the pendulum, increasing in size proportional to the velocity. Of note, is that at high velocities the pendulum state will move across the phase plot. With these high velocities, the state will traverse into other equilibrium orbits. However, at low energies, the pendulum will orbit a single stable equilibrium point (Located at $\pi$ and $-\pi$ on the graph). I want you to keep this idea of orbiting an equilibrium point in mind as we continue.

Hypothetically, if our system begins in a certain state, let’s call it $\delta$, with the right feedback control, we can make the assertion that our system will remain within bound $\epsilon$ to our target state for all time $t$ (Universe heat death is outside of the scope). This is a fundamental concept, illustrated below:



(Lyapunov stability diagram concept stolen from This paper)


This is typically expressed mathematically as $||x_0 - x_e|| < \delta \rightarrow ||x(t, t_0) - x_e|| < \epsilon, \enspace \forall t \geq t_0$, where $\delta$ is a starting bound for the system, and $\epsilon$ is an arbitrary bound on the state. If your system satisfies these conditions, we can declare it to be “Stable in the sense of Lyapunov”, or “I.S.L” if lazy. Better people than me will verify this mathematically as part of a stability condition, using it to ensure the chosen feedback control law will in fact stabilise the system (to within bound $\epsilon$). For this post, I’m going to gloss over it, and just ask that you take my word for it.

With the above, we have introduced two interesting concepts. First we introduce the idea that we can linearise the system around an equilibrium point, which can conveniently serve as a target state. Second is the idea that if we choose a suitable feedback control law, we can ensure our system remains close to our target state, within a margin $\epsilon$, if we start within bound $\delta$. With these two concepts, we realise that we might be able to stay close enough to the region we linearised the system around for it to remain a suitable approximation for our application!

So why the Jacobian?

Typically, when linearising a solution, we take the Jacobian of our state space. The “Why” for this is that the Jacobian provides an excellent linear approximation when applied near equilibrium points, which is exactly what we’re after for applying a linear feedback law. The caveat is that this approximation is only valid around the operating point which we linearise around. Thankfully as we covered above, we can ensure our system stays around an equilibrium point. It’s worth noting that the Jacobian as a linearisation does not apply when applied around regions with hyperbolic curvature, but we don’t need to worry about that here.

So how the Jacobian?

Deep down, the Jacobian is closely related to the Taylor series, which is also a method for creating approximations of nonlinear differential equations.

The Taylor expansion of function $f(x)$ around point $a$ can be written as: $f(x) = f(a) + f’(a)(x - a) + \frac{f’’(a)}{2!}(x - a)^2 + \frac{f’’’(a)}{3!}(x - a)^3 + \cdots$. The Taylor series represents a given function as an infinite sum of terms, with each term calculated from the functions derivatives at a given point ($a$ in this case). This allows one to represent a function as a polynomial, describing behaviour local to the chosen point.

Although there are an infinite number of terms, we are (thankfully) only interested in the first term in this case ($f’(a)$). This first term gives us a first order approximation of the function, which is good enough for us. The Jacobian is this first order approximation applied to our state space matrices, and looks like the following:

$$J = \left. \frac{\partial A}{\partial q} \right|_{\theta=0} = \left[ \def\arraystretch{1.5}\begin{array}{ccc} \frac{ \partial A_1 }{\partial \theta} & \frac{ \partial A_1 }{\partial \dot\omega} & \frac{ \partial A_1 }{\partial \dot\theta} \\ \frac{ \partial A_2 }{\partial \theta} & \frac{ \partial A_2 }{\partial \dot\omega} & \frac{ \partial A_2 }{\partial \dot\theta} \\ \frac{ \partial A_3 }{\partial \theta} & \frac{ \partial A_3 }{\partial \dot\omega} & \frac{ \partial A_3 }{\partial \dot\theta}\end{array} \right] = \fcolorbox{#fe6601}{none}{$ \left[\def\arraystretch{1.5}\begin{array}{ccc} 0 & 0 & 1.0 \\ -\frac{g l^{2} m_{t}^{2} r}{Ib m_{t} r^{2} + Ib m_{w} r^{2} + l^{2} m_{t} m_{w} r^{2}} & 0 & 0 \\ \frac{g l m_{t} \left(m_{t} r^{2} + m_{w} r^{2}\right)}{Ib m_{t} r^{2} + Ib m_{w} r^{2} + l^{2} m_{t} m_{w} r^{2}} & 0 & 0\end{array}\right] $} $$

This gives us a linearised representation around our equilibrium point, which means we can move onto…

Discretisation

Something that seems to be omitted frequently from online tutorials is the requirement for discretisation of the state space. When deriving the state space equations, it’s typically as a continuous system. Our robots, although they exist in an entirely continuous environment, operate in a discretised fashion, in that we have a sequence that looks like:

What this figure (That I spent WAY too long making look pretty) is implying that we sample our continuous environment at a certain frequency, 100hz in our case. It’s discussions around this sort of thing that you’ll hear the term Nyquist rate thrown around, which for a given signal, corresponds to half the frequency of that signal. This is noteworthy as if your sample rate is less than this nyquist frequency, you’ll experience Aliasing.

Our chosen rate of 100hz is more than fast enough for our application. In more complicated systems, ensuring that we’re consistent about maintaining that 100hz sample rate can be critical. There a some environments where sampling late can result in critical failure. We call systems with these constraints Hard real time.

All of this to say that we have a rate at which we sample our environment, and choose how to respond to these samples. In order to correctly approximate the state space of the system, we need to incorporate our sample rate dt into our system. Fortunately, this doesn’t take much, and only needs to be done once, assuming we have a consistent dt. In order to achieve glorious discretisation, we can apply a bilinear transform:

$$ A_d = \frac{I + A * dt/2}{I - A * dt/2}$$ $$ B_d = \frac{B*dt}{I - A_d * dt/2} $$ Where $A_d$ and $B_d$ are our discretised state space models.

Finally, LQR

After all of this, we can finally get to the linear quadratic portion of our controller. Somewhat disappointingly, this is (imo) the simplest part of the control problem. State estimation is an open problem, and it ain’t easy after all. The LQR solution consists of two steps, first is to solve for an optimal gain value based on the cost function provided, second is to apply the resulting feedback gain matrix to the system.

The set-up

The Defining part of LQR, and what truly separates it from the PID only approach of the previous post is the inclusion of a cost function. Typically, for a Linear system, you’ll see a Quadratic cost function get defined as: $$ J = \sum^\infty_{k=0} \left(x_k^TQx_k + u^T_kRu_k\right) $$ Where $Q$ is our state cost matrix, giving a cost for how far from its target state each element is. $R$ is our control effort cost, you can think of it as a penalty for each control input. Typically, and this matches my experience, one will want to set $R$ as identity, and tweak elements $Q$ by orders of magnitude to prioritise particular states we care about. In our case, we care a huge amount more about our torso angle that we do our linear velocity, and our Q matrix reflects this. With this function, we establish a sum-of-squares optimisation to our control problem, allowing the robot to determine the most optimal control response.

To use this cost function, we aim to establish a feedback gain that will Regulate our system according to cost of error. To achieve this, we use the Discrete Algebraic Ricatti Equation:

$$P = Q + A_d^{\text{T}} P A_d - A_d^{\text{T}} P B_d (R + B_d^{\text{T}} P B_d)^{-1} B_d^{\text{T}} P A_d$$ Where our resulting gain is: $$K = \frac{R + B_d^T P B_d} { B_d^T P A_d }$$ which is a convenient $3\times1$ matrix, allowing us to assign the resulting gain values for torso angle and linear velocity errors, summed as effort commands to our wheels.

The pay-off

Our codeblock for the above looks like (You can view the full source code here):

void solve()
{
    Eigen::MatrixXd P = Q_;
    Eigen::MatrixXd P_1;
    Eigen::MatrixXd P_err;
    uint iteration_num = 0;
    double curr_err = 10000.0;
    Eigen::MatrixXd::Index maxRow, maxCol;

    while (curr_err > desired_solver_error_ && iteration_num < max_solver_iterations_)
    {
        P_1 = Q_ + Ad.transpose() * P * Ad -
            Ad.transpose() * P * Bd * (R_ + Bd.transpose() * P * Bd).inverse() * Bd.transpose() * P * Ad;
        Eigen::MatrixXd P_err = P_1 - P;
        double err = fabs(P_err.maxCoeff(&maxRow, &maxCol));
        curr_err = err;
        P = P_1;
        iteration_num++;
    }
    K_ = (R_ + Bd.transpose() * P * Bd).inverse() * Bd.transpose() * P * Ad;
}

Remarkably simple, considering the effort it took to get here! A video of the driving (Compare it to PID video):

With this, I feel like I have checked off a TON of entries from my “Robotics blog bingo card”. I got heaps: “Jacobians”, “Lyapunov stability”, I even managed to sneak in a “Nyquist”! I haven’t decided where I will go next with KIWI. Part of me wants to apply feedback linearisation, the other wants to finally start mentioning Machine Learning based approaches to control. I’ll probably just flip a coin tbh…