Robotics Ai Techniques

Kalman Filters

29 minute read



Notice a tyop typo? Please submit an issue or open a PR.

Kalman Filters

Tracking Intro Quiz

In the last lesson, we talked about localization, the process by which we could use sensor data to provide an estimation of a robot's location in the world.

In this lesson, we concern ourselves with finding other moving objects in the world. We need to understand how to interpret sensor data to understand not only where these other objects are but also how fast they are moving so that we can move in a way that avoids collisions with them in the future. The technique that we are going to use to solve this problem, known as tracking, is the Kalman filter.

This method is very similar to the Monte Carlo localization method we discussed in the previous lesson. The primary difference between the two is that Kalman filters provide estimations over a continuous state, whereas Monte Carlo localization required us to chop the world into discrete locations. As a result, the Kalman filter only produces unimodal probability distributions, while the Monte Carlo method can produce multimodal distributions.

Let's start with an example. Consider a car driving along in the world. Let's assume this car senses an object, represented by an "*" below, in the depicted locations at times t=0t=0, t=1t=1, t=2t=2, and t=3t=3. Where is the object going to be at time t=4t=4?

Tracking Intro Quiz Solution

From the positional observations thus far, as well as the corresponding timestamps, we can intuitively infer the velocity of the object "*". At time t=4t=4, we expect the object, barring any drastic change in velocity, to move roughly the same distance, in roughly the same direction, as it has in the past.

Kalman filters allow us to solve precisely these types of problems: estimating future locations and velocities based on positional data like this.

Gaussian Intro Quiz

When we talked about localization in the last lesson, we sliced the world into a finite number of grid cells and assigned a probability to each grid cell.

Such a representation, one that divides a continuous space into a set of discrete locations, is called a histogram. Since the world is a continuous space, our histogram distribution is only an approximation of the underlying continuous distribution.

In Kalman filters, the probability distribution is given by a Gaussian. A Gaussian is a continuous function over a space of inputs - locations, in our case. Like all probability distributions, the area underneath a Gaussian equals one. Furthermore, a Gaussian is characterized by two parameters: a mean, μ\mu, and a variance, σ2\sigma^2.

Since a Gaussian is a continuous function, there is no histogram to maintain as we sense and move. Instead, our task is to maintain a μ\mu and σ2\sigma^2 that parameterize the Gaussian that serves as our best estimate of the location of the object that we are trying to localize.

The formula for a Gaussian function, f(x)f(x) is an exponential of a quadratic function:

f(x)=exp(12(xμ)2σ2)f(x) = \exp(-{\frac{1}{2}{\frac{(x-\mu)^2}{\sigma^2}}})

where exp(x)=ex\exp(x) = e^x.

If we look at the expression inside the exponential, we can see that we are taking the squared difference of our query point, xx, and our mean, μ\mu, and dividing this squared difference by the variance, σ2\sigma^2.

If x=μx = \mu, the numerator in this expression becomes zero, so we have exp(0)=1\exp(0) = 1. This result makes intuitive sense, as the probability should be maximal when xx equals the mean of the distribution.

This observation, that f(μ)=1f(\mu) = 1, might seem puzzling. If f(x)f(x) is to be a valid probability distribution, then it must return 0 for all xμx \neq \mu. To reconcile this issue, the complete formula for the Gaussian function actually includes a normalization factor:

f(x)=12πσ2exp(12(xμ)2σ2)f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-{\frac{1}{2}{\frac{(x-\mu)^2}{\sigma^2}}})

However, for everything we are going to talk about regarding Kalman filters, this constant doesn't matter, so we can ignore it.

Let's look at a quiz now. Given the following functions, which ones do you think are Gaussians?

Gaussian Intro Quiz Solution

One prominent characteristic of Gaussian functions is that they are unimodal; that is, they have a single "peak". The third, fifth, and sixth functions are all multimodal, so they cannot be Gaussians.

Variance Comparison Quiz

Given the following graphs of three different Gaussians, which one has the large variance, the medium variance, and the small variance?

Variance Comparison Quiz Solution

The difference between xx and μ\mu is normalized by σ2\sigma^2. Larger values of σ2\sigma^2 penalize large differences between xx and μ\mu less than smaller values. As a result, Gaussians with large variances produce larger values of f(x)f(x) when xx is far from the mean than do Gaussians with smaller variances.

Another way to think about it is that variance is a measure of uncertainty. For example, the third Gaussian has the highest variance and the lowest probability associated with its mean value. On the other hand, the second Gaussian has the smallest variance and, therefore, the highest probability associated with its mean value.

Gaussians parameterized by larger variances spread probability over a larger number of possibilities, while those with smaller variances concentrate probability over a smaller number of possibilities.

Preferred Gaussian Quiz

If we are tracking another car from within our car, which Gaussian would we prefer to see as an estimate of the other car's location?

Preferred Gaussian Quiz Solution

Because the third Gaussian has the smallest variance, it represents the most certain estimate of the other car's location. The more certain we can be about where another car is located, the more confidently we can maneuver to avoid hitting it.

Evaluate Gaussian Quiz

Given a Gaussian, f(x)f(x), parameterized by μ=10\mu = 10 and σ2=4\sigma^2 = 4, what is f(8)f(8)? As a reminder, here is the formula for the Gaussian we use:

f(x)=12πσ2exp(12(xμ)2σ2)f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp(-{\frac{1}{2}{\frac{(x-\mu)^2}{\sigma^2}}})

Evaluate Gaussian Quiz Solution

f(8)=18πexp(12(810)24)f(8) = \frac{1}{\sqrt{8\pi}} \exp(-{\frac{1}{2}{\frac{(8-10)^2}{4}}})
f(8)=18πexp((2)28)f(8) = \frac{1}{\sqrt{8\pi}} \exp(-{\frac{(-2)^2}{8}})
f(8)=18πexp(12)f(8) = \frac{1}{\sqrt{8\pi}} \exp(-\frac{1}{2})
f(8)=18eπ0.12f(8) = \frac{1}{\sqrt{8e\pi}} \approx 0.12

Maximize Gaussian Quiz

We have the following python function, f, which takes as input mu, sigma2 and x, and returns the output of the Gaussian function, parameterized by mu and sigma2, for x.

Given mu = 10 and sigma2 = 4, what value of x can we pass into f to maximize the returned value?

Maximize Gaussian Quiz Solution

If we set x equal to mu, then we maximize the exponential expression in the Gaussian function, as exp(-0.5 * 0) = exp(0) = 1. As a result, we maximize the function overall, since the normalization factor is a constant.

Measurement and Motion 1 Quiz

When we talked about localization, we talked about manipulating our probability distributions via measurement updates and motion updates. We can apply this exact iterative adjustment when working with Kalman filters as well.

Let's see if we remember from the localization lectures. Does the measurement update involve a convolution or a product? What about the motion update?

Measurement and Motion 1 Quiz Solution

Measurement and Motion 2 Quiz

Let's see if we remember from the localization lectures. Does the measurement update involve Bayes' rule or the theorem of total probability? What about the motion update?

Measurement and Motion 2 Quiz Solution

Shifting the Mean Quiz

When using Kalman filters, we iterate between measurement and motion. The measurement update uses Bayes' rule, which produces a new posterior distribution by taking the product of the prior distribution and the information we gain from our measurement. The motion update - also known as the prediction - uses the theory of total probability, which produces a new posterior by adding the motion to the prior.

Suppose we are localizing another vehicle, and we have the following prior distribution, shown in black below. Note that this Gaussian is very wide, indicating a high level of uncertainty about the position of the vehicle.

Suppose now we take a measurement which gives us an updated distribution on the position of the vehicle, shown in blue before. Note that this Gaussian is quite narrow, indicating a large increase in certainty regarding the position of the vehicle.

We have to combine the two distributions to produce our posterior distribution. Which of the three red lines below corresponds to the mean of our posterior distribution?

Shifting the Mean Quiz Solution

The mean of the resulting posterior distribution lies in between the mean of the prior and the mean of the distribution as given by the measurement. Note that the new mean lies slightly closer to the mean of the measurement-based distribution since that distribution is more certain of the location of the vehicle than was the prior. The more certain we are, the more we pull the mean in the direction of the certain answer.

Predicting the Peak Quiz

We've found the location of the mean of the posterior distribution. Now let's focus on the "peakiness" of this distribution.

We can think about "peakiness" in two different ways. First, peakiness is inversely related to variance. The measurement-based distribution, which has a smaller variance, has a higher peak than the prior.

We can also think about peakiness as a visual representation of certainty since we know that we can use variance as a measure of certainty. Indeed, the prior is less certain about the vehicle location than is the measurement-based distribution and, therefore, has a smaller peak.

Given this information, how high should we expect the peak of the resulting distribution to be? Below the peak of the prior, above the peak of the measurement-based distribution, or somewhere in the middle?

Put another way, is our posterior: more confident than the constituent distributions, less confident, or somewhere in between?

Predicting the Peak Quiz Solution

Very surprisingly, the posterior distribution is more certain - has a higher peak - than either of the two component distributions. The variance of this distribution is smaller than both that of the prior distribution and the measurement-based distribution.

Parameter Update Quiz

Suppose we have two Gaussians - the prior distribution and the measurement probability - and we want to multiply them using Bayes' rule to produce a posterior distribution. The prior has a mean μ\mu and a variance σ2\sigma^2, while the measurement probability has a mean ν\nu and a variance r2r^2.

The new mean, μ\mu^\prime, is the weighted sum of the old means, where μ\mu is weighted by r2r^2 and ν\nu is weighted by σ2\sigma^2. This sum is normalized by the sum of the weighting factors: r2+σ2r^2 + \sigma^2.

μ=r2μ+σ2νr2+σ2\mu^\prime = \frac{r^2\mu + \sigma^2\nu}{r^2 + \sigma^2}

The new variance term, σ2\sigma^{2\prime}, is given by the following equation.

σ2=11r2+1σ2\sigma^{2\prime} = \frac{1}{\frac{1}{r^2} + \frac{1}{\sigma^2}}

Let's put this into action using the prior distribution and measurement probability we have been examining.

Clearly, the prior distribution has a much larger uncertainty than the measurement probability. Thus, σ2>r2\sigma^2 > r^2, which means that ν\nu is weighted more heavily than μ\mu in the calculation of μ\mu^\prime. As a result, μ\mu^\prime sits closer to ν\nu, than μ\mu, which is consistent with what we have seen.

Interestingly, the variance calculation is unaffected by the previous means and only considers the previous variances. The resulting variance is always larger than both r2r^2 and σ2\sigma^2.

Suppose we have two Gaussians: one parameterized by μ=10\mu = 10 and σ2=4\sigma^2 = 4, and another parameterized by ν=12\nu = 12 and r2=4r^2 = 4. What are the values of μ\mu^\prime and σ2\sigma^{2\prime} as a result of the measurement update?

Parameter Update Quiz Solution

We can calculate the new mean, μ\mu^\prime, as follows.

μ=r2μ+σ2νr2+σ2\mu^\prime = \frac{r^2\mu + \sigma^2\nu}{r^2 + \sigma^2}
μ=4×12+4×104+4\mu^\prime = \frac{4 \times 12 + 4 \times 10}{4 + 4}
μ=48+408\mu^\prime = \frac{48 + 40}{8}
μ=888=11\mu^\prime = \frac{88}{8} = 11

Note that because r2=σ2r^2 = \sigma^2, the normalized weighted average of μ\mu and ν\nu is equivalent to the arithmetic average: μ+ν2\frac{\mu + \nu}{2}.

We can calculate the new variance, σ2\sigma^{2\prime}, as follows.

σ2=11r2+1σ2\sigma^{2\prime} = \frac{1}{\frac{1}{r^2} + \frac{1}{\sigma^2}}
σ2=114+14\sigma^{2\prime} = \frac{1}{\frac{1}{4} + \frac{1}{4}}
σ2=112=2\sigma^{2\prime} = \frac{1}{\frac{1}{2}} = 2

We may have been surprised earlier that the variance of the posterior distribution was smaller than the variance of both the prior and the measurement probability. Here we have demonstrated that fact mathematically.

Parameter Update 2 Quiz

Suppose we have two Gaussians: one parameterized by μ=10\mu = 10 and σ2=8\sigma^2 = 8, and another parameterized by ν=13\nu = 13 and r2=2r^2 = 2. What are the values of μ\mu^\prime and σ2\sigma^{2\prime} as a result of the measurement update?

Parameter Update 2 Quiz Solution

We can calculate the new mean, μ\mu^\prime, as follows.

μ=r2μ+σ2νr2+σ2\mu^\prime = \frac{r^2\mu + \sigma^2\nu}{r^2 + \sigma^2}
μ=2×10+8×138+2\mu^\prime = \frac{2 \times 10 + 8 \times 13}{8 + 2}
μ=20+10410\mu^\prime = \frac{20 + 104}{10}
μ=12410=12.4\mu^\prime = \frac{124}{10} = 12.4

As we expect, μ\mu^\prime is much closer to ν\nu than μ\mu, since σ2\sigma^2 is much larger than r2r^2.

We can calculate the new variance, σ2\sigma^{2\prime}, as follows.

σ2=11r2+1σ2\sigma^{2\prime} = \frac{1}{\frac{1}{r^2} + \frac{1}{\sigma^2}}
σ2=118+12\sigma^{2\prime} = \frac{1}{\frac{1}{8} + \frac{1}{2}}
σ2=158=85=1.6\sigma^{2\prime} = \frac{1}{\frac{5}{8}} = \frac{8}{5} = 1.6

Again, we can see that the resulting variance is less than that of either of the component distributions.

Separated Gaussians Quiz

Suppose we have the following prior distribution and measurement probability, each with the same variance. Which of the following positions - marked in red below - corresponds to the mean of the posterior distribution produced by combining these two distributions?

Separated Gaussians Quiz Solution

Since the two variances are the same, the new mean is simply the arithmetic average of the means of the two component distributions.

Separated Gaussians 2 Quiz

Suppose we have the following prior distribution and measurement probability, each with the same variance. Which of the following Gaussian represents the combination of the two? There are three choices: a Gaussian with a larger variance than the two component distributions, a Gaussian with an equivalent variance, and a Gaussian with a smaller variance.

Separated Gaussians 2 Quiz Solution

As we have seen from the math, the resulting Gaussian has a smaller variance - that is, it is more peaky - than the component distributions. We can again demonstrate this fact using the equation for the new variance, σ2\sigma^{2\prime}. Remember that both component Gaussians have the same variance.

σ2=11r2+1σ2\sigma^{2\prime} = \frac{1}{\frac{1}{r^2} + \frac{1}{\sigma^2}}
σ2=11σ2+1σ2\sigma^{2\prime} = \frac{1}{\frac{1}{\sigma^2} + \frac{1}{\sigma^2}}
σ2=112σ2=σ22\sigma^{2\prime} = \frac{1}{\frac{1}{2\sigma^2}} = \frac{\sigma^2}{2}

New Mean and Variance Quiz

Given the following python function, update, which takes as input the means and variances of two Gaussians - mean1, var1, and mean2, var2, respectively - our task is to write some code that computes new_mean and new_var according to the update rules we have been working with so far.

New Mean and Variance Quiz Solution

Gaussian Motion Quiz

Suppose we live in a world where the following Gaussian represents our best estimate as to our current location.

Now suppose we issue a motion command to move to the right a certain distance. We can think about the motion as being represented by its own Gaussian, having an expected value, uu, and an uncertainty, r2r^2.

We can combine the motion with the prior to produce our prediction, parameterized by μ\mu^\prime and σ2\sigma^{2\prime} as follows.

μ=μ+u\mu^\prime = \mu + u
σ2=σ2+r2\sigma^{2\prime} = \sigma^2 + r^2

Intuitively, this makes sense. If we expect to move to the right by uu, then it makes sense that we shift our mean by exactly uu. Additionally, since we have uncertainty in the prior and uncertainty in the motion, we can expect that our prediction has even more uncertainty than either of the two component Gaussians.

Let's take a quiz now. Suppose we have a Gaussian before the movement, parameterized by μ=8\mu = 8 and σ2=4\sigma^2 = 4. We issue a motion command with an expected value u=10u = 10 and uncertainty r2r^2. What are the parameters, μ\mu^\prime and σ2\sigma^{2\prime} of the predicted Gaussian?

Gaussian Motion Quiz Solution

Remember the following equations.

μ=μ+u\mu^\prime = \mu + u
σ2=σ2+r2\sigma^{2\prime} = \sigma^2 + r^2

Predict Function Quiz

Given the following python function, predict, which takes as input the means and variances of two Gaussians - mean1, var1, and mean2, var2, respectively - our task is to write some code that computes new_mean and new_var according to the prediction rules we have been working with so far.

Predict Function Quiz Solution

Kalman Filter Code Quiz

Let's write a main program that takes these two functions, update and predict, and feeds into them an alternating sequence of measurements and motions.

The measurements we are going to use are measurements = [5,6,7,9,10] and the motions are motions = [1,1,2,1,1]. Our initial belief as to our position is a Gaussian parameterized by mu = 0 and a very large uncertainty sig = 10000.

Furthermore, the measurement uncertainty, measurement_sig, is constant and equal to 4, while the motion uncertainty, motion_sig, is constant and equal to 2.

Kalman Filter Code Quiz Solution

We iterate through the measurements and motions updating mu and sig first according to the rules of update - passing in the corresponding measurement and measurement uncertainty - and then according to predict - passing in the corresponding motion command and motion uncertainty.

When we execute this code, we see that the value of mu after our first measurement update is roughly 4.98, which is very close to the measurement of 5. This update makes sense because we had a huge initial uncertainty, sig = 10000, combined with a relatively small measurement uncertainty, measurement_sig = 4. The resulting sigma is roughly 3.98, which is much better than our initial uncertainty and slightly better than the measurement uncertainty.

We now apply a motion of 1, and our resulting mu becomes 5.98: 4.98 + 1. Our uncertainty increases by exactly 2 - corresponding to the motion_sig - from 3.98 to 5.98.

Next, we measure 6. This measurement increases mu slightly from 5.98 to 5.99, while sharply decreasing sig from 5.98 to 2.39. We then apply another motion of 1, which increases mu by precisely 1, from 5.99 to 6.99, and increases sig by motion_sig, from 2.39 to 4.39.

We continue this process until we measure 10 and move right one, at which point our final mu is equal to 10.999, and our final sigma is equal to 4.005.

Kalman Prediction Quiz

Now that we understand how to implement a Kalman filter in a single dimension let's look at higher dimensions. To start, let's suppose we have a two-dimensional state space in xx and yy. Suppose we observe an object in this plane at the locations below for times t=0t=0, t=1t=1, and t=2t=2.

Where is the object going to be at time t=3t=3?

Kalman Prediction Quiz Solution

Here we have a sensor that only reads in positional data - (x,y)(x,y) coordinates - at different timestamps. The power of a two-dimensional Kalman filter operating over this space lies in its ability to infer the object's velocity from this positional information and the corresponding time deltas of the measurements. Using this velocity estimate, the Kalman filter can provide a reliable prediction about the object's position at some new time in the future.

Kalman Filter Land

To explain how this velocity inference works, we have to first look at higher-dimensional Gaussians, often called multivariate Gaussians. For such Gaussians of dimension DD, the mean, μ\mu, is a vector of length DD that contains a scalar value for every dimension.

μ=(μ0μD)\mu = \begin{pmatrix} \mu_0 \\ \vdots \\ \mu_D \end{pmatrix}

The variance, σ2\sigma^2, is replaced by a covariance matrix, which contains DD rows and DD columns.

Σ=(Σ0,0Σ0,DΣD,0ΣD,D)\Sigma = \begin{pmatrix} \Sigma_{0,0} \ldots \Sigma_{0,D} \\ \vdots \\ \Sigma_{D,0} \ldots \Sigma_{D,D} \end{pmatrix}

Let's visualize a two-dimensional Gaussian over (x,y)(x, y) space.

The mean of this Gaussian is the (x0,y0)(x_0, y_0) pair. The covariance defines the "spread" of the Gaussian as indicated by the contour lines.

A two-dimensional Gaussian with a fairly small uncertainty would have much more tightly packed contour lines. Additionally, a 2D Gaussian might be very certain in one dimension and very uncertain in another. Both types are shown below.

When the Gaussian is "tilted", as shown in the original graph, then xx and yy are correlated, which means that if we get information about the true value of xx, we can make a corresponding inference about the true value of yy.

For example, if we learn that the true value of x0x_0 is to the right of where we initially thought, we can adjust our estimate of y0y_0 a similar distance upwards.

Kalman Filter Prediction Quiz

Let's look at a simple one-dimensional motion example. Assume we observe the following positions at the following timestamps.

Given this information, we assume that, at time t=4t=4, the object sits at position x=4x=4. Even though we can only see the object's discrete locations, we can infer a velocity driving the object to the right. How does a Kalman filter address this?

Let's build a two-dimensional estimate, where one dimension corresponds to the position of the object, and the other dimension corresponds to the velocity of the object, denoted x˙\dot{x}, which can be either negative, zero, or positive.

Initially, We know our location but not our velocity. We can represent this belief with a Gaussian that is narrow along the positional axis and elongated along the velocity axis.

Now let's look at the prediction step. Since we have a very low certainty for our velocity estimate, we cannot use it to predict a new position accurately.

Let's pause for a second and just assume a value for velocity: x˙=0\dot{x} = 0. Given this assumption, along with the knowledge of the current position, x=0x = 0, our prediction is the point (0,0)(0, 0), corresponding to a new position of 0 and a new velocity of 0. In other words, if we start somewhere and don't move, we can predict that we end up where we started.

If we assume instead that the velocity x˙=1\dot{x} = 1, where would the prediction be one time step later, given that we start at position x=1x = 1?

Kalman Filter Prediction Quiz Solution

If we start with a velocity, x˙=1\dot{x} = 1 and position, x=1x = 1, then one time step in the future we predict our position to be updated by our velocity - x=x+x˙=1+1=2x = x + \dot{x} = 1 + 1 = 2 - and our velocity to remain unchanged. This prediction corresponds with the point, (2,1)(2, 1).

Another Prediction Quiz

If we assume instead that the velocity x˙=2\dot{x} = 2, which of the following points corresponds to the most plausible prediction one time step later, given that we start at position x=1x = 1?

Another Prediction Quiz Solution

If we start with a velocity, x˙=2\dot{x} = 2 and position, x=1x = 1, then one time step in the future we predict our position to be updated by our velocity - x=x+x˙=1+2=3x = x + \dot{x} = 1 + 2 = 3 - and our velocity to remain unchanged. This prediction corresponds with the point, (3,1)(3, 1).

More Kalman Filters

After we take our initial measurement, x=0x = 0, we represent our current belief about our position and velocity with a Gaussian that is skinny along the position axis and elongated along the velocity axis.

The skinniness about the position axis indicates our relative certainty about our current position, while the elongation about the velocity axis indicates our total uncertainty about our current velocity.

Even though we are completely uncertain about our current velocity, we do know that our location and velocity are correlated. For example, given an initial measurement, x=0x = 0, we expect to see a subsequent measurement, x=1x = 1, if our velocity x˙=1\dot{x} = 1. If our velocity x˙=2\dot{x} = 2, we expect a subsequent measurement x=3x = 3.

Given our initial measurement, we can generate a prediction Gaussian that essentially hugs the line mapping all possible subsequent positions to the velocities that would put us in those positions.

Even though we still haven't figured out how fast we are moving, we have pinned down the relationship between our position and our velocity, and we can illuminate the value of the latter by taking more measurements of the former.

Let's now fold in a second observation, x=2x = 2, at time t=2t = 2. Again, this observation tells us nothing about the velocity - only something about the location - which we can represent with the Gaussian in green below.

If we multiply the prior from the prediction step (in red above), with this measurement probability (green), then we get a Gaussian centered right about the point (2,1)(2,1). This Gaussian represents our current belief that our position x=2x = 2 and our velocity x˙=1\dot{x} = 1.

If we take this posterior Gaussian, in black above, and predict one step forward, we see a new Gaussian, centered at (3,1)(3,1), which corresponds to applying a constant velocity x˙=1\dot{x} = 1 to a position x=2x = 2.

Note: This graphic is wrong. It represents increasing velocity: acceleration. All of the posterior distributions, shown in black, should be centered around (x,1)(x, 1). Read note here.

What's so powerful about this process is that we've been able to infer the value of a variable we couldn't directly measure using a variable we could measure. This inference is made possible by a physical equation that relates xx^\prime, xx, and x˙\dot{x}.

x=x+x˙Δtx^\prime = x + \dot{x}\Delta{t}

This equation has been able to propagate constraints from subsequent measurements back to this unobservable variable, x˙\dot{x}, so that we can estimate it. In other words, this equation defines the relationship between position and velocity, and, given two consecutive positional measurements, allows us to pin down a value of velocity, even though we cannot measure it directly.

This is the big lesson that Kalman filters teach us. The variables of a Kalman filter, often called states because they reflect states of the physical world, are separated into two subsets: the observables, like momentary location, and the hidden, like velocity.

Because these two variables are correlated, subsequent observations of the observable variables give us information about the hidden variables so that we can estimate the values of those hidden variables. In our case, we can estimate how fast an object is moving from multiple observations of its location.

Kalman Filter Design

When we design a Kalman filter, we need two functions: a state transition function and a measurement function. Let's look at both of these in the context of our 1D motion example.

Our state transition provides the following update rule for xx: x=x+x˙Δtx^\prime = x + \dot{x}\Delta{t}. This function signifies that the resulting position is the sum of the current position and the product of the current velocity and the time delta. Additionally, our state transition provides the following update rule for x˙\dot{x}: x˙=x˙\dot{x}^\prime = \dot{x}. In other words, our velocity is constant.

We can express these two update rules simultaneously using linear algebra.

(xx˙)(1101)(xx˙)\begin{pmatrix} x^\prime \\ \dot{x}^\prime \end{pmatrix} \leftarrow \begin{pmatrix} 1 & 1 \\ 0 & 1 \end{pmatrix} \begin{pmatrix} x \\ \dot{x} \end{pmatrix}

Our measurement function only observes the current position and not the velocity, and so we represent it like this.

z(10)(xx˙)z \leftarrow \begin{pmatrix} 1 & 0 \end{pmatrix} \begin{pmatrix} x \\ \dot{x} \end{pmatrix}

We refer to the state transition matrix as FF and the measurement function matrix as HH.

The actual update equations for Kalman filters, which generalize the specific example we have been using, are more involved.

In the prediction step, we update the current best estimate, xx, by multiplying it with the state transition matrix, FF, and adding the motion vector, uu.

x=Fx+ux^\prime = Fx + u

Our best estimate is accompanied by a covariance matrix, PP, which characterizes the uncertainty of our estimate in each dimension of xx. PP gets updated according to the following equation.

P=F×P×FTP^\prime = F \times P \times F^T

In the measurement update step, we first compute the error, yy. This error is the difference between the measurement, zz, and the product of the current estimate, xx, and HH, the matrix that maps the space of xx into the space of zz.

y=zHxy = z - Hx

The error is mapped into a matrix, SS, which is obtained by projecting the system uncertainty, PP, into the measurement space using the measurement function matrix, HH, plus a matrix that characterizes the measurement noise, RR.

S=H×P×HT+RS = H \times P \times H^T + R

We then compute a variable, KK, which is often called the Kalman gain, where we invert SS.

K=PHTS1K = PH^TS^{-1}

Finally, we update our estimate, xx, and uncertainty, PP, as follows.

x=x+(K×y)x^\prime = x + (K \times y)
P=(IK×H)×PP^\prime = (I - K \times H) \times P

Kalman Matrices Quiz

Let's complete one last challenging programming assignment. We are going to implement a multidimensional Kalman filter for the 1D motion/velocity example we have been investigating.

We are going to use a matrix class to help us manipulate matrices. This class can:

  • initialize matrices from python lists
  • zero out existing matrices
  • compute an identity matrix
  • print a matrix
  • add, subtract, and multiply matrices
  • compute the transpose of a matrix
  • compute the inverse of a matrix

We can create a matrix like follows:

a = matrix([[10.], [10.]])

We can print the matrix with the show method:

a.show()
# [10.0]
# [10.0]

We can compute the transpose of a matrix using the transpose method:

b = a.transpose()
b.show()
# [10.0, 10.0]

We can multiply two matrices simply using the multiplication operator, which the matrix class overloads.

a = matrix([[10.], [10.]])
F = matrix([[12., 8.], [6., 2.]])
b = a * F
b.show()
# [200.0]
# [80.]

Using this library, let's set an initial state, x. This is a 1D tracking problem, so the state contains fields for the position and velocity.

x = matrix([[0.], [0.]])

We initialize an uncertainty matrix, p, which is characterized by a high uncertainty regarding both position and velocity.

P = matrix([[1000., 0.], [0., 1000.]])

We can specify an external motion u, but we set it to zero, so it has no effect.

u = matrix([[0.], [0.]])

We can specify the next state function, F, which takes on values according to the update rules we specified for each dimension in x.

F = matrix([[1., 1.], [0., 1.]])

We can specify a measurement function, H, which extracts the first of the two values. In other words, it observes the location but not the velocity.

H = matrix([[1., 0]])

We have a measurement uncertainty, R, which reflects the uncertainty in our positional measurements.

R = matrix([[1.]])

Finally, we have an identity matrix, I.

I = matrix([[1., 0.],[0., 1.]])

Additionally, we have a list of measurements, measurements = [1,2,3]. We want to implement a method, filter, which takes in the estimate, x, and uncertainty, P, and returns an estimate of velocity derived from those measurements.

When implementing filter, we should program the measurement update first and then the motion update. Here is the shell of filter.

def filter(x, P):
    for n in range(len(measurements)):
        # measurement update
        # prediction
        print('x = ')
        x.show()
        print('P = ')
        P.show()

Once we run this function, we can see the following progress.

After the first measurement, we can see that we have essentially solved the localization problem, since x[0] correctly equals 1, but we haven't learned anything about velocity.

After the second run, we see that we have solved the localization problem and the velocity problem.

After the third measurement, we see that the estimate for both position and velocity remain correct.

Can you implement the method filter?

Kalman Matrices Quiz Solution

We can implement, in code, the measurement update and motion update rules we learned previously.

Note: We must convert the measurement into a matrix before subtracting H * x from it. Why? From the implementation of the overloaded multiplication operator (look for the __mul__ method, read more here), we see that the resulting product is an instance of the matrix class. As a result, if you try to compute measurements[n] - H * x, you will see the following error:

TypeError: unsupported operand type(s) for -: 'int' and 'instance'

Basically, you can't subtract an object from an integer. If you convert y into an instance of matrix - matrix([[y]]) - this error should go away.

OMSCS Notes is made with in NYC by Matt Schlenker.

Copyright © 2019-2023. All rights reserved.

privacy policy