# Introduction to Kalman Filtering

We humans have been filtering things for virtually our entire
history. Water filtering is a simple example. We can filter
impurities from water as simply as using our hands to skim dirt
and leaves off the top of the water. Another example is
filtering out noise from our surroundings. If we paid attention
to all the little noises around us we would go crazy. We learn
to ignore superfluous sounds (traffic, applicances, and so on)
and focus on important sounds, like the voice of the person
we're speaking with.

There are also many examples in engineering where filtering
is desirable. Radio communications signals are often corrupted
with noise. A good filtering algorithm can remove the noise
from electromagnetic signals while still retaining the useful
information. Another example is voltages. Many countries
require in-home filtering of line voltages in order to power
personal computers and peripherals. Without filtering, the
power fluctuations would drastically shorten the useful
lifespan of the devices.

Kalman filtering is a relatively recent (1960) development
in filtering, although it has its roots as far back as Gauss
(1795). Kalman filtering has been applied in areas as diverse
as aerospace, marine navigation, nuclear power plant
instrumentation, demographic modeling, manufactring, and many
explain Kalman filtering.

Mathematics

Consider the problem of estimating the variables of a system.
In dynamic systems (that is, systems which vary with time) the
system variables are often denoted by the term “state
variables”. Assume that the system variables, represented by
the vector x, are governed by the equation xk+1 =
Axk + wk where wk is random
process noise, and the subscripts on the vectors represent the
time step. For instance, if our dynamic system consists of a
spacecraft which is accelerating with random bursts of gas from
its Reaction Control System thrusters, the vector x might
consist of position p and velocity v. Then the system equation
would be (1)

where ak is the random, time-varying acceleration
and T is the time between step k and step k+1. Now suppose we
can measure the position p. Then our measurement at time k can
be denoted zk = pk + vk where
vk is random measurement noise.

The question which is addressed by the Kalman filter is
this: Given our knowledge of the behavior of the system, and
given our measurements, what is the best estimate of position
and velocity? We know how the sytem behaves according to the
system equation, and we have measurements of the position, so
how can we determine the best estimate of the system variables?
Surely we can do better than just take each measurement at its
face value, especially if we suspect that we have a lot of
measurement noise.

The Kalman filter is formulated as follows. Suppose we
assume that the process noise wk is white gaussian
noise with a covariance matrix Q. Further assume that the
measurement noise is white gaussian noise with a covariance
matrix R, and that it is not correlated with the process noise.
We might want to formulate an estimation algorithm such that
the following statistical conditions hold:

1. The expected value of our estimate is equal to the
expected value of the state. That is, “on average,” our
estimate of the state will equal the true state.

2. We want an estimation algorithim such that of all
possible estimation algorithms, our algorithm minimizes the
expected value of the square of the estimation error. That
is, “on average,” our algorithm gives the “smallest” possible
estimation error.

It so happens that the Kalman filter is the estimation
algorithm which satisfies these criteria. There are many
alternative ways to formulate the Kalman filter
equations. One of the formulations is given in the
following equations:

Sk = Pk + R
(2)

Kk =
APk Sk -1
(3)

Pk+1 = APk AT + Q –
APk Sk -1 Pk A
T
(4) k+1
= A k
+ Kk (zk+1 – A k
(5)

In the above equations, the superscript -1 indicates
matrix inversion and the superscript T indicates matrix
transposition. S is called the covariance of the
innovation, K is called the gain matrix, and P is called
the covariance of the prediction error.

Equation 5 is fairly intuitive. The first term used to
derive the state estimate at time k+1 is just A times the
state estimate at time k. This would be the state
estimate if we didn't have a measurement. In other words,
the state estimate propagates in time just like the state
vector (see Equation 1). The second term in Equation 5 is
called the corrector term, and it represents how much to
correct the propagated estimated due to our measurement.
Inspection of Equation 5 indicates that if the
measurement noise is much greater than the process noise,
K will be small (that is, we won't give much credence to
the measurement); if the measurement nosie is much
smaller than the process noise, K will be large (that is,
we will give a lot of credence to the measurement).

Example

Let's look at an example. The system represented by
Equation 1 was simulated on a computer with random bursts
of acceleration which had a standard deviation of 0.5
feet/sec. The position was measured with an error of 10
feet (one standard deviation). Figure 1 shows how
well the Kalman Filter was able to estimate the position,
in spite of the large measurement noise. Figure 1:  The graph demonstrates the Kalman filter's ability to estimate position despite a large noise measurement

The MATLAB program that I used to generate the above
don't know MATLAB”it's an easy-to-read language, almost
like pseudo code. The parameters I used to generate the
above results were a = 1, dt =
0.5, and duration = 30. If you use MATLAB to run the
program you will get different results every time because
of the random noise that is simulated, but the results

Kalman filtering is a huge field whose depths we
cannot hope to begin to plumb in such a brief paper as
this. Thousands of papers and dozens of textbooks have
been written on this subject since its inception in 1960.
Some issues which complicate the application of the
Kalman filter are as follows.

1. We have assumed that the system equation is linear
(see Equation 1). What if the equation is
nonlinear?

2. What if the measurement noise and process noise are
not Gaussian, not white, and not independent of each
other?

3. What if the statistics (for example, the covariance
matrix) of the noise is not known?

4. What if, rather than estimating the state of a
the measurements and we want to reconstruct a time
history of the state? Can we do better than a Kalman
filter? It would seem that we could since we have more
information available (that is, we have future
measurements) to estimate the state at a given time.
This is called the smoothing problem.

5. Equations 2 to 5 are matrix equations, and as such
can impose a large computational burden for
high-dimensional systems. Is there a way to approximate
the Kalman filter for large systems, reducing the
computational load while still approaching the
theoretical optimum of the Kalman filter?

6. What if the noise characteristics change with time?
Can we somehow formulate a Kalman filter that adapts
over time to changes in the noise
characteristics?

7. What if, rather than desired to minimize the
“average” estimation error, we desire to minimize the
“worst case” estimation error? This is known as the
minimax or H-infinity estimation problem.