Aadish Verma
résumé
/
notes
/
github
/
email
Kalman filters
May 11, 2026
·
9 min read
For the final project of the
math course
I am taking this school year, I wrote a brief explanatory
paper aiming to provide some intuition for the standard equations used for the Kalman filter and
Extended Kalman filter. I have a track record of writing
about
control
theory
, I thought I would
publish it here in case it helps anyone.
First, I give the formula for the standard Kalman filter formula. Then, I derive the Extended Kalman
filter formula.
(Note: my notation is quite sloppy, particularly, I don’t distinguish between the
a priori
and
a
posteriori
state variables, which makes some equations confusing to read.)
Standard Kalman filter
The goal of any filter is to, given a time series of measurements from a noisy sensor (e.g., that has
some predictable noise added to an otherwise accurate measurement), predict the state of a linear
system. In this case, the linear system must be able to be expressed using a state transition matrix
and a process noise matrix (which represents inherent noise in the system, as opposed to actual
inaccuracies in the sensor). For example, a robot could be trying to estimate its position using a
noisy visual odometry sensor (i.e., that uses a floor-facing camera to track position, like computer
mice do). It assumes a few key properties:
•
The system can be modeled using a state
𝑛
-vector
𝑥
with sensors producing
𝑚
-dimensional
vector measurements
𝑧
•
The system can be modeled discretely from time step
𝑡
−
1
to step
𝑡
using a state transition
matrix
𝐹
and a
𝑛
×
𝑛
process covariance matrix
𝑄
:
‣
𝑥
𝑡
=
𝐹
𝑥
𝑡
−
1
+
𝒩︀
(
0
,
𝑄
)
‣
(where
𝒩︀
(
0
,
𝑄
)
represents a random sample from a multivariate Gaussian centered at
0
with
covariance
𝑄
(covariance is the multidimensional analog of variance)
•
Measurements can be modeled based on an
𝑚
×
𝑛
observation matrix
𝐻
and a
𝑚
×
𝑚
measurement covariance matrix
𝑅
:
‣
𝑧
𝑡
=
𝐻
𝑥
𝑡
+
𝒩︀
(
0
,
𝑅
)
The Kalman filter then tracks two variables; its predicted state
𝑥
𝑡
, and the estimated uncertainty in
that state, reprsented as a covariance matrix
𝑃
. Together, these form a Gaussian distribution over
the state space which tracks both the mean (predicted state) and the variance (covariance matrix) of
the system.
The actual Kalman filter then consists of two steps to update the variables.
Predict step
In this step, we update the model based on our priors (we don’t yet use the latest
sensor measurements). This has two parts; we update the predicted state using the state transition
matrix, then update the uncertainty covariance matrix using the state transition and process noise
matrices.
𝑥
𝑡
=
𝐹
𝑥
𝑡
−
1
.
This is the exact application of the state transition matrix we discussed above.
𝑃
𝑡
=
𝐹
𝑃
𝑡
−
1
𝐹
𝑇
+
𝑄
Doing
𝐹
𝑃
𝐹
𝑇
is the standard way to apply a transformation to a covariance matrix; it will be
heavily used in the update step as well. We also directly add the process noise covariance to
account for drift in the system since the last time step (not to be confused with drift in the
measurements, which we account for in the update step).
Update step
In this step, we use the latest sensor measurements to update the predicted state and
covariance matrix. First, we compute the Kalman gain
𝐾
. The Kalman gain matrix is essentially a
measure of how much “trust” we have in the sensor measurements compared to our current state. If
the Kalman gain is higher, then we correct our model to match the sensor measurements more.
Constructing the Kalman gain is a multistep process that importantly does not involve the
measurement. First, we apply the observation matrix to the uncertainty and add the measurement
noise covariance; this quantity,
𝐻
𝑃
𝑡
𝐻
𝑇
+
𝑅
, represents the covariance of the sensor measurements
(due in part to the sensor noise and in part to the uncertainty in our model). Taking the inverse of
the covariance results in the “precision” matrix, representing how accurate we believe each
component of the sensor prediction is at any given time.
Next, we multiply the precision matrix
(
𝐻
𝑃
𝑡
𝐻
𝑇
+
𝑅
)
−
1
by the observation matrix
𝐻
to bring the
precision matrix back into state space (it was previously in sensor/observation space). Finally, we
multiply by the state covariance
𝑃
𝑡
(again) to determine how the precision in measurements should
be weighted. This results in the standard Kalman gain matrix
𝐾
=
𝑃
𝑡
𝐻
𝑇
(
𝐻
𝑃
𝑡
𝐻
𝑇
+
𝑅
)
−
1
.
In summary, the Kalman gain matrix is computed by
•
finding the state uncertainty in sensor space
•
finding the measurement precision in sensor space
•
finding the measurement precision in state space
•
weighting by the state covariance to determine how the precision in measurements should be
weighted
The rest of the update step is simply applying the Kalman gain. We first multiply the innovation –
𝑧
𝑡
−
𝐻
𝑥
𝑡
, the error between the measurement and the predicted measurement from our current
model – by the Kalman gain to add to the state estimate:
𝑥
𝑡
=
𝑥
𝑡
+
𝐾
(
𝑧
𝑡
−
𝐻
𝑥
𝑡
)
.
To update our state uncertainty matrix, we again project the Kalman gain fully into state space
using the observation matrix, producing a matrix representing how much of the state is correctable
by the measurement. By subtracting from the identity matrix, we then produce a matrix
representing which parts of the state were
not
corrected by the measurement, which we multiply
the previous state covariance matrix by to get the updated uncertainty:
𝑃
𝑡
=
(
𝐼
−
𝐾
𝐻
)
𝑃
𝑡
.
At the end of the update step, we have successfully updated our model to correct the state estimate
and uncertainty matrix to account for the measurement.
Derivation of Extended Kalman filter
The vast majority of control theory applications are not a linear model. As a example that actually
shows up regularly in VEX robots, imagine a square robot inside a 2D square field with four
distance sensors on each side trying to localize against its
𝑥
and
𝑦
position. If the robot doesn’t
turn, the sensors simply report the distance to the nearest wall. The distances to each wall have a
clear linear relation to the
𝑥
and
𝑦
coordinates, so we can model this as a linear system. However, if
we start to account for turns made by the robot, the problem quickly devolves; if we track
𝜃
, the
state space becomes nonlinear, as raycasting is needed to predict distance sensor readings. The
Extended Kalman filter is the most feasible approach, as it allows Kalman filters to be used for
nonlinear systems (which all systems but the most contrived examples are).
The Extended Kalman filter uses two differentiable functions to replace the state transition and
observation matrices:
𝑔
(
𝑢
𝑡
,
𝑥
𝑡
−
1
)
represents the state transition for a previous state
𝑥
𝑡
−
1
and a control
𝑢
𝑡
. Controls are
factors influencing the state that
we
can control and can thus use to update our priors about the
state in the predict step; continuing the above example, the control for the robot may be the
voltages sent to the drivetrain motors.
ℎ
(
𝑥
𝑡
)
represents the observation or sensor reading for a state
𝑥
𝑡
, replacing the observation matrix.
Continuning the above example,
ℎ
(
𝑥
𝑡
)
would, given a predicted pose (
𝑥
/
𝑦
/
𝜃
) of the robot, simulate
the distance sensors and return the predicted distance sensor readings.
We can then rewrite our previous definitions of the state transition and observation calculation as
follows:
𝑥
𝑡
⇒
𝑔
(
𝑢
𝑡
,
𝑥
𝑡
−
1
)
+
𝒩︀
(
0
,
𝑄
)
𝑧
𝑡
⇒
ℎ
(
𝑥
𝑡
)
+
𝒩︀
(
0
,
𝑅
)
The Extended Kalman filter takes advantage of the linear approximation of
𝑔
and
ℎ
(often referred
to in the literature as the first-order Taylor expansion). Taylor expansions in multiple dimensions
are very similar to the one-dimensional Taylor series covered in single-variable calculus. The Taylor
expansion for a vector-valued vector-input function
𝑓
(
𝑥
)
is given by:
𝑓
(
𝑥
)
=
𝑓
(
𝑥
0
)
+
(
D
𝑓
)
(
𝑥
0
)
×
(
𝑥
−
𝑥
0
)
+
1
2
(
𝑥
−
𝑥
0
)
𝑇
(
H
𝑓
)
(
𝑥
0
)
×
(
𝑥
−
𝑥
0
)
+
…
.
(It gets quite ugly for the third term onwards.) Of course, for the Extended Kalman filter, we only
need the first two terms, the multivariate linear approximation, which uses its Jacobian/derivative
matrix (the equivalent of slope in multivariate calculus).
The Extended Kalman filter is then a simple modification over the standard Kalman filter. The
changes made are as follows:
1.
In the predict step, instead of using a linear state transition model (
𝑥
𝑡
=
𝐹
𝑥
𝑡
−
1
), we use the
nonlinear state transition function
𝑔
:
𝑥
𝑡
=
𝑔
(
𝑢
𝑡
,
𝑥
𝑡
−
1
)
.
2.
In the update step when computing the innovation, instead of using the observation matrix
𝐻
,
we use the nonlinear observation function
ℎ
:
𝑥
𝑡
=
𝑥
𝑡
+
𝐾
(
𝑧
𝑡
−
ℎ
(
𝑥
𝑡
)
)
.
3.
In general, throughout the filter, instead of using the observation matrix
𝐻
to propagate
covariances, we use the observation function’s Jacobian
(
D
ℎ
)
(
𝑥
𝑡
)
, and instead of using the state
transition matrix
𝐹
, we use the state transition function’s Jacobian
(
D
𝑔
)
(
𝑢
𝑡
,
𝑥
𝑡
−
1
)
.
To summarize, to convert a Kalman filter to an Extended Kalman filter, we replace
•
uses of the state transition/observation matrices applied to vectors (to update the timestep or to
move between state- and sensor-space)
⇒
use the nonlinear functions
𝑔
and
ℎ
•
uses of the state transition/observation matrices applied to matrices (to propagate covariances)
⇒
use the Jacobians
(
D
𝑔
)
(
𝑢
𝑡
,
𝑥
𝑡
−
1
)
and
(
D
ℎ
)
(
𝑥
𝑡
)
.
Both standard and extended Kalman filters are found throughout robotics and other applications.
Sources
Thrun, S., Burgard, W., & Fox, D. (2005).
Probabilistic Robotics.
The MIT Press.