Aadish Verma

Kalman filters

9 min read

For the final project of the math course I am taking this school year, I wrote a brief explanatory paper aiming to provide some intuition for the standard equations used for the Kalman filter and Extended Kalman filter. I have a track record of writing about control theory, I thought I would publish it here in case it helps anyone.First, I give the formula for the standard Kalman filter formula. Then, I derive the Extended Kalman filter formula.(Note: my notation is quite sloppy, particularly, I don’t distinguish between the a priori and a posteriori state variables, which makes some equations confusing to read.)Standard Kalman filterThe goal of any filter is to, given a time series of measurements from a noisy sensor (e.g., that has some predictable noise added to an otherwise accurate measurement), predict the state of a linear system. In this case, the linear system must be able to be expressed using a state transition matrix and a process noise matrix (which represents inherent noise in the system, as opposed to actual inaccuracies in the sensor). For example, a robot could be trying to estimate its position using a noisy visual odometry sensor (i.e., that uses a floor-facing camera to track position, like computer mice do). It assumes a few key properties:The system can be modeled using a state 𝑛-vector 𝑥 with sensors producing 𝑚-dimensional vector measurements 𝑧The system can be modeled discretely from time step 𝑡1 to step 𝑡 using a state transition matrix 𝐹 and a 𝑛×𝑛 process covariance matrix 𝑄:𝑥𝑡=𝐹𝑥𝑡1+𝒩︀(0,𝑄)(where 𝒩︀(0,𝑄) represents a random sample from a multivariate Gaussian centered at 0 with covariance 𝑄 (covariance is the multidimensional analog of variance)Measurements can be modeled based on an 𝑚×𝑛 observation matrix 𝐻 and a 𝑚×𝑚 measurement covariance matrix 𝑅:𝑧𝑡=𝐻𝑥𝑡+𝒩︀(0,𝑅)The Kalman filter then tracks two variables; its predicted state 𝑥𝑡, and the estimated uncertainty in that state, reprsented as a covariance matrix 𝑃. Together, these form a Gaussian distribution over the state space which tracks both the mean (predicted state) and the variance (covariance matrix) of the system.The actual Kalman filter then consists of two steps to update the variables.Predict step In this step, we update the model based on our priors (we don’t yet use the latest sensor measurements). This has two parts; we update the predicted state using the state transition matrix, then update the uncertainty covariance matrix using the state transition and process noise matrices.𝑥𝑡=𝐹𝑥𝑡1.This is the exact application of the state transition matrix we discussed above.𝑃𝑡=𝐹𝑃𝑡1𝐹𝑇+𝑄Doing 𝐹𝑃𝐹𝑇 is the standard way to apply a transformation to a covariance matrix; it will be heavily used in the update step as well. We also directly add the process noise covariance to account for drift in the system since the last time step (not to be confused with drift in the measurements, which we account for in the update step).Update step In this step, we use the latest sensor measurements to update the predicted state and covariance matrix. First, we compute the Kalman gain 𝐾. The Kalman gain matrix is essentially a measure of how much “trust” we have in the sensor measurements compared to our current state. If the Kalman gain is higher, then we correct our model to match the sensor measurements more.Constructing the Kalman gain is a multistep process that importantly does not involve the measurement. First, we apply the observation matrix to the uncertainty and add the measurement noise covariance; this quantity, 𝐻𝑃𝑡𝐻𝑇+𝑅, represents the covariance of the sensor measurements (due in part to the sensor noise and in part to the uncertainty in our model). Taking the inverse of the covariance results in the “precision” matrix, representing how accurate we believe each component of the sensor prediction is at any given time.Next, we multiply the precision matrix (𝐻𝑃𝑡𝐻𝑇+𝑅)1 by the observation matrix 𝐻 to bring the precision matrix back into state space (it was previously in sensor/observation space). Finally, we multiply by the state covariance 𝑃𝑡 (again) to determine how the precision in measurements should be weighted. This results in the standard Kalman gain matrix𝐾=𝑃𝑡𝐻𝑇(𝐻𝑃𝑡𝐻𝑇+𝑅)1.In summary, the Kalman gain matrix is computed byfinding the state uncertainty in sensor spacefinding the measurement precision in sensor spacefinding the measurement precision in state spaceweighting by the state covariance to determine how the precision in measurements should be weightedThe rest of the update step is simply applying the Kalman gain. We first multiply the innovation – 𝑧𝑡𝐻𝑥𝑡, the error between the measurement and the predicted measurement from our current model – by the Kalman gain to add to the state estimate:𝑥𝑡=𝑥𝑡+𝐾(𝑧𝑡𝐻𝑥𝑡).To update our state uncertainty matrix, we again project the Kalman gain fully into state space using the observation matrix, producing a matrix representing how much of the state is correctable by the measurement. By subtracting from the identity matrix, we then produce a matrix representing which parts of the state were not corrected by the measurement, which we multiply the previous state covariance matrix by to get the updated uncertainty:𝑃𝑡=(𝐼𝐾𝐻)𝑃𝑡.At the end of the update step, we have successfully updated our model to correct the state estimate and uncertainty matrix to account for the measurement.Derivation of Extended Kalman filterThe vast majority of control theory applications are not a linear model. As a example that actually shows up regularly in VEX robots, imagine a square robot inside a 2D square field with four distance sensors on each side trying to localize against its 𝑥 and 𝑦 position. If the robot doesn’t turn, the sensors simply report the distance to the nearest wall. The distances to each wall have a clear linear relation to the 𝑥 and 𝑦 coordinates, so we can model this as a linear system. However, if we start to account for turns made by the robot, the problem quickly devolves; if we track 𝜃, the state space becomes nonlinear, as raycasting is needed to predict distance sensor readings. The Extended Kalman filter is the most feasible approach, as it allows Kalman filters to be used for nonlinear systems (which all systems but the most contrived examples are).The Extended Kalman filter uses two differentiable functions to replace the state transition and observation matrices:𝑔(𝑢𝑡,𝑥𝑡1) represents the state transition for a previous state 𝑥𝑡1 and a control 𝑢𝑡. Controls are factors influencing the state that we can control and can thus use to update our priors about the state in the predict step; continuing the above example, the control for the robot may be the voltages sent to the drivetrain motors.(𝑥𝑡) represents the observation or sensor reading for a state 𝑥𝑡, replacing the observation matrix. Continuning the above example, (𝑥𝑡) would, given a predicted pose (𝑥/𝑦/𝜃) of the robot, simulate the distance sensors and return the predicted distance sensor readings.We can then rewrite our previous definitions of the state transition and observation calculation as follows:𝑥𝑡𝑔(𝑢𝑡,𝑥𝑡1)+𝒩︀(0,𝑄)𝑧𝑡(𝑥𝑡)+𝒩︀(0,𝑅)The Extended Kalman filter takes advantage of the linear approximation of 𝑔 and (often referred to in the literature as the first-order Taylor expansion). Taylor expansions in multiple dimensions are very similar to the one-dimensional Taylor series covered in single-variable calculus. The Taylor expansion for a vector-valued vector-input function 𝑓(𝑥) is given by:𝑓(𝑥)=𝑓(𝑥0)+(D𝑓)(𝑥0)×(𝑥𝑥0)+12(𝑥𝑥0)𝑇(H𝑓)(𝑥0)×(𝑥𝑥0)+.(It gets quite ugly for the third term onwards.) Of course, for the Extended Kalman filter, we only need the first two terms, the multivariate linear approximation, which uses its Jacobian/derivative matrix (the equivalent of slope in multivariate calculus).The Extended Kalman filter is then a simple modification over the standard Kalman filter. The changes made are as follows:1.In the predict step, instead of using a linear state transition model (𝑥𝑡=𝐹𝑥𝑡1), we use the nonlinear state transition function 𝑔:𝑥𝑡=𝑔(𝑢𝑡,𝑥𝑡1).2.In the update step when computing the innovation, instead of using the observation matrix 𝐻, we use the nonlinear observation function :𝑥𝑡=𝑥𝑡+𝐾(𝑧𝑡(𝑥𝑡)).3.In general, throughout the filter, instead of using the observation matrix 𝐻 to propagate covariances, we use the observation function’s Jacobian (D)(𝑥𝑡), and instead of using the state transition matrix 𝐹, we use the state transition function’s Jacobian (D𝑔)(𝑢𝑡,𝑥𝑡1).To summarize, to convert a Kalman filter to an Extended Kalman filter, we replaceuses of the state transition/observation matrices applied to vectors (to update the timestep or to move between state- and sensor-space) use the nonlinear functions 𝑔 and uses of the state transition/observation matrices applied to matrices (to propagate covariances) use the Jacobians (D𝑔)(𝑢𝑡,𝑥𝑡1) and (D)(𝑥𝑡).Both standard and extended Kalman filters are found throughout robotics and other applications.SourcesThrun, S., Burgard, W., & Fox, D. (2005). Probabilistic Robotics. The MIT Press.