Outlier-robust estimation of state-space models using a penalized approach


Rajan Shankar¹, Garth Tarr¹, Ines Wilms², Jakob Raymaekers³

¹University of Sydney, ²Maastricht University, ³University of Antwerp

Blue whale data

Blue whale data

Blue whale data

Goal: Recover the true path taken by the whale

Why? Animal scientists need accurate movement paths to study animal behaviour


Modelling considerations:

  • Randomness of the whale
  • Measurement error of the satellite

Goal

State-space model

\[\begin{align} \mathbf x_t &= \Phi \mathbf x_{t-1} + \mathbf w_t \\ \mathbf y_t &= A \mathbf x_t + \mathbf v_t \\ \end{align}\]

  • \(\mathbf x_t\): state vector
    • true position of object
  • \(\Phi\): state transition matrix
  • \(\mathbf w_t \sim N(\mathbf 0, \Sigma_\mathbf{w})\)
  • \(\mathbf y_t\): observation vector
    • measured position of object
  • \(A\): observation / measurement matrix
  • \(\mathbf v_t \sim N(\mathbf 0, \Sigma_\mathbf{v})\)

Usually, \(\Phi\), \(\Sigma_\mathbf{w}\) and \(\Sigma_\mathbf{v}\) depend on model parameters \(\boldsymbol \theta\).

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

State-space model

Where are State Space Models used?

  • Econometrics & Finance
    • Unobserved components of GDP, inflation, interest rates
  • Engineering & Control
    • Signal processing and target tracking (radar, sonar, GPS)
  • Environmental & Ecological Science
    • Animal movement from telemetry data
  • Biomedicine
    • EEG/ECG signal analysis

Key Idea: SSMs apply whenever a time series is driven by an unobserved dynamic state.

Classical estimation

Given a parameter vector \(\boldsymbol \theta\):

  • \(\mathbf {\hat y}_{t|t-1}(\boldsymbol\theta)\) is the prediction for observation \(t\) given data up to time \(t-1\)
  • \(\mathbf r_t(\boldsymbol\theta) := \mathbf y_{t} - \mathbf {\hat y}_{t|t-1}(\boldsymbol\theta)\) is the residual
  • \(S_{t|t-1}(\boldsymbol\theta)\) is the prediction variance

\(\mathbf {\hat y}_{t|t-1}(\boldsymbol\theta)\) and \(S_{t|t-1}(\boldsymbol\theta)\) are computed using the Kalman filter

\[ \min_{\boldsymbol\theta}\sum_{t=1}^n \left\{\log \left|S_{t|t-1}(\boldsymbol\theta)\right| + \mathbf r_t(\boldsymbol\theta)^\top S_{t|t-1}^{-1}(\boldsymbol\theta) \mathbf r_t(\boldsymbol\theta)\right\} \]

The estimate for \(\boldsymbol \theta\) can be found using standard optimisation routines.

Outliers