Skip to main content
Back to blog

How we detect heel strike from a phone video

Peaks, prominences, and why ankle Y-position turns out to be a surprisingly honest signal.

Eamon 2 min read
A line plot showing ankle Y-position with marked maxima labelled as heel strikes.

Detecting heel strike sounds easy until you stare at the data. We use ankle Y-position maxima — the frame where the foot is at its lowest point in the image — as our heel-strike signal. A few approaches we tried and rejected first:

What didn’t work

  • Ankle separation peaks. The hypothesis was that maximum separation between the feet equals heel strike. It doesn’t: peak separation happens before the foot lands, while the leg is still swinging through.
  • Acceleration peaks. Too noisy with MediaPipe data. We got too many false positives — every twitch in the limb showed up as a contact.

What worked

MediaPipe normalises landmarks differently per axis: X by video width, Y by video height. Higher Y means lower in the image, which means closer to the ground. So the local maxima of the ankle’s Y-coordinate are a very clean signal for “the foot is on the ground right now.”

# In detection.py
peaks, _ = find_peaks(
    ankle_y_smoothed,
    prominence=PROMINENCE_THRESHOLD,
    distance=MIN_FRAMES_BETWEEN_STRIKES,
)

The trick is tuning prominence and distance per signal — what works for ankle Y doesn’t work for acceleration. Our values came from running the detector against a corpus of self-filmed videos and comparing to manually labelled ground truth.

A wrinkle: aspect-ratio correction

Once you have heel strikes, you can compute stride length as the X-distance between successive same-foot contacts. Divide by body height (estimated from nose-to-ankle, scaled by 1/0.85) and you have a normalised stride.

But X is normalised to video width and Y is normalised to video height. You can’t just compare them. Portrait 9:16 video inflates an uncorrected ratio by ~1.78x. The fix:

physical_ratio = (x_distance / y_distance) * (video_width / video_height)

Without that correction, anyone filming on a phone gets a stride length score that flatters them by 78%. With it, the numbers line up across landscape and portrait clips of the same runner.

More posts on the rest of the pipeline are coming. If you want the gory detail, the source for the pose analysis layer is open in the repo.


← Back to blog

All posts →