PoseNet - Camera Feed Demo

Pose Estimation using PoseNet

PoseNet allows anyone with a webcam-equipped desktop or phone to detect body parts (eyes, ears, nose, shoulders, elbows, wrists, hips, knees, and ankles) within a web browser. This algorithm is estimating where key body joints using advance machine learning techniques (convolutional neural networks).

This algorithm and initial development was created Google Creative Team. BRS team expands this algorithm to include many business applications for our clients in the Augmented Reality (AR) space. For example, we can expand this algorithm to detect fundamental of a baseball or golf swing in the sports space.

Tech Notes

At a high level pose estimation happens in two phases:

An input RGB image is fed through a convolutional neural network.
Either a single-pose or multi-pose decoding algorithm is used to decode poses, pose confidence scores, keypoint positions, and keypoint confidence scores from the model outputs.

PoseNet runs with either a single-pose or multi-pose detection algorithm. The single person pose detector is faster and more accurate but requires only one subject present in the image.

The output stride and image scale factor have the largest effects on accuracy/speed. A higher output stride results in lower accuracy but higher speed. A higher image scale factor results in higher accuracy but lower speed.