PoseNet allows anyone with a webcam-equipped desktop or phone to detect body parts (eyes, ears, nose, shoulders, elbows, wrists, hips, knees, and ankles) within a web browser. This algorithm is estimating where key body joints using advance machine learning techniques (convolutional neural networks).
This algorithm and initial development was created Google Creative Team. BRS team expands this algorithm to include many business applications for our clients in the Augmented Reality (AR) space. For example, we can expand this algorithm to detect fundamental of a baseball or golf swing in the sports space.
At a high level pose estimation happens in two phases:
PoseNet runs with either a single-pose or multi-pose detection algorithm. The single person pose detector is faster and more accurate but requires only one subject present in the image.
The output stride and image scale factor have the largest effects on accuracy/speed. A higher output stride results in lower accuracy but higher speed. A higher image scale factor results in higher accuracy but lower speed.