Back in 2001 Viola and Jones published a paper on a real time face detection framework capable of rapid image processing and high detection rates, one of the seminal works in computer vision. There were three main contributions. First is an image representation called the “Integral Image” which allows features to be computed quickly. The second is a simple and efficient classifier to select a small number of critical visual features from a large set of potential features. The third contribution is a method for combining classifiers in a “cascade” which allows back-ground regions of the image to be quickly discarded which allows more computation on promising face-like regions. Here’s the demo on the Jetson:
A complete description of the algorithm is a little beyond what I can cover here. If you’re interested you can read one of their papers here: Robust Real-Time Face Detection.
Note here that I haven’t done optimization of any sort, it’s just basically “thrown together” in the simplest manner. You’ll notice that the frame rate is only around 18fps when the detector is not running, around 12 when it is. If we got serious and built this for reals I’m sure we could do much better. However, for a quick demo of me making funny faces and wearing sun glasses and a baseball cap, I think it’s a good tradeoff! As an aside, when Viola and Jones originally implemented this on a PC in 2001, they operated on a 384×284 image at 15fps on an Intel Pentium III 700MHZ PC. Oh, how times change, and in a good way this time.
Another part of the algorithm is that there is a “training mode”, where a set of face and non-face training images are used. The face training set typically consists of several thousand hand labeled faces scaled and aligned, and a larger set of non-face images. For the demo, a relatively small 1MB of this distilled type of training information is used. For hard core usage of face detection several hundred megabytes of this information may be employed.
Fortunately for mankind, this is the enabling technology which allows everyone to wear mustaches, beards, nose glasses, and all sorts of silly things on their webcams. I guess it might have other applications, but I can’t think of any right off the bat.
The demo itself is a modified version of an example from Kyle McDonald’s openFrameworks add-on ofxCV package. ofxCV acts as a wrapper to interface with the NVIDIA accelerated OpenCV implementation on the Jetson TK1. OpenFrameworks has built in Gstreamer support which I used to grab the video from a Logitech c920 webcam.