Back in February, we installed Caffe on the TX1. At the time, the TX1 was running a 32-bit version of L4T 23.1. With the advent of the 64-bit L4T 24.2, this seems like a good time to do a performance comparison of the two. The TX1 can now do an image recognition in about 8 ms! For the install and test, Looky Here:
As you recall, Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind. It was created by Yangqing Jia during his PhD at UC Berkeley, and is in active development by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
The L4T 23.1 Operating System release was a 64-bit kernel supporting a 32-bit user space. For the L4T 24.2 release, both the kernel and the user space are 64-bit.
A script is available in the JetsonHack Github repository which will install the dependencies for Caffe, downloads the source files, configures the build system, compiles Caffe, and then runs a suite of tests. Passing the tests indicates that Caffe is installed correctly.
This installation demonstration is for a NVIDIA Jetson TX1 running L4T 24.2, an Ubuntu 16.04 variant. The installation of L4T 24.2 was done using JetPack 2.3, and includes installation of OpenCV4Tegra, CUDA 8.0, and cuDNN 5.1.
Before starting the installation, you may want to set the CPU and GPU clocks to maximum by running the script:
$ sudo ./jetson_clocks.sh
The script is in the home directory, and is also included in the installCaffeJTX1 repository for convenience.
In order to install Caffe:
$ git clone https://github.com/jetsonhacks/installCaffeJTX1.git
$ cd installCaffeJTX1
Installation should not require intervention, in the video installation of dependencies and compilation took about 10 minutes. Running the unit tests takes about 45 minutes. While not strictly necessary, running the unit tests makes sure that the installation is correct.
At the end of the video, there are a couple of timed tests which can be compared with the Jetson TK1, and the previous installation:
|Jetson TK1 vs. Jetson TX1 Caffe GPU Example Comparison
10 iterations, times in milliseconds
|Machine||Average FWD||Average BACK||Average FWD-BACK|
|Jetson TK1 (32-bit OS)||234||243||478|
|Jetson TX1 (32-bit OS)||179||144||324|
with cuDNN support (32-bit OS)
|Jetson TX1 (64-bit OS)||110||122||233|
with cuDNN support (64-bit)
There is definitely a performance improvement between the 32-bit and 64-bit releases. There are a couple of factors for the performance improvement. One is the change from a 32-bit to 64-bit operating system. Another factor is the improvement of the deep learning libraries, CUDA and cuDNN, between the releases. Considering that the tests are running on the exact same hardware, the performance boost is impressive. Using cuDNN provides a huge gain in the forward pass tests.
The tests are running 50 iterations of the recognition pipeline, and each one is analyzing 10 different crops of the input image, so look at the ‘Average Forward pass’ time and divide by 10 to get the timing per recognition result. For the 64-bit version, that means that an image recognition takes about 8 ms.
It is worth mentioning that NVCaffe is a special branch of Caffe used on the TX1 which includes support for FP16. The above tests use FP32. In many cases, FP32 and FP16 give very similar results; FP16 is faster. For example, in the above tests, the Average Forward Pass test finishes in about 60ms, a result of 6 ms per image recognition!
Deep learning is in its infancy and as people explore its potential, the Jetson TX1 seems well positioned to take the lessons learned and deploy them in the embedded computing ecosystem. There are several different deep learning platforms being developed, the improvement in Caffe on the Jetson Dev Kits over the last couple of years is quite impressive.
The installation in this video was done directly after flashing L4T 24.2 on to the Jetson TX1 with CUDA 8.0, cuDNN r5.1 and OpenCV4Tegra. Git was then installed:
$ sudo apt-get install git
The latest Caffe commit used in the video is: 80f44100e19fd371ff55beb3ec2ad5919fb6ac43