Back in September, we installed the Caffe Deep Learning Framework on a Jetson TX1 Development Kit. With the advent of the Jetson TX2, now is the time to install Caffe and compare the performance difference between the two. Looky here:
Background
As you recall, Caffe is a deep learning framework developed with cleanliness, readability, and speed in mind. It was created by Yangqing Jia during his PhD at UC Berkeley, and is in active development by the Berkeley Vision and Learning Center (BVLC) and by community contributors.
Over the last couple of years, a great deal of progress has been made in speeding up the performance of the supporting underlying software stack. In particular the cuDNN library has been tightly integrated with Caffe, giving a nice bump in performance.
Caffe Installation
A script is available in the JetsonHack Github repository which will install the dependencies for Caffe, download the source files, configure the build system, compile Caffe, and then run a suite of tests. Passing the tests indicates that Caffe is installed correctly.
This installation demonstration is for a NVIDIA Jetson TX2 running L4T 27.1, a 64-bit Ubuntu 16.04 variant. The installation of L4T 27.1 was done using JetPack 3.0, and includes installation of OpenCV4Tegra, CUDA 8.0, and cuDNN 5.1.
Before starting the installation, you may want to set the CPU and GPU clocks to maximum by running the script:
$ sudo ./jetson_clocks.sh
The script is in the home directory.
In order to install Caffe:
$ git clone https://github.com/jetsonhacks/installCaffeJTX2.git
$ cd installCaffeJTX2
$ ./installCaffe.sh
Installation should not require intervention, in the video installation of dependencies and compilation took about 14 minutes. Running the unit tests takes about 19 minutes. While not strictly necessary, running the unit tests makes sure that the installation is correct.
Test Results
At the end of the video, there are a couple of timed tests which can be compared with the Jetson TX1. The following table adds some more information:
Jetson TK1 vs. Jetson TX1 vs. Jetson TX2 Caffe GPU Example Comparison 10 iterations, times in milliseconds | |||
---|---|---|---|
Machine | Average FWD | Average BACK | Average FWD-BACK |
Jetson TK1 (32-bit OS) | 234 | 243 | 478 |
Jetson TX1 (64-bit OS) | 80 | 119 | 200 |
Jetson TX2 (Mode Max-Q) | 78 | 97 | 175 |
Jetson TX2 (Mode Max-P) | 65 | 85 | 149 |
Jetson TX2 (Mode Max-N) | 56 | 75 | 132 |
The tests are running 50 iterations of the recognition pipeline, and each one is analyzing 10 different crops of the input image, so look at the ‘Average Forward pass’ time and divide by 10 to get the timing per recognition result. For the Max-N version of the Jetson TX2, that means that an image recognition takes about 5.6 ms.
The Jetson TX2 introduces the concept of performance modes. The Jetson TX1 has 4 ARM Cortex A57 CPU cores. In comparison, there are 6 CPU cores in the Tegra T2 SoC. Four are ARM Cortex-A57, the other two are NVIDIA Denver 2. Depending on performance and power requirements the cores can be taken on or offline, and the frequencies of their clocks set independently. There are five predefined modes available through the use of the nvpmodel CLI tool.
- sudo nvpmodel -m 1 (Max-Q)
- sudo nvpmodel -m 2 (Max-P)
- sudo nvpmodel -m 0 (Max-N)
Max-Q uses only the 4 ARM A57 cores at a minimal clock frequency. Note that from the table, this gives performance equivalent to the Jetson TX1. Max-Q sets the power profile to be 7.5W, so this represents Jetson TX1 performance while only using half the amount of power of a TX1 at full speed!
Max-P also uses only the 4 ARM A57 cores, but at a faster clock frequency. From the table, we can see that the Average Forward Pass drops from the Max-Q value of 78 to the Max-P value of 65. My understanding is that Max-P limits power usage to 15W.
Finally, we can see that in Max-N mode the Jetson TX2 performs best of all. (Note: This wasn’t shown in the video, it’s a special bonus for our readers here!) In addition to the 4 ARM A57 cores the Denver 2 cores come on line, and the clocks on the CPU and the GPU are put to their maximum values. To put it in perspective, the Jetson TX1 at max clock runs the test in about ~10000 ms, the Jetson TX2 at Max-N runs the same test in ~6600 ms. Quite a bit of giddy-up.
Conclusion
Deep learning is in its infancy and as people explore its potential, the Jetson TX2 seems well positioned to take the lessons learned and deploy them in the embedded computing ecosystem. There are several different deep learning platforms being developed, the improvement in Caffe on the Jetson Dev Kits over the last couple of years is way impressive.
Notes
The installation in this video was done directly after flashing L4T 27.1 on to the Jetson TX2 with CUDA 8.0, cuDNN r5.1 and OpenCV4Tegra.
The latest Caffe commit used in the video is: 317d162acbe420c4b2d1faa77b5c18a3841c444c
21 Responses
Appreciate the video, just purchased the TX2 a day ago and expect it soon. Before I do the install, I was looking for some references on installing caffe and came upon your blog. Does this installation include the installation of pycaffe? Thanks
This particular installation does not include pycaffe, though the source is all part of the BLVC Caffe git repository. Thanks for reading!
Does it work for TX1 ?
Yes. https://jetsonhacks.com/2016/09/18/caffe-deep-learning-framework-64-bit-nvidia-jetson-tx1/
Thanks for reading!
Does this installation for caffe with GPU support ?
Yes. Thanks for reading!
Thanks a lot!
I need to do just one extra step in the end:
sudo pip install protobuf
I could not understand why though, because installCaffe.sh already have it installed.
I am glad you got it to work.
I’m using a Caffe trained model on my jetsonTX2, when I use OpenCV(which is using only the CPU)for reading and forward passing the network I get 1.7 sec for one image (which is divided to 64 patches) approximately 26ms for each patch… but the problem is when I’m using Caffe(using GPU) for reading and forward passing the network the time increase to 2.1 ! which is 32ms for each patch …my conclusion is that I’m not using GPU at all ! here is part of my code would you please help to run my code on GPU?
///
void Network::useGPU(const bool _useGPU)
{
useGPU_ = _useGPU;
caffe::Caffe::set_mode(useGPU_ ? caffe::Caffe::GPU : caffe::Caffe::CPU);
}
const Result Network::classifier(
const cv::Mat& image,
const size_t imageSize
) throw (utility::Exception)
{
caffe::Blob* caffeInput;
caffe::Blob* caffeOutput;
cv::Mat blob;
cv::Mat caffeInputMatrix;
cv::Mat probabilities;
Class _class;
Result result;
caffe::Timer forwardTimer;
// Convert image to batch of images
blob = cv::dnn::blobFromImage(
image,
1.0f,
cv::Size(imageSize, imageSize),
cv::Scalar(meanPixels[0], meanPixels[1], meanPixels[2]),
false
);
if (useGPU_) {
// Run Caffe model using Caffe
caffeInput = caffeNetwork_->input_blobs()[0];
// Wrap Caffe’s input blob to cv::Mat
caffeInputMatrix = cv::Mat(
caffeInput->shape(),
CV_32F,
(char*) caffeInput->cpu_data()
);
blob.copyTo(caffeInputMatrix);
// forwardTimer.Start();
caffeOutput = caffeNetwork_->Forward()[0];
// std::cout << "Forward Time: " << forwardTimer.MilliSeconds() <shape(),
CV_32F,
(char*) caffeOutput->cpu_data()
);
} else {
network_.setInput(blob, “data”);
probabilities = network_.forward(“softmax”);
}
_class = getClass(probabilities);
result.label(labels[_class.first]);
result.probability(_class.second);
return result;
}
How are you determining if it is running on the GPU or not?
I check my CPU usage using CPUSTAT which is showing CPU is working the same when using GPU and CPU mode, and besides, I expect when running on GPU mode the time should decrease not increase 😐
I do not think those are correct assumptions. Because the CPU and GPU share the same memory on the Jetson and there can be different cache coherency issues, the only way to determine if the GPU is being utilized it to actually check GPU usage. It is possible that GPU code executes more slowly than CPU code under certain circumstances.
one more test I’ve done after your comment, I use only Caffe once setmode(GPU) and once setmode(CPU) the time in both conditions is exactly the same!
and by the way, how can I check my GPU usage on jetsonTX2? I try using GPUSTAT but I was not able to run it gives the following error:
Error on querying NVIDIA devices. Use –debug flag for details
and the debug result is :
Error on querying NVIDIA devices. Use –debug flag for details
Traceback (most recent call last):
File “/usr/local/lib/python2.7/dist-packages/gpustat/__main__.py”, line 16, in print_gpustat
gpu_stats = GPUStatCollection.new_query()
File “/usr/local/lib/python2.7/dist-packages/gpustat/core.py”, line 261, in new_query
N.nvmlInit()
File “/usr/local/lib/python2.7/dist-packages/pynvml.py”, line 747, in nvmlInit
_LoadNvmlLibrary()
File “/usr/local/lib/python2.7/dist-packages/pynvml.py”, line 785, in _LoadNvmlLibrary
_nvmlCheckReturn(NVML_ERROR_LIBRARY_NOT_FOUND)
File “/usr/local/lib/python2.7/dist-packages/pynvml.py”, line 405, in _nvmlCheckReturn
raise NVMLError(ret)
NVMLError_LibraryNotFound: NVML Shared Library Not Found
is it possible for you to give me a Caffe code that runs on GPU?which I could compare it with your table?
I am not familiar with GPUSTAT, but I believe that it is intended for desktop GPU environments. Typically people use tegrastats to examine GPU on the Jetson. Typically I use the gpuGraph utility: https://jetsonhacks.com/2018/05/29/gpu-activity-monitor-nvidia-jetson-tx-dev-kit/
If you followed the instructions in this tutorial, you have a version which has GPU enabled. You can check your CMAKE settings if in doubt.
thanks, I use GPU-graph and was able to monitor GPU usage, I trained a simple CNN and the GPU was working !!! but when I run my code the GPU is not working 😐 is there somthing wrong with my code?!
Did you try to run the test?
$ tools/caffe time –model=models/bvlc_alexnet/deploy.prototxt –gpu=0
Hello. I’ve installed CAFFE on Jetson TX2 according to the way you mentioned in this article, and also get the pycaffe interface. In terminal, I can import the CAFFE package normally .
But when I ran my python script(this script could be used in other computers), an exceptional :error raised (Check failed: status == CUDNN_STATUS_SUCCESS(4 vs. 0)CUDNN_STATUS_INTETRNAL_ERROR). Then I used ‘sudo’ command, but another mistake occurred, ImportError:no module named CAFFE .
I am trapped at this problems for several days, and sincerely wish someone could help me deal with this problem. Thank you.
To install Caffe with OpenCV 4+ download the zip (not the clone link) from here: https://github.com/BVLC/caffe/tree/7f503bd9a19758a173064e299ab9d4cac65ed60f
It includes fixes from this commit for OpenCV4: https://github.com/BVLC/caffe/pull/6625
Hello.. I got below error during installation.. Could you advise?
[ 5%] Building CXX object src/caffe/CMakeFiles/caffe.dir/layers/cudnn_relu_layer.cpp.o
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp: In member function ‘virtual void caffe::WindowDataLayer::load_batch(caffe::Batch*)’:
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:293:42: error: ‘CV_LOAD_IMAGE_COLOR’ was not declared in this scope
cv_img = cv::imread(image.first, CV_LOAD_IMAGE_COLOR);
^
In file included from /usr/local/include/opencv4/opencv2/core.hpp:60:0,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/local/include/opencv4/opencv2/core/persistence.hpp: In instantiation of ‘void cv::read(const cv::FileNode&, cv::Point_&, const cv::Point_&) [with _Tp = int]’:
/usr/local/include/opencv4/opencv2/core/persistence.hpp:778:34: required from here
/usr/local/include/opencv4/opencv2/core/persistence.hpp:722:11: error: ambiguous overload for ‘operator=’ (operand types are ‘cv::Point_’ and ‘const cv::Point_’)
value = temp.size() != 2 ? default_value : Point_(saturate_cast(temp[0]), saturate_cast(temp[1]));
^
In file included from /usr/local/include/opencv4/opencv2/core.hpp:58:0,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/local/include/opencv4/opencv2/core/types.hpp:1184:14: note: candidate: cv::Point_& cv::Point_::operator=(const cv::Point_&) [with _Tp = int]
Point_& Point_::operator = (const Point_& pt)
^
/usr/local/include/opencv4/opencv2/core/types.hpp:1191:14: note: candidate: cv::Point_& cv::Point_::operator=(cv::Point_) [with _Tp = int; cv::Point_ = cv::Point_]
Point_& Point_::operator = (Point_&& pt) CV_NOEXCEPT
^
src/caffe/CMakeFiles/caffe.dir/build.make:31083: recipe for target ‘src/caffe/CMakeFiles/caffe.dir/layers/image_data_layer.cpp.o’ failed
make[3]: *** [src/caffe/CMakeFiles/caffe.dir/layers/image_data_layer.cpp.o] Error 1
make[3]: *** Waiting for unfinished jobs….
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp: In instantiation of ‘void caffe::WindowDataLayer::load_batch(caffe::Batch*) [with Dtype = float]’:
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:472:1: required from here
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:291:16: error: ambiguous overload for ‘operator=’ (operand types are ‘cv::Mat’ and ‘cv::Mat’)
cv_img = DecodeDatumToCVMat(image_cached.second, true);
^
In file included from /usr/local/include/opencv4/opencv2/core/mat.hpp:3721:0,
from /usr/local/include/opencv4/opencv2/core.hpp:59,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/local/include/opencv4/opencv2/core/mat.inl.hpp:743:6: note: candidate: cv::Mat& cv::Mat::operator=(const cv::Mat&)
Mat& Mat::operator = (const Mat& m)
^
/usr/local/include/opencv4/opencv2/core/mat.inl.hpp:1405:6: note: candidate: cv::Mat& cv::Mat::operator=(cv::Mat)
Mat& Mat::operator = (Mat&& m)
^
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp: In instantiation of ‘void caffe::WindowDataLayer::load_batch(caffe::Batch*) [with Dtype = double]’:
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:472:1: required from here
/home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:291:16: error: ambiguous overload for ‘operator=’ (operand types are ‘cv::Mat’ and ‘cv::Mat’)
cv_img = DecodeDatumToCVMat(image_cached.second, true);
^
In file included from /usr/local/include/opencv4/opencv2/core/mat.hpp:3721:0,
from /usr/local/include/opencv4/opencv2/core.hpp:59,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/local/include/opencv4/opencv2/core/mat.inl.hpp:743:6: note: candidate: cv::Mat& cv::Mat::operator=(const cv::Mat&)
Mat& Mat::operator = (const Mat& m)
^
/usr/local/include/opencv4/opencv2/core/mat.inl.hpp:1405:6: note: candidate: cv::Mat& cv::Mat::operator=(cv::Mat)
Mat& Mat::operator = (Mat&& m)
^
In file included from /usr/local/include/opencv4/opencv2/core.hpp:58:0,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/local/include/opencv4/opencv2/core/types.hpp: In instantiation of ‘static _OI std::__copy_move::__copy_m(_II, _II, _OI) [with _II = const cv::KeyPoint*; _OI = cv::KeyPoint*]’:
/usr/include/c++/5/bits/stl_algobase.h:402:44: required from ‘_OI std::__copy_move_a(_II, _II, _OI) [with bool _IsMove = false; _II = const cv::KeyPoint*; _OI = cv::KeyPoint*]’
/usr/include/c++/5/bits/stl_algobase.h:438:45: required from ‘_OI std::__copy_move_a2(_II, _II, _OI) [with bool _IsMove = false; _II = __gnu_cxx::__normal_iterator<const cv::KeyPoint*, std::vector >; _OI = __gnu_cxx::__normal_iterator<cv::KeyPoint*, std::vector >]’
/usr/include/c++/5/bits/stl_algobase.h:471:8: required from ‘_OI std::copy(_II, _II, _OI) [with _II = __gnu_cxx::__normal_iterator<const cv::KeyPoint*, std::vector >; _OI = __gnu_cxx::__normal_iterator<cv::KeyPoint*, std::vector >]’
/usr/include/c++/5/bits/vector.tcc:206:31: required from ‘std::vector& std::vector::operator=(const std::vector&) [with _Tp = cv::KeyPoint; _Alloc = std::allocator]’
/usr/local/include/opencv4/opencv2/core/persistence.hpp:1180:13: required from here
/usr/local/include/opencv4/opencv2/core/types.hpp:711:27: error: ambiguous overload for ‘operator=’ (operand types are ‘cv::Point2f {aka cv::Point_}’ and ‘const Point2f {aka const cv::Point_}’)
class CV_EXPORTS_W_SIMPLE KeyPoint
^
In file included from /usr/local/include/opencv4/opencv2/core.hpp:58:0,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/local/include/opencv4/opencv2/core/types.hpp:170:13: note: candidate: cv::Point_& cv::Point_::operator=(const cv::Point_&) [with _Tp = float]
Point_& operator = (const Point_& pt);
^
/usr/local/include/opencv4/opencv2/core/types.hpp:171:13: note: candidate: cv::Point_& cv::Point_::operator=(cv::Point_) [with _Tp = float; cv::Point_ = cv::Point_]
Point_& operator = (Point_&& pt) CV_NOEXCEPT;
^
In file included from /usr/include/c++/5/algorithm:61:0,
from /usr/local/include/opencv4/opencv2/core/base.hpp:55,
from /usr/local/include/opencv4/opencv2/core.hpp:54,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/include/c++/5/bits/stl_algobase.h:340:18: note: synthesized method ‘cv::KeyPoint& cv::KeyPoint::operator=(const cv::KeyPoint&)’ first required here
*__result = *__first;
^
src/caffe/CMakeFiles/caffe.dir/build.make:31203: recipe for target ‘src/caffe/CMakeFiles/caffe.dir/layers/window_data_layer.cpp.o’ failed
make[3]: *** [src/caffe/CMakeFiles/caffe.dir/layers/window_data_layer.cpp.o] Error 1
CMakeFiles/Makefile2:304: recipe for target ‘src/caffe/CMakeFiles/caffe.dir/all’ failed
make[2]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs….
[ 5%] Linking CXX static library ../../lib/libgtest.a
[ 5%] Built target gtest
CMakeFiles/Makefile2:367: recipe for target ‘src/caffe/test/CMakeFiles/runtest.dir/rule’ failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:253: recipe for target ‘runtest’ failed
make: *** [runtest] Error 2
You appear to be trying to compile against OpenCV 4. The script and installer expects OpenCV 3.
Hello. I got a error msg during installation. Please advise.
Thank you in advance.
In file included from /usr/include/c++/5/algorithm:61:0,
from /usr/local/include/opencv4/opencv2/core/base.hpp:55,
from /usr/local/include/opencv4/opencv2/core.hpp:54,
from /usr/local/include/opencv4/opencv2/core/types_c.h:124,
from /usr/local/include/opencv4/opencv2/core/core_c.h:48,
from /usr/local/include/opencv4/opencv2/highgui/highgui_c.h:45,
from /home/nvidia/caffe/src/caffe/layers/window_data_layer.cpp:2:
/usr/include/c++/5/bits/stl_algobase.h:340:18: note: synthesized method ‘cv::KeyPoint& cv::KeyPoint::operator=(const cv::KeyPoint&)’ first required here
*__result = *__first;
^
src/caffe/CMakeFiles/caffe.dir/build.make:31203: recipe for target ‘src/caffe/CMakeFiles/caffe.dir/layers/window_data_layer.cpp.o’ failed
make[3]: *** [src/caffe/CMakeFiles/caffe.dir/layers/window_data_layer.cpp.o] Error 1
CMakeFiles/Makefile2:304: recipe for target ‘src/caffe/CMakeFiles/caffe.dir/all’ failed
make[2]: *** [src/caffe/CMakeFiles/caffe.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs….
[ 5%] Linking CXX static library ../../lib/libgtest.a
[ 5%] Built target gtest
CMakeFiles/Makefile2:367: recipe for target ‘src/caffe/test/CMakeFiles/runtest.dir/rule’ failed
make[1]: *** [src/caffe/test/CMakeFiles/runtest.dir/rule] Error 2
Makefile:253: recipe for target ‘runtest’ failed
make: *** [runtest] Error 2