Build TensorFlow on NVIDIA Jetson TX2 Development Kit

In this article, we will build and install TensorFlow v1.3.0 on the Jetson TX2 running L4T 28.1 from source. Looky here:

Background

TensorFlow is one of the major deep learning systems. Created at Google, it is an open-source software library for machine intelligence. The Jetson TX2 ships with TensorRT. TensorRT is what is called an “Inference Engine“, the idea being that large machine learning systems can train models which are then transferred over and “run” on the Jetson.

Note: We built TensorFlow back in April, 2017 for the Jetson TX2 running L4T 27.1 (JetPack 3.0). This article is the update to build TensorFlow for L4T 28.1 (JetPack 3.1).

Some people would like to use the entire TensorFlow system on a Jetson. In this article, we’ll go over the steps to build TensorFlow v1.3.0 on the Jetson TX2 from source. This should take about two hours to build.

Note: Please read through this article before starting installation. This is not a simple installation, you may want to tailor it to your needs.

Preparation

This article assumes that Jetson 3.1 is used to flash the Jetson TX2. At a minimum, install:

  • L4T 28.1 an Ubuntu 16.04 64-bit variant (aarch64)
  • CUDA 8.0
  • cuDNN 6.0.1

TensorFlow will use CUDA and cuDNN in this build.

It may be helpful to enable all of the CPU cores for the build:

$ sudo nvpmodel -m 0

There is a repository on the JetsonHacks account on Github named installTensorFlowTX2. Clone the repository and switch over to that directory.

$ git clone https://github.com/jetsonhacks/installTensorFlowTX2
$ cd installTensorFlowTX2
$ git checkout vL4T28.1TF1.3V3

Prerequisites

There is a convenience script which will install the required prerequisites such as Java and Bazel. The script also patches the source files appropriately for ARM 64.

Before installing TensorFlow, a swap file should be created (minimum of 8GB recommended). The Jetson TX2 does not have enough physical memory to compile TensorFlow. The swap file may be located on the internal eMMC, and may be removed after the build. Note that placing the swap file on a SATA drive if available will be faster.

If you do not already have one available on the Jetson, there is a convenience script for building a swap file. For example, to build a 8GB swapfile on the eMMC in the home directory:

$ ./createSwapfile.sh -d ~/ -s 8

After TensorFlow has finished building, the swap file is no longer needed and may be removed (See below).

Scripts in this repository will build TensorFlow with Python 2.7 support, and/or Python 3.5 support.

For Python 2.7

$ ./installPrerequisites.sh

From the video installation of the prerequisites takes a little over 30 minutes, but will depend on your internet connection speed.
First, clone the TensorFlow repository and patch for Arm 64 operation:

$ ./cloneTensorFlow.sh

then setup the TensorFlow environment variables. This is a semi-automated way to run the TensorFlow configure.sh file. You should look through this script and change it according to your needs. Note that most of the library locations are configured in this script. The library locations are determined by the JetPack installation.

$ ./setTensorFlowEV.sh

Continue to Building and Installation

For Python 3.5

$ ./installPrerequisitesPy3.sh

From the video installation of the prerequisites takes a little over 30 minutes, but will depend on your internet connection speed.
First, clone the TensorFlow repository and patch for Arm 64 operation:

$ ./cloneTensorFlow.sh

then setup the TensorFlow environment variables. This is a semi-automated way to run the TensorFlow configure.sh file. You should look through this script and change it according to your needs. Note that most of the library locations are configured in this script. The library locations are determined by the JetPack installation.

$ ./setTensorFlowEVPy3.sh

Build TensorFlow and Install

We’re now ready to build TensorFlow:

$ ./buildTensorFlow.sh

This should take less than two hours. After TensorFlow is finished building, we package it into a ‘wheel’ file:

$ ./packageTensorFlow.sh

The wheel file will be placed in the $HOME directory.

Install wheel file

For Python 2.X

$ pip install $HOME/wheel file

For Python 3.X

$ pip3 install $HOME/wheel file

Validation

You can go through the procedure on the TensorFlow installation page: Tensorflow: Validate your installation

Validate your TensorFlow installation by doing the following:

Start a Terminal.
Change directory (cd) to any directory on your system other than the tensorflow subdirectory from which you invoked the configure command.
Invoke python or python3 accordingly, for python 3.X for example:

$ python3

Enter the following short program inside the python interactive shell:

>>> import tensorflow as tf
>>> hello = tf.constant(‘Hello, TensorFlow!’)
>>> sess = tf.Session()
>>> print(sess.run(hello))

If the Python program outputs the following, then the installation is successful and you can begin writing TensorFlow programs.

Hello, TensorFlow!”

Conclusion

So there you have it. Building TensorFlow is quite a demanding task, but hopefully some of these scripts may make the job a little bit simpler.

Notes

  • The install in the video was performed directly after flashing the Jetson TX2 with JetPack 3.1
  • The install is lengthy, however it certainly should take much less than 4 hours once all the files are downloaded. If it takes that long, something is wrong.
  • TensorFlow 1.3.0 is installed
  • Github recently upgraded their operating system, and regenerated checksums for some of their archives. The TensorFlow project relies on those older checksums in some of their code, which can lead to dependency files not being downloaded. Here we use a patch to update the file workspace.bzl to ignore the old checksums that are applied after TensorFlow is git cloned, but there may be other instances of this issue as time goes on. Beware.

Removing the Swap File

If you created a swap file for this installation, you may wish to remove it after building TensorFlow. There are several ways to do this. First, if you did not AUTOMOUNT the swapfile, you may reboot the machine and then delete the file either through the Terminal or through the GUI.

If you wish to delete the swapfile without rebooting the machine, you should turn the swap off and then remove the swap file. For example, for the swapfile located in the home directory:

$ swapoff ~/swapfile
$ swapoff -a
$ rm ~/swapfile

If you used the AUTOMOUNT option, you will probably also need to edit the file /etc/fstab.

10 Comments

  1. Hi, I tried to follow your instructions in my TX2 with R28.1 and python 2.7.
    When I ran “./setTensorFlowEV.sh”, it shown following error. Please help!!

    nvidia@tegra-ubuntu:~/installTensorFlowTX2$ ./setTensorFlowEV.sh
    mkdir: cannot create directory ‘/usr/lib/aarch64-linux-gnu/include/’: File exists
    You have bazel 0.5.2- installed.
    Found possible Python library paths:
    /opt/ros/kinetic/lib/python2.7/dist-packages
    /usr/local/lib/python2.7/dist-packages
    /usr/lib/python2.7/dist-packages
    Please input the desired Python library path to use. Default is [/opt/ros/kinetic/lib/python2.7/dist-packages]
    /usr/local/lib/python2.7/dist-packages
    ./configure: line 669: /usr/local/cuda/extras/demo_suite/deviceQuery: No such file or directory
    Configuration finished
    nvidia@tegra-ubuntu:~/installTensorFlowTX2$ python –version
    Python 2.7.12
    nvidia@tegra-ubuntu:~/installTensorFlowTX2$

  2. I have jetson tx2 board , i installed tensorflow1.3-gpu ,the some step like the tutorial , it seem everything ok when i run the code, but the problem is the execution time.
    In fact , I tested tensorflow 0.8 with cuda 7.5 and cudnn 4 (Maxwell Quadro K620), execution time=0.075s when i test the same code with cuda 8 and cudnn 5 , execution time=0.75 s
    I tested the some code with tensorflow 1.3 , cuda 8 ,cudnn 6 ,jetpack 3.1 (jetson tx2) , execution time =0.6s when i try to get the timeline with chromium::/tracing i added tensorflow.python.client import timeline that use libcputi , execution time become =0.3s ,the code run on gpu , because when i run it in cpu execution time become 0.9 s.

    the code uses face detection with cnn.
    i don’t understand why the code is faster when i use tensorflow with cuda 7.5 and cudnn4 then cuda 8 and cudnn 5 or 6 . any help please?

    • thx for reply ,i tested the quadro K620 with two configuration cuda 7.5 (cudnn 4, execution time=0.075) and cuda 8 (cudnn 5 exec 0.75 10x lower ). i found that the first configuration has a lower execution time. then i tested the jetson tx2 , with cuda 8 , cudnn 6 and i found that execution time =O.3 ..0.5 . i wonder why with cuda 7.5 and cudnn is faster ?? i think i have error in my configuration

  3. when I run the ./buildTensorFlow.sh,something wrong out.
    WARNING: /home/nvidia/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target ‘//tensorflow/contrib/learn:learn’ depends on deprecated target ‘//tensorflow/contrib/session_bundle:exporter’: No longer supported. Switch to SavedModel immediately.
    WARNING: /home/nvidia/tensorflow/tensorflow/contrib/learn/BUILD:15:1: in py_library rule //tensorflow/contrib/learn:learn: target ‘//tensorflow/contrib/learn:learn’ depends on deprecated target ‘//tensorflow/contrib/session_bundle:gc’: No longer supported. Switch to SavedModel immediately.
    INFO: Found 1 target…
    ERROR: /home/nvidia/tensorflow/tensorflow/core/kernels/BUILD:2183:1: C++ compilation of rule ‘//tensorflow/core/kernels:svd_op’ failed: crosstool_wrapper_driver_is_not_gcc failed: error executing command
    (cd /home/nvidia/.cache/bazel/_bazel_nvidia/d2751a49dacf4cb14a513ec663770624/execroot/org_tensorflow && \
    exec env – \
    CUDA_TOOLKIT_PATH=/usr/local/cuda \
    CUDNN_INSTALL_PATH=/usr/lib/aarch64-linux-gnu \
    GCC_HOST_COMPILER_PATH=/usr/bin/gcc \
    LD_LIBRARY_PATH=/opt/ros/kinetic/lib:/usr/local/cuda-8.0/lib64: \
    PATH=/opt/ros/kinetic/bin:/usr/local/cuda-8.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin \
    PWD=/proc/self/cwd \
    PYTHON_BIN_PATH=/usr/bin/python3 \
    PYTHON_LIB_PATH=/usr/lib/python3/dist-packages \
    TF_CUDA_CLANG=0 \
    TF_CUDA_COMPUTE_CAPABILITIES=6.2 \
    TF_CUDA_VERSION=8.0 \
    TF_CUDNN_VERSION=6.0.21 \
    TF_NEED_CUDA=1 \
    TF_NEED_OPENCL=0 \
    external/local_config_cuda/crosstool/clang/bin/crosstool_wrapper_driver_is_not_gcc -U_FORTIFY_SOURCE ‘-D_FORTIFY_SOURCE=1’ -fstack-protector -fPIE -Wall -Wunused-but-set-parameter -Wno-free-nonheap-object -fno-omit-frame-pointer -g0 -O2 -DNDEBUG -ffunction-sections -fdata-sections ‘-std=c++11’ -MD -MF bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.d ‘-frandom-seed=bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.o’ -fPIC -DEIGEN_MPL2_ONLY -DSNAPPY -iquote . -iquote bazel-out/local_linux-py3-opt/genfiles -iquote external/bazel_tools -iquote bazel-out/local_linux-py3-opt/genfiles/external/bazel_tools -iquote external/protobuf -iquote bazel-out/local_linux-py3-opt/genfiles/external/protobuf -iquote external/eigen_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -iquote external/local_config_sycl -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_sycl -iquote external/gif_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/gif_archive -iquote external/jpeg -iquote bazel-out/local_linux-py3-opt/genfiles/external/jpeg -iquote external/com_googlesource_code_re2 -iquote bazel-out/local_linux-py3-opt/genfiles/external/com_googlesource_code_re2 -iquote external/farmhash_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive -iquote external/fft2d -iquote bazel-out/local_linux-py3-opt/genfiles/external/fft2d -iquote external/highwayhash -iquote bazel-out/local_linux-py3-opt/genfiles/external/highwayhash -iquote external/png_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/png_archive -iquote external/zlib_archive -iquote bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -iquote external/snappy -iquote bazel-out/local_linux-py3-opt/genfiles/external/snappy -iquote external/local_config_cuda -iquote bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda -isystem external/bazel_tools/tools/cpp/gcc3 -isystem external/protobuf/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/protobuf/src -isystem external/eigen_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/eigen_archive -isystem external/gif_archive/lib -isystem bazel-out/local_linux-py3-opt/genfiles/external/gif_archive/lib -isystem external/farmhash_archive/src -isystem bazel-out/local_linux-py3-opt/genfiles/external/farmhash_archive/src -isystem external/png_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/png_archive -isystem external/zlib_archive -isystem bazel-out/local_linux-py3-opt/genfiles/external/zlib_archive -isystem external/local_config_cuda/cuda -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda -isystem external/local_config_cuda/cuda/cuda/include -isystem bazel-out/local_linux-py3-opt/genfiles/external/local_config_cuda/cuda/cuda/include -DEIGEN_AVOID_STL_ARRAY -Iexternal/gemmlowp -Wno-sign-compare -fno-exceptions ‘-DGOOGLE_CUDA=1’ -pthread ‘-DGOOGLE_CUDA=1’ -no-canonical-prefixes -Wno-builtin-macro-redefined ‘-D__DATE__=”redacted”‘ ‘-D__TIMESTAMP__=”redacted”‘ ‘-D__TIME__=”redacted”‘ -fno-canonical-system-headers -c tensorflow/core/kernels/svd_op_complex64.cc -o bazel-out/local_linux-py3-opt/bin/tensorflow/core/kernels/_objs/svd_op/tensorflow/core/kernels/svd_op_complex64.pic.o): com.google.devtools.build.lib.shell.BadExitStatusException: Process exited with status 1.
    In file included from external/eigen_archive/Eigen/Core:498:0,
    from external/eigen_archive/Eigen/QR:11,
    from external/eigen_archive/Eigen/SVD:11,
    from ./third_party/eigen3/Eigen/SVD:1,
    from ./tensorflow/core/kernels/svd_op_impl.h:23,
    from tensorflow/core/kernels/svd_op_complex64.cc:16:
    external/eigen_archive/Eigen/src/Core/products/GeneralMatrixVector.h: In static member function ‘static void Eigen::internal::general_matrix_vector_product::run(Index, Index, const LhsMapper&, const RhsMapper&, Eigen::internal::general_matrix_vector_product::ResScalar*, Index, RhsScalar) [with Index = long int; LhsScalar = std::complex; LhsMapper = Eigen::internal::const_blas_data_mapper<std::complex, long int, 0>; bool ConjugateLhs = false; RhsScalar = std::complex; RhsMapper = Eigen::internal::const_blas_data_mapper<std::complex, long int, 1>; bool ConjugateRhs = false; int Version = 0; Eigen::internal::general_matrix_vector_product::ResScalar = std::complex]’:
    external/eigen_archive/Eigen/src/Core/products/GeneralMatrixVector.h:156:49: internal compiler error: Segmentation fault
    RhsPacket b0 = pset1(rhs(j,0));
    ^
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See for instructions.
    Target //tensorflow/tools/pip_package:build_pip_package failed to build
    INFO: Elapsed time: 377.503s, Critical Path: 259.42s

    • I don’t know if you fixed this issue. I initially got the same (or very similar) error about some crosstool_wrapper_driver_is_not_gcc .
      Then realized the following:
      1) Directory for CUDA 8.0 was /usr/local/cuda-8.0 and not /usr/local/cuda as initialized in the default configuration files above.
      2) my default cuDNN files were not in the cuda directories.

      I did a bazel clean, pointed the configuration files to the correct cuda path, put the cuDNN files where they need to be (follow directions from cudnn installation on nvidia) and rebuilt the whl file for python3 successfully.

      Thank you everyone at jetsonhacks for making this a smooth experience!

Leave a Reply

Your email address will not be published.


*