HomeDeep LearningDeep Learning Machine Setup: Ubuntu17.04+Nvidia GTX 1080+CUDA 9.0+cuDNN 7.0+TensorFlow 1.3
Deep Learning Specialization on Coursera

Last year, I got a deep learning machine with GTX 1080 and write an article about the Deep Learning Environment configuration: Dive Into TensorFlow, Part III: GTX 1080+Ubuntu16.04+CUDA8.0+cuDNN5.0+TensorFlow. Recently, I met some problem with the deep learning server and reinstall it with Ubuntu 17.04, CUDA9, cuDNN7 and Tensorflow1.3. Following is the step-to-step record, just for reference.

1. Install the Nvidia Driver

First install the Nvidia GTX 1080 driver, I follow the article How to install Nvidia Drivers on Ubuntu 17.04 & below, Linux Mint and install 381.09 driver:

sudo apt-get purge nvidia*
sudo add-apt-repository ppa:graphics-drivers/ppa
sudo apt-get update && sudo apt-get install nvidia-381 nvidia-settings

I’m not sure this is necessary, cause when you install CUDA, a new driver version is installed again.


On the official Nvidia CUDA website, you can login to download the CUDA9.0: CUDA Toolkit 9.0 Release Candidate Downloads

After downloading the deb file, you can install CUDA9 by the official guide:

sudo dpkg -i cuda-repo-ubuntu1704-9-0-local-rc_9.0.103-1_amd64.deb
sudo apt-key add /var/cuda-repo-9-0-local-rc/7fa2af80.pub
sudo apt-get update
sudo apt-get install cuda

The NVIDIA 384.69 Driver was installed with this cuda9 apt-get installation, you can restart the deep learning computer to test it by ‘nvidia-smi’ command:

Let’s test some CUDA samples:

cp -r /usr/local/cuda-9.0/samples/ .
cd samples/

Testing some cuda cases:

textminer@textminer:~/cuda_sample/samples/3_Imaging/convolutionSeparable$ ./convolutionSeparable

[./convolutionSeparable] - Starting...
GPU Device 0: "GeForce GTX 1080" with compute capability 6.1

Image Width x Height = 3072 x 3072

Allocating and initializing host arrays...
Allocating and initializing CUDA arrays...
Running GPU convolution (16 identical iterations)...

convolutionSeparable, Throughput = 14344.9500 MPixels/sec, Time = 0.00066 s, Size = 9437184 Pixels, NumDevsUsed = 1, Workgroup = 0

Reading back GPU results...

Checking the results...
...running convolutionRowCPU()
...running convolutionColumnCPU()
...comparing the results
...Relative L2 norm: 0.000000E+00

Shutting down...
Test passed

textminer@textminer:~/cuda_sample/samples/4_Finance/MonteCarloMultiGPU$ ./MonteCarloMultiGPU
./MonteCarloMultiGPU Starting...

Using single CPU thread for multiple GPUs
Parallelization method = streamed
Problem scaling = weak
Number of GPUs = 1
Total number of options = 8192
Number of paths = 262144
main(): generating input data...
main(): starting 1 host threads...
main(): GPU statistics, streamed
GPU Device #0: GeForce GTX 1080
Options : 8192
Simulation paths: 262144

Total time (ms.): 26.597000
Note: This is elapsed time for all to compute.
Options per sec.: 308004.660766
main(): comparing Monte Carlo and Black-Scholes results...
Shutting down...
Test Summary...
L1 norm : 4.825160E-04
Average reserve: 11.741779

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Test passed

Finally set the CUDA Path in ~/.bashrc:

export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
export CUDA_HOME=/usr/local/cuda

And run ‘source ~/.bashrc’ to make it available.

3. Install cuDNN

Install cuDNN is very simple, still need to download it in the official NVIDIA website: https://developer.nvidia.com/cudnn, about cuDNN7:

What’s New in cuDNN 7?
Deep learning frameworks using cuDNN 7 can leverage new features and performance of the Volta architecture to deliver up to 3x faster training performance compared to Pascal GPUs. cuDNN 7 is now available as a free download to the members of the NVIDIA Developer Program. Highlights include:

Up to 2.5x faster training of ResNet50 and 3x faster training of NMT language translation LSTM RNNs on Tesla V100 vs. Tesla P100
Accelerated convolutions using mixed-precision Tensor Cores operations on Volta GPUs
Grouped Convolutions for models such as ResNeXt and Xception and CTC (Connectionist Temporal Classification) loss layer for temporal classification

I selected this version: cuDNN v7.0 (August 3, 2017), for CUDA 9.0 RC — cuDNN v7.0 Library for Linux

After cuDNN7 download, just extract it and copy related files into CUDA install directory:

tar -zxvf cudnn-9.0-linux-x64-v7.tgz
sudo cp cuda/include/cudnn.h /usr/local/cuda/include/
sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64/
sudo chmod a+r /usr/local/cuda/include/cudnn.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*

4. Install TensorFlow1.3

Before install tensorflow, first install libcupti-dev based on the TensorFlow official install document:

The libcupti-dev library, which is the NVIDIA CUDA Profile Tools Interface. This library provides advanced profiling support. To install this library, issue the following command:

$ sudo apt-get install libcupti-dev

Then install Tensorflow 1.3 GUP version by virtualenv, note this is install under Python2.7:

sudo apt-get install python-pip python-dev python-virtualenv
virtualenv --system-site-packages tensorflow1.3
source tensorflow1.3/bin/activate
(tensorflow1.3) textminer@textminer:~/tensorflow/tensorflow1.3$ pip install --upgrade tensorflow-gpu

Collecting tensorflow-gpu
Installing collected packages: backports.weakref, protobuf, funcsigs, pbr, mock, numpy, markdown, html5lib, bleach, werkzeug, tensorflow-tensorboard, tensorflow-gpu
Successfully installed backports.weakref-1.0rc1 bleach-1.5.0 funcsigs-1.0.2 html5lib-0.9999999 markdown-2.6.9 mock-2.0.0 numpy-1.13.1 pbr-3.1.1 protobuf-3.4.0 tensorflow-gpu-1.3.0 tensorflow-tensorboard-0.1.5 werkzeug-0.12.2

If everything is ok, this is always my first choice to install tensorflow-gpu version:

(tensorflow1.3) textminer@textminer:~/tensorflow/tensorflow1.3$ python
Python 2.7.13 (default, Jan 19 2017, 14:48:08)
[GCC 6.3.0 20170118] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import tensorflow as tf

But I met an error:

File “/home/textminer/tensorflow/tensorflow1.3/local/lib/python2.7/site-packages/tensorflow/python/pywrap_tensorflow_internal.py”, line 24, in swig_import_helper
_mod = imp.load_module(‘_pywrap_tensorflow_internal’, fp, pathname, description)
ImportError: libcusolver.so.8.0: cannot open shared object file: No such file or directory

Failed to load the native TensorFlow runtime.

See https://www.tensorflow.org/install/install_sources#common_installation_problems

I found a libcusolver.so.9.0 under /usr/local/cuda/lib64, but no libcursolver.so.8.0, and after google some related questions, I thought the reason is that the official Tensorflow is not support cuda9 now, and just cuda8, so the official pip tensorflow version found the libcursolver.so.8.0, not 9.0 version.

There is always a way out, although the related material about the CUDA9, cuDNN7 ann tensorflow are fewer, but still found two github issue by google: Upgrade to CuDNN 7 and CUDA 9 CUDA 9RC + cuDNN7 。The first is a request for official TensorFlow version to support CUDA9 and cuDNN7: Please upgrade TensorFlow to support CUDA 9 and CuDNN 7. Nvidia claims this will provide a 2x performance boost on Pascal GPUs. The second issue give a unofficial tutorial to install TensorFlow with CUDA9 and cuDNN7: This is unofficial and very not supported patch to make it possible to compile TensorFlow with CUDA9RC and cuDNN 7 or CUDA8 + cuDNN 7.

5. Install TensorFlow via Source

The github issue is about ten days ago, and if you follow it step-by-step, there is almost no problem to install a TensorFlow with cuda9 and cudnn7:

git clone https://github.com/tensorflow/tensorflow.git
wget https://storage.googleapis.com/tf-performance/public/cuda9rc_patch/0001-CUDA-9.0-and-cuDNN-7.0-support.patch
wget https://storage.googleapis.com/tf-performance/public/cuda9rc_patch/eigen.f3a22f35b044.cuda9.diff
cd tensorflow/
git status
git checkout db596594b5653b43fcb558a4753b39904bb62cbd~
git apply ../0001-CUDA-9.0-and-cuDNN-7.0-support.patch
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

But I met a problem with bazel after configure:

ERROR: Skipping ‘//tensorflow/tools/pip_package:build_pip_package’: error loading package ‘tensorflow/tools/pip_package’: Encountered error while reading extension file ‘cuda/build_defs.bzl’: no such package ‘@local_config_cuda//cuda

After google, I found it’s couase that I use a newest bazel version: bazel_0.5.4, revert the bazel version to 0.5.2 to solve the problem. Following is the configure process, just for reference:

Please specify the location of python. [Default is /usr/bin/python]:
Found possible Python library paths:
Please input the desired Python library path to use. Default is /usr/local/lib/python2.7/dist-packages
Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]: Y
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [y/N]: N
No Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [y/N]: N
No Hadoop File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]:
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]:
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL support? [y/N]:
No OpenCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]: 9.0
Please specify the location where CUDA 9.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
"Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7
Please specify the location where cuDNN 7 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:
Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 6.1]
Do you want to use clang as CUDA compiler? [y/N]: N
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:
Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:
Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished

Even there are no problem about bazel and configure, you will still meet a problem after bazel build:

bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

But as the issue described, it has given and Eigen patch:

Attempt to build TensorFlow, so that Eigen is downloaded. This build will fail if building for CUDA9RC but will succeed for CUDA8
bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

Apply the Eigen patch:

cd -P bazel-out/../../../external/eigen_archive
patch -p1 < ~/Downloads/eigen.f3a22f35b044.cuda9.diff Build TensorFlow successfully cd - bazel build --config=opt --config=cuda //tensorflow/tools/pip_package:build_pip_package

This time we can build TensorFlow successfully, and finally we can make a tensorflow-gpu pip file:

bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
ls /tmp/tensorflow_pkg/
sudo pip install /tmp/tensorflow_pkg/tensorflow-1.3.0rc1-cp27-cp27mu-linux_x86_64.whl

We can test TensorFlow in iPython, following is from Andrew Ng's Deep Learning Course example:

Python 2.7.13 (default, Jan 19 2017, 14:48:08) 
Type "copyright", "credits" or "license" for more information.
IPython 5.1.0 -- An enhanced Interactive Python.
?         -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help      -> Python's own help system.
object?   -> Details about 'object', use 'object??' for extra details.
In [1]: import numpy as np
In [2]: import tensorflow as tf
In [3]: coefficients = np.array([[1.], [-10.], [25]])
In [4]: w = tf.Variable(0, dtype=tf.float32)
In [5]: x = tf.placeholder(tf.float32, [3, 1])
In [6]: cost = x[0][0]*w**2 + x[1][0]*w + x[2][0]
In [7]: train = tf.train.GradientDescentOptimizer(0.01).minimize(cost)
In [8]: init = tf.global_variables_initializer()
In [9]: session = tf.Session()
2017-09-04 13:27:34.813855: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties: 
name: GeForce GTX 1080
major: 6 minor: 1 memoryClockRate (GHz) 1.835
pciBusID 0000:01:00.0
Total memory: 7.92GiB
Free memory: 235.69MiB
2017-09-04 13:27:34.813887: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 
2017-09-04 13:27:34.813892: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y 
2017-09-04 13:27:34.813899: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1080, pci bus id: 0000:01:00.0)
In [10]: session.run(init)
In [11]: print(session.run(w))
In [12]: for i in range(1000):
    ...:     session.run(train, feed_dict={x:coefficients})
In [13]: print(session.run(w))

Just enjoy it.

Reference: 深度学习服务器环境配置: Ubuntu17.04+Nvidia GTX 1080+CUDA 9.0+cuDNN 7.0+TensorFlow 1.3

Posted by TextMiner

Deep Learning Specialization on Coursera


Deep Learning Machine Setup: Ubuntu17.04+Nvidia GTX 1080+CUDA 9.0+cuDNN 7.0+TensorFlow 1.3 — 4 Comments

  1. I’m considering moving to this from Ubuntu 16.04 / CUDA 8.0 / CuDNN 6.0 (on a 1080 Ti + Skylake CPU). Do you have any feel for the performance benefit I’d get?

Leave a Reply

Your email address will not be published.