2017/12/20

在 NAS - Ubuntu 16.04 上使用 TensorFlow 做即時影像偵測及分類

近年來 TensorFlow and CNN 這類變態方法的出現,現今做影像偵測及分類比起以前傳統方法已容易許多(也更無腦?),且 GPU 加速佔了舉足輕重的角色,像文中使用的環境 CPU 及 GPU 運算 FPS 可相差了 7 倍之多,除此之外,NAS 由於儲存空間特大,很適合存放訓練資料,若能進一步直接在上面做運算,可省下不少時間及金錢。

本文將在 QNAP NAS TS-1685 在 Linux Station - Ubuntu 16.04 配上 NVIDIA GeForce GTX 1060 上,使用 darkflow 加上已訓練好的 yolo model 做即時影像偵測,成果如下,它會將 webcam 資料讀進來分析,影片中,會依序出現,人、剪刀、搖控器,三樣東西一起出現。



開工

進入 Ubuntu shell

個人習慣是全都用終端機,不用任何圖型化界面,但要從 NAS 進入 Ubuntu 終端機比較麻煩點,ssh 進入 NAS 後,可用下列指令

[~] # lxc-attach -n ubuntu_1604 -P `getcfg ubuntu-hd Install_Path -f /etc/config/qpkg.conf`/lxc -- sudo -u admin -i
To run a command as administrator (user "root"), use "sudo ".
See "man sudo_root" for details.

admin@ubuntu_1604:~$

便能看到 Ubuntu 16.04 用 admin 登入的 prompt。接下來都將用 $ 取代原本的 prompt。

更新系統
$ sudo apt update
$ sudo apt upgrade

安裝 Python 環境

$ sudo apt install -y git
$ git clone https://github.com/pyenv/pyenv.git ~/.pyenv
$ echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
$ echo 'export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
$ echo 'eval "$(pyenv init -)"' >> ~/.bashrc

重新進入 shell

$ git clone https://github.com/pyenv/pyenv-virtualenv.git $(pyenv root)/plugins/pyenv-virtualenv
$ echo 'eval "$(pyenv virtualenv-init -)"' >> ~/.bashrc

重新進入 shell

接著安裝 Python 3.6.3 及預設使用 virtualenv darkflow
$ sudo apt-get install -y libbz2-dev libreadline-dev libssl-dev libsqlite3-dev
$ pyenv install 3.6.3
$ pyenv shell 3.6.3
$ pyenv virtualenv darkflow
$ pyenv global darkflow

重新進入 shell 後,便能看到以下這樣的 prompt。

(darkflow) admin@ubuntu_1604:~$

安裝 darkflow

$ git clone https://github.com/thtrieu/darkflow
$ cd darkflow
$ pip install Cython numpy opencv-python tensorflow
$ pip install -e .

編譯 opencv

正常 Python 環境下是不用重編 opencv,但是我們用了特殊版本的 Python,Python 在讀取 cv2.so 會異常,而且沒有明顯的錯誤訊息,這步做就對了,不要問太多。

$ mkdir ~/opencv
$ curl https://github.com/opencv/opencv/archive/3.3.1.zip > ~/opencv/3.3.1.zip
$ cd ~/opencv/
$ unzip 3.3.1.zip
$ cd opencv*
$ mkdir build
$ cd build
$ sudo apt install -y cmake libgtk2.0-dev
$ export PREFIX_MAIN=`pyenv virtualenv-prefix` PREFIX=`pyenv prefix`
$ cmake -D CMAKE_BUILD_TYPE=RELEASE \
-D CMAKE_INSTALL_PREFIX="$PREFIX" \
-D PYTHON3_LIBRARY="$PREFIX_MAIN"/lib/libpython*m.a \
-D PYTHON3_INCLUDE_DIRS="$PREFIX_MAIN"/include/python*m \
-D PYTHON3_EXECUTABLE="$PREFIX"/bin/python3 \
-D PYTHON3_PACKAGES_PATH="$PREFIX"/lib/python*/site-packages/ \
-D PYTHON3_NUMPY_INCLUDE_DIRS="$PREFIX"/lib/python3.6/site-packages/numpy/core/include \
-D BUILD_opencv_python3=ON \
..
$ make -j`grep processor /proc/cpuinfo| wc -l`
$ make install
$ cp lib/python3/cv2.cpython-*-linux-gnu.so $PREFIX/lib/python*/site-packages/cv2/

使用 CPU 測試

下載訓練好的資料

$ cd ~/darkflow
$ mkdir bin
$ curl https://pjreddie.com/media/files/yolo.weights > ~/darkflow/bin/yolo.weights

測試
$ flow --model cfg/yolo.cfg --load bin/yolo.weights --demo camera --saveVideo

編譯 TensorFlow

重新編譯 TensorFlow 是為了支援此 NAS 支援的 CPU 指令集 (SSE4) 及 GPU。首先需先
確定 QTS 已安裝 NVIDIA Driver


進入 Ubuntu shell 安裝跟 NAS 一樣的 NVIDIA driver
$ sudo add-apt-repository ppa:graphics-drivers/ppa
$ sudo apt update
$ sudo apt install -y nvidia-381

安裝 CUDA

需先下載安裝檔,參考此網址

安裝過程中,務必選擇不要安裝 NVIDIA driver,整個流程如下:
$ sudo sh cuda_*_linux.run

>>>
Do you accept the previously read EULA?
accept/decline/quit: accept

Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 375.26?
(y)es/(n)o/(q)uit: n

Install the CUDA 8.0 Toolkit?
(y)es/(n)o/(q)uit: y

Enter Toolkit Location
[ default is /usr/local/cuda-8.0 ]:

Do you want to install a symbolic link at /usr/local/cuda?
(y)es/(n)o/(q)uit: y

Install the CUDA 8.0 Samples?
(y)es/(n)o/(q)uit: y

Enter CUDA Samples Location
[ default is /home/admin ]:
將執行檔放入 PATH
echo 'export PATH=/usr/local/cuda-8.0/bin:$PATH' >> ~/.bashrc

將執行檔放入 PATH

$ echo 'export PATH=/usr/local/cuda-8.0/bin:$PATH' >> ~/.bashrc

安裝 cuDNN

需先下載 cudnn (要註冊) https://developer.nvidia.com/rdp/cudnn-download

$ sudo apt install -y libcupti-dev openjdk-8-jdk
$ mkdir cudnn
$ tar xvf cudnn*.tgz -C cudnn/
$ cd cudnn
$ sudo cp cuda/include/cudnn.h /usr/local/cuda/include
$ sudo cp cuda/lib64/libcudnn* /usr/local/cuda/lib64
$ sudo chmod a+r /usr/local/cuda/include/cudnn.h /usr/local/cuda/lib64/libcudnn*
$ echo /usr/local/cuda-8.0/lib64/ | sudo tee /etc/ld.so.conf.d/cudnn.conf
$ sudo ldconfig

安裝 bazel

$ echo "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.list
$ curl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add -
$ sudo apt-get update && sudo apt-get install bazel

下載 TensorFlow

$ git clone https://github.com/tensorflow/tensorflow
$ cd tensorflow
$ git checkout -b r1.4 origin/r1.4

設定 TensorFlow 編譯參數

很多 Y/N 要填,特別要注意 CUDA 要寫 Y,預設是 N。

$ ./configure

Extracting Bazel installation...
You have bazel 0.7.0 installed.
Please specify the location of python. [Default is /home/admin/.pyenv/versions/darkflow/bin/python]:


Found possible Python library paths:
 /home/admin/.pyenv/versions/darkflow/lib/python3.6/site-packages
Please input the desired Python library path to use.  Default is [/home/admin/.pyenv/versions/darkflow/lib/python3.6/site-packages]

Do you wish to build TensorFlow with jemalloc as malloc support? [Y/n]:
jemalloc as malloc support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Google Cloud Platform support? [Y/n]:
Google Cloud Platform support will be enabled for TensorFlow.

Do you wish to build TensorFlow with Hadoop File System support? [Y/n]:
Hadoop File System support will be enabled for TensorFlow.


Do you wish to build TensorFlow with Amazon S3 File System support? [Y/n]: n
No Amazon S3 File System support will be enabled for TensorFlow.

Do you wish to build TensorFlow with XLA JIT support? [y/N]: n
No XLA JIT support will be enabled for TensorFlow.

Do you wish to build TensorFlow with GDR support? [y/N]: n
No GDR support will be enabled for TensorFlow.

Do you wish to build TensorFlow with VERBS support? [y/N]: n
No VERBS support will be enabled for TensorFlow.

Do you wish to build TensorFlow with OpenCL SYCL support? [y/N]: n
No OpenCL SYCL support will be enabled for TensorFlow.

Do you wish to build TensorFlow with CUDA support? [y/N]: y
CUDA support will be enabled for TensorFlow.

Please specify the CUDA SDK version you want to use, e.g. 7.0. [Leave empty to default to CUDA 8.0]:


Please specify the location where CUDA 8.0 toolkit is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify the cuDNN version you want to use. [Leave empty to default to cuDNN 6.0]: 7.0.4


Please specify the location where cuDNN 7.0.4 library is installed. Refer to README.md for more details. [Default is /usr/local/cuda]:


Please specify a list of comma-separated Cuda compute capabilities you want to build with.
You can find the compute capability of your device at: https://developer.nvidia.com/cuda-gpus.
Please note that each additional compute capability significantly increases your build time and binary size. [Default is: 3.5,5.2]


Do you want to use clang as CUDA compiler? [y/N]:
nvcc will be used as CUDA compiler.

Please specify which gcc should be used by nvcc as the host compiler. [Default is /usr/bin/gcc]:


Do you wish to build TensorFlow with MPI support? [y/N]:
No MPI support will be enabled for TensorFlow.

Please specify optimization flags to use during compilation when bazel option "--config=opt" is specified [Default is -march=native]:


Add "--config=mkl" to your bazel command to build with MKL support.
Please note that MKL on MacOS or windows is still not supported.
If you would like to use a local MKL instead of downloading, please set the environment variable "TF_MKL_ROOT" every time before build.
Configuration finished

編譯 TensorFlow

要編很久...
$ bazel build --config=cuda -c opt --copt=-mavx --copt=-mavx2 --copt=-mfma --copt=-mfpmath=both --copt=-msse4.2 //tensorflow/tools/pip_package:build_pip_package

編譯及安裝 Python module

$ ./bazel-bin/tensorflow/tools/pip_package/build_pip_package /tmp/tensorflow_pkg
$ pip uninstall tensorflow
$ pip install /tmp/tensorflow_pkg/tensorflow-1.4.*-linux_x86_64.whl

使用 GPU 測試

$ flow --model cfg/yolo.cfg --load bin/yolo.weights --demo camera --saveVideo --gpu 0.5 --threshold 0.5

可參考本文開頭影片。

附註:在第一次測試時,發生 CUDA init 失敗,仔細看是 Container 內沒有 /dev/nvidia-uvm,但 Host 有,之後重開 Ubuntu 就好了。

若是自己玩,可以調整一下上述 gpu and threshold 參數,若發生 GPU OOM(out of memory) 可把 GPU 參數調低或重開 Ubuntu,若辨識到太不準,可調整 threshold,其他還有一些參數可調整的,請參考官網。

後記

每次都是半夜整理這些東西,很累很累...