国产一区二区播放,日本一区二区视频

【深度學習系列】用PaddlePaddle和Tensorflow進行圖像分類

2018.01.09

作者：Charlotte77 數(shù)學系的數(shù)據(jù)挖掘民工
博客專欄：http://www.cnblogs.com/charlotte77/
個人公眾號：Charlotte數(shù)據(jù)挖掘（ID：CharlotteDataMining）

往期閱讀：

【深度學習系列】PaddlePaddle之手寫數(shù)字識別

【深度學習系列】卷積神經(jīng)網(wǎng)絡CNN原理詳解(一)——基本原理

【深度學習系列】PaddlePaddle之數(shù)據(jù)預處理

【深度學習系列】卷積神經(jīng)網(wǎng)絡詳解(二)——自己手寫一個卷積神經(jīng)網(wǎng)絡

上個月發(fā)布了四篇文章，主要講了深度學習中的“hello world”----mnist圖像識別，以及卷積神經(jīng)網(wǎng)絡的原理詳解，包括基本原理、自己手寫CNN和paddlepaddle的源碼解析。這篇主要跟大家講講如何用PaddlePaddle和Tensorflow做圖像分類。所有程序都在我的github里，可以自行下載訓練。

　　在卷積神經(jīng)網(wǎng)絡中，有五大經(jīng)典模型，分別是：LeNet-5,AlexNet,GoogleNet,Vgg和ResNet。本文首先自己設計一個小型CNN網(wǎng)絡結構來對圖像進行分類，再了解一下LeNet-5網(wǎng)絡結構對圖像做分類，并用比較流行的Tensorflow框架和百度的PaddlePaddle實現(xiàn)LeNet-5網(wǎng)絡結構，并對結果對比。

什么是圖像分類

　　圖像分類是根據(jù)圖像的語義信息將不同類別圖像區(qū)分開來，是計算機視覺中重要的基本問題，也是圖像檢測、圖像分割、物體跟蹤、行為分析等其他高層視覺任務的基礎。圖像分類在很多領域有廣泛應用，包括安防領域的人臉識別和智能視頻分析等，交通領域的交通場景識別，互聯(lián)網(wǎng)領域基于內(nèi)容的圖像檢索和相冊自動歸類，醫(yī)學領域的圖像識別等(引用自官網(wǎng))

cifar-10數(shù)據(jù)集

　　CIFAR-10分類問題是機器學習領域的一個通用基準，由60000張32*32的RGB彩色圖片構成，共10個分類。50000張用于訓練集，10000張用于測試集。其問題是將32X32像素的RGB圖像分類成10種類別:飛機，手機，鳥，貓，鹿，狗，青蛙，馬，船和卡車。更多信息可以參考CIFAR-10和Alex Krizhevsky的演講報告。常見的還有cifar-100，分類物體達到100類，以及ILSVRC比賽的100類。

自己設計CNN

　了解CNN的基本網(wǎng)絡結構后，首先自己設計一個簡單的CNN網(wǎng)絡結構對cifar-10數(shù)據(jù)進行分類。

　　網(wǎng)絡結構

　代碼實現(xiàn)

　　1.網(wǎng)絡結構：simple_cnn.py

#coding:utf-8
'''
Created by huxiaoman 2017.11.27
simple_cnn.py:自己設計的一個簡單的cnn網(wǎng)絡結構
'''

import os
from PIL import Image
import numpy as np
import paddle.v2 as paddle
from paddle.trainer_config_helpers import *

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def simple_cnn(img):
conv_pool_1 = paddle.networks.simple_img_conv_pool(
input=img,
filter_size=5,
num_filters=20,
num_channel=3,
pool_size=2,
pool_stride=2,
act=paddle.activation.Relu())
conv_pool_2 = paddle.networks.simple_img_conv_pool(
input=conv_pool_1,
filter_size=5,
num_filters=50,
num_channel=20,
pool_size=2,
pool_stride=2,
act=paddle.activation.Relu())
fc = paddle.layer.fc(
input=conv_pool_2, size=512, act=paddle.activation.Softmax())

　2.訓練程序：train_simple_cnn.py

#coding:utf-8
'''
Created by huxiaoman 2017.11.27
train_simple—_cnn.py:訓練simple_cnn對cifar10數(shù)據(jù)集進行分類
'''
import sys, os

import paddle.v2 as paddle
from simple_cnn import simple_cnn

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def main():
datadim = 3 * 32 * 32
classdim = 10

# PaddlePaddle init
paddle.init(use_gpu=with_gpu, trainer_count=7)

image = paddle.layer.data(
name='image', type=paddle.data_type.dense_vector(datadim))

# Add neural network config
# option 1. resnet
# net = resnet_cifar10(image, depth=32)
# option 2. vgg
net = simple_cnn(image)

out = paddle.layer.fc(
input=net, size=classdim, act=paddle.activation.Softmax())

lbl = paddle.layer.data(
name='label', type=paddle.data_type.integer_value(classdim))
cost = paddle.layer.classification_cost(input=out, label=lbl)

# Create parameters
parameters = paddle.parameters.create(cost)

# Create optimizer
momentum_optimizer = paddle.optimizer.Momentum(
momentum=0.9,
regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
learning_rate=0.1 / 128.0,
learning_rate_decay_a=0.1,
learning_rate_decay_b=50000 * 100,
learning_rate_schedule='discexp')

# End batch and end pass event handler
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0:
print '\nPass %d, Batch %d, Cost %f, %s' % (
event.pass_id, event.batch_id, event.cost, event.metrics)
else:
sys.stdout.write('.')
sys.stdout.flush()
if isinstance(event, paddle.event.EndPass):
# save parameters
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
parameters.to_tar(f)

result = trainer.test(
reader=paddle.batch(
paddle.dataset.cifar.test10(), batch_size=128),
feeding={'image': 0,
'label': 1})
print '\nTest with Pass %d, %s' % (event.pass_id, result.metrics)

# Create trainer
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=momentum_optimizer)

# Save the inference topology to protobuf.
inference_topology = paddle.topology.Topology(layers=out)
with open('inference_topology.pkl', 'wb') as f:
inference_topology.serialize_for_inference(f)

trainer.train(
reader=paddle.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(), buf_size=50000),
batch_size=128),
num_passes=200,
event_handler=event_handler,
feeding={'image': 0,
'label': 1})

# inference
from PIL import Image
import numpy as np
import os

def load_image(file):
im = Image.open(file)
im = im.resize((32, 32), Image.ANTIALIAS)
im = np.array(im).astype(np.float32)
# The storage order of the loaded image is W(widht),
# H(height), C(channel). PaddlePaddle requires
# the CHW order, so transpose them.
im = im.transpose((2, 0, 1)) # CHW
# In the training phase, the channel order of CIFAR
# image is B(Blue), G(green), R(Red). But PIL open
# image in RGB mode. It must swap the channel order.
im = im[(2, 1, 0), :, :] # BGR
im = im.flatten()
im = im / 255.0
return im

test_data = []
cur_dir = os.path.dirname(os.path.realpath(__file__))
test_data.append((load_image(cur_dir + '/image/dog.png'), ))

# users can remove the comments and change the model name
# with open('params_pass_50.tar', 'r') as f:
# parameters = paddle.parameters.Parameters.from_tar(f)

probs = paddle.infer(
output_layer=out, parameters=parameters, input=test_data)
lab = np.argsort(-probs) # probs and lab are the results of one batch data
print 'Label of image/dog.png is: %d' % lab[0][0]

if __name__ == '__main__':
main()

　　3.結果輸出

I1128 21:44:30.218085 14733 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7
[INFO 2017-11-28 21:44:35,874 layers.py:2539] output for __conv_pool_0___conv: c = 20, h = 28, w = 28, size = 15680
[INFO 2017-11-28 21:44:35,874 layers.py:2667] output for __conv_pool_0___pool: c = 20, h = 14, w = 14, size = 3920
[INFO 2017-11-28 21:44:35,875 layers.py:2539] output for __conv_pool_1___conv: c = 50, h = 10, w = 10, size = 5000
[INFO 2017-11-28 21:44:35,876 layers.py:2667] output for __conv_pool_1___pool: c = 50, h = 5, w = 5, size = 1250
I1128 21:44:35.881502 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
I1128 21:44:35.928449 14733 GradientMachine.cpp:85] Initing parameters..
I1128 21:44:36.056259 14733 GradientMachine.cpp:92] Init parameters done.

Pass 0, Batch 0, Cost 2.302628, {'classification_error_evaluator': 0.9296875}
................................................................................
```
Pass 199, Batch 200, Cost 0.869726, {'classification_error_evaluator': 0.3671875}
...................................................................................................
Pass 199, Batch 300, Cost 0.801396, {'classification_error_evaluator': 0.3046875}
..........................................................................................I1128 23:21:39.443141 14733 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8

Test with Pass 199, {'classification_error_evaluator': 0.5248000025749207}
Label of image/dog.png is: 9

　我開了7個線程，用了8個Tesla K80 GPU訓練，batch_size = 128,迭代次數(shù)200次，耗時1h37min，錯誤分類率為0.5248，這個結果，emm，不算很高，我們可以把它作為一個baseline，后面對其進行調優(yōu)。

LeNet-5網(wǎng)絡結構

　　Lenet-5網(wǎng)絡結構來源于Yan LeCun提出的,原文為《Gradient-based learning applied to document recognition》，論文里使用的是mnist手寫數(shù)字作為輸入數(shù)據(jù)（32 * 32）進行驗證。我們來看一下網(wǎng)絡結構。

??　LeNet-5一共有8層: 1個輸入層+3個卷積層(C1、C3、C5)+2個下采樣層(S2、S4)+1個全連接層(F6)+1個輸出層，每層有多個feature map(自動提取的多組特征)。??????

　　Input輸入層

　cifar10 數(shù)據(jù)集，每一張圖片尺寸：32 * 32

　　C1 卷積層

6個feature_map，卷積核大小 5 * 5 ，feature_map尺寸：28 * 28
每個卷積神經(jīng)元的參數(shù)數(shù)目：5 * 5 = 25個和一個bias參數(shù)
連接數(shù)目：（5*5+1）* 6 *（28*28） = 122,304
參數(shù)共享：每個feature_map內(nèi)共享參數(shù)，∴∴共(5*5+1)*6 = 156個參數(shù)

　　S2 下采樣層（池化層）

6個14*14的feature_map，pooling大小 2* 2
每個單元與上一層的feature_map中的一個2*2的滑動窗口連接，不重疊，因此S2每個feature_map大小是C1中feature_map大小的1/4
連接數(shù)：（2*2+1)*1*14*14*6 = 5880個
參數(shù)共享：每個feature_map內(nèi)共享參數(shù)，有2 * 6 = 12個訓練參數(shù)

　　C3 卷積層

　　這層略微復雜，S2神經(jīng)元與C3是多對多的關系，比如最簡單方式：用S2的所有feature map與C3的所有feature map做全連接(也可以對S2抽樣幾個feature map出來與C3某個feature map連接)，這種全連接方式下：6個S2的feature map使用6個獨立的5×5卷積核得到C3中1個feature map(生成每個feature map時對應一個bias)，C3中共有16個feature map，所以該層需要學習的參數(shù)個數(shù)為：(5×5×6+1)×16=2416個，神經(jīng)元連接數(shù)為：2416×8×8=154624個。

　　S4 下采樣層

　　同S2，如果采用Max Pooling/Mean Pooling，則該層需要學習的參數(shù)個數(shù)為0個，神經(jīng)元連接數(shù)為：(2×2+1)×16×4×4=1280個。

　　C5卷積層

　　類似C3，用S4的所有feature map與C5的所有feature map做全連接，這種全連接方式下：16個S4的feature map使用16個獨立的1×1卷積核得到C5中1個feature map(生成每個feature map時對應一個bias)，C5中共有120個feature map，所以該層需要學習的參數(shù)個數(shù)為：(1×1×16+1)×120=2040個，神經(jīng)元連接數(shù)為：2040個。

　　F6 全連接層

　　將C5層展開得到4×4×120=1920個節(jié)點，并接一個全連接層，考慮bias，該層需要學習的參數(shù)和連接個數(shù)為：(1920+1)*84=161364個。

　　輸出層

　　該問題是個10分類問題，所以有10個輸出單元，通過softmax做概率歸一化，每個分類的輸出單元對應84個輸入。

LeNet-5的PaddlePaddle實現(xiàn)

　　1.網(wǎng)絡結構 lenet.py????????

#coding:utf-8
'''
Created by huxiaoman 2017.11.27
lenet.py:LeNet-5
'''

import os
from PIL import Image
import numpy as np
import paddle.v2 as paddle
from paddle.trainer_config_helpers import *

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def lenet(img):
conv_pool_1 = paddle.networks.simple_img_conv_pool(
input=img,
filter_size=5,
num_filters=6,
num_channel=3,
pool_size=2,
pool_stride=2,
act=paddle.activation.Relu())
conv_pool_2 = paddle.networks.simple_img_conv_pool(
input=conv_pool_1,
filter_size=5,
num_filters=16,
pool_size=2,
pool_stride=2,
act=paddle.activation.Relu())
conv_3 = img_conv_layer(
input = conv_pool_2,
filter_size = 1,
num_filters = 120,
stride = 1)
fc = paddle.layer.fc(
input=conv_3, size=84, act=paddle.activation.Sigmoid())
return fc

2.訓練代碼 train_lenet.py

#coding:utf-8
'''
Created by huxiaoman 2017.11.27
train_lenet.py:訓練LeNet-5對cifar10數(shù)據(jù)集進行分類
'''

import sys, os

import paddle.v2 as paddle
from lenet import lenet

with_gpu = os.getenv('WITH_GPU', '0') != '1'

def main():
datadim = 3 * 32 * 32
classdim = 10

# PaddlePaddle init
paddle.init(use_gpu=with_gpu, trainer_count=7)

image = paddle.layer.data(
name='image', type=paddle.data_type.dense_vector(datadim))

# Add neural network config
# option 1. resnet
# net = resnet_cifar10(image, depth=32)
# option 2. vgg
net = lenet(image)

out = paddle.layer.fc(
input=net, size=classdim, act=paddle.activation.Softmax())

lbl = paddle.layer.data(
name='label', type=paddle.data_type.integer_value(classdim))
cost = paddle.layer.classification_cost(input=out, label=lbl)

# Create parameters
parameters = paddle.parameters.create(cost)

# Create optimizer
momentum_optimizer = paddle.optimizer.Momentum(
momentum=0.9,
regularization=paddle.optimizer.L2Regularization(rate=0.0002 * 128),
learning_rate=0.1 / 128.0,
learning_rate_decay_a=0.1,
learning_rate_decay_b=50000 * 100,
learning_rate_schedule='discexp')

# End batch and end pass event handler
def event_handler(event):
if isinstance(event, paddle.event.EndIteration):
if event.batch_id % 100 == 0:
print '\nPass %d, Batch %d, Cost %f, %s' % (
event.pass_id, event.batch_id, event.cost, event.metrics)
else:
sys.stdout.write('.')
sys.stdout.flush()
if isinstance(event, paddle.event.EndPass):
# save parameters
with open('params_pass_%d.tar' % event.pass_id, 'w') as f:
parameters.to_tar(f)

result = trainer.test(
reader=paddle.batch(
paddle.dataset.cifar.test10(), batch_size=128),
feeding={'image': 0,
'label': 1})
print '\nTest with Pass %d, %s' % (event.pass_id, result.metrics)

# Create trainer
trainer = paddle.trainer.SGD(
cost=cost, parameters=parameters, update_equation=momentum_optimizer)

# Save the inference topology to protobuf.
inference_topology = paddle.topology.Topology(layers=out)
with open('inference_topology.pkl', 'wb') as f:
inference_topology.serialize_for_inference(f)

trainer.train(
reader=paddle.batch(
paddle.reader.shuffle(
paddle.dataset.cifar.train10(), buf_size=50000),
batch_size=128),
num_passes=200,
event_handler=event_handler,
feeding={'image': 0,
'label': 1})

# inference
from PIL import Image
import numpy as np
import os

def load_image(file):
im = Image.open(file)
im = im.resize((32, 32), Image.ANTIALIAS)
im = np.array(im).astype(np.float32)
# The storage order of the loaded image is W(widht),
# H(height), C(channel). PaddlePaddle requires
# the CHW order, so transpose them.
im = im.transpose((2, 0, 1)) # CHW
# In the training phase, the channel order of CIFAR
# image is B(Blue), G(green), R(Red). But PIL open
# image in RGB mode. It must swap the channel order.
im = im[(2, 1, 0), :, :] # BGR
im = im.flatten()
im = im / 255.0
return im

test_data = []
cur_dir = os.path.dirname(os.path.realpath(__file__))
test_data.append((load_image(cur_dir + '/image/dog.png'), ))

# users can remove the comments and change the model name
# with open('params_pass_50.tar', 'r') as f:
# parameters = paddle.parameters.Parameters.from_tar(f)

probs = paddle.infer(
output_layer=out, parameters=parameters, input=test_data)
lab = np.argsort(-probs) # probs and lab are the results of one batch data
print 'Label of image/dog.png is: %d' % lab[0][0]

if __name__ == '__main__':
main()

　3.結果輸出　

I1129 14:52:44.314946 15153 Util.cpp:166] commandline: --use_gpu=True --trainer_count=7
[INFO 2017-11-29 14:52:50,490 layers.py:2539] output for __conv_pool_0___conv: c = 6, h = 28, w = 28, size = 4704
[INFO 2017-11-29 14:52:50,491 layers.py:2667] output for __conv_pool_0___pool: c = 6, h = 14, w = 14, size = 1176
[INFO 2017-11-29 14:52:50,491 layers.py:2539] output for __conv_pool_1___conv: c = 16, h = 10, w = 10, size = 1600
[INFO 2017-11-29 14:52:50,492 layers.py:2667] output for __conv_pool_1___pool: c = 16, h = 5, w = 5, size = 400
[INFO 2017-11-29 14:52:50,493 layers.py:2539] output for __conv_0__: c = 120, h = 5, w = 5, size = 3000
I1129 14:52:50.498749 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8
I1129 14:52:50.545882 15153 GradientMachine.cpp:85] Initing parameters..
I1129 14:52:50.651103 15153 GradientMachine.cpp:92] Init parameters done.

Pass 0, Batch 0, Cost 2.331898, {'classification_error_evaluator': 0.9609375}
```
......
Pass 199, Batch 300, Cost 0.004373, {'classification_error_evaluator': 0.0}
..........................................................................................I1129 16:17:08.678097 15153 MultiGradientMachine.cpp:99] numLogicalDevices=1 numThreads=7 numDevices=8

Test with Pass 199, {'classification_error_evaluator': 0.39579999446868896}
Label of image/dog.png is: 7

　同樣是7個線程，8個Tesla K80 GPU，batch_size = 128,迭代次數(shù)200次，耗時1h25min，錯誤分類率為0.3957，相比與simple_cnn的0.5248提高了12.91%。當然，這個結果也并不是很好，如果輸出詳細的日志，可以看到在訓練的過程中l(wèi)oss先降后升，說明有一定程度的過擬合，對于如何防止過擬合，我們在后面會詳細講解。

　　有一個可視化CNN的網(wǎng)站可以對mnist和cifar10分類的網(wǎng)絡結構進行可視化，這是cifar-10 BaseCNN的網(wǎng)絡結構：

LeNet-5的Tensorflow實現(xiàn)

tensorflow版本LeNet-5版本可以參照：

models/tutorials/image/cifar10/(https://github.com/tensorflow/models/tree/master/tutorials/image/cifar10)的步驟來訓練，不過這里面的代碼包含了很多數(shù)據(jù)處理、權重衰減以及正則化的一些方法防止過擬合。按照官方寫的,batch_size=128時在Tesla K40上迭代10w次需要4小時，準確率能達到86%。不過如果不對數(shù)據(jù)做處理，直接跑的話，效果應該沒有這么好。不過可以仔細借鑒cifar10_inputs.py里的distorted_inouts函數(shù)對數(shù)據(jù)預處理增大數(shù)據(jù)集的思想，以及cifar10.py里對于權重和偏置的衰減設置等。目前迭代到1w次左右，cost是0.98，acc是78.4%

　　對于未進行數(shù)據(jù)處理的cifar10我準備也跑一次，看看效果如何，與paddle的結果對比一下。不過得等到周末再補上了 = =

總結

　　本節(jié)用常規(guī)的cifar-10數(shù)據(jù)集做圖像分類，用了三種實現(xiàn)方式，第一種是自己設計的一個簡單的cnn，第二種是LeNet-5，第三種是Tensorflow實現(xiàn)的LeNet-5，對比速度可以見一下表格：

可以看到LeNet-5相比于原始的simple_cnn在準確率和速度方面都有一定的的提升，等tensorflow版本跑完后可以把結果加上去再對比一下。不過用Lenet-5網(wǎng)絡結構后，結果雖然有一定的提升，但是還是不夠理想，在日志里看到loss的信息基本可以推斷出是過擬合，對于神經(jīng)網(wǎng)絡訓練過程中出現(xiàn)的過擬合情況我們應該如何避免，下期我們講著重講解。此外在下一節(jié)將介紹AlexNet，并對分類做一個實驗，對比其效果。

參考文獻

1.LeNet-5論文：《Gradient-based learning applied to document recognition》

2.可視化CNN：http://shixialiu.com/publications/cnnvis/demo/

本站僅提供存儲服務，所有內(nèi)容均由用戶發(fā)布，如發(fā)現(xiàn)有害或侵權內(nèi)容，請點擊舉報。

打開APP，閱讀全文并永久保存查看更多類似文章

專欄 | 在PaddlePaddle上實現(xiàn)MNIST手寫體數(shù)字識別

PaddlePaddle

PaddlePaddle入門：從對話系統(tǒng)中的情感分析談起

使用Python基于TensorFlow的CIFAR

學習Tensorflow，反卷積

LeNet5模型與全連接模型的差異

更多類似文章 >>

免费视频淫片aa毛片_日韩高清在线亚洲专区vr_日韩大片免费观看视频播放_亚洲欧美国产精品完整版