作者:薛坤軍
編輯: 陳人和
前 言
- SSD理論總結(jié)(SSD: Single Shot MultiBox Detector)
- 關(guān)鍵源碼分析:https://github.com/balancap/SSD-Tensorflow
Model
SSD模型采用VGG16作為基礎(chǔ)網(wǎng)絡(luò)結(jié)構(gòu)(base network),在base network 之后添加了額外的網(wǎng)絡(luò)結(jié)構(gòu),如下圖所示:
01
在base network(VGG16的前5層)之后添加了額外的卷基層,具體利用astrous算法將fc6和fc7層轉(zhuǎn)化為兩個卷積層,再額外增加3個卷基層(Conv:1*1+Conv:3*3)和一個平均池化層(Avg Pooling,論文中是一個Conv:1*1+Conv:3*3,具有相同作用);
這里我們在網(wǎng)絡(luò)的所有特征圖上應(yīng)用3*3卷積進(jìn)行預(yù)測,來自較低層的預(yù)測有助于處理較小的物體。因為低層的feature map的感受野較小。這意味著可以通過使用與感受野大小相似的feature map來處理大小不同的對象,即達(dá)到多尺度特征圖檢測的目的;
關(guān)鍵代碼解析:#部分初始化參數(shù)
class SSDNet(object):
'''Implementation of the SSD VGG-based 300 network.
The default features layers with 300x300 image input are:
多尺度feature map檢測位置
conv4 ==> 38 x 38
conv7 ==> 19 x 19
conv8 ==> 10 x 10
conv9 ==> 5 x 5
conv10 ==> 3 x 3
conv11 ==> 1 x 1
The default image size used to train this network is 300x300.
'''
default_params = SSDParams(
img_shape=(300, 300),#輸入圖像尺寸
num_classes=21,#類別數(shù)量,20+1(背景)
no_annotation_label=21,
#多尺度feature map檢測位置
feat_layers=['block4', 'block7', 'block8', 'block9', 'block10', 'block11'],
#feature map尺寸
feat_shapes=[(38, 38), (19, 19), (10, 10), (5, 5), (3, 3), (1, 1)],
#最低層、最高層default box大小,可根據(jù)需要進(jìn)行修改
anchor_size_bounds=[0.15, 0.90],
#anchor_size_bounds=[0.20, 0.90],(原論文中的值)
#default box大小
anchor_sizes=[(21., 45.),
(45., 99.),
(99., 153.),
(153., 207.),
(207., 261.),
(261., 315.)],
# anchor_sizes=[(30., 60.),
# (60., 111.),
# (111., 162.),
# (162., 213.),
# (213., 264.),
# (264., 315.)],
#default box的長寬比例
anchor_ratios=[[2, .5],
[2, .5, 3, 1./3],
[2, .5, 3, 1./3],
[2, .5, 3, 1./3],
[2, .5],
[2, .5]],
#default box中心位置間隔
anchor_steps=[8, 16, 32, 64, 100, 300],
anchor_offset=0.5,#補償閾值
#該特征圖是否進(jìn)行正則化,大于0正則化
normalizations=[20, -1, -1, -1, -1, -1],
prior_scaling=[0.1, 0.1, 0.2, 0.2]
)
#定義SSD網(wǎng)絡(luò)結(jié)構(gòu)
def ssd_net(input,
num_classes=SSDNet.default_params.num_classes,
feat_layers=SSDNet.default_params.feat_layers,
anchor_sizes=SSDNet.default_params.anchor_sizes,
anchor_ratios=SSDNet.default_params.anchor_ratios,
normalizations=SSDNet.default_params.normalizations,
is_training=True,
dropout_keep_prob=0.5,
prediction_fn=slim.softmax,
reuse=None,
scope='ssd_300_vgg'):
'''SSD net definition.'''
# End_points collect relevant activations for external use.
#存儲每層feature map的輸出結(jié)果
end_points = {}
with tf.variable_scope(scope, 'ssd_300_vgg', [inputs], reuse=reuse):
# ========Original VGG-16 blocks========
net = slim.repeat(input, 2, slim.conv2d, 64, [3, 3], scope='conv1')
end_points['block1'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool1', padding='SAME')
# Block 2.
net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
end_points['block2'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool2', padding='SAME')
# Block 3.
net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
end_points['block3'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool3', padding='SAME')
# Block 4.
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
#第一個用于預(yù)測的feature map,shape為(batch_size, 38, 38, 512)
end_points['block4'] = net
net = slim.max_pool2d(net, [2, 2], scope='pool4', padding='SAME')
# Block 5.
net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
end_points['block5'] = net
net = slim.max_pool2d(net, [3, 3], stride=1, scope='pool5', padding='SAME')
# Additional SSD blocks.
# Block 6: let's dilate the hell out of it!
net = slim.conv2d(net, 1024, [3, 3], rate=6, scope='conv6')
end_points['block6'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
# Block 7: 1x1 conv. Because the fuck.
net = slim.conv2d(net, 1024, [1, 1], scope='conv7')
#第二個用于預(yù)測的feature map,shape為(batch_size, 19, 19, 1024)
end_points['block7'] = net
net = tf.layers.dropout(net, rate=dropout_keep_prob, training=is_training)
# Block 8/9/10/11: 1x1 and 3x3 convolutions stride 2 (except lasts)
end_point = 'block8'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 256, [1, 1], scope='conv1x1')
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 512, [3, 3], stride=2, scope='conv3x3', padding='VALID')
#第三個用于預(yù)測的feature map,shape為(batch_size, 10, 10, 512)
end_points[end_point] = net
end_point = 'block9'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = custom_layers.pad2d(net, pad=(1, 1))
net = slim.conv2d(net, 256, [3, 3], stride=2, scope='conv3x3', padding='VALID')
#第四個用于預(yù)測的feature map,shape為(batch_size, 5, 5, 256)
end_points[end_point] = net
end_point = 'block10'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
#第五個用于預(yù)測的feature map,shape為(batch_size, 3, 3, 256)
end_points[end_point] = net
end_point = 'block11'
with tf.variable_scope(end_point):
net = slim.conv2d(net, 128, [1, 1], scope='conv1x1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3x3', padding='VALID')
#第六個用于預(yù)測的feature map,shape為(batch_size, 1, 1, 256)
end_points[end_point] = net
# Prediction and localisations layers.
predictions = []
logits = []
localisations = []
for i, layer in enumerate(feat_layers):
with tf.variable_scope(layer + '_box'):
#預(yù)測bbox的位置(相對于default box的偏移)以及類別
p, l = ssd_multibox_layer(end_points[layer],
num_classes,
anchor_sizes[i],
anchor_ratios[i],
normalizations[i])
#softmax
predictions.append(prediction_fn(p))
#類別概率
logits.append(p)
#bbox相對于default box的偏移
localisations.append(l)
return predictions, localisations, logits, end_points
ssd_net.default_image_size = 300
測試使用的是tf-1.1.0版本,使用300*300的圖片feature map的shape和預(yù)期不一樣,因此在源碼中做了改動,即在max_pool添加參數(shù)padding='SAME'。
02
每一個用于預(yù)測的特征層(base network之后的feature map),使用一系列 convolutional filters,產(chǎn)生一系列固定大小(即每個特征圖預(yù)測的尺度是固定的)的 predictions。對于一個 m×n,具有 p 通道的feature map,使用的convolutional filters 是 3×3 的 kernels。預(yù)測default box的類別和偏移位置;
YOLO 則是用一個全連接層來代替這里的卷積層,全連接層導(dǎo)致輸入大小必須固定;
關(guān)鍵代碼分析:
##在特征圖上進(jìn)行預(yù)測(偏移位置,類別概率)
'''
inpouts:['block4', 'block7', 'block8', 'block9', 'block10', 'block11']
num_classes:21
sizes:[(21.,45.),(45.,99.),(99.,153.), (153.,207.),(207.,261.),(261.,315.)]
ratios:
[[2, .5],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5, 3, 1./3],[2, .5],[2,.5]]
參數(shù)一一對應(yīng)
'''
def ssd_multibox_layer(inputs,
num_classes,
sizes,
ratios=[1],
normalization=-1,
bn_normalization=False):
'''Construct a multibox layer, return a class and localization predictions.
'''
net = inputs
#正則化
if normalization > 0:
net = custom_layers.l2_normalization(net, scaling=True)
# Number of anchors.
#此feature map每個位置對應(yīng)的default box個數(shù)
#len(size)表示長寬比例為1的的個數(shù)
#len(ratios)表示其它長寬比例
num_anchors = len(sizes) + len(ratios)
# Location.
#位置
num_loc_pred = num_anchors * 4
#卷積預(yù)測器,為每個bbox預(yù)測位置
'''輸出:
(batch_size, 38, 38,num_loc_pred)
(batch_size, 19, 19,num_loc_pred)
(batch_size, 10, 10,num_loc_pred)
(batch_size, 5, 5,num_loc_pred)
(batch_size, 3, 3,num_loc_pred)
(batch_size, 1, 1,num_loc_pred)
'''
loc_pred = slim.conv2d(net, num_loc_pred, [3, 3], activation_fn=None,
scope='conv_loc')
loc_pred = custom_layers.channel_to_last(loc_pred)
loc_pred = tf.reshape(loc_pred,
tensor_shape(loc_pred, 4)[:-1]+[num_anchors, 4])
# Class prediction.
#卷積預(yù)測器,為每個bbox預(yù)測類別
num_cls_pred = num_anchors * num_classes
cls_pred = slim.conv2d(net,
num_cls_pred,
[3, 3],
activation_fn=None,
scope='conv_cls')
cls_pred = custom_layers.channel_to_last(cls_pred)
cls_pred = tf.reshape(cls_pred,
tensor_shape(cls_pred, 4)[:-1]+[num_anchors, num_classes])
return cls_pred, loc_pred
03
在每一個用于預(yù)測的feature map上得到default boxes,default boxes的數(shù)量、尺寸、長寬比由網(wǎng)絡(luò)結(jié)構(gòu)固定而固定;
關(guān)鍵代碼解析:
#為特征每個feature map生成固定的default box
def ssd_anchor_one_layer(img_shape,
feat_shape,
sizes,
ratios,
step,
offset=0.5,
dtype=np.float32):
'''Computer SSD default anchor boxes for one feature layer.
Determine the relative position grid of the centers, and the relative
width and height.
Arguments:
feat_shape: Feature shape, used for computing relative position grids;
size: Absolute reference sizes;
ratios: Ratios to use on these features;
img_shape: Image shape, used for computing height, width relatively to the
former;
offset: Grid offset.
Return:
y, x, h, w: Relative x and y grids, and height and width.
'''
# Compute the position grid: simple way.
# y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
# y = (y.astype(dtype) + offset) / feat_shape[0]
# x = (x.astype(dtype) + offset) / feat_shape[1]
# Weird SSD-Caffe computation using steps values...
#以(38*38)的feature map為例生成default box
#理解為feature map對應(yīng)的y軸坐標(biāo),x軸坐標(biāo)
'''
y的shape(38,38),值為:
np.array([[0,0,0,...,0,0,0],
[1,1,1,...,1,1,1],
......
[37,37,37,...,37,37,37]])
x的shape(38,38),值為:
np.array([[0,1,2,...,35,36,37],
[0,1,2,...,35,36,37],
......
[0,1,2,...,35,36,37]])
'''
y, x = np.mgrid[0:feat_shape[0], 0:feat_shape[1]]
#將feature map的點對應(yīng)到原始圖像上并歸一化[0-1]
#y = (y + 0.5) * 8/300
#x = (x + 0.5) * 8/300
#x,y為default box在原始圖片中的中心位置,并歸一化[0-1]
y = (y.astype(dtype) + offset) * step / img_shape[0]
x = (x.astype(dtype) + offset) * step / img_shape[1]
# Expand dims to support easy broadcasting.
#擴(kuò)展維度,shape為(38,38,1)
y = np.expand_dims(y, axis=-1)
x = np.expand_dims(x, axis=-1)
# Compute relative height and width.
# Tries to follow the original implementation of SSD for the order.
#anchors的數(shù)量
#feature map每個點對應(yīng)的default box 的數(shù)量
num_anchors = len(sizes) + len(ratios)
#default box 的高和寬
h = np.zeros((num_anchors, ), dtype=dtype)
w = np.zeros((num_anchors, ), dtype=dtype)
# Add first anchor boxes with ratio=1.
#
#長寬比例為1的default box,高和寬都為21/300
h[0] = sizes[0] / img_shape[0]
w[0] = sizes[0] / img_shape[1]
di = 1
#長寬比例為1的default box額外添加一個尺寸為sqrt(Sk*Sk+1)的default box
if len(sizes) > 1:
#寬高都為sqrt(21*45)
h[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[0]
w[1] = math.sqrt(sizes[0] * sizes[1]) / img_shape[1]
di += 1
#剩余長寬比的default box
for i, r in enumerate(ratios):
h[i+di] = sizes[0] / img_shape[0] / math.sqrt(r)
w[i+di] = sizes[0] / img_shape[1] * math.sqrt(r)
#返回default box的中心位置以及寬和高
#y,x的shape為(38,38,1)
#h,w的shape為(4,)
return y, x, h, w
def ssd_anchors_all_layers(img_shape,#原始圖像的shape
layers_shape,#特征圖shape
anchor_sizes,#default box尺寸
anchor_ratios,#長寬比例
anchor_steps,
offset=0.5,
dtype=np.float32):
'''Compute anchor boxes for all feature layers.'''
'''
params:
img_shape: (300,300)
layers_shape: [(38,38),(19,19),(10,10),(5,5),(3,3),(1,1)]
21,45,99,153,207,261
anchor_sizes: [(21,45),(45,99),(99,153),(153,207),(207,261),(261,315)]
anchor_ratios:[[2,.5],[2,.5,3,1./3],[2,.5,3,1./3],[2,.5,3,1./3],[2,.5],[2,.5]]
anchor_steps: [8,16,32,64,100,300]
offset: 0.5
'''
layers_anchors = []
#enumerate,python的內(nèi)置函數(shù)返回索引、內(nèi)容
'''
即:
0,(38,38)
1,(19,19)
2,(10,10)
3,(5,5)
4,(3,3)
5,(1,1)
'''
for i, s in enumerate(layers_shape):
anchor_bboxes = ssd_anchor_one_layer(img_shape, s,
anchor_sizes[i],
anchor_ratios[i],
anchor_steps[i],
offset=offset,
dtype=dtype)
layers_anchors.append(anchor_bboxes)
return layers_anchors
訓(xùn)練
1. 生成default box
對每種尺寸的feature map,按照相應(yīng)的大?。╯cale)和寬高比例(ratio)在每個點生成固定數(shù)量的default box,也就是說,SSD中的default box是由網(wǎng)絡(luò)結(jié)構(gòu)固定而固定的,如下圖(僅僅是為了舉例),紅色點代表feature map(5*5),每個位置預(yù)測3個default box,尺寸為168,寬高比為1,1/2,2,則default box寬高分別為
2. 生成訓(xùn)練數(shù)據(jù)
根據(jù)圖片的ground truth和default box生成訓(xùn)練數(shù)據(jù),關(guān)鍵代碼解析如下:
#gt編碼函數(shù)
#labels:gt的類別
#bboxes:gt的位置
#anchors:default box的位置
#num_class:類別數(shù)量
#no_annotation_label:21
#ignore_threshold=0.5,閾值
#prior_scaling=[0.1, 0.1, 0.2, 0.2],縮放
def tf_ssd_bboxes_encode(labels, bboxes, anchors, num_classes,
no_annotation_label, ignore_threshold=0.5,
prior_scaling=[0.1, 0.1, 0.2, 0.2],
dtype=tf.float32, scope='ssd_bboxes_encode'):
'''Encode groundtruth labels and bounding boxes using SSD net anchors.
Encoding boxes for all feature layers.
Arguments:
labels: 1D Tensor(int64) containing groundtruth labels;
bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
anchors: List of Numpy array with layer anchors;
matching_threshold: Threshold for positive match with groundtruth bboxes;
prior_scaling: Scaling of encoded coordinates.
Return:
(target_labels, target_localizations, target_scores):
Each element is a list of target Tensors.
'''
with tf.name_scope(scope):
target_labels = []
target_localizations = []
target_scores = []
for i, anchors_layer in enumerate(anchors):
with tf.name_scope('bboxes_encode_block_%i' % i):
#處理每個尺寸的default box(對應(yīng)一層的feature map),生成訓(xùn)練數(shù)據(jù)
t_labels, t_loc, t_scores = \
tf_ssd_bboxes_encode_layer(labels, bboxes,
anchors_layer,
num_classes,
no_annotation_label,
ignore_threshold,
prior_scaling, dtype)
target_labels.append(t_labels)
target_localizations.append(t_loc)
target_scores.append(t_scores)
return target_labels, target_localizations, target_scores
處理每個尺寸的default box(對應(yīng)一層的feature map),生成訓(xùn)練數(shù)據(jù),關(guān)鍵代碼解析,以shape為(38,38)feature map為例:
本代碼塊中對于每一個anchor和所有的gt計算重疊度,anchor的類別為重疊度最高的gt的類別,偏移位置為相對于重疊度最高的gt的偏移位置;
給定輸入圖像以及每個物體的 ground truth,首先找到每個gt對應(yīng)的default box中重疊度最大的作為(與該ground true box相關(guān)的匹配)正樣本。然后,在剩下的default box中找到那些與任意一個ground truth box 的 IOU 大于 0.5的default box作為(與該ground true box相關(guān)的匹配)正樣本。剩余的default box 作為負(fù)例樣本;
一個anchor對應(yīng)一個gt,而一個gt可能對應(yīng)多個anchor;
#labels:gt的類別
#bboxes:gt的位置
#anchors_layer:特定feature map的default box的位置
#num_class:類別數(shù)量
#no_annotation_label:21
#ignore_threshold=0.5,閾值
#prior_scaling=[0.1, 0.1, 0.2, 0.2],縮放
def tf_ssd_bboxes_encode_layer(labels,
bboxes,
anchors_layer,
num_classes,
no_annotation_label,
ignore_threshold=0.5,
prior_scaling=[0.1, 0.1, 0.2, 0.2],
dtype=tf.float32):
'''Encode groundtruth labels and bounding boxes using SSD anchors from
one layer.
Arguments:
labels: 1D Tensor(int64) containing groundtruth labels;
bboxes: Nx4 Tensor(float) with bboxes relative coordinates;
anchors_layer: Numpy array with layer anchors;
matching_threshold: Threshold for positive match with groundtruth bboxes;
prior_scaling: Scaling of encoded coordinates.
Return:
(target_labels, target_localizations, target_scores): Target Tensors.
'''
# Anchors coordinates and volume.
#anchors的中心坐標(biāo),以及寬高
#shape為(38,38,1),(38,38,1),(4,),(4,)
yref, xref, href, wref = anchors_layer
ymin = yref - href / 2.#anchor的下邊界,(38,38,4)
xmin = xref - wref / 2.#anchor的左邊界,(38,38,4)
ymax = yref + href / 2.#anchor的上邊界,(38,38,4)
xmax = xref + wref / 2.#anchor的右邊界,(38,38,4)
vol_anchors = (xmax - xmin) * (ymax - ymin)#anchor的面積,(38,38,4)
# Initialize tensors...
#(38,38,4)
shape = (yref.shape[0], yref.shape[1], href.size)
feat_labels = tf.zeros(shape, dtype=tf.int64)
feat_scores = tf.zeros(shape, dtype=dtype)
feat_ymin = tf.zeros(shape, dtype=dtype)
feat_xmin = tf.zeros(shape, dtype=dtype)
feat_ymax = tf.ones(shape, dtype=dtype)
feat_xmax = tf.ones(shape, dtype=dtype)
#計算jaccard重合度
#box存儲的是gt的四個邊界位置,并且都進(jìn)行了歸一化
def jaccard_with_anchors(bbox):
'''Compute jaccard score between a box and the anchors.
'''
#獲取gt和anchors重合的部分
int_ymin = tf.maximum(ymin, bbox[0])
int_xmin = tf.maximum(xmin, bbox[1])
int_ymax = tf.minimum(ymax, bbox[2])
int_xmax = tf.minimum(xmax, bbox[3])
h = tf.maximum(int_ymax - int_ymin, 0.)
w = tf.maximum(int_xmax - int_xmin, 0.)
# Volumes.
inter_vol = h * w#計算重疊部分面積
union_vol = vol_anchors - inter_vol \
+ (bbox[2] - bbox[0]) * (bbox[3] - bbox[1])
jaccard = tf.div(inter_vol, union_vol)
return jaccard#返回重合度
#計算重疊部分面積占anchor面積的比例
def intersection_with_anchors(bbox):
'''Compute intersection between score a box and the anchors.
'''
int_ymin = tf.maximum(ymin, bbox[0])
int_xmin = tf.maximum(xmin, bbox[1])
int_ymax = tf.minimum(ymax, bbox[2])
int_xmax = tf.minimum(xmax, bbox[3])
h = tf.maximum(int_ymax - int_ymin, 0.)
w = tf.maximum(int_xmax - int_xmin, 0.)
inter_vol = h * w
scores = tf.div(inter_vol, vol_anchors)
return scores
#tf.while_loop的條件
def condition(i, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax):
'''Condition: check label index.
'''
#返回I<>
r = tf.less(i, tf.shape(labels))
return r[0]
#tf.while_loop的主體
def body(i, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax):
'''Body: update feature labels, scores and bboxes.
Follow the original SSD paper for that purpose:
- assign values when jaccard > 0.5;
- only update if beat the score of other bboxes.
'''
# Jaccard score.
#第i個gt的類別和位置
label = labels[i]
bbox = bboxes[i]
#計算gt和每一個anchor的重合度
jaccard = jaccard_with_an4chors(bbox)
# Mask: check threshold + scores + no annotations + num_classes.
#比較兩個值的大小來輸出對錯,大于輸出true,shape(38,38,4)
#feat_scores存儲的是anchor和gt重疊度最高的值
mask = tf.greater(jaccard, feat_scores)
#mask = tf.logical_and(mask,tf.greater(jaccard,matching_threshold))
#邏輯與
mask = tf.logical_and(mask, feat_scores > -0.5)
mask = tf.logical_and(mask, label <>
imask = tf.cast(mask, tf.int64)
fmask = tf.cast(mask, dtype)
# Update values using mask.
#根據(jù)imask更新類別,和位置
#imask表示本輪anchor和gt重合度之前gt的重合度,1-imask保留之前的結(jié)果
#更新anchor的類別標(biāo)簽
feat_labels = imask * label + (1 - imask) * feat_labels
#jaccard返回true對應(yīng)的值,feat_scores返回false對應(yīng)的值
#更新anchor與gt的重合度,為每個anchor保留重合度最大值
feat_scores = tf.where(mask, jaccard, feat_scores)
#更新anchor對應(yīng)的gt(具有最大重合度)
feat_ymin = fmask * bbox[0] + (1 - fmask) * feat_ymin
feat_xmin = fmask * bbox[1] + (1 - fmask) * feat_xmin
feat_ymax = fmask * bbox[2] + (1 - fmask) * feat_ymax
feat_xmax = fmask * bbox[3] + (1 - fmask) * feat_xmax
# Check no annotation label: ignore these anchors...
# interscts = intersection_with_anchors(bbox)
# mask = tf.logical_and(interscts > ignore_threshold,
# label == no_annotation_label)
# # Replace scores by -1.
# feat_scores = tf.where(mask, -tf.cast(mask, dtype), feat_scores)
return [i+1, feat_labels, feat_scores,
feat_ymin, feat_xmin, feat_ymax, feat_xmax]
# Main loop definition.
i = 0
[i, feat_labels, feat_scores,
feat_ymin, feat_xmin,
feat_ymax, feat_xmax] = tf.while_loop(condition, body,
[i, feat_labels, feat_scores,
feat_ymin, feat_xmin,
feat_ymax, feat_xmax])
# Transform to center / size.
#計算anchor對應(yīng)的gt的中心位置以及寬和高
feat_cy = (feat_ymax + feat_ymin) / 2.
feat_cx = (feat_xmax + feat_xmin) / 2.
feat_h = feat_ymax - feat_ymin
feat_w = feat_xmax - feat_xmin
# Encode features.
#計算anchor與對應(yīng)的gt的偏移位置
feat_cy = (feat_cy - yref) / href / prior_scaling[0]
feat_cx = (feat_cx - xref) / wref / prior_scaling[1]
feat_h = tf.log(feat_h / href) / prior_scaling[2]
feat_w = tf.log(feat_w / wref) / prior_scaling[3]
# Use SSD ordering: x / y / w / h instead of ours.
feat_localizations = tf.stack([feat_cx, feat_cy, feat_w, feat_h], axis=-1)
#返回每個anchor的類別標(biāo)簽,以及anchor和對應(yīng)gt的偏移,anchor與對應(yīng)gt的重合度
return feat_labels, feat_localizations, feat_scores
3.損失函數(shù)
SSD損失函數(shù)分為兩部分:
localization loss(loc)
confidence loss(conf)
關(guān)鍵代碼分析:
#SSD損失函數(shù)定義
#logits:預(yù)測的類別
#localisations:預(yù)測的偏移位置
#gclasses:default box相對于gt的類別
#glocalisations:default box相對于gt的偏移位置
#gscores:default box和gt的重疊度
def ssd_losses(logits, localisations,
gclasses, glocalisations, gscores,
match_threshold=0.5,
negative_ratio=3.,
alpha=1.,
label_smoothing=0.,
device='/cpu:0',
scope=None):
with tf.name_scope(scope, 'ssd_losses'):
lshape = tfe.get_shape(logits[0], 5)
#類別數(shù)量
num_classes = lshape[-1]
batch_size = lshape[0]
# Flatten out all vectors!
flogits = []
fgclasses = []
fgscores = []
flocalisations = []
fglocalisations = []
#處理所有尺寸feature map的預(yù)測結(jié)果
#(38,38),(19,19),(10,10),(5,5),(3,3),(1,1)
for i in range(len(logits)):
#預(yù)測的類別(38*38*4, 21)
flogits.append(tf.reshape(logits[i], [-1, num_classes]))
#真實類別(38*38*4)
fgclasses.append(tf.reshape(gclasses[i], [-1]))
#重疊度(38*38*4)
fgscores.append(tf.reshape(gscores[i], [-1]))
#預(yù)測偏移位置,(38*38*4, 4)
flocalisations.append(tf.reshape(localisations[i], [-1, 4]))
#真實偏移位置,(38*38*4, 4)
fglocalisations.append(tf.reshape(glocalisations[i], [-1, 4]))
# And concat the crap!
logits = tf.concat(flogits, axis=0)
gclasses = tf.concat(fgclasses, axis=0)
gscores = tf.concat(fgscores, axis=0)
localisations = tf.concat(flocalisations, axis=0)
glocalisations = tf.concat(fglocalisations, axis=0)
dtype = logits.dtype
# Compute positive matching mask...
#獲取重疊度>0.5的default box個數(shù),即損失函數(shù)中的N,正例樣本位置
pmask = gscores > match_threshold
fpmask = tf.cast(pmask, dtype)
n_positives = tf.reduce_sum(fpmask)
# Hard negative mining...
no_classes = tf.cast(pmask, tf.int32)
#將輸出類別對應(yīng)的softmax
predictions = slim.softmax(logits)
#邏輯與,獲得負(fù)類樣本的位置
nmask = tf.logical_and(tf.logical_not(pmask),
gscores > -0.5)
fnmask = tf.cast(nmask, dtype)
#獲得負(fù)例樣本對應(yīng)的概率
nvalues = tf.where(nmask,
predictions[:, 0],
1. - fnmask)
nvalues_flat = tf.reshape(nvalues, [-1])
# Number of negative entries to select.
#負(fù)例樣本數(shù)目,保證正負(fù)樣本數(shù)目為1:3
max_neg_entries = tf.cast(tf.reduce_sum(fnmask), tf.int32)
n_neg = tf.cast(negative_ratio * n_positives, tf.int32)+batch_size
n_neg = tf.minimum(n_neg, max_neg_entries)
val, idxes = tf.nn.top_k(-nvalues_flat, k=n_neg)
max_hard_pred = -val[-1]
# Final negative mask.
nmask = tf.logical_and(nmask, nvalues <>
fnmask = tf.cast(nmask, dtype)
# Add cross-entropy loss.
#正樣本概率損失函數(shù)
with tf.name_scope('cross_entropy_pos'):
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits,
labels=gclasses)
loss = tf.div(tf.reduce_sum(loss * fpmask),
batch_size, name='value')
tf.losses.add_loss(loss)
#負(fù)樣本概率損失函數(shù)
with tf.name_scope('cross_entropy_neg'):
loss = tf.nn.sparse_softmax_cross_entropy_with_logits(
logits=logits,
labels=no_classes)
loss = tf.div(tf.reduce_sum(loss * fnmask),
batch_size, name='value')
tf.losses.add_loss(loss)
# Add localization loss: smooth L1, L2, ...
#位置損失函數(shù)
with tf.name_scope('localization'):
# Weights Tensor: positive mask + random negative.
weights = tf.expand_dims(alpha * fpmask, axis=-1)
loss = custom_layers.abs_smooth(localisations - glocalisations)
loss = tf.div(tf.reduce_sum(loss * weights),
batch_size,
name='value')
tf.losses.add_loss(loss)
4.Hard Negative Mining
絕大多數(shù)的default box都是負(fù)例樣本,導(dǎo)致正負(fù)樣本不平衡,訓(xùn)練時采用Hard Negative Mining策略(使正負(fù)樣本比例為1:3)來平衡正負(fù)樣本比例。
END