SSD在VGGNet基础上又增加几个卷基层,然后用3*3的卷积核在不同尺度上进行分类和回归。SSD的创新点:数据增强、VGGNet+卷积块儿、PriorBox与多层特征图、正负样本选取与损失函数的设计。
SSD的优点:
- 利用多特征检图测在某些场景下可以与FasterRCNN媲美。
- 检测速度可以超过同期的FasterRCNN和YOLO算法。
- 网络优化简单。
SSD的缺点:
1 . PriorBox需要人工设置。
2. 检测精度有限。
1 数据处理
1.1 数据集划分voc2ssd.py
这个代码的目的是根据annotation里每个图片的名称划分数据为训练集、测试集、验证集,并把划分好的数据集的名称存放在以’.txt‘结束的文件中。
代码步骤:
- 设置xml的地址和数据集名称的保存地址.xmlfilepath,saveBasePath
- 根据训练测试、测试比例、数据集总量确定各个数据集的数量.trainval_percent,train_percent ,total_xml
- 根据总的数据集和各个数据集的数量抽样得到各个数据集下标的集合. tv,tr,trainval,train
- 根据上一步的结果写入各个数据集并保存.ftrainval,ftest,ftrain,fval
代码
'''
xmlfilepath,saveBasePath + trainval_percent,train_percent ,total_xml + tv,tr,trainval,train + ftrainval,ftest,ftrain,fval + 遍历
'''
import os
import random
xmlFilePath = r'/Users/liushuang/Desktop/LearnGit/Bubbliiiing资料/Keras/目标检测/ssd-keras-master/VOCdevkit/VOC2007/Annotations'
saveBasePath = r'/Users/liushuang/Desktop/LearnGit/Bubbliiiing资料/Keras/目标检测/ssd-keras-master/VOCdevkit/VOC2007/ImageSets/Main'
trainval_percent = 0.9
train_percent = 0.9
temp = os.listdir(xmlFilePath)
total_xml = []
for i in temp:
if i.endswith('.xml'):
total_xml.append(i)
num = len(total_xml)
tv = int(trainval_percent*num)
tr = int(tv*train_percent)
list = range(num)
trainval = random.sample(list,tv)
train = random.sample(trainval,tr)
ftrainval = open(os.path.join(saveBasePath,'LStrainval.txt'),'w')
ftest = open(os.path.join(saveBasePath,'LStest.txt'),'w')
ftrain = open(os.path.join(saveBasePath,'LStrain.txt'),'w')
fval = open(os.path.join(saveBasePath,'LSval.txt'),'w')
for i in list:
name = total_xml[i][:-4]+'\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftest.close()
ftrain.close()
fval.close()
1.2 读入数据voc_annotation.py
这个文件的目的是读入图片地址、框、类别信息。
1.2.1 代码步骤
- 写一个读入xml框和类别的函数。
- 遍历每一个数据集,分别读入图片地址、框和类别的信息。
1.2.2 代码
'''
convert_annotation(year, image_id, list_file): difficult,cls + cls_id ,xmlbox + b ;
year,imge_set + image_ids,list_file + list_file.write(wd, year, image_id) , convert_annotation(year, image_id, list_file)
'''
import os
import xml.etree.ElementTree as ET
sets = [('2007','train'),('2007','val'),('2007','test')]
classes = ["aeroplane", "bicycle", "bird", "boat", "bottle", "bus", "car", "cat", "chair", "cow", "diningtable", "dog", "horse", "motorbike", "person", "pottedplant", "sheep", "sofa", "train", "tvmonitor"]
def convert_annotaion(year,image_id,list_file):
in_file = open('VOCdevkit/VOC%s/Annotations/%s.xml'%(year,image_id))
tree = ET.parse(in_file)
root = tree.getroot()
for obj in root.iter('object'):
difficult = obj.find('difficult').text
cls = obj.find('name').text
if cls not in classes or int(difficult)==1:
continue
cls_id = classes.index(cls)
xmlbox = obj.find('bndbox')
b = (int(xmlbox.find('xmin').text),int(xmlbox.find('ymin').text),int(xmlbox.find('xmax').text),int(xmlbox.find('ymax').text))
list_file.write(' '+','.join([str(a) for a in b])+','+str(cls_id))
wd = os.getcwd()
for year,image_set in sets:
image_ids = open('VOCdevkit/VOC%s/ImageSets/Main/%s.txt'%(year,image_set)).read().strip().split()
list_file = open('%s_%s.txt'%(year,image_set),'w')
for image_id in image_ids:
list_file.write('%s/VOCdevkit/VOC%s/JPEGImages/%s.jpg'%(wd,year,image_id ))
convert_annotaion(year,image_id,list_file)
list_file.write('\n')
list_file.close()
2 主干网络
主干网络对输入的图片进行一系列卷机、池化、激活提取不同的特征,再根据特征进行分类和回归。
self.ssd_model = ssd.SSD300(self.model_image_size, self.num_classes)
2.1 主干网络流程
- 输入图片[300,300,3]带入VGG16中得到net。提取net([‘conv4_3’])、net[‘fc7’]、提取netnet[‘conv6_2’] 、 提取net[‘conv7_2’] 、 net[‘conv8_2’]、net[‘conv9_2’] 。
- 对上一步提取的特征分类和回归
2.1.1 建立VGG16模型步骤
- Block 1 (300,300,3 -> 150,150,64) : input–>Conv2D*2+MaxPooling2D–>net[‘conv1_1’]+net[‘conv1_2’]+net[‘pool1’]
- Block 2 (150,150,64 -> 75,75,128) : net[‘pool1’]–>Conv2D*2+MaxPooling2D–>net[‘conv2_1’]+net[‘conv2_2’]+net[‘pool2’]
- Block 3 (75,75,128 -> 38,38,256) : net[‘pool2’] --> Conv2D*3 + MaxPooling2D --> net[‘conv3_1’]+net[‘conv3_2’]+net[‘conv3_2’] + net[‘pool3’]
- Block 4 (38,38,256 -> 19,19,512) : net[‘pool3’] --> Conv2D*3 + MaxPooling2D --> net[‘conv4_1’]+net[‘conv4_2’]+net[‘conv4_2’] + net[‘pool4’]
- Block 5 (19,19,512 -> 19,19,512) : net[‘pool4’] --> Conv2D*3 + MaxPooling2D --> net[‘conv5_1’]+net[‘conv5_2’]+net[‘conv5_2’] + net[‘pool5’]
- FC6 (19,19,512 -> 19,19,1024) : net[‘pool5’] --> Conv2D --> net[‘fc6’]
- FC7 (19,19,1024 -> 19,19,1024) :net[‘fc6’]–> Conv2D --> net[‘fc7’]
- Block 6 (19,19,512 -> 10,10,512) : net[‘fc7’] --> Conv2D*2 --> net[‘conv6_1’] + net[‘conv6_2’]
- Block 7 (10,10,512 -> 5,5,256) : net[‘conv6_2’] --> Conv2D*2 --> net[‘conv7_1’] + net[‘conv7_2’]
- Block 8 (5,5,256 -> 3,3,256) : net[‘conv7_2’] --> Conv2D*2 --> net[‘conv8_1’] + net[‘conv8_2’]
- Block 9 (3,3,256 -> 1,1,256) : net[‘conv8_2’] --> Conv2D*2 --> net[‘conv9_1’] + net[‘conv9_2’]
2.1.2 VGG16代码
import keras.backend as K
from keras.layers import Activation
from keras.layers import Conv2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import GlobalAveragePooling2D
from keras.layers import Input
from keras.layers import MaxPooling2D
from keras.layers import merge, concatenate
from keras.layers import Reshape
from keras.layers import ZeroPadding2D
from keras.models import Model
def VGG16(input_tensor):
#----------------------------主干特征提取网络开始---------------------------#
# SSD结构,net字典
net = {}
# Block 1
net['input'] = input_tensor
# 300,300,3 -> 150,150,64
net['conv1_1'] = Conv2D(64, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv1_1')(net['input'])
net['conv1_2'] = Conv2D(64, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv1_2')(net['conv1_1'])
net['pool1'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
name='pool1')(net['conv1_2'])
# Block 2
# 150,150,64 -> 75,75,128
net['conv2_1'] = Conv2D(128, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv2_1')(net['pool1'])
net['conv2_2'] = Conv2D(128, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv2_2')(net['conv2_1'])
net['pool2'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
name='pool2')(net['conv2_2'])
# Block 3
# 75,75,128 -> 38,38,256
net['conv3_1'] = Conv2D(256, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv3_1')(net['pool2'])
net['conv3_2'] = Conv2D(256, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv3_2')(net['conv3_1'])
net['conv3_3'] = Conv2D(256, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv3_3')(net['conv3_2'])
net['pool3'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
name='pool3')(net['conv3_3'])
# Block 4
# 38,38,256 -> 19,19,512
net['conv4_1'] = Conv2D(512, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv4_1')(net['pool3'])
net['conv4_2'] = Conv2D(512, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv4_2')(net['conv4_1'])
net['conv4_3'] = Conv2D(512, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv4_3')(net['conv4_2'])
net['pool4'] = MaxPooling2D((2, 2), strides=(2, 2), padding='same',
name='pool4')(net['conv4_3'])
# Block 5
# 19,19,512 -> 19,19,512
net['conv5_1'] = Conv2D(512, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv5_1')(net['pool4'])
net['conv5_2'] = Conv2D(512, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv5_2')(net['conv5_1'])
net['conv5_3'] = Conv2D(512, kernel_size=(3,3),
activation='relu',
padding='same',
name='conv5_3')(net['conv5_2'])
net['pool5'] = MaxPooling2D((3, 3), strides=(1, 1), padding='same',
name='pool5')(net['conv5_3'])
# FC6
# 19,19,512 -> 19,19,1024
net['fc6'] = Conv2D(1024, kernel_size=(3,3), dilation_rate=(6, 6),
activation='relu', padding='same',
name='fc6')(net['pool5'])
# x = Dropout(0.5, name='drop6')(x)
# FC7
# 19,19,1024 -> 19,19,1024
net['fc7'] = Conv2D(1024, kernel_size=(1,1), activation='relu',
padding='same', name='fc7')(net['fc6'])
# x = Dropout(0.5, name='drop7')(x)
# Block 6
# 19,19,512 -> 10,10,512
net['conv6_1'] = Conv2D(256, kernel_size=(1,1), activation='relu',
padding='same',
name='conv6_1')(net['fc7'])
net['conv6_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv6_padding')(net['conv6_1'])
net['conv6_2'] = Conv2D(512, kernel_size=(3,3), strides=(2, 2),
activation='relu',
name='conv6_2')(net['conv6_2'])
# Block 7
# 10,10,512 -> 5,5,256
net['conv7_1'] = Conv2D(128, kernel_size=(1,1), activation='relu',
padding='same',
name='conv7_1')(net['conv6_2'])
net['conv7_2'] = ZeroPadding2D(padding=((1, 1), (1, 1)), name='conv7_padding')(net['conv7_1'])
net['conv7_2'] = Conv2D(256, kernel_size=(3,3), strides=(2, 2),
activation='relu', padding='valid',
name='conv7_2')(net['conv7_2'])
# Block 8
# 5,5,256 -> 3,3,256
net['conv8_1'] = Conv2D(128, kernel_size=(1,1), activation='relu',
padding='same',
name='conv8_1')(net['conv7_2'])
net['conv8_2'] = Conv2D(256, kernel_size=(3,3), strides=(1, 1),
activation='relu', padding='valid',
name='conv8_2')(net['conv8_1'])
# Block 9
# 3,3,256 -> 1,1,256
net['conv9_1'] = Conv2D(128, kernel_size=(1,1), activation='relu',
padding='same',
name='conv9_1')(net['conv8_2'])
net['conv9_2'] = Conv2D(256, kernel_size=(3,3), strides=(1, 1),
activation='relu', padding='valid',
name='conv9_2')(net['conv9_1'])
#----------------------------主干特征提取网络结束---------------------------#
return net
if __name__ == "__main__":
from keras.layers import Input
input_tensor = Input(shape = [300,300,3])
net = VGG16(input_tensor)
for i in net:
print(net[i])
# print('\n')
'''
Tensor("input_1:0", shape=(?, 300, 300, 3), dtype=float32)
Tensor("conv1_1/Relu:0", shape=(?, 300, 300, 64), dtype=float32)
Tensor("conv1_2/Relu:0", shape=(?, 300, 300, 64), dtype=float32)
Tensor("pool1/MaxPool:0", shape=(?, 150, 150, 64), dtype=float32)
Tensor("conv2_1/Relu:0", shape=(?, 150, 150, 128), dtype=float32)
Tensor("conv2_2/Relu:0", shape=(?, 150, 150, 128), dtype=float32)
Tensor("pool2/MaxPool:0", shape=(?, 75, 75, 128), dtype=float32)
Tensor("conv3_1/Relu:0", shape=(?, 75, 75, 256), dtype=float32)
Tensor("conv3_2/Relu:0", shape=(?, 75, 75, 256), dtype=float32)
Tensor("conv3_3/Relu:0", shape=(?, 75, 75, 256), dtype=float32)
Tensor("pool3/MaxPool:0", shape=(?, 38, 38, 256), dtype=float32)
Tensor("conv4_1/Relu:0", shape=(?, 38, 38, 512), dtype=float32)
Tensor("conv4_2/Relu:0", shape=(?, 38, 38, 512), dtype=float32)
Tensor("conv4_3/Relu:0", shape=(?, 38, 38, 512), dtype=float32)
Tensor("pool4/MaxPool:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("conv5_1/Relu:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("conv5_2/Relu:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("conv5_3/Relu:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("pool5/MaxPool:0", shape=(?, 19, 19, 512), dtype=float32)
Tensor("fc6/Relu:0", shape=(?, 19, 19, 1024), dtype=float32)
Tensor("fc7/Relu:0", shape=(?, 19, 19, 1024), dtype=float32)
Tensor("conv6_1/Relu:0", shape=(?, 19, 19, 256), dtype=float32)
Tensor("conv6_2/Relu:0", shape=(?, 10, 10, 512), dtype=float32)
Tensor("conv7_1/Relu:0", shape=(?, 10, 10, 128), dtype=float32)
Tensor("conv7_2/Relu:0", shape=(?, 5, 5, 256), dtype=float32)
Tensor("conv8_1/Relu:0", shape=(?, 5, 5, 128), dtype=float32)
Tensor("conv8_2/Relu:0", shape=(?, 3, 3, 256), dtype=float32)
Tensor("conv9_1/Relu:0", shape=(?, 3, 3, 128), dtype=float32)
Tensor("conv9_2/Relu:0", shape=(?, 1, 1, 256), dtype=float32)
'''
2.1.3 SSD300代码
import keras.backend as K
from keras.layers import Activation
#from keras.layers import AtrousConvolution2D
from keras.layers import Conv2D
from keras.layers import Dense
from keras.layers import Flatten
from keras.layers import GlobalAveragePooling2D
from keras.layers import Input
from keras.layers import MaxPooling2D
from keras.layers import merge, concatenate
from keras.layers import Reshape
from keras.layers import ZeroPadding2D
from keras.models import Model
from nets.VGG16 import VGG16
from nets.ssd_layers import Normalize
from nets.ssd_layers import PriorBox
def SSD300(input_shape, num_classes=21):
# 300,300,3
input_tensor = Input(shape=input_shape)
img_size = (input_shape[1], input_shape[0])
# SSD结构,net字典
net = VGG16(input_tensor)
#-----------------------将提取到的主干特征进行处理---------------------------#
# 对conv4_3进行处理 38,38,512
net['conv4_3_norm'] = Normalize(20, name='conv4_3_norm')(net['conv4_3'])
num_priors = 4
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
net['conv4_3_norm_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same', name='conv4_3_norm_mbox_loc')(net['conv4_3_norm'])
net['conv4_3_norm_mbox_loc_flat'] = Flatten(name='conv4_3_norm_mbox_loc_flat')(net['conv4_3_norm_mbox_loc'])
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
net['conv4_3_norm_mbox_conf'] = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv4_3_norm_mbox_conf')(net['conv4_3_norm'])
net['conv4_3_norm_mbox_conf_flat'] = Flatten(name='conv4_3_norm_mbox_conf_flat')(net['conv4_3_norm_mbox_conf'])
priorbox = PriorBox(img_size, 30.0,max_size = 60.0, aspect_ratios=[2],
variances=[0.1, 0.1, 0.2, 0.2],
name='conv4_3_norm_mbox_priorbox')
net['conv4_3_norm_mbox_priorbox'] = priorbox(net['conv4_3_norm']) # prior_boxes_tensor.shape :TensorShape([Dimension(38), Dimension(5776), Dimension(8)])
# 对fc7层进行处理
num_priors = 6
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
net['fc7_mbox_loc'] = Conv2D(num_priors * 4, kernel_size=(3,3),padding='same',name='fc7_mbox_loc')(net['fc7'])
net['fc7_mbox_loc_flat'] = Flatten(name='fc7_mbox_loc_flat')(net['fc7_mbox_loc'])
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
net['fc7_mbox_conf'] = Conv2D(num_priors * num_classes, kernel_size=(3,3),padding='same',name='fc7_mbox_conf')(net['fc7'])
net['fc7_mbox_conf_flat'] = Flatten(name='fc7_mbox_conf_flat')(net['fc7_mbox_conf'])
priorbox = PriorBox(img_size, 60.0, max_size=111.0, aspect_ratios=[2, 3],
variances=[0.1, 0.1, 0.2, 0.2],
name='fc7_mbox_priorbox')
net['fc7_mbox_priorbox'] = priorbox(net['fc7'])
# 对conv6_2进行处理
num_priors = 6
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv6_2_mbox_loc')(net['conv6_2'])
net['conv6_2_mbox_loc'] = x
net['conv6_2_mbox_loc_flat'] = Flatten(name='conv6_2_mbox_loc_flat')(net['conv6_2_mbox_loc'])
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv6_2_mbox_conf')(net['conv6_2'])
net['conv6_2_mbox_conf'] = x
net['conv6_2_mbox_conf_flat'] = Flatten(name='conv6_2_mbox_conf_flat')(net['conv6_2_mbox_conf'])
priorbox = PriorBox(img_size, 111.0, max_size=162.0, aspect_ratios=[2, 3],
variances=[0.1, 0.1, 0.2, 0.2],
name='conv6_2_mbox_priorbox')
net['conv6_2_mbox_priorbox'] = priorbox(net['conv6_2'])
# 对conv7_2进行处理
num_priors = 6
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv7_2_mbox_loc')(net['conv7_2'])
net['conv7_2_mbox_loc'] = x
net['conv7_2_mbox_loc_flat'] = Flatten(name='conv7_2_mbox_loc_flat')(net['conv7_2_mbox_loc'])
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv7_2_mbox_conf')(net['conv7_2'])
net['conv7_2_mbox_conf'] = x
net['conv7_2_mbox_conf_flat'] = Flatten(name='conv7_2_mbox_conf_flat')(net['conv7_2_mbox_conf'])
priorbox = PriorBox(img_size, 162.0, max_size=213.0, aspect_ratios=[2, 3],
variances=[0.1, 0.1, 0.2, 0.2],
name='conv7_2_mbox_priorbox')
net['conv7_2_mbox_priorbox'] = priorbox(net['conv7_2'])
# 对conv8_2进行处理
num_priors = 4
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv8_2_mbox_loc')(net['conv8_2'])
net['conv8_2_mbox_loc'] = x
net['conv8_2_mbox_loc_flat'] = Flatten(name='conv8_2_mbox_loc_flat')(net['conv8_2_mbox_loc'])
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv8_2_mbox_conf')(net['conv8_2'])
net['conv8_2_mbox_conf'] = x
net['conv8_2_mbox_conf_flat'] = Flatten(name='conv8_2_mbox_conf_flat')(net['conv8_2_mbox_conf'])
priorbox = PriorBox(img_size, 213.0, max_size=264.0, aspect_ratios=[2],
variances=[0.1, 0.1, 0.2, 0.2],
name='conv8_2_mbox_priorbox')
net['conv8_2_mbox_priorbox'] = priorbox(net['conv8_2'])
# 对conv9_2进行处理
num_priors = 4
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv9_2_mbox_loc')(net['conv9_2'])
net['conv9_2_mbox_loc'] = x
net['conv9_2_mbox_loc_flat'] = Flatten(name='conv9_2_mbox_loc_flat')(net['conv9_2_mbox_loc'])
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv9_2_mbox_conf')(net['conv9_2'])
net['conv9_2_mbox_conf'] = x
net['conv9_2_mbox_conf_flat'] = Flatten(name='conv9_2_mbox_conf_flat')(net['conv9_2_mbox_conf'])
priorbox = PriorBox(img_size, 264.0, max_size=315.0, aspect_ratios=[2],
variances=[0.1, 0.1, 0.2, 0.2],
name='conv9_2_mbox_priorbox')
net['conv9_2_mbox_priorbox'] = priorbox(net['conv9_2'])
# 将所有结果进行堆叠
net['mbox_loc'] = concatenate([net['conv4_3_norm_mbox_loc_flat'],
net['fc7_mbox_loc_flat'],
net['conv6_2_mbox_loc_flat'],
net['conv7_2_mbox_loc_flat'],
net['conv8_2_mbox_loc_flat'],
net['conv9_2_mbox_loc_flat']],
axis=1, name='mbox_loc')
net['mbox_conf'] = concatenate([net['conv4_3_norm_mbox_conf_flat'],
net['fc7_mbox_conf_flat'],
net['conv6_2_mbox_conf_flat'],
net['conv7_2_mbox_conf_flat'],
net['conv8_2_mbox_conf_flat'],
net['conv9_2_mbox_conf_flat']],
axis=1, name='mbox_conf')
net['mbox_priorbox'] = concatenate([net['conv4_3_norm_mbox_priorbox'],
net['fc7_mbox_priorbox'],
net['conv6_2_mbox_priorbox'],
net['conv7_2_mbox_priorbox'],
net['conv8_2_mbox_priorbox'],
net['conv9_2_mbox_priorbox']],
axis=1, name='mbox_priorbox')
if hasattr(net['mbox_loc'], '_keras_shape'):
num_boxes = net['mbox_loc']._keras_shape[-1] // 4
elif hasattr(net['mbox_loc'], 'int_shape'):
num_boxes = K.int_shape(net['mbox_loc'])[-1] // 4 # 8732
# 8732,4
net['mbox_loc'] = Reshape((num_boxes, 4),name='mbox_loc_final')(net['mbox_loc'])
# 8732,21
net['mbox_conf'] = Reshape((num_boxes, num_classes),name='mbox_conf_logits')(net['mbox_conf'])
net['mbox_conf'] = Activation('softmax',name='mbox_conf_final')(net['mbox_conf'])
net['predictions'] = concatenate([net['mbox_loc'],
net['mbox_conf'],
net['mbox_priorbox']],
axis=2, name='predictions')
# predictions(Concatenate)(None, 8732, 33) 8732= 38**2*4+19**2*6+10**2*6+5**2*6+3**2*4+1**2*4
# print(net['predictions']) # 4+21+8=33 预测偏移+背景+类别+先验框x1y1x2y2+variances
# print(net['predictions'].shape) : (None, 8732, 33)
z=0
for i ,j in net.items():
print('{} {}: {}'.format(z,i,j.shape))
z+=1
model = Model(net['input'], net['predictions'])
return model
if __name__=='__main__':
model = SSD300((300,300,3), num_classes=21)
model.summary()
'''
/usr/local/bin/python3.6
Instructions for updating:
Colocations handled automatically by placer.
0 input: (?, 300, 300, 3)
1 conv1_1: (?, 300, 300, 64)
2 conv1_2: (?, 300, 300, 64)
3 pool1: (?, 150, 150, 64)
4 conv2_1: (?, 150, 150, 128)
5 conv2_2: (?, 150, 150, 128)
6 pool2: (?, 75, 75, 128)
7 conv3_1: (?, 75, 75, 256)
8 conv3_2: (?, 75, 75, 256)
9 conv3_3: (?, 75, 75, 256)
10 pool3: (?, 38, 38, 256)
11 conv4_1: (?, 38, 38, 512)
12 conv4_2: (?, 38, 38, 512)
13 conv4_3: (?, 38, 38, 512)
14 pool4: (?, 19, 19, 512)
15 conv5_1: (?, 19, 19, 512)
16 conv5_2: (?, 19, 19, 512)
17 conv5_3: (?, 19, 19, 512)
18 pool5: (?, 19, 19, 512)
19 fc6: (?, 19, 19, 1024)
20 fc7: (?, 19, 19, 1024)
21 conv6_1: (?, 19, 19, 256)
22 conv6_2: (?, 10, 10, 512)
23 conv7_1: (?, 10, 10, 128)
24 conv7_2: (?, 5, 5, 256)
25 conv8_1: (?, 5, 5, 128)
26 conv8_2: (?, 3, 3, 256)
27 conv9_1: (?, 3, 3, 128)
28 conv9_2: (?, 1, 1, 256)
29 conv4_3_norm: (?, 38, 38, 512)
30 conv4_3_norm_mbox_loc: (?, 38, 38, 16)
31 conv4_3_norm_mbox_loc_flat: (?, ?)
32 conv4_3_norm_mbox_conf: (?, 38, 38, 84)
33 conv4_3_norm_mbox_conf_flat: (?, ?)
34 conv4_3_norm_mbox_priorbox: (?, 5776, 8)
35 fc7_mbox_loc: (?, 19, 19, 24)
36 fc7_mbox_loc_flat: (?, ?)
37 fc7_mbox_conf: (?, 19, 19, 126)
38 fc7_mbox_conf_flat: (?, ?)
39 fc7_mbox_priorbox: (?, 2166, 8)
40 conv6_2_mbox_loc: (?, 10, 10, 24)
41 conv6_2_mbox_loc_flat: (?, ?)
42 conv6_2_mbox_conf: (?, 10, 10, 126)
43 conv6_2_mbox_conf_flat: (?, ?)
44 conv6_2_mbox_priorbox: (?, 600, 8)
45 conv7_2_mbox_loc: (?, 5, 5, 24)
46 conv7_2_mbox_loc_flat: (?, ?)
47 conv7_2_mbox_conf: (?, 5, 5, 126)
48 conv7_2_mbox_conf_flat: (?, ?)
49 conv7_2_mbox_priorbox: (?, 150, 8)
50 conv8_2_mbox_loc: (?, 3, 3, 16)
51 conv8_2_mbox_loc_flat: (?, ?)
52 conv8_2_mbox_conf: (?, 3, 3, 84)
53 conv8_2_mbox_conf_flat: (?, ?)
54 conv8_2_mbox_priorbox: (?, 36, 8)
55 conv9_2_mbox_loc: (?, 1, 1, 16)
56 conv9_2_mbox_loc_flat: (?, ?)
57 conv9_2_mbox_conf: (?, 1, 1, 84)
58 conv9_2_mbox_conf_flat: (?, ?)
59 conv9_2_mbox_priorbox: (?, 4, 8)
60 mbox_loc: (?, 8732, 4)
61 mbox_conf: (?, 8732, 21)
62 mbox_priorbox: (?, 8732, 8)
63 predictions: (?, 8732, 33)
__________________________________________________________________________________________________
Layer (type) Output Shape Param # Connected to
==================================================================================================
input_1 (InputLayer) (None, 300, 300, 3) 0
__________________________________________________________________________________________________
conv1_1 (Conv2D) (None, 300, 300, 64) 1792 input_1[0][0]
__________________________________________________________________________________________________
conv1_2 (Conv2D) (None, 300, 300, 64) 36928 conv1_1[0][0]
__________________________________________________________________________________________________
pool1 (MaxPooling2D) (None, 150, 150, 64) 0 conv1_2[0][0]
__________________________________________________________________________________________________
conv2_1 (Conv2D) (None, 150, 150, 128 73856 pool1[0][0]
__________________________________________________________________________________________________
conv2_2 (Conv2D) (None, 150, 150, 128 147584 conv2_1[0][0]
__________________________________________________________________________________________________
pool2 (MaxPooling2D) (None, 75, 75, 128) 0 conv2_2[0][0]
__________________________________________________________________________________________________
conv3_1 (Conv2D) (None, 75, 75, 256) 295168 pool2[0][0]
__________________________________________________________________________________________________
conv3_2 (Conv2D) (None, 75, 75, 256) 590080 conv3_1[0][0]
__________________________________________________________________________________________________
conv3_3 (Conv2D) (None, 75, 75, 256) 590080 conv3_2[0][0]
__________________________________________________________________________________________________
pool3 (MaxPooling2D) (None, 38, 38, 256) 0 conv3_3[0][0]
__________________________________________________________________________________________________
conv4_1 (Conv2D) (None, 38, 38, 512) 1180160 pool3[0][0]
__________________________________________________________________________________________________
conv4_2 (Conv2D) (None, 38, 38, 512) 2359808 conv4_1[0][0]
__________________________________________________________________________________________________
conv4_3 (Conv2D) (None, 38, 38, 512) 2359808 conv4_2[0][0]
__________________________________________________________________________________________________
pool4 (MaxPooling2D) (None, 19, 19, 512) 0 conv4_3[0][0]
__________________________________________________________________________________________________
conv5_1 (Conv2D) (None, 19, 19, 512) 2359808 pool4[0][0]
__________________________________________________________________________________________________
conv5_2 (Conv2D) (None, 19, 19, 512) 2359808 conv5_1[0][0]
__________________________________________________________________________________________________
conv5_3 (Conv2D) (None, 19, 19, 512) 2359808 conv5_2[0][0]
__________________________________________________________________________________________________
pool5 (MaxPooling2D) (None, 19, 19, 512) 0 conv5_3[0][0]
__________________________________________________________________________________________________
fc6 (Conv2D) (None, 19, 19, 1024) 4719616 pool5[0][0]
__________________________________________________________________________________________________
fc7 (Conv2D) (None, 19, 19, 1024) 1049600 fc6[0][0]
__________________________________________________________________________________________________
conv6_1 (Conv2D) (None, 19, 19, 256) 262400 fc7[0][0]
__________________________________________________________________________________________________
conv6_padding (ZeroPadding2D) (None, 21, 21, 256) 0 conv6_1[0][0]
__________________________________________________________________________________________________
conv6_2 (Conv2D) (None, 10, 10, 512) 1180160 conv6_padding[0][0]
__________________________________________________________________________________________________
conv7_1 (Conv2D) (None, 10, 10, 128) 65664 conv6_2[0][0]
__________________________________________________________________________________________________
conv7_padding (ZeroPadding2D) (None, 12, 12, 128) 0 conv7_1[0][0]
__________________________________________________________________________________________________
conv7_2 (Conv2D) (None, 5, 5, 256) 295168 conv7_padding[0][0]
__________________________________________________________________________________________________
conv8_1 (Conv2D) (None, 5, 5, 128) 32896 conv7_2[0][0]
__________________________________________________________________________________________________
conv8_2 (Conv2D) (None, 3, 3, 256) 295168 conv8_1[0][0]
__________________________________________________________________________________________________
conv9_1 (Conv2D) (None, 3, 3, 128) 32896 conv8_2[0][0]
__________________________________________________________________________________________________
conv4_3_norm (Normalize) (None, 38, 38, 512) 512 conv4_3[0][0]
__________________________________________________________________________________________________
conv9_2 (Conv2D) (None, 1, 1, 256) 295168 conv9_1[0][0]
__________________________________________________________________________________________________
conv4_3_norm_mbox_conf (Conv2D) (None, 38, 38, 84) 387156 conv4_3_norm[0][0]
__________________________________________________________________________________________________
fc7_mbox_conf (Conv2D) (None, 19, 19, 126) 1161342 fc7[0][0]
__________________________________________________________________________________________________
conv6_2_mbox_conf (Conv2D) (None, 10, 10, 126) 580734 conv6_2[0][0]
__________________________________________________________________________________________________
conv7_2_mbox_conf (Conv2D) (None, 5, 5, 126) 290430 conv7_2[0][0]
__________________________________________________________________________________________________
conv8_2_mbox_conf (Conv2D) (None, 3, 3, 84) 193620 conv8_2[0][0]
__________________________________________________________________________________________________
conv9_2_mbox_conf (Conv2D) (None, 1, 1, 84) 193620 conv9_2[0][0]
__________________________________________________________________________________________________
conv4_3_norm_mbox_loc (Conv2D) (None, 38, 38, 16) 73744 conv4_3_norm[0][0]
__________________________________________________________________________________________________
fc7_mbox_loc (Conv2D) (None, 19, 19, 24) 221208 fc7[0][0]
__________________________________________________________________________________________________
conv6_2_mbox_loc (Conv2D) (None, 10, 10, 24) 110616 conv6_2[0][0]
__________________________________________________________________________________________________
conv7_2_mbox_loc (Conv2D) (None, 5, 5, 24) 55320 conv7_2[0][0]
__________________________________________________________________________________________________
conv8_2_mbox_loc (Conv2D) (None, 3, 3, 16) 36880 conv8_2[0][0]
__________________________________________________________________________________________________
conv9_2_mbox_loc (Conv2D) (None, 1, 1, 16) 36880 conv9_2[0][0]
__________________________________________________________________________________________________
conv4_3_norm_mbox_conf_flat (Fl (None, 121296) 0 conv4_3_norm_mbox_conf[0][0]
__________________________________________________________________________________________________
fc7_mbox_conf_flat (Flatten) (None, 45486) 0 fc7_mbox_conf[0][0]
__________________________________________________________________________________________________
conv6_2_mbox_conf_flat (Flatten (None, 12600) 0 conv6_2_mbox_conf[0][0]
__________________________________________________________________________________________________
conv7_2_mbox_conf_flat (Flatten (None, 3150) 0 conv7_2_mbox_conf[0][0]
__________________________________________________________________________________________________
conv8_2_mbox_conf_flat (Flatten (None, 756) 0 conv8_2_mbox_conf[0][0]
__________________________________________________________________________________________________
conv9_2_mbox_conf_flat (Flatten (None, 84) 0 conv9_2_mbox_conf[0][0]
__________________________________________________________________________________________________
conv4_3_norm_mbox_loc_flat (Fla (None, 23104) 0 conv4_3_norm_mbox_loc[0][0]
__________________________________________________________________________________________________
fc7_mbox_loc_flat (Flatten) (None, 8664) 0 fc7_mbox_loc[0][0]
__________________________________________________________________________________________________
conv6_2_mbox_loc_flat (Flatten) (None, 2400) 0 conv6_2_mbox_loc[0][0]
__________________________________________________________________________________________________
conv7_2_mbox_loc_flat (Flatten) (None, 600) 0 conv7_2_mbox_loc[0][0]
__________________________________________________________________________________________________
conv8_2_mbox_loc_flat (Flatten) (None, 144) 0 conv8_2_mbox_loc[0][0]
__________________________________________________________________________________________________
conv9_2_mbox_loc_flat (Flatten) (None, 16) 0 conv9_2_mbox_loc[0][0]
__________________________________________________________________________________________________
mbox_conf (Concatenate) (None, 183372) 0 conv4_3_norm_mbox_conf_flat[0][0]
fc7_mbox_conf_flat[0][0]
conv6_2_mbox_conf_flat[0][0]
conv7_2_mbox_conf_flat[0][0]
conv8_2_mbox_conf_flat[0][0]
conv9_2_mbox_conf_flat[0][0]
__________________________________________________________________________________________________
mbox_loc (Concatenate) (None, 34928) 0 conv4_3_norm_mbox_loc_flat[0][0]
fc7_mbox_loc_flat[0][0]
conv6_2_mbox_loc_flat[0][0]
conv7_2_mbox_loc_flat[0][0]
conv8_2_mbox_loc_flat[0][0]
conv9_2_mbox_loc_flat[0][0]
__________________________________________________________________________________________________
mbox_conf_logits (Reshape) (None, 8732, 21) 0 mbox_conf[0][0]
__________________________________________________________________________________________________
conv4_3_norm_mbox_priorbox (Pri (None, 5776, 8) 0 conv4_3_norm[0][0]
__________________________________________________________________________________________________
fc7_mbox_priorbox (PriorBox) (None, 2166, 8) 0 fc7[0][0]
__________________________________________________________________________________________________
conv6_2_mbox_priorbox (PriorBox (None, 600, 8) 0 conv6_2[0][0]
__________________________________________________________________________________________________
conv7_2_mbox_priorbox (PriorBox (None, 150, 8) 0 conv7_2[0][0]
__________________________________________________________________________________________________
conv8_2_mbox_priorbox (PriorBox (None, 36, 8) 0 conv8_2[0][0]
__________________________________________________________________________________________________
conv9_2_mbox_priorbox (PriorBox (None, 4, 8) 0 conv9_2[0][0]
__________________________________________________________________________________________________
mbox_loc_final (Reshape) (None, 8732, 4) 0 mbox_loc[0][0]
__________________________________________________________________________________________________
mbox_conf_final (Activation) (None, 8732, 21) 0 mbox_conf_logits[0][0]
__________________________________________________________________________________________________
mbox_priorbox (Concatenate) (None, 8732, 8) 0 conv4_3_norm_mbox_priorbox[0][0]
fc7_mbox_priorbox[0][0]
conv6_2_mbox_priorbox[0][0]
conv7_2_mbox_priorbox[0][0]
conv8_2_mbox_priorbox[0][0]
conv9_2_mbox_priorbox[0][0]
__________________________________________________________________________________________________
predictions (Concatenate) (None, 8732, 33) 0 mbox_loc_final[0][0]
mbox_conf_final[0][0]
mbox_priorbox[0][0]
==================================================================================================
Total params: 26,285,486
Trainable params: 26,285,486
Non-trainable params: 0
__________________________________________________________________________________________________
Process finished with exit code 0
'''
2.2 对特征层分别分类、回归、生成先验框
2.2.1 对特征层分别分类、回归、生成先验框
- 生成框的偏移
- 生成类别
- 生成先验框
2.2.2 对特征层分别分类和回归代码
回归:生成(num_priors * 4)个通道。
# 对conv9_2进行处理
num_priors = 4
# 预测框的处理
# num_priors表示每个网格点先验框的数量,4是x,y,h,w的调整
x = Conv2D(num_priors * 4, kernel_size=(3,3), padding='same',name='conv9_2_mbox_loc')(net['conv9_2'])
net['conv9_2_mbox_loc'] = x
net['conv9_2_mbox_loc_flat'] = Flatten(name='conv9_2_mbox_loc_flat')(net['conv9_2_mbox_loc'])
分类:生成(num_priors * num_classes)个通道。
# num_priors表示每个网格点先验框的数量,num_classes是所分的类
x = Conv2D(num_priors * num_classes, kernel_size=(3,3), padding='same',name='conv9_2_mbox_conf')(net['conv9_2'])
net['conv9_2_mbox_conf'] = x
net['conv9_2_mbox_conf_flat'] = Flatten(name='conv9_2_mbox_conf_flat')(net['conv9_2_mbox_conf'])
先验框:
- 生成先验框的宽高
- 生成先验框的中心点
- 获得先验框的左上角和右下角
priorbox = PriorBox(img_size, 264.0, max_size=315.0, aspect_ratios=[2],
variances=[0.1, 0.1, 0.2, 0.2],
name='conv9_2_mbox_priorbox')
net['conv9_2_mbox_priorbox'] = priorbox(net['conv9_2'])
# PriorBox(img_size, min_size, max_size=None, aspect_ratios=None,
flip=True, variances=[0.1], clip=True, **kwargs) 详解
def call(self, x, mask=None):
if hasattr(x, '_keras_shape'):
input_shape = x._keras_shape
elif hasattr(K, 'int_shape'):
input_shape = K.int_shape(x)
# ------------------ #
# 获取宽和高
# ------------------ #
layer_width = input_shape[self.waxis]
layer_height = input_shape[self.haxis]
img_width = self.img_size[0]
img_height = self.img_size[1]
box_widths = []
box_heights = []
for ar in self.aspect_ratios:
if ar == 1 and len(box_widths) == 0:
box_widths.append(self.min_size)
box_heights.append(self.min_size)
elif ar == 1 and len(box_widths) > 0:
box_widths.append(np.sqrt(self.min_size * self.max_size))
box_heights.append(np.sqrt(self.min_size * self.max_size))
elif ar != 1:
box_widths.append(self.min_size * np.sqrt(ar))
box_heights.append(self.min_size / np.sqrt(ar))
box_widths = 0.5 * np.array(box_widths)
box_heights = 0.5 * np.array(box_heights)
step_x = img_width / layer_width
step_y = img_height / layer_height
linx = np.linspace(0.5 * step_x, img_width - 0.5 * step_x,
layer_width)
liny = np.linspace(0.5 * step_y, img_height - 0.5 * step_y,
layer_height)
centers_x, centers_y = np.meshgrid(linx, liny)
centers_x = centers_x.reshape(-1, 1)
centers_y = centers_y.reshape(-1, 1)
num_priors_ = len(self.aspect_ratios)
# 每一个先验框需要两个(centers_x, centers_y),前一个用来计算左上角,后一个计算右下角
prior_boxes = np.concatenate((centers_x, centers_y), axis=1)
prior_boxes = np.tile(prior_boxes, (1, 2 * num_priors_))
# 获得先验框的左上角和右下角
prior_boxes[:, ::4] -= box_widths
prior_boxes[:, 1::4] -= box_heights
prior_boxes[:, 2::4] += box_widths
prior_boxes[:, 3::4] += box_heights
# 变成小数的形式
prior_boxes[:, ::2] /= img_width
prior_boxes[:, 1::2] /= img_height
prior_boxes = prior_boxes.reshape(-1, 4)
prior_boxes = np.minimum(np.maximum(prior_boxes, 0.0), 1.0)
num_boxes = len(prior_boxes)
if len(self.variances) == 1:
variances = np.ones((num_boxes, 4)) * self.variances[0]
elif len(self.variances) == 4:
variances = np.tile(self.variances, (num_boxes, 1))
else:
raise Exception('Must provide one or four variances.')
prior_boxes = np.concatenate((prior_boxes, variances), axis=1)
prior_boxes_tensor = K.expand_dims(K.variable(prior_boxes), 0)
pattern = [tf.shape(x)[0], 1, 1]
prior_boxes_tensor = tf.tile(prior_boxes_tensor, pattern)
return prior_boxes_tensor
3 制作标签
y = bbox_util.assign_boxes(y) [box_Num,4+cls]–> [8732,4+1+cls+8]
3.1 流程
一张图片上有多个真实框,对每一个真实框找一个先验框预测
3.2 代码步骤
- 对每一个真实框根据IoU编码,得到encoded_box[box_num,8732,4+1(iou)]
- 对encoded_box筛选。先对0坐标轴根据IoU取最大值及对应的下标;再上一步的结果选取IoU>0的先验框及下标;然后根据上一步的得到每个真实框对应的先验框。OK,完结撒花!
3.3 代码
def assign_boxes(self, boxes): # boxes.shape [-1,框+类别] 筛选框,得到 y_true ; y = self.bbox_util.assign_boxes(y)
assignment = np.zeros((self.num_priors, 4 + self.num_classes + 8)) # assignment.shape (8732, 33) y.shape=(7, 24)
assignment[:, 4] = 1.0 # 背景的概率
if len(boxes) == 0:
return assignment
# 对每一个真实框都进行iou计算 encoded_boxes.shape = (7, 43660) 7 是图片有7个框,43660 = 8732*5
encoded_boxes = np.apply_along_axis(self.encode_box, 1, boxes[:, :4]) # 找到框并编码[ num_priors , 4 + 1 ]
# 每一个真实框的编码后的值,和iou encoded_boxes.shape = (7, 43660)
encoded_boxes = encoded_boxes.reshape(-1, self.num_priors, 5) # encoded_boxes.shape = (7, 8732, 5)
# 一个先验框可以匹配多个真实框,但是一个先验框只能拟合一个真实框,所以找出先验框最匹配的真实框,
best_iou = encoded_boxes[:, :, -1].max(axis=0) # encoded_boxes[:, :, -1].shape :(7, 8732) # best_iou .shape = (8732,)
best_iou_idx = encoded_boxes[:, :, -1].argmax(axis=0) # 取每个先验框对应iou最大的值 , (8732,) 每个先验框对应真实框的坐标
best_iou_mask = best_iou > 0 # 取iou大于零的框的小标
best_iou_idx = best_iou_idx[best_iou_mask] # 取iou大于零的框 ; best_iou_idx.shape = (64,)
assign_num = len(best_iou_idx) # 可以用来预测先验框的个数 ; assign_num = 64
# 保留重合程度最大的先验框的应该有的预测结果
encoded_boxes = encoded_boxes[:, best_iou_mask, :] # encoded_boxes.shape = (7, 64, 5)
assignment[:, :4][best_iou_mask] = encoded_boxes[best_iou_idx , np.arange(assign_num),:4] # 偏移
# 4代表为背景的概率,为0
assignment[:, 4][best_iou_mask] = 0 # 背 景
assignment[:, 5:-8][best_iou_mask] = boxes[best_iou_idx, 4:] # 类 别
assignment[:, -8][best_iou_mask] = 1 # 代表有物体?为什么有8,因为y_pre也有8吗?
# 通过assign_boxes我们就获得了,输入进来的这张图片,应该有的预测结果是什么样子的
return assignment # assignment.shape = (8732, 33) 33 = 4 + 21 + 8
4 预测
4.1 预测流程
输入图片地址、检测、显示.
4.1.1 预测代码步骤
- 实例化主干网络
- 读入图片
- 检测图片的目标并显示
4.1.2 预测代码
predict.py
from ssd import SSD
from PIL import Image
ssd = SSD()
while True:
img = input('Picture path:')
try:
image = Image.open(img)
except:
print('Open Error! Try again!')
continue
else:
r_image = ssd.detect_image(image)
r_image.show()
ssd.close_sesstion()
4.2 检测目标
r_image = ssd.detect_image(image)
4.2.1 检测步骤
- 图片加入灰条,使得所有输入图片大小一样。
- 图片预处理,归一化、预测
- 将预测结果进行解码–>筛选–>nms–>选出top_k
- 筛选出其中得分高于confidence的框 ,results[label,conf,det_xmin, det_ymin, det_xmax, det_ymax]
- 去掉灰条
- 画出物体的框并表明心迹类别
4.2.2 检测代码
def detect_image(self, image):
image_shape = np.array(np.shape(image)[0:2]) # 图片尺寸
crop_img, x_offset, y_offset = letterbox_image(image, (self.model_image_size[0], self.model_image_size[1])) # 加入灰条
photo = np.array(crop_img, dtype=np.float64) # photo.shape =(300,300,3)
# 图片预处理,归一化、预测
photo = preprocess_input(np.reshape(photo, [1, self.model_image_size[0], self.model_image_size[1], 3]))
preds = self.ssd_model.predict(photo) # predictions(Concatenate)(None, 8732, 33) 4+21+8=33
# 将预测结果进行解码-->筛选-->nms-->选出top_k
results = self.bbox_util.detection_out(preds, confidence_threshold=self.confidence)
if len(results[0]) <= 0:
return image
# 筛选出其中得分高于confidence的框 ,results[label,conf,det_xmin, det_ymin, det_xmax, det_ymax]
det_label = results[0][:, 0]
det_conf = results[0][:, 1]
det_xmin, det_ymin, det_xmax, det_ymax = results[0][:, 2], results[0][:, 3], results[0][:, 4], results[0][:, 5]
top_indices = [i for i, conf in enumerate(det_conf) if conf >= self.confidence]
top_conf = det_conf[top_indices]
top_label_indices = det_label[top_indices].tolist()
top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(det_xmin[top_indices], -1), np.expand_dims(
det_ymin[top_indices], -1), np.expand_dims(det_xmax[top_indices], -1), np.expand_dims(det_ymax[top_indices],
-1)
# 去掉灰条
boxes = ssd_correct_boxes(top_ymin, top_xmin, top_ymax, top_xmax, # [200,4]
np.array([self.model_image_size[0], self.model_image_size[1]]), image_shape)
font = ImageFont.truetype(font='model_data/simhei.ttf',
size=np.floor(3e-2 * np.shape(image)[1] + 0.5).astype('int32'))
thickness = (np.shape(image)[0] + np.shape(image)[1]) // self.model_image_size[0]
for i, c in enumerate(top_label_indices): # [2.0, 15.0, 15.0, 15.0, 7.0]
predicted_class = self.class_names[int(c) - 1]
score = top_conf[i]
top, left, bottom, right = boxes[i] # np.shape(image)=(1330, 1330, 3)
top = top - 5 #
left = left - 5
bottom = bottom + 5
right = right + 5
top = max(0, np.floor(top + 0.5).astype('int32')) # 向上取整,让框在image之内
left = max(0, np.floor(left + 0.5).astype('int32'))
bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))
# 画框框
label = '{} {:.2f}'.format(predicted_class, score)
draw = ImageDraw.Draw(image)
label_size = draw.textsize(label, font)
label = label.encode('utf-8')
print(label)
# label在框中的位置
if top - label_size[1] >= 0:
text_origin = np.array([left, top - label_size[1]]) # xy
else:
text_origin = np.array([left, top + 1])
for i in range(thickness):
draw.rectangle(
[left + i, top + i, right - i, bottom - i],
outline=self.colors[int(c) - 1]) # 画框
draw.rectangle(
[tuple(text_origin), tuple(text_origin + label_size)],
fill=self.colors[int(c) - 1]) # 画label
draw.text(text_origin, str(label, 'UTF-8'), fill=(0, 0, 0), font=font) # 写文字
del draw
return image
4.2.3 图像加入灰条
通过图像加入灰条,让所有输入图像统一到一样的尺寸,又保持原有宽高比例而不失真。
- 计算原图片尺寸和目标尺寸的比例
- 选取最小的比例对原图变形
- 把变换后的图片粘贴在灰度图上
'''
crop_img, x_offset, y_offset = letterbox_image(image, (self.model_image_size[0], self.model_image_size[1]))
'''
import numpy as np
from PIL import Image
def letterbox_image(image, size):
ih, iw = image.size
h, w = size
scale = min(h/ih,w/iw)
nw = int(iw*scale)
nh = int(ih*scale)
image = image.resize((nh,nw),Image.BICUBIC)
new_img = Image.new('RGB',size,(128,128,128))
new_img.paste(image,((h-nh)//2,(w-nw)//2))
x_offset,y_offset = (w-nw)//2/300,(h-nh)//2/300
return new_img,x_offset,y_offset
if __name__ == '__main__':
image = np.random.randint(0,256,[10,10],dtype=np.uint8)
new_img,x_offset,y_offset = letterbox_image(image, size=[15,15])
print(new_img.shape)
print(x_offset,y_offset)
4.2.4 将预测结果进行解码–>筛选–>nms–>选出top_k
步骤:
- 解码
- 筛选
- nms
- 按照置信度进行排序,选出top_k
# results = self.bbox_util.detection_out(preds, confidence_threshold=self.confidence)
def detection_out(self, predictions, background_label_id=0, keep_top_k=200,
confidence_threshold=0.5):
# 网络预测的结果 [4+1+20+4+4] 预测偏移 + 置信度 + 类别 + 先验框 + variance[0.1,0.1,0.2,0.2]
mbox_loc = predictions[:, :, :4] # (1, 8732, 4)
# 0.1,0.1,0.2,0.2
variances = predictions[:, :, -4:] # (1, 8732, 4)
# 先验框
mbox_priorbox = predictions[:, :, -8:-4] # (1, 8732, 4)
# 置信度
mbox_conf = predictions[:, :, 4:-8] # (1, 8732, 21)
results = []
# 处理每张图片
for i in range(len(mbox_loc)):
results.append([])
### 1. 解码
decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox[i], variances[i]) # 解码 decode_bbox.shape: (8732, 4)
for c in range(self.num_classes):
if c == background_label_id: # index为零的是背景
continue
c_confs = mbox_conf[i, :, c]
c_confs_m = c_confs > confidence_threshold
if len(c_confs[c_confs_m]) > 0:
# 取出得分高于confidence_threshold的框
boxes_to_process = decode_bbox[c_confs_m]
confs_to_process = c_confs[c_confs_m]
# 进行iou的非极大抑制
feed_dict = {self.boxes: boxes_to_process,
self.scores: confs_to_process}
idx = self.sess.run(self.nms, feed_dict=feed_dict)
# 取出在非极大抑制中效果较好的框
good_boxes = boxes_to_process[idx]
confs = confs_to_process[idx][:, None] # 变成行列的形式
# 将label、置信度、框的位置进行堆叠。
labels = c * np.ones((len(idx), 1)) # c 是类别对应的数字
c_pred = np.concatenate((labels, confs, good_boxes),
axis=1)
# 添加进result里
results[-1].extend(c_pred)
if len(results[-1]) > 0:
# 按照置信度进行排序
results[-1] = np.array(results[-1])
argsort = np.argsort(results[-1][:, 1])[::-1] # 按照概率排序
results[-1] = results[-1][argsort]
# 选出置信度最大的keep_top_k个
results[-1] = results[-1][:keep_top_k]
return results
(1) 解码:
- 获得先验框的宽与高、中心点
- 预测框中心点、宽与高
- 预测框的左上角与右下角
'''
解码
decode_bbox = self.decode_boxes(mbox_loc[i], mbox_priorbox[i], variances[i]) # 解码 decode_bbox.shape: (8732, 4)
'''
def decode_boxes(self, mbox_loc, mbox_priorbox, variances):
# 1 获得先验框的宽与高 x1,y1,x2,y2 --> cx,cy,h,w
prior_width = mbox_priorbox[:, 2] - mbox_priorbox[:, 0]
prior_height = mbox_priorbox[:, 3] - mbox_priorbox[:, 1]
# 获得先验框的中心点
prior_center_x = 0.5 * (mbox_priorbox[:, 2] + mbox_priorbox[:, 0])
prior_center_y = 0.5 * (mbox_priorbox[:, 3] + mbox_priorbox[:, 1])
# 2 预测框距离先验框中心的xy轴偏移情况
decode_bbox_center_x = mbox_loc[:, 0] * prior_width * variances[:, 0]
decode_bbox_center_x += prior_center_x
decode_bbox_center_y = mbox_loc[:, 1] * prior_height * variances[:, 1]
decode_bbox_center_y += prior_center_y
# 预测框的宽与高的求取
decode_bbox_width = np.exp(mbox_loc[:, 2] * variances[:, 2])
decode_bbox_width *= prior_width
decode_bbox_height = np.exp(mbox_loc[:, 3] * variances[:, 3])
decode_bbox_height *= prior_height
# 3 获取预测框的左上角与右下角
decode_bbox_xmin = decode_bbox_center_x - 0.5 * decode_bbox_width
decode_bbox_ymin = decode_bbox_center_y - 0.5 * decode_bbox_height
decode_bbox_xmax = decode_bbox_center_x + 0.5 * decode_bbox_width
decode_bbox_ymax = decode_bbox_center_y + 0.5 * decode_bbox_height
# 预测框的左上角与右下角进行堆叠
decode_bbox = np.concatenate((decode_bbox_xmin[:, None],
decode_bbox_ymin[:, None],
decode_bbox_xmax[:, None],
decode_bbox_ymax[:, None]), axis=-1)
# 防止超出0与1
decode_bbox = np.minimum(np.maximum(decode_bbox, 0.0), 1.0)
return decode_bbox
(2) 筛选
'''
c_confs_m = c_confs > confidence_threshold
if len(c_confs[c_confs_m]) > 0:
# 取出得分高于confidence_threshold的框
boxes_to_process = decode_bbox[c_confs_m]
confs_to_process = c_confs[c_confs_m]
'''
(3) nms
'''
feed_dict = {self.boxes: boxes_to_process,
self.scores: confs_to_process}
idx = self.sess.run(self.nms, feed_dict=feed_dict)
'''
(4) 按照置信度进行排序,选出top_k
'''
results[-1].extend(c_pred)
if len(results[-1]) > 0:
# 按照置信度进行排序
results[-1] = np.array(results[-1])
argsort = np.argsort(results[-1][:, 1])[::-1] # 按照概率排序
results[-1] = results[-1][argsort]
# 选出置信度最大的keep_top_k个
results[-1] = results[-1][:keep_top_k]
'''
4.2.5 筛选出其中得分高于confidence的框
det_label = results[0][:, 0]
det_conf = results[0][:, 1]
det_xmin, det_ymin, det_xmax, det_ymax = results[0][:, 2], results[0][:, 3], results[0][:, 4], results[0][:, 5]
top_indices = [i for i, conf in enumerate(det_conf) if conf >= self.confidence]
top_conf = det_conf[top_indices]
top_label_indices = det_label[top_indices].tolist()
top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(det_xmin[top_indices], -1), np.expand_dims(
det_ymin[top_indices], -1), np.expand_dims(det_xmax[top_indices], -1), np.expand_dims(det_ymax[top_indices], -1)
4.2.6 去掉灰条
- 计算offset、scale、框的左上角和右下角坐标转换为中心点和宽高
- 带入公式box_yx = (box_yx - offset) * scale、box_hw *= scale
- 框的中心点和宽高转换为左上角和右下角坐标
- 把框映射到原图
'''
boxes = ssd_correct_boxes(top_ymin, top_xmin, top_ymax, top_xmax, # [200,4]
np.array([self.model_image_size[0], self.model_image_size[1]]), image_shape)
'''
def ssd_correct_boxes(top, left, bottom, right, input_shape, image_shape):
new_shape = image_shape*np.min(input_shape/image_shape)
offset = (input_shape-new_shape)/2./input_shape
scale = input_shape/new_shape
box_yx = np.concatenate(((top+bottom)/2,(left+right)/2),axis=-1)
box_hw = np.concatenate((bottom-top,right-left),axis=-1)
box_yx = (box_yx - offset) * scale
box_hw *= scale
box_mins = box_yx - (box_hw / 2.)
box_maxes = box_yx + (box_hw / 2.)
boxes = np.concatenate([
box_mins[:, 0:1],
box_mins[:, 1:2],
box_maxes[:, 0:1],
box_maxes[:, 1:2]
],axis=-1)
print(np.shape(boxes))
boxes *= np.concatenate([image_shape, image_shape],axis=-1)
return boxes
4.2.7 画框
for i, c in enumerate(top_label_indices): # [2.0, 15.0, 15.0, 15.0, 7.0]
predicted_class = self.class_names[int(c) - 1]
score = top_conf[i]
top, left, bottom, right = boxes[i] # np.shape(image)=(1330, 1330, 3)
top = top - 5 #
left = left - 5
bottom = bottom + 5
right = right + 5
top = max(0, np.floor(top + 0.5).astype('int32')) # 向上取整,让框在image之内
left = max(0, np.floor(left + 0.5).astype('int32'))
bottom = min(np.shape(image)[0], np.floor(bottom + 0.5).astype('int32'))
right = min(np.shape(image)[1], np.floor(right + 0.5).astype('int32'))
# 画框框
label = '{} {:.2f}'.format(predicted_class, score)
draw = ImageDraw.Draw(image)
label_size = draw.textsize(label, font)
label = label.encode('utf-8')
print(label)
# label在框中的位置
if top - label_size[1] >= 0:
text_origin = np.array([left, top - label_size[1]]) # xy
else:
text_origin = np.array([left, top + 1])
for i in range(thickness):
draw.rectangle(
[left + i, top + i, right - i, bottom - i],
outline=self.colors[int(c) - 1]) # 画框
draw.rectangle(
[tuple(text_origin), tuple(text_origin + label_size)],
fill=self.colors[int(c) - 1]) # 画label
draw.text(text_origin, str(label, 'UTF-8'), fill=(0, 0, 0), font=font) # 写文字
del draw
return image
5 训练
5.1 流程
- 生成标签。y = bbox_util.assign_boxes(y) # y[ 框 + cls ]
- 加载模型。model = SSD300(input_shape, num_classes=NUM_CLASSES)
- 设置训练参数。logging + checkpoint + reduce_lr + early_stopping
- 数据生成器。gen.generate(True)
- 训练。model.fit_generator() + loss = MultiboxLoss()
5.1.1 bbox_util代码步骤
参考## 3 制作标签
5.1.2 loss代码步骤
- SmothL1损失
- 交叉熵损失
- 正样本的回归损失和分类损失
5.1.3 loss 代码
def compute_loss(self, y_true, y_pred):
batch_size = tf.shape(y_true)[0] # 输入图片的数量
num_boxes = tf.to_float(tf.shape(y_true)[1]) # 每个图片先验框的数量 8732
# 计算所有的loss
# 分类的loss
# batch_size,8732,4(gt)+1(bg)+21(cls)+4(anchor)+4(variance) -> batch_size,8732
conf_loss = self._softmax_loss(y_true[:, :, 4:-8],
y_pred[:, :, 4:-8])
# 框的位置的loss
# batch_size,8732,4 -> batch_size,8732
loc_loss = self._l1_smooth_loss(y_true[:, :, :4],
y_pred[:, :, :4])
# 获取所有的正标签的loss
# 每一张图的pos的个数 y_true.shape[6,8,1]; num_pos = array([5., 3., 1., 2., 2., 4.])
num_pos = tf.reduce_sum(y_true[:, :, -8], axis=-1) # 64 [batch_size,64]
# 每一张图的pos_loc_loss
pos_loc_loss = tf.reduce_sum(loc_loss * y_true[:, :, -8],
axis=1)
# 每一张图的pos_conf_loss
pos_conf_loss = tf.reduce_sum(conf_loss * y_true[:, :, -8],
axis=1)
# 获取每张图片有的负样本数 一定的负样本 neg_pos_ratio * num_pos = 192.0 [batch_size,64]
num_neg = tf.minimum(self.neg_pos_ratio * num_pos, # num_boxes - num_pos =8668.0 = 8732.0-64.0
num_boxes - num_pos) # num_boxes = tf.to_float(tf.shape(y_true)[1])
# 找到了哪些值是大于0的 ; array([ True, True, True, True, True, True])
pos_num_neg_mask = tf.greater(num_neg, 0) # 判断哪些图片有负样本 # return boolean : True
# 获得一个1.0,判断哪些值大于零
has_min = tf.to_float(tf.reduce_any(pos_num_neg_mask)) # has_min = 1.0
num_neg = tf.concat( axis=0,values=[num_neg, # 如果不存在负样本,就设置负样本的数量
[(1 - has_min) * self.negatives_for_hard]]) # array([192, 0])
# 求平均每个图片要取多少个负样本
num_neg_batch = tf.reduce_mean(tf.boolean_mask(num_neg, # num_neg_batch = 192
tf.greater(num_neg, 0))) # tf.greater(num_neg, 0)=array([ True, False])
num_neg_batch = tf.to_int32(num_neg_batch)
# conf的起始[5:-8]
confs_start = 4 + self.background_label_id + 1 # confs_start = 5
# conf的结束
confs_end = confs_start + self.num_classes - 1 # confs_end = 25
# 找到实际上在该位置不应该有预测结果的框,求他们最大的置信度。取top_k个置信度,作为负样本
max_confs = tf.reduce_max(y_pred[:, :, confs_start:confs_end],
axis=2)
_, indices = tf.nn.top_k(max_confs * (1 - y_true[:, :, -8]), # indices.shape=(?, 192)
k=num_neg_batch) # num_neg_batch = 192 ; indices.shape=(?, 192)
# 找到其在1维上的索引 ??? ??? batch_size = 1
batch_idx = tf.expand_dims(tf.range(0, batch_size), 1) # batch_idx.shape = (32, 1)
batch_idx = tf.tile(batch_idx, (1, num_neg_batch)) # batch_idx.shape = (32, 33)
full_indices = (tf.reshape(batch_idx, [-1]) * tf.to_int32(num_boxes) + # num_boxes=8732
tf.reshape(indices, [-1])) # (8732, 33)
# full_indices = tf.concat(2, [tf.expand_dims(batch_idx, 2),
# tf.expand_dims(indices, 2)])
# neg_conf_loss = tf.gather_nd(conf_loss, full_indices)
neg_conf_loss = tf.gather(tf.reshape(conf_loss, [-1]),
full_indices)
neg_conf_loss = tf.reshape(neg_conf_loss,
[batch_size, num_neg_batch])
neg_conf_loss = tf.reduce_sum(neg_conf_loss, axis=1)
# loss is sum of positives and negatives
num_pos = tf.where(tf.not_equal(num_pos, 0), num_pos, # num_pos = 64
tf.ones_like(num_pos))
total_loss = tf.reduce_sum(pos_conf_loss) + tf.reduce_sum(neg_conf_loss)
total_loss /= tf.reduce_sum(num_pos)
total_loss += tf.reduce_sum(self.alpha * pos_loc_loss) / tf.reduce_sum(num_pos)
return total_loss
5.2.1 gen.generate(True)代码步骤
- 数据增强
- 先验框与真实框配对
5.2.2 gen.generate(True)代码
def generate(self, train=True):
while True:
if train:
# 打乱数据顺序
shuffle(self.train_lines)
lines = self.train_lines
else:
shuffle(self.val_lines)
lines = self.val_lines
inputs = []
targets = []
for annotation_line in lines: # img,y = image_data, box_data
img,y=self.get_random_data(annotation_line,self.image_size[0:2]) # y.shap3[4+cls],数据增强操作
if len(y)!=0:
boxes = np.array(y[:,:4],dtype=np.float32)
boxes[:,0] = boxes[:,0]/self.image_size[1]
boxes[:,1] = boxes[:,1]/self.image_size[0]
boxes[:,2] = boxes[:,2]/self.image_size[1]
boxes[:,3] = boxes[:,3]/self.image_size[0]
one_hot_label = np.eye(self.num_classes)[np.array(y[:,4],np.int32)] # 不包含背景
if ((boxes[:,3]-boxes[:,1])<=0).any() and ((boxes[:,2]-boxes[:,0])<=0).any():
continue
y = np.concatenate([boxes,one_hot_label],axis=-1) # y[4+20(one_hot)]
# 编码操作 y[ 框 + cls ]
y = self.bbox_util.assign_boxes(y) # 先根据IoU找出框,编码,再找出最符合条件的框, 制作成标签y_true[4+1+cls+8]
inputs.append(img)
targets.append(y)
if len(targets) == self.batch_size: # 如果有数据
tmp_inp = np.array(inputs)
tmp_targets = np.array(targets)
inputs = []
targets = []
yield preprocess_input(tmp_inp), tmp_targets
6 模型评估
6.1 流程
- 获得预测框
- 获得真实框
- 计算m_AP
6.1.1 获得预测框代码步骤
- 图片预处理,归一化
- 将预测结果进行解码
- 筛选出其中得分高于confidence的框
- 去掉灰条
6.1.2 获得真实框代码步骤
- 读入真实框的数据
6.1.3 计算m_AP代码步骤
- 根据IoU计算precision和recall
- 计算P-R曲线的面积
6.1.4 get_dr_txt.py 代码
#----------------------------------------------------#
#----------------------------------------------------#
from keras.layers import Input
from ssd import SSD
from PIL import Image
from keras.applications.imagenet_utils import preprocess_input
from utils_.utils import BBoxUtility,letterbox_image,ssd_correct_boxes
import numpy as np
import os
class mAP_SSD(SSD):
#---------------------------------------------------#
# 检测图片
#---------------------------------------------------#
def detect_image(self,image_id,image):
self.confidence = 0.05
f = open("./input/detection-results/"+image_id+".txt","w")
image_shape = np.array(np.shape(image)[0:2])
crop_img,x_offset,y_offset = letterbox_image(image, (self.model_image_size[0],self.model_image_size[1]))
photo = np.array(crop_img,dtype = np.float64)
# 图片预处理,归一化
photo = preprocess_input(np.reshape(photo,[1,self.model_image_size[0],self.model_image_size[1],3]))
preds = self.ssd_model.predict(photo)
# 将预测结果进行解码
results = self.bbox_util.detection_out(preds, confidence_threshold=self.confidence)
if len(results[0])<=0:
f.close()
return
# 筛选出其中得分高于confidence的框
det_label = results[0][:, 0]
det_conf = results[0][:, 1]
det_xmin, det_ymin, det_xmax, det_ymax = results[0][:, 2], results[0][:, 3], results[0][:, 4], results[0][:, 5]
top_indices = [i for i, conf in enumerate(det_conf) if conf >= self.confidence]
top_conf = det_conf[top_indices]
top_label_indices = det_label[top_indices].tolist()
top_xmin, top_ymin, top_xmax, top_ymax = np.expand_dims(det_xmin[top_indices],-1),np.expand_dims(det_ymin[top_indices],-1),np.expand_dims(det_xmax[top_indices],-1),np.expand_dims(det_ymax[top_indices],-1)
# 去掉灰条
boxes = ssd_correct_boxes(top_ymin,top_xmin,top_ymax,top_xmax,np.array([self.model_image_size[0],self.model_image_size[1]]),image_shape)
for i, c in enumerate(top_label_indices):
predicted_class = self.class_names[int(c)-1]
score = str(top_conf[i])
top, left, bottom, right = boxes[i]
f.write("%s %s %s %s %s %s\n" % (predicted_class, score[:6], str(int(left)), str(int(top)), str(int(right)),str(int(bottom))))
f.close()
return
ssd = mAP_SSD()
image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/train.txt').read().strip().split() # image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/test.txt').read().strip().split()
if not os.path.exists("./input"):
os.makedirs("./input")
if not os.path.exists("./input/detection-results"):
os.makedirs("./input/detection-results")
if not os.path.exists("./input/images-optional"):
os.makedirs("./input/images-optional")
for image_id in image_ids:
image_path = "./VOCdevkit/VOC2007/JPEGImages/"+image_id+".jpg"
image = Image.open(image_path)
image.save("./input/images-optional/"+image_id+".jpg")
ssd.detect_image(image_id,image)
print(image_id," done!")
print("Conversion completed!")
6.1.5 get_gt_txt.py 代码
#----------------------------------------------------#
#----------------------------------------------------#
import sys
import os
import glob
import xml.etree.ElementTree as ET
image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/train.txt').read().strip().split() #image_ids = open('VOCdevkit/VOC2007/ImageSets/Main/test.txt').read().strip().split()
if not os.path.exists("./input"):
os.makedirs("./input")
if not os.path.exists("./input/ground-truth"):
os.makedirs("./input/ground-truth")
for image_id in image_ids:
with open("./input/ground-truth/"+image_id+".txt", "w") as new_f:
root = ET.parse("VOCdevkit/VOC2007/Annotations/"+image_id+".xml").getroot()
for obj in root.findall('object'):
if obj.find('difficult')!=None:
difficult = obj.find('difficult').text
if int(difficult)==1:
continue
obj_name = obj.find('name').text
bndbox = obj.find('bndbox')
left = bndbox.find('xmin').text
top = bndbox.find('ymin').text
right = bndbox.find('xmax').text
bottom = bndbox.find('ymax').text
new_f.write("%s %s %s %s %s\n" % (obj_name, left, top, right, bottom))
print("Conversion completed!")
6.1.6 get_map 代码
import glob
import json
import os
import shutil
import operator
import sys
import argparse
import math
import numpy as np
#----------------------------------------------------#
# 用于计算mAP
# 代码克隆自https://github.com/Cartucho/mAP
#----------------------------------------------------#
MINOVERLAP = 0.5 # default value (defined in the PASCAL VOC2012 challenge)
parser = argparse.ArgumentParser()
parser.add_argument('-na', '--no-animation', help="no animation is shown.", action="store_true")
parser.add_argument('-np', '--no-plot', help="no plot is shown.", action="store_true")
parser.add_argument('-q', '--quiet', help="minimalistic console output.", action="store_true")
# argparse receiving list of classes to be ignored
parser.add_argument('-i', '--ignore', nargs='+', type=str, help="ignore a list of classes.")
# argparse receiving list of classes with specific IoU (e.g., python main.py --set-class-iou person 0.7)
parser.add_argument('--set-class-iou', nargs='+', type=str, help="set IoU for a specific class.")
args = parser.parse_args()
'''
0,0 ------> x (width)
|
| (Left,Top)(x1,y1)
| *_________
| | |
| |
y |_________|
(height) *
(Right,Bottom)(x2,y2)
'''
# if there are no classes to ignore then replace None by empty list
if args.ignore is None:
args.ignore = []
specific_iou_flagged = False
if args.set_class_iou is not None:
specific_iou_flagged = True
# make sure that the cwd() is the location of the python script (so that every path makes sense)
os.chdir(os.path.dirname(os.path.abspath(__file__)))
GT_PATH = os.path.join(os.getcwd(), 'input', 'ground-truth')
DR_PATH = os.path.join(os.getcwd(), 'input', 'detection-results')
# if there are no images then no animation can be shown
IMG_PATH = os.path.join(os.getcwd(), 'input', 'images-optional')
if os.path.exists(IMG_PATH):
for dirpath, dirnames, files in os.walk(IMG_PATH):
if not files:
# no image files found
args.no_animation = True
else:
args.no_animation = True
# try to import OpenCV if the user didn't choose the option --no-animation
show_animation = False
if not args.no_animation:
try:
import cv2
show_animation = True
except ImportError:
print("\"opencv-python\" not found, please install to visualize the results.")
args.no_animation = True
# try to import Matplotlib if the user didn't choose the option --no-plot
draw_plot = False
if not args.no_plot:
try:
import matplotlib.pyplot as plt
draw_plot = True
except ImportError:
print("\"matplotlib\" not found, please install it to get the resulting plots.")
args.no_plot = True
def log_average_miss_rate(precision, fp_cumsum, num_images):
"""
log-average miss rate:
Calculated by averaging miss rates at 9 evenly spaced FPPI points
between 10e-2 and 10e0, in log-space.
output:
lamr | log-average miss rate
mr | miss rate
fppi | false positives per image
references:
[1] Dollar, Piotr, et al. "Pedestrian Detection: An Evaluation of the
State of the Art." Pattern Analysis and Machine Intelligence, IEEE
Transactions on 34.4 (2012): 743 - 761.
"""
# if there were no detections of that class
if precision.size == 0:
lamr = 0
mr = 1
fppi = 0
return lamr, mr, fppi
fppi = fp_cumsum / float(num_images)
mr = (1 - precision)
fppi_tmp = np.insert(fppi, 0, -1.0)
mr_tmp = np.insert(mr, 0, 1.0)
# Use 9 evenly spaced reference points in log-space
ref = np.logspace(-2.0, 0.0, num = 9)
for i, ref_i in enumerate(ref):
# np.where() will always find at least 1 index, since min(ref) = 0.01 and min(fppi_tmp) = -1.0
j = np.where(fppi_tmp <= ref_i)[-1][-1]
ref[i] = mr_tmp[j]
# log(0) is undefined, so we use the np.maximum(1e-10, ref)
lamr = math.exp(np.mean(np.log(np.maximum(1e-10, ref))))
return lamr, mr, fppi
"""
throw error and exit
"""
def error(msg):
print(msg)
sys.exit(0)
"""
check if the number is a float between 0.0 and 1.0
"""
def is_float_between_0_and_1(value):
try:
val = float(value)
if val > 0.0 and val < 1.0:
return True
else:
return False
except ValueError:
return False
"""
Calculate the AP given the recall and precision array
1st) We compute a version of the measured precision/recall curve with
precision monotonically decreasing
2nd) We compute the AP as the area under this curve by numerical integration.
"""
def voc_ap(rec, prec):
"""
--- Official matlab code VOC2012---
mrec=[0 ; rec ; 1];
mpre=[0 ; prec ; 0];
for i=numel(mpre)-1:-1:1
mpre(i)=max(mpre(i),mpre(i+1));
end
i=find(mrec(2:end)~=mrec(1:end-1))+1;
ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
"""
rec.insert(0, 0.0) # insert 0.0 at begining of list
rec.append(1.0) # insert 1.0 at end of list
mrec = rec[:]
prec.insert(0, 0.0) # insert 0.0 at begining of list
prec.append(0.0) # insert 0.0 at end of list
mpre = prec[:]
"""
This part makes the precision monotonically decreasing
(goes from the end to the beginning)
matlab: for i=numel(mpre)-1:-1:1
mpre(i)=max(mpre(i),mpre(i+1));
"""
# matlab indexes start in 1 but python in 0, so I have to do:
# range(start=(len(mpre) - 2), end=0, step=-1)
# also the python function range excludes the end, resulting in:
# range(start=(len(mpre) - 2), end=-1, step=-1)
for i in range(len(mpre)-2, -1, -1):
mpre[i] = max(mpre[i], mpre[i+1])
"""
This part creates a list of indexes where the recall changes
matlab: i=find(mrec(2:end)~=mrec(1:end-1))+1;
"""
i_list = []
for i in range(1, len(mrec)):
if mrec[i] != mrec[i-1]:
i_list.append(i) # if it was matlab would be i + 1
"""
The Average Precision (AP) is the area under the curve
(numerical integration)
matlab: ap=sum((mrec(i)-mrec(i-1)).*mpre(i));
"""
ap = 0.0
for i in i_list:
ap += ((mrec[i]-mrec[i-1])*mpre[i])
return ap, mrec, mpre
"""
Convert the lines of a file to a list
"""
def file_lines_to_list(path):
# open txt file lines to a list
with open(path) as f:
content = f.readlines()
# remove whitespace characters like `\n` at the end of each line
content = [x.strip() for x in content]
return content
"""
Draws text in image
"""
def draw_text_in_image(img, text, pos, color, line_width):
font = cv2.FONT_HERSHEY_PLAIN
fontScale = 1
lineType = 1
bottomLeftCornerOfText = pos
cv2.putText(img, text,
bottomLeftCornerOfText,
font,
fontScale,
color,
lineType)
text_width, _ = cv2.getTextSize(text, font, fontScale, lineType)[0]
return img, (line_width + text_width)
"""
Plot - adjust axes
"""
def adjust_axes(r, t, fig, axes):
# get text width for re-scaling
bb = t.get_window_extent(renderer=r)
text_width_inches = bb.width / fig.dpi
# get axis width in inches
current_fig_width = fig.get_figwidth()
new_fig_width = current_fig_width + text_width_inches
propotion = new_fig_width / current_fig_width
# get axis limit
x_lim = axes.get_xlim()
axes.set_xlim([x_lim[0], x_lim[1]*propotion])
"""
Draw plot using Matplotlib
"""
def draw_plot_func(dictionary, n_classes, window_title, plot_title, x_label, output_path, to_show, plot_color, true_p_bar):
# sort the dictionary by decreasing value, into a list of tuples
sorted_dic_by_value = sorted(dictionary.items(), key=operator.itemgetter(1))
# unpacking the list of tuples into two lists
sorted_keys, sorted_values = zip(*sorted_dic_by_value)
#
if true_p_bar != "":
"""
Special case to draw in:
- green -> TP: True Positives (object detected and matches ground-truth)
- red -> FP: False Positives (object detected but does not match ground-truth)
- orange -> FN: False Negatives (object not detected but present in the ground-truth)
"""
fp_sorted = []
tp_sorted = []
for key in sorted_keys:
fp_sorted.append(dictionary[key] - true_p_bar[key])
tp_sorted.append(true_p_bar[key])
plt.barh(range(n_classes), fp_sorted, align='center', color='crimson', label='False Positive')
plt.barh(range(n_classes), tp_sorted, align='center', color='forestgreen', label='True Positive', left=fp_sorted)
# add legend
plt.legend(loc='lower right')
"""
Write number on side of bar
"""
fig = plt.gcf() # gcf - get current figure
axes = plt.gca()
r = fig.canvas.get_renderer()
for i, val in enumerate(sorted_values):
fp_val = fp_sorted[i]
tp_val = tp_sorted[i]
fp_str_val = " " + str(fp_val)
tp_str_val = fp_str_val + " " + str(tp_val)
# trick to paint multicolor with offset:
# first paint everything and then repaint the first number
t = plt.text(val, i, tp_str_val, color='forestgreen', va='center', fontweight='bold')
plt.text(val, i, fp_str_val, color='crimson', va='center', fontweight='bold')
if i == (len(sorted_values)-1): # largest bar
adjust_axes(r, t, fig, axes)
else:
plt.barh(range(n_classes), sorted_values, color=plot_color)
"""
Write number on side of bar
"""
fig = plt.gcf() # gcf - get current figure
axes = plt.gca()
r = fig.canvas.get_renderer()
for i, val in enumerate(sorted_values):
str_val = " " + str(val) # add a space before
if val < 1.0:
str_val = " {0:.2f}".format(val)
t = plt.text(val, i, str_val, color=plot_color, va='center', fontweight='bold')
# re-set axes to show number inside the figure
if i == (len(sorted_values)-1): # largest bar
adjust_axes(r, t, fig, axes)
# set window title
fig.canvas.set_window_title(window_title)
# write classes in y axis
tick_font_size = 12
plt.yticks(range(n_classes), sorted_keys, fontsize=tick_font_size)
"""
Re-scale height accordingly
"""
init_height = fig.get_figheight()
# comput the matrix height in points and inches
dpi = fig.dpi
height_pt = n_classes * (tick_font_size * 1.4) # 1.4 (some spacing)
height_in = height_pt / dpi
# compute the required figure height
top_margin = 0.15 # in percentage of the figure height
bottom_margin = 0.05 # in percentage of the figure height
figure_height = height_in / (1 - top_margin - bottom_margin)
# set new height
if figure_height > init_height:
fig.set_figheight(figure_height)
# set plot title
plt.title(plot_title, fontsize=14)
# set axis titles
# plt.xlabel('classes')
plt.xlabel(x_label, fontsize='large')
# adjust size of window
fig.tight_layout()
# save the plot
fig.savefig(output_path)
# show image
if to_show:
plt.show()
# close the plot
plt.close()
"""
Create a ".temp_files/" and "results/" directory
"""
TEMP_FILES_PATH = ".temp_files"
if not os.path.exists(TEMP_FILES_PATH): # if it doesn't exist already
os.makedirs(TEMP_FILES_PATH)
results_files_path = "results"
if os.path.exists(results_files_path): # if it exist already
# reset the results directory
shutil.rmtree(results_files_path)
os.makedirs(results_files_path)
if draw_plot:
os.makedirs(os.path.join(results_files_path, "classes"))
if show_animation:
os.makedirs(os.path.join(results_files_path, "images", "detections_one_by_one"))
"""
ground-truth
Load each of the ground-truth files into a temporary ".json" file.
Create a list of all the class names present in the ground-truth (gt_classes).
"""
# get a list with the ground-truth files
ground_truth_files_list = glob.glob(GT_PATH + '/*.txt')
if len(ground_truth_files_list) == 0:
error("Error: No ground-truth files found!")
ground_truth_files_list.sort()
# dictionary with counter per class
gt_counter_per_class = {}
counter_images_per_class = {}
for txt_file in ground_truth_files_list:
#print(txt_file)
file_id = txt_file.split(".txt", 1)[0]
file_id = os.path.basename(os.path.normpath(file_id))
# check if there is a correspondent detection-results file
temp_path = os.path.join(DR_PATH, (file_id + ".txt"))
if not os.path.exists(temp_path):
error_msg = "Error. File not found: {}\n".format(temp_path)
error_msg += "(You can avoid this error message by running extra/intersect-gt-and-dr.py)"
error(error_msg)
lines_list = file_lines_to_list(txt_file)
# create ground-truth dictionary
bounding_boxes = []
is_difficult = False
already_seen_classes = []
for line in lines_list:
try:
if "difficult" in line:
class_name, left, top, right, bottom, _difficult = line.split()
is_difficult = True
else:
class_name, left, top, right, bottom = line.split()
except ValueError:
error_msg = "Error: File " + txt_file + " in the wrong format.\n"
error_msg += " Expected: <class_name> <left> <top> <right> <bottom> ['difficult']\n"
error_msg += " Received: " + line
error_msg += "\n\nIf you have a <class_name> with spaces between words you should remove them\n"
error_msg += "by running the script \"remove_space.py\" or \"rename_class.py\" in the \"extra/\" folder."
error(error_msg)
# check if class is in the ignore list, if yes skip
if class_name in args.ignore:
continue
bbox = left + " " + top + " " + right + " " +bottom
if is_difficult:
bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False, "difficult":True})
is_difficult = False
else:
bounding_boxes.append({"class_name":class_name, "bbox":bbox, "used":False})
# count that object
if class_name in gt_counter_per_class:
gt_counter_per_class[class_name] += 1
else:
# if class didn't exist yet
gt_counter_per_class[class_name] = 1
if class_name not in already_seen_classes:
if class_name in counter_images_per_class:
counter_images_per_class[class_name] += 1
else:
# if class didn't exist yet
counter_images_per_class[class_name] = 1
already_seen_classes.append(class_name)
# dump bounding_boxes into a ".json" file
with open(TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json", 'w') as outfile:
json.dump(bounding_boxes, outfile)
gt_classes = list(gt_counter_per_class.keys())
# let's sort the classes alphabetically
gt_classes = sorted(gt_classes)
n_classes = len(gt_classes)
#print(gt_classes)
#print(gt_counter_per_class)
"""
Check format of the flag --set-class-iou (if used)
e.g. check if class exists
"""
if specific_iou_flagged:
n_args = len(args.set_class_iou)
error_msg = \
'\n --set-class-iou [class_1] [IoU_1] [class_2] [IoU_2] [...]'
if n_args % 2 != 0:
error('Error, missing arguments. Flag usage:' + error_msg)
# [class_1] [IoU_1] [class_2] [IoU_2]
# specific_iou_classes = ['class_1', 'class_2']
specific_iou_classes = args.set_class_iou[::2] # even
# iou_list = ['IoU_1', 'IoU_2']
iou_list = args.set_class_iou[1::2] # odd
if len(specific_iou_classes) != len(iou_list):
error('Error, missing arguments. Flag usage:' + error_msg)
for tmp_class in specific_iou_classes:
if tmp_class not in gt_classes:
error('Error, unknown class \"' + tmp_class + '\". Flag usage:' + error_msg)
for num in iou_list:
if not is_float_between_0_and_1(num):
error('Error, IoU must be between 0.0 and 1.0. Flag usage:' + error_msg)
"""
detection-results
Load each of the detection-results files into a temporary ".json" file.
"""
# get a list with the detection-results files
dr_files_list = glob.glob(DR_PATH + '/*.txt')
dr_files_list.sort()
for class_index, class_name in enumerate(gt_classes):
bounding_boxes = []
for txt_file in dr_files_list:
#print(txt_file)
# the first time it checks if all the corresponding ground-truth files exist
file_id = txt_file.split(".txt",1)[0]
file_id = os.path.basename(os.path.normpath(file_id))
temp_path = os.path.join(GT_PATH, (file_id + ".txt"))
if class_index == 0:
if not os.path.exists(temp_path):
error_msg = "Error. File not found: {}\n".format(temp_path)
error_msg += "(You can avoid this error message by running extra/intersect-gt-and-dr.py)"
error(error_msg)
lines = file_lines_to_list(txt_file)
for line in lines:
try:
tmp_class_name, confidence, left, top, right, bottom = line.split()
except ValueError:
error_msg = "Error: File " + txt_file + " in the wrong format.\n"
error_msg += " Expected: <class_name> <confidence> <left> <top> <right> <bottom>\n"
error_msg += " Received: " + line
error(error_msg)
if tmp_class_name == class_name:
#print("match")
bbox = left + " " + top + " " + right + " " +bottom
bounding_boxes.append({"confidence":confidence, "file_id":file_id, "bbox":bbox})
#print(bounding_boxes)
# sort detection-results by decreasing confidence
bounding_boxes.sort(key=lambda x:float(x['confidence']), reverse=True)
with open(TEMP_FILES_PATH + "/" + class_name + "_dr.json", 'w') as outfile:
json.dump(bounding_boxes, outfile)
"""
Calculate the AP for each class
"""
sum_AP = 0.0
ap_dictionary = {}
lamr_dictionary = {}
# open file to store the results
with open(results_files_path + "/results.txt", 'w') as results_file:
results_file.write("# AP and precision/recall per class\n")
count_true_positives = {}
for class_index, class_name in enumerate(gt_classes):
count_true_positives[class_name] = 0
"""
Load detection-results of that class
"""
dr_file = TEMP_FILES_PATH + "/" + class_name + "_dr.json"
dr_data = json.load(open(dr_file))
"""
Assign detection-results to ground-truth objects
"""
nd = len(dr_data)
tp = [0] * nd # creates an array of zeros of size nd
fp = [0] * nd
for idx, detection in enumerate(dr_data):
file_id = detection["file_id"]
if show_animation:
# find ground truth image
ground_truth_img = glob.glob1(IMG_PATH, file_id + ".*")
#tifCounter = len(glob.glob1(myPath,"*.tif"))
if len(ground_truth_img) == 0:
error("Error. Image not found with id: " + file_id)
elif len(ground_truth_img) > 1:
error("Error. Multiple image with id: " + file_id)
else: # found image
#print(IMG_PATH + "/" + ground_truth_img[0])
# Load image
img = cv2.imread(IMG_PATH + "/" + ground_truth_img[0])
# load image with draws of multiple detections
img_cumulative_path = results_files_path + "/images/" + ground_truth_img[0]
if os.path.isfile(img_cumulative_path):
img_cumulative = cv2.imread(img_cumulative_path)
else:
img_cumulative = img.copy()
# Add bottom border to image
bottom_border = 60
BLACK = [0, 0, 0]
img = cv2.copyMakeBorder(img, 0, bottom_border, 0, 0, cv2.BORDER_CONSTANT, value=BLACK)
# assign detection-results to ground truth object if any
# open ground-truth with that file_id
gt_file = TEMP_FILES_PATH + "/" + file_id + "_ground_truth.json"
ground_truth_data = json.load(open(gt_file))
ovmax = -1
gt_match = -1
# load detected object bounding-box
bb = [ float(x) for x in detection["bbox"].split() ]
for obj in ground_truth_data:
# look for a class_name match
if obj["class_name"] == class_name:
bbgt = [ float(x) for x in obj["bbox"].split() ]
bi = [max(bb[0],bbgt[0]), max(bb[1],bbgt[1]), min(bb[2],bbgt[2]), min(bb[3],bbgt[3])]
iw = bi[2] - bi[0] + 1
ih = bi[3] - bi[1] + 1
if iw > 0 and ih > 0:
# compute overlap (IoU) = area of intersection / area of union
ua = (bb[2] - bb[0] + 1) * (bb[3] - bb[1] + 1) + (bbgt[2] - bbgt[0]
+ 1) * (bbgt[3] - bbgt[1] + 1) - iw * ih
ov = iw * ih / ua
if ov > ovmax:
ovmax = ov
gt_match = obj
# assign detection as true positive/don't care/false positive
if show_animation:
status = "NO MATCH FOUND!" # status is only used in the animation
# set minimum overlap
min_overlap = MINOVERLAP
if specific_iou_flagged:
if class_name in specific_iou_classes:
index = specific_iou_classes.index(class_name)
min_overlap = float(iou_list[index])
if ovmax >= min_overlap:
if "difficult" not in gt_match:
if not bool(gt_match["used"]):
# true positive
tp[idx] = 1
gt_match["used"] = True
count_true_positives[class_name] += 1
# update the ".json" file
with open(gt_file, 'w') as f:
f.write(json.dumps(ground_truth_data))
if show_animation:
status = "MATCH!"
else:
# false positive (multiple detection)
fp[idx] = 1
if show_animation:
status = "REPEATED MATCH!"
else:
# false positive
fp[idx] = 1
if ovmax > 0:
status = "INSUFFICIENT OVERLAP"
"""
Draw image to show animation
"""
if show_animation:
height, widht = img.shape[:2]
# colors (OpenCV works with BGR)
white = (255,255,255)
light_blue = (255,200,100)
green = (0,255,0)
light_red = (30,30,255)
# 1st line
margin = 10
v_pos = int(height - margin - (bottom_border / 2.0))
text = "Image: " + ground_truth_img[0] + " "
img, line_width = draw_text_in_image(img, text, (margin, v_pos), white, 0)
text = "Class [" + str(class_index) + "/" + str(n_classes) + "]: " + class_name + " "
img, line_width = draw_text_in_image(img, text, (margin + line_width, v_pos), light_blue, line_width)
if ovmax != -1:
color = light_red
if status == "INSUFFICIENT OVERLAP":
text = "IoU: {0:.2f}% ".format(ovmax*100) + "< {0:.2f}% ".format(min_overlap*100)
else:
text = "IoU: {0:.2f}% ".format(ovmax*100) + ">= {0:.2f}% ".format(min_overlap*100)
color = green
img, _ = draw_text_in_image(img, text, (margin + line_width, v_pos), color, line_width)
# 2nd line
v_pos += int(bottom_border / 2.0)
rank_pos = str(idx+1) # rank position (idx starts at 0)
text = "Detection #rank: " + rank_pos + " confidence: {0:.2f}% ".format(float(detection["confidence"])*100)
img, line_width = draw_text_in_image(img, text, (margin, v_pos), white, 0)
color = light_red
if status == "MATCH!":
color = green
text = "Result: " + status + " "
img, line_width = draw_text_in_image(img, text, (margin + line_width, v_pos), color, line_width)
font = cv2.FONT_HERSHEY_SIMPLEX
if ovmax > 0: # if there is intersections between the bounding-boxes
bbgt = [ int(round(float(x))) for x in gt_match["bbox"].split() ]
cv2.rectangle(img,(bbgt[0],bbgt[1]),(bbgt[2],bbgt[3]),light_blue,2)
cv2.rectangle(img_cumulative,(bbgt[0],bbgt[1]),(bbgt[2],bbgt[3]),light_blue,2)
cv2.putText(img_cumulative, class_name, (bbgt[0],bbgt[1] - 5), font, 0.6, light_blue, 1, cv2.LINE_AA)
bb = [int(i) for i in bb]
cv2.rectangle(img,(bb[0],bb[1]),(bb[2],bb[3]),color,2)
cv2.rectangle(img_cumulative,(bb[0],bb[1]),(bb[2],bb[3]),color,2)
cv2.putText(img_cumulative, class_name, (bb[0],bb[1] - 5), font, 0.6, color, 1, cv2.LINE_AA)
# show image
cv2.imshow("Animation", img)
cv2.waitKey(20) # show for 20 ms
# save image to results
output_img_path = results_files_path + "/images/detections_one_by_one/" + class_name + "_detection" + str(idx) + ".jpg"
cv2.imwrite(output_img_path, img)
# save the image with all the objects drawn to it
cv2.imwrite(img_cumulative_path, img_cumulative)
#print(tp)
# compute precision/recall
cumsum = 0
for idx, val in enumerate(fp):
fp[idx] += cumsum
cumsum += val
cumsum = 0
for idx, val in enumerate(tp):
tp[idx] += cumsum
cumsum += val
#print(tp)
rec = tp[:]
for idx, val in enumerate(tp):
rec[idx] = float(tp[idx]) / gt_counter_per_class[class_name]
#print(rec)
prec = tp[:]
for idx, val in enumerate(tp):
prec[idx] = float(tp[idx]) / (fp[idx] + tp[idx])
#print(prec)
ap, mrec, mprec = voc_ap(rec[:], prec[:])
sum_AP += ap
text = "{0:.2f}%".format(ap*100) + " = " + class_name + " AP " #class_name + " AP = {0:.2f}%".format(ap*100)
"""
Write to results.txt
"""
rounded_prec = [ '%.2f' % elem for elem in prec ]
rounded_rec = [ '%.2f' % elem for elem in rec ]
results_file.write(text + "\n Precision: " + str(rounded_prec) + "\n Recall :" + str(rounded_rec) + "\n\n")
if not args.quiet:
print(text)
ap_dictionary[class_name] = ap
n_images = counter_images_per_class[class_name]
lamr, mr, fppi = log_average_miss_rate(np.array(rec), np.array(fp), n_images)
lamr_dictionary[class_name] = lamr
"""
Draw plot
"""
if draw_plot:
plt.plot(rec, prec, '-o')
# add a new penultimate point to the list (mrec[-2], 0.0)
# since the last line segment (and respective area) do not affect the AP value
area_under_curve_x = mrec[:-1] + [mrec[-2]] + [mrec[-1]]
area_under_curve_y = mprec[:-1] + [0.0] + [mprec[-1]]
plt.fill_between(area_under_curve_x, 0, area_under_curve_y, alpha=0.2, edgecolor='r')
# set window title
fig = plt.gcf() # gcf - get current figure
fig.canvas.set_window_title('AP ' + class_name)
# set plot title
plt.title('class: ' + text)
#plt.suptitle('This is a somewhat long figure title', fontsize=16)
# set axis titles
plt.xlabel('Recall')
plt.ylabel('Precision')
# optional - set axes
axes = plt.gca() # gca - get current axes
axes.set_xlim([0.0,1.0])
axes.set_ylim([0.0,1.05]) # .05 to give some extra space
# Alternative option -> wait for button to be pressed
#while not plt.waitforbuttonpress(): pass # wait for key display
# Alternative option -> normal display
#plt.show()
# save the plot
fig.savefig(results_files_path + "/classes/" + class_name + ".png")
plt.cla() # clear axes for next plot
if show_animation:
cv2.destroyAllWindows()
results_file.write("\n# mAP of all classes\n")
mAP = sum_AP / n_classes
text = "mAP = {0:.2f}%".format(mAP*100)
results_file.write(text + "\n")
print(text)
# remove the temp_files directory
shutil.rmtree(TEMP_FILES_PATH)
"""
Count total of detection-results
"""
# iterate through all the files
det_counter_per_class = {}
for txt_file in dr_files_list:
# get lines to list
lines_list = file_lines_to_list(txt_file)
for line in lines_list:
class_name = line.split()[0]
# check if class is in the ignore list, if yes skip
if class_name in args.ignore:
continue
# count that object
if class_name in det_counter_per_class:
det_counter_per_class[class_name] += 1
else:
# if class didn't exist yet
det_counter_per_class[class_name] = 1
#print(det_counter_per_class)
dr_classes = list(det_counter_per_class.keys())
"""
Plot the total number of occurences of each class in the ground-truth
"""
if draw_plot:
window_title = "ground-truth-info"
plot_title = "ground-truth\n"
plot_title += "(" + str(len(ground_truth_files_list)) + " files and " + str(n_classes) + " classes)"
x_label = "Number of objects per class"
output_path = results_files_path + "/ground-truth-info.png"
to_show = False
plot_color = 'forestgreen'
draw_plot_func(
gt_counter_per_class,
n_classes,
window_title,
plot_title,
x_label,
output_path,
to_show,
plot_color,
'',
)
"""
Write number of ground-truth objects per class to results.txt
"""
with open(results_files_path + "/results.txt", 'a') as results_file:
results_file.write("\n# Number of ground-truth objects per class\n")
for class_name in sorted(gt_counter_per_class):
results_file.write(class_name + ": " + str(gt_counter_per_class[class_name]) + "\n")
"""
Finish counting true positives
"""
for class_name in dr_classes:
# if class exists in detection-result but not in ground-truth then there are no true positives in that class
if class_name not in gt_classes:
count_true_positives[class_name] = 0
#print(count_true_positives)
"""
Plot the total number of occurences of each class in the "detection-results" folder
"""
if draw_plot:
window_title = "detection-results-info"
# Plot title
plot_title = "detection-results\n"
plot_title += "(" + str(len(dr_files_list)) + " files and "
count_non_zero_values_in_dictionary = sum(int(x) > 0 for x in list(det_counter_per_class.values()))
plot_title += str(count_non_zero_values_in_dictionary) + " detected classes)"
# end Plot title
x_label = "Number of objects per class"
output_path = results_files_path + "/detection-results-info.png"
to_show = False
plot_color = 'forestgreen'
true_p_bar = count_true_positives
draw_plot_func(
det_counter_per_class,
len(det_counter_per_class),
window_title,
plot_title,
x_label,
output_path,
to_show,
plot_color,
true_p_bar
)
"""
Write number of detected objects per class to results.txt
"""
with open(results_files_path + "/results.txt", 'a') as results_file:
results_file.write("\n# Number of detected objects per class\n")
for class_name in sorted(dr_classes):
n_det = det_counter_per_class[class_name]
text = class_name + ": " + str(n_det)
text += " (tp:" + str(count_true_positives[class_name]) + ""
text += ", fp:" + str(n_det - count_true_positives[class_name]) + ")\n"
results_file.write(text)
"""
Draw log-average miss rate plot (Show lamr of all classes in decreasing order)
"""
if draw_plot:
window_title = "lamr"
plot_title = "log-average miss rate"
x_label = "log-average miss rate"
output_path = results_files_path + "/lamr.png"
to_show = False
plot_color = 'royalblue'
draw_plot_func(
lamr_dictionary,
n_classes,
window_title,
plot_title,
x_label,
output_path,
to_show,
plot_color,
""
)
"""
Draw mAP plot (Show AP's of all classes in decreasing order)
"""
if draw_plot:
window_title = "mAP"
plot_title = "mAP = {0:.2f}%".format(mAP*100)
x_label = "Average Precision"
output_path = results_files_path + "/mAP.png"
to_show = True
plot_color = 'royalblue'
draw_plot_func(
ap_dictionary,
n_classes,
window_title,
plot_title,
x_label,
output_path,
to_show,
plot_color,
""
)
7 训练自己数据集的步骤
7.1 流程
- 用imgme圈出真实框,把图片信息和框的信息分别放入VOCdevkit/VOC2007/JPEGImages和VOCdevkit/VOC2007/Annotations中。
- 修改ssd.py里的"model_path"、“classes_path”。
- 修改train.py里的model.load_weights。
版权声明:本文为CSDN博主「山居秋暝LS」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/qq_35732321/article/details/122324735
暂无评论