文章目录[隐藏]
一、环境
千辛万苦走通后,发现版本真的坑死人。
我的版本:python3.6 + cuda 10.2 + pytorch1.7.1 + numpy1.15.1 + RTX2060
(建议:据说将pytorch的版本降低为1.2及以下的版本。但是我的cuda是10.2,目前不支持1.2及其以下的GPU版的pytorch,重安装太麻烦了,我就只能在后边解决问题了)
二、下载项目
- 自己配置
使用的是SSD-Pytorch git项目,因为需要使用外网,有时候可能git clone不成功,建议直接下载zip包,本地压缩:
项目地址:https://github.com/amdegroot/ssd.pytorch
git clone https://gitcode.net/mirrors/amdegroot/ssd.pytorch.git
第一次下载失败,git网页没进去,后来使用了VPN才能下载:
- 下载配置好项目(我自己调通后的项目)
地址:https://github.com/625135449/SSD-Pytorch,可直接git clone
git clone https://github.com/625135449/SSD-Pytorch
三、准备数据集
我的数据集格式是darknet的yolo格式,在此转为voc数据集(可自行转为coco数据集)
3.1 数据结构
darknet格式:
voc格式:在Annotations中放置所有的xml标签,在JPEGImages中放置所有的图片,ImageSets/Main中放置train.txt、trainval.txt、val.txt、test.txt(内容只有图片的名字)
3.2 darkent的txt文件转为voc的xml文件代码
输入相关的文件地址、类别
import os
import glob
from PIL import Image
from tqdm import tqdm
voc_annotations = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/Annotations/' #存放的xml文件地址
yolo_txt = '/home/darknet/Helmet/labels/' #darkent数据集标签文件地址
img_path = '/home/darknet/Helmet/images/' #darkent数据集图片地址
labels = ['no helmet', 'wear helmet'] #darknet数据集的类别
# 图像存储位置
src_img_dir = img_path
# 图像的txt文件存放位置
src_txt_dir = yolo_txt
src_xml_dir = voc_annotations
img_Lists = glob.glob(src_img_dir + '/*.jpg')
img_basenames = []
for item in img_Lists:
img_basenames.append(os.path.basename(item))
img_names = []
for item in img_basenames:
temp1, temp2 = os.path.splitext(item)
img_names.append(temp1)
for img in tqdm(img_names):
im = Image.open((src_img_dir + '/' + img + '.jpg'))
width, height = im.size
# 打开txt文件
gt = open(src_txt_dir + '/' + img + '.txt').read().splitlines()
# print(gt)
if gt:
# 将主干部分写入xml文件中
xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
xml_file.write('<annotation>\n')
xml_file.write(' <folder>VOC2007</folder>\n')
xml_file.write(' <filename>' + str(img) + '.jpg' + '</filename>\n')
xml_file.write(' <size>\n')
xml_file.write(' <width>' + str(width) + '</width>\n')
xml_file.write(' <height>' + str(height) + '</height>\n')
xml_file.write(' <depth>3</depth>\n')
xml_file.write(' </size>\n')
# write the region of image on xml file
for img_each_label in gt: # txt 文件中的每一行
spt = img_each_label.split(' ') # 这里如果txt里面是以逗号‘,’隔开的,那么就改为spt = img_each_label.split(',')。
# print(f'spt:{spt}')
xml_file.write(' <object>\n')
xml_file.write(' <name>' + str(labels[int(spt[0])]) + '</name>\n')
xml_file.write(' <pose>Unspecified</pose>\n')
xml_file.write(' <truncated>0</truncated>\n')
xml_file.write(' <difficult>0</difficult>\n')
xml_file.write(' <bndbox>\n')
center_x = round(float(spt[1].strip()) * width)
center_y = round(float(spt[2].strip()) * height)
bbox_width = round(float(spt[3].strip()) * width)
bbox_height = round(float(spt[4].strip()) * height)
xmin = str(int(center_x - bbox_width / 2))
ymin = str(int(center_y - bbox_height / 2))
xmax = str(int(center_x + bbox_width / 2))
ymax = str(int(center_y + bbox_height / 2))
xml_file.write(' <xmin>' + xmin + '</xmin>\n')
xml_file.write(' <ymin>' + ymin + '</ymin>\n')
xml_file.write(' <xmax>' + xmax + '</xmax>\n')
xml_file.write(' <ymax>' + ymax + '</ymax>\n')
xml_file.write(' </bndbox>\n')
xml_file.write(' </object>\n')
xml_file.write('</annotation>')
3.3 自动生成test.txt、train.txt、trainval.txt、val.txt代码
输入相关的文件地址
import os
import random
trainval_percent = 0.66
train_percent = 0.5
xmlfilepath = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/Annotations'
txtsavepath = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml) #xml个数
list = range(num)
tv = int(num * trainval_percent) #总数的66%
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)
ftrainval = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/trainval.txt', 'w')
ftest = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/test.txt', 'w')
ftrain = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/train.txt', 'w')
fval = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/val.txt', 'w')
for i in list:
name = total_xml[i][:-4] + '\n'
if i in trainval:
ftrainval.write(name)
if i in train:
ftrain.write(name)
else:
fval.write(name)
else:
ftest.write(name)
ftrainval.close()
ftrain.close()
fval.close()
ftest.close()
四、ssd.pytorch项目操作
4.1 创建数据集
-
使用的VOC数据集
-
没有数据集的可以下载代码自带的VOC和COCO数据集(./data/scripts目录下)
-
有自己数据集的进行以下操作:
- 在data文件夹下新建VOCdevkit文件夹
- 上边转好的数据集VOC2021复制到VOCdevkit文件夹下,结构如下(如果使用我的项目,运行./data/VOCdevkit、VOC2021下的darknet_to_voc.py、split_txt.py):
4.2 修改配置文件
以下以我的数据为例:
- 配置环境:
- 下载预训练权重vgg16_reducedfc.pth,放入ssd.pytorch/weights中(没有weights文件夹则新建),权重地址
- 安装pillow、opencv-python、tqdm
- 安装numpy :建议安装1.15.1,高于该版本会报错
- 安装pytorch:可去官网根据cuda版本下载相应的Torch版本,这位博主有写对应的,地址:https://blog.csdn.net/llm765800916/article/details/118146146
我的安装命令:
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
-
./data/config.py中的voc:
- HOME = os.path.expanduser("~"),加入项目ssd.pytorch所在的绝对地址(我的是改为HOME = os.path.expanduser("/home/ssd.pytorch"))
- 'num_classes’的类别数:classes+1(背景算一类),我是2个类,所有是3
- ‘max_iter’的训练迭代次数:测试用,所以暂时设置的1000(根据自己的电脑配置参数与需求)
-
./data/coco.py
- 将11line中的COCO_ROOT = osp.join(HOME, ‘data/coco/’)改为COCO_ROOT = osp.join(HOME, ‘data/’)
-
./data/voc0712.py
- 将20line 的VOC_CLASS改为自己的类别名;
- 93line的image_sets=[(‘2007’, ‘trainval’), (‘2012’, ‘trainval’)]改为自己的数据集名字和文件名(我的数据集为VOC2021,用ImageSets/Main下的train.txt、trainval.txt),我改后为:image_sets=[(‘2021’, ‘train’), (‘2021’, ‘trainval’)]
- 95line的dataset_name='VOC0712’改为dataset_name=‘voc0712’
-
./train.py
- 32line的batch_size,默认=32,建议改小一点,可以改8(Batch Size指一次训练所选取的样本数,其大小影响模型的优化程度和速度,同时其直接影响到GPU内存的使用情况,假如你GPU内存不大,该数值最好设置小一点)
- 194line的iteration % 5000 == 0,根据config.py中设置的max_iter选择每迭代多少次保存一次模型。
-
./SSD.py
- 32line的self.cfg = (coco, voc)[num_classes == 21],21改为自己的类别数3
- 198line的def build_ssd(phase, size=300, num_classes=21),21改为自己的类别数
五、训练过程error、warning解决
line的定位可能不太准,在该line上下几行定位下即可
-
error
loss_c[pos] = 0 # filter out pos boxes for now
IndexError: The shape of the mask [8, 8732] at index 0 does not match the shape of the indexed tensor [69856, 1] at index 0
solved
定位到./layers/modules/multibox_loss.py- 97line与98line对调一下
loss_c[pos] = 0 # filter out pos boxes for now
loss_c = loss_c.view(num, -1)
改为:
loss_c = loss_c.view(num, -1)
loss_c[pos] = 0 # filter out pos boxes for now - 114line
N = num_pos.data.sum()
改为:
N = num_pos.data.sum().double()
loss_l = loss_l.double()
loss_c = loss_c.double()
- 97line与98line对调一下
-
error
RuntimeError: Expected a ‘cuda’ device type for generator but found ‘cpu’
solved
安装对应cuda的pytorch:
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
-
error
loc_loss += loss_l.data[0]
IndexError: invalid index of a 0-dim tensor. Usetensor.item()
in Python ortensor.item<T>()
in C++ to convert a 0-dim tensor to a number
solved
定位到./train.py 183line之后的所有.data[0]改为.data -
error
StopIteration
solved
定位到./train.py 165line
*images, targets = next(batch_iterator)*改为:
try:
images, targets = next(batch_iterator)
except StopIteration:
batch_iterator = iter(data_loader)
images, targets = next(batch_iterator) -
warning
VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarraysolved
pip install numpy==1.15.1 -
warning
- UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
init.xavier_uniform(param)
solved
定位train.py 218line:init.xavier_uniform 改为 init.xavier_uniform_ - UserWarning: volatile was removed and now has no effect. Use
with torch.no_grad():
instead.
targets = [Variable(ann.cuda(), volatile=True) for ann in targets]
solved
定位train.py 173line、176line中的’volatile=True’删除,例如:targets = [Variable(ann.cuda(), volatile=True)
改为:targets = [Variable(ann.cuda())
- UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
-
训练出现-nan
solved:定位到./train.py 42line:*parser.add_argument(’–lr’, ‘–learning-rate’, default=1e-3, type=float,help=‘initial learning rate’)*默认为0.01(1e-3),降低学习率即可。
六、训练完成后的验证
6.1 配置eval.py
-
修改38line的训练好的模型(运行train.py成功后会自动保存模型到weights文件夹中,我取的loss值最低的一个模型):
parser.add_argument(’–trained_model’,default=‘weights/ssd_VOC_500.pth’…) -
修改54line的:args = parser.parse_args()–>args,unknow= parser.parse_known_args()
-
修改69、70、71、73line的annopath、imgpath、imgsetpath、YEAR(项目作者用的voc2007,我建立的是voc2021,所以需要修改)
比如:annopath = os.path.join(args.voc_root, ‘VOC2007’, ‘Annotations’, ‘%s.xml’)
改为:annopath = os.path.join(args.voc_root, ‘VOC2021’, ‘Annotations’, ‘%s.xml’) -
修改429line:dataset = VOCDetection(args.voc_root, [(‘2007’, set_type)]…)
改为:dataset = VOCDetection(args.voc_root, [(‘2021’, set_type)]…)
得到map结果:
6.3 配置test.py
-
修改17line的训练好的模型
-
修改87line的testset = VOCDetection(args.voc_root, [(‘2007’, ‘test’)]…),2007改为2021
6.4 检测图片,可视化
放入项目 ./demo/live_img.py:带检测框的图片 https://github.com/625135449/SSD-Pytorch/blob/main/demo/live_img.py
放入项目 ./demo/live_score.py:带置信度检测框的图片 https://github.com/625135449/SSD-Pytorch/blob/main/demo/live_score.py
6.5 eval.py检测过程的error、warning
-
error
RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.
solved
据说pytorch版本低于1.2不会出现该问题,可自行降版本,以下是不降版本的解决方法,参考的这位博主的解决方法:地址链接- 定位./ssd.py 98line(注释的是原代码,以下是修改后的)
if self.phase == "test":
# output = self.detect(
# loc.view(loc.size(0), -1, 4), # loc preds
# self.softmax(conf.view(conf.size(0), -1,
# self.num_classes)), # conf preds
# self.priors.type(type(x.data)) # default boxes
# )
output = self.detect.forward(
loc.view(loc.size(0), -1, 4), # loc preds
self.softmax(conf.view(conf.size(0), -1,
self.num_classes)), # conf preds
self.priors.type(type(x.data)) # default boxes
)
- 定位./layers/box_utils.py 中的def nms(boxes, scores, overlap=0.5, top_k=200)函数,改成以下的函数:
def nms(boxes, scores, overlap=0.5, top_k=200): ##参数:边界框精确位置,边界框类别的分数、nms阈值、前200个边界框
'''(1)构建keep张量:初始值为0,形状与预测框的数量相同(预测框的数量为该类,类别置信度大于阈值的预测边界框的数量)'''
keep = scores.new(scores.size(0)).zero_().long()
if boxes.numel() == 0:
return keep
'''(2)计算预测边界框的面积'''
x1 = boxes[:, 0]
y1 = boxes[:, 1]
x2 = boxes[:, 2]
y2 = boxes[:, 3]
area = torch.mul(x2 - x1, y2 - y1)
'''(3)获取 类别置信度分数最高的top_k个 预测边界框的索引'''
v, idx = scores.sort(0) # 对类别置信度分数升序排序,返回 按照类别置信度分数排序后的 预测边界框的索引
# I = I[v >= 0.01]
'''类别置信度分数最高的前top_k个预测框的索引:idx '''
idx = idx[-top_k:] # indices of the top-k largest vals
xx1 = boxes.new()
yy1 = boxes.new()
xx2 = boxes.new()
yy2 = boxes.new()
w = boxes.new()
h = boxes.new()
'''(4)将nms后的预测边界框的索引,存入keep'''
count = 0
while idx.numel() > 0:
''''#1.类别置信度分数最高的预测边界框————————索引逐一写入keep'''
i = idx[-1] # index of current largest val
# keep.append(i)
keep[count] = i
count += 1
if idx.size(0) == 1:
break
'''#2.剩余预测边界框的索引'''
idx = idx[:-1] # remove kept element from view
'''#3.计算剩余预测边界框与,分数最高的边界框之间的iou值'''
#####################################添加代码##########################################
# 否者出错RuntimeError: index_select(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
idx = torch.autograd.Variable(idx, requires_grad=False)
idx = idx.data
x1 = torch.autograd.Variable(x1, requires_grad=False)
x1 = x1.data
y1 = torch.autograd.Variable(y1, requires_grad=False)
y1 = y1.data
x2 = torch.autograd.Variable(x2, requires_grad=False)
x2 = x2.data
y2 = torch.autograd.Variable(y2, requires_grad=False)
y2 = y2.data
######################################添加代码#################################################
torch.index_select(x1, 0, idx, out=xx1)
torch.index_select(y1, 0, idx, out=yy1)
torch.index_select(x2, 0, idx, out=xx2)
torch.index_select(y2, 0, idx, out=yy2)
# store element-wise max with next highest score
xx1 = torch.clamp(xx1, min=x1[i])
yy1 = torch.clamp(yy1, min=y1[i])
xx2 = torch.clamp(xx2, max=x2[i])
yy2 = torch.clamp(yy2, max=y2[i])
w.resize_as_(xx2)
h.resize_as_(yy2)
w = xx2 - xx1
h = yy2 - yy1
# check sizes of xx1 and xx2.. after each iteration
w = torch.clamp(w, min=0.0)
h = torch.clamp(h, min=0.0)
inter = w * h
# IoU = i / (area(a) + area(b) - i)
#####################################添加代码##########################################
# 否者出错RuntimeError: index_select(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
area = torch.autograd.Variable(area, requires_grad=False)
area = area.data
idx = torch.autograd.Variable(idx, requires_grad=False)
idx = idx.data
######################################添加代码#################################################
rem_areas = torch.index_select(area, 0, idx) # load remaining areas)
union = (rem_areas - inter) + area[i]
IoU = inter / union # store result in iou
# keep only elements with an IoU <= overlap
'''4.保留iou值小于nms阈值的预测边界框的索引'''
idx = idx[IoU.le(overlap)] # 保留交并比小于阈值的预测边界框的id
return keep, count```
+ **warning**
*UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
self.priors = Variable(self.priorbox.forward(), volatile=True)*
**solved**:
定位./ssd.py 34line *self.priors = Variable(self.priorbox.forward(), volatile=True)*
改为:*self.priors = Variable(self.priorbox.forward())*
参考了这位博主的流程:[https://blog.csdn.net/weixin_42447868/article/details/105675158#comments_19145022](https://blog.csdn.net/weixin_42447868/article/details/105675158#comments_19145022)
版权声明:本文为CSDN博主「卖strawberry的小女孩」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/baidu_41906969/article/details/121835265
暂无评论