SSD-Pytorch,Darknet数据集转VOC训练、检测图片、全流程跑通

一、环境

千辛万苦走通后,发现版本真的坑死人。

我的版本:python3.6 + cuda 10.2 + pytorch1.7.1 + numpy1.15.1 + RTX2060

(建议:据说将pytorch的版本降低为1.2及以下的版本。但是我的cuda是10.2,目前不支持1.2及其以下的GPU版的pytorch,重安装太麻烦了,我就只能在后边解决问题了)

二、下载项目

  • 自己配置
    使用的是SSD-Pytorch git项目,因为需要使用外网,有时候可能git clone不成功,建议直接下载zip包,本地压缩:
    项目地址:https://github.com/amdegroot/ssd.pytorch
git clone https://gitcode.net/mirrors/amdegroot/ssd.pytorch.git

第一次下载失败,git网页没进去,后来使用了VPN才能下载:

git clone https://github.com/625135449/SSD-Pytorch

三、准备数据集

我的数据集格式是darknet的yolo格式,在此转为voc数据集(可自行转为coco数据集)

3.1 数据结构

darknet格式:
图一
voc格式:在Annotations中放置所有的xml标签,在JPEGImages中放置所有的图片,ImageSets/Main中放置train.txt、trainval.txt、val.txt、test.txt(内容只有图片的名字)
图二

3.2 darkent的txt文件转为voc的xml文件代码

输入相关的文件地址、类别

import os
import glob
from PIL import Image
from tqdm import tqdm

voc_annotations = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/Annotations/' #存放的xml文件地址
yolo_txt = '/home/darknet/Helmet/labels/'  #darkent数据集标签文件地址
img_path = '/home/darknet/Helmet/images/' #darkent数据集图片地址
labels = ['no helmet', 'wear helmet']  #darknet数据集的类别

# 图像存储位置
src_img_dir = img_path
# 图像的txt文件存放位置
src_txt_dir = yolo_txt
src_xml_dir = voc_annotations
img_Lists = glob.glob(src_img_dir + '/*.jpg')
img_basenames = []

for item in img_Lists:
    img_basenames.append(os.path.basename(item))

img_names = []
for item in img_basenames:
    temp1, temp2 = os.path.splitext(item)
    img_names.append(temp1)

for img in tqdm(img_names):
    im = Image.open((src_img_dir + '/' + img + '.jpg'))
    width, height = im.size

    # 打开txt文件
    gt = open(src_txt_dir + '/' + img + '.txt').read().splitlines()
    # print(gt)
    if gt:
        # 将主干部分写入xml文件中
        xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w')
        xml_file.write('<annotation>\n')
        xml_file.write('    <folder>VOC2007</folder>\n')
        xml_file.write('    <filename>' + str(img) + '.jpg' + '</filename>\n')
        xml_file.write('    <size>\n')
        xml_file.write('        <width>' + str(width) + '</width>\n')
        xml_file.write('        <height>' + str(height) + '</height>\n')
        xml_file.write('        <depth>3</depth>\n')
        xml_file.write('    </size>\n')

        # write the region of image on xml file
        for img_each_label in gt:  # txt 文件中的每一行
            spt = img_each_label.split(' ')  # 这里如果txt里面是以逗号‘,’隔开的,那么就改为spt = img_each_label.split(',')。
            # print(f'spt:{spt}')
            xml_file.write('    <object>\n')
            xml_file.write('        <name>' + str(labels[int(spt[0])]) + '</name>\n')
            xml_file.write('        <pose>Unspecified</pose>\n')
            xml_file.write('        <truncated>0</truncated>\n')
            xml_file.write('        <difficult>0</difficult>\n')
            xml_file.write('        <bndbox>\n')

            center_x = round(float(spt[1].strip()) * width)
            center_y = round(float(spt[2].strip()) * height)
            bbox_width = round(float(spt[3].strip()) * width)
            bbox_height = round(float(spt[4].strip()) * height)
            xmin = str(int(center_x - bbox_width / 2))
            ymin = str(int(center_y - bbox_height / 2))
            xmax = str(int(center_x + bbox_width / 2))
            ymax = str(int(center_y + bbox_height / 2))

            xml_file.write('            <xmin>' + xmin + '</xmin>\n')
            xml_file.write('            <ymin>' + ymin + '</ymin>\n')
            xml_file.write('            <xmax>' + xmax + '</xmax>\n')
            xml_file.write('            <ymax>' + ymax + '</ymax>\n')
            xml_file.write('        </bndbox>\n')
            xml_file.write('    </object>\n')

        xml_file.write('</annotation>')

3.3 自动生成test.txt、train.txt、trainval.txt、val.txt代码

输入相关的文件地址

import os
import random

trainval_percent = 0.66
train_percent = 0.5
xmlfilepath = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/Annotations'
txtsavepath = '/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main'
total_xml = os.listdir(xmlfilepath)

num = len(total_xml)   #xml个数
list = range(num)
tv = int(num * trainval_percent)   #总数的66%
tr = int(tv * train_percent)
trainval = random.sample(list, tv)
train = random.sample(trainval, tr)

ftrainval = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/trainval.txt', 'w')
ftest = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/test.txt', 'w')
ftrain = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/train.txt', 'w')
fval = open('/home/ssd.pytorch/data/VOCdevkit/VOC2021/ImageSets/Main/val.txt', 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        ftrainval.write(name)
        if i in train:
            ftrain.write(name)
        else:
            fval.write(name)
    else:
        ftest.write(name)

ftrainval.close()
ftrain.close()
fval.close()
ftest.close()

四、ssd.pytorch项目操作

4.1 创建数据集

  • 使用的VOC数据集

  • 没有数据集的可以下载代码自带的VOC和COCO数据集(./data/scripts目录下)

  • 有自己数据集的进行以下操作:

    • 在data文件夹下新建VOCdevkit文件夹
    • 上边转好的数据集VOC2021复制到VOCdevkit文件夹下,结构如下(如果使用我的项目,运行./data/VOCdevkit、VOC2021下的darknet_to_voc.py、split_txt.py):
      在这里插入图片描述

4.2 修改配置文件

以下以我的数据为例:

  • 配置环境
    • 下载预训练权重vgg16_reducedfc.pth,放入ssd.pytorch/weights中(没有weights文件夹则新建),权重地址
    • 安装pillow、opencv-python、tqdm
    • 安装numpy :建议安装1.15.1,高于该版本会报错
    • 安装pytorch:可去官网根据cuda版本下载相应的Torch版本,这位博主有写对应的,地址:https://blog.csdn.net/llm765800916/article/details/118146146
      我的安装命令:
pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
  • ./data/config.py中的voc

    • HOME = os.path.expanduser("~"),加入项目ssd.pytorch所在的绝对地址(我的是改为HOME = os.path.expanduser("/home/ssd.pytorch"))
    • 'num_classes’的类别数:classes+1(背景算一类),我是2个类,所有是3
    • ‘max_iter’的训练迭代次数:测试用,所以暂时设置的1000(根据自己的电脑配置参数与需求)
      在这里插入图片描述
  • ./data/coco.py

    • 将11line中的COCO_ROOT = osp.join(HOME, ‘data/coco/’)改为COCO_ROOT = osp.join(HOME, ‘data/’)
  • ./data/voc0712.py

    • 将20line 的VOC_CLASS改为自己的类别名;
    • 93line的image_sets=[(‘2007’, ‘trainval’), (‘2012’, ‘trainval’)]改为自己的数据集名字和文件名(我的数据集为VOC2021,用ImageSets/Main下的train.txt、trainval.txt),我改后为:image_sets=[(‘2021’, ‘train’), (‘2021’, ‘trainval’)]
    • 95line的dataset_name='VOC0712’改为dataset_name=‘voc0712’
  • ./train.py

    • 32line的batch_size,默认=32,建议改小一点,可以改8(Batch Size指一次训练所选取的样本数,其大小影响模型的优化程度和速度,同时其直接影响到GPU内存的使用情况,假如你GPU内存不大,该数值最好设置小一点)
    • 194line的iteration % 5000 == 0,根据config.py中设置的max_iter选择每迭代多少次保存一次模型。
  • ./SSD.py

    • 32line的self.cfg = (coco, voc)[num_classes == 21],21改为自己的类别数3
    • 198line的def build_ssd(phase, size=300, num_classes=21),21改为自己的类别数

五、训练过程error、warning解决

line的定位可能不太准,在该line上下几行定位下即可

  • error
    loss_c[pos] = 0 # filter out pos boxes for now
    IndexError: The shape of the mask [8, 8732] at index 0 does not match the shape of the indexed tensor [69856, 1] at index 0

    solved
    定位到./layers/modules/multibox_loss.py

    • 97line与98line对调一下
      loss_c[pos] = 0 # filter out pos boxes for now
      loss_c = loss_c.view(num, -1)
      改为:
      loss_c = loss_c.view(num, -1)
      loss_c[pos] = 0 # filter out pos boxes for now
    • 114line
      N = num_pos.data.sum()
      改为:
      N = num_pos.data.sum().double()
      loss_l = loss_l.double()
      loss_c = loss_c.double()
  • error
    RuntimeError: Expected a ‘cuda’ device type for generator but found ‘cpu’
    solved
    安装对应cuda的pytorch:

pip install torch==1.7.1 torchvision==0.8.2 torchaudio==0.7.2
  • error
    loc_loss += loss_l.data[0]
    IndexError: invalid index of a 0-dim tensor. Use tensor.item() in Python or tensor.item<T>() in C++ to convert a 0-dim tensor to a number

    solved
    定位到./train.py 183line之后的所有.data[0]改为.data

  • error
    StopIteration
    solved
    定位到./train.py 165line
    *images, targets = next(batch_iterator)*改为:
    try:
    images, targets = next(batch_iterator)
    except StopIteration:
    batch_iterator = iter(data_loader)
    images, targets = next(batch_iterator)

  • warning
    VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify ‘dtype=object’ when creating the ndarray

    solved
    pip install numpy==1.15.1

  • warning

    • UserWarning: nn.init.xavier_uniform is now deprecated in favor of nn.init.xavier_uniform_.
      init.xavier_uniform(param)

      solved
      定位train.py 218line:init.xavier_uniform 改为 init.xavier_uniform_
    • UserWarning: volatile was removed and now has no effect. Use with torch.no_grad(): instead.
      targets = [Variable(ann.cuda(), volatile=True) for ann in targets]

      solved
      定位train.py 173line、176line中的’volatile=True’删除,例如:targets = [Variable(ann.cuda(), volatile=True)
      改为:targets = [Variable(ann.cuda())
  • 训练出现-nan
    在这里插入图片描述

solved:定位到./train.py 42line:*parser.add_argument(’–lr’, ‘–learning-rate’, default=1e-3, type=float,help=‘initial learning rate’)*默认为0.01(1e-3),降低学习率即可。

六、训练完成后的验证

6.1 配置eval.py

  • 修改38line的训练好的模型(运行train.py成功后会自动保存模型到weights文件夹中,我取的loss值最低的一个模型):
    parser.add_argument(’–trained_model’,default=‘weights/ssd_VOC_500.pth’…)

  • 修改54line的:args = parser.parse_args()–>args,unknow= parser.parse_known_args()

  • 修改69、70、71、73line的annopath、imgpath、imgsetpath、YEAR(项目作者用的voc2007,我建立的是voc2021,所以需要修改)
    比如:annopath = os.path.join(args.voc_root, ‘VOC2007’, ‘Annotations’, ‘%s.xml’)
    改为:annopath = os.path.join(args.voc_root, ‘VOC2021’, ‘Annotations’, ‘%s.xml’)

  • 修改429line:dataset = VOCDetection(args.voc_root, [(‘2007’, set_type)]…)
    改为:dataset = VOCDetection(args.voc_root, [(‘2021’, set_type)]…)

得到map结果:
在这里插入图片描述

6.3 配置test.py

  • 修改17line的训练好的模型

  • 修改87line的testset = VOCDetection(args.voc_root, [(‘2007’, ‘test’)]…),2007改为2021

在这里插入图片描述

6.4 检测图片,可视化

放入项目 ./demo/live_img.py:带检测框的图片 https://github.com/625135449/SSD-Pytorch/blob/main/demo/live_img.py
在这里插入图片描述

放入项目 ./demo/live_score.py:带置信度检测框的图片 https://github.com/625135449/SSD-Pytorch/blob/main/demo/live_score.py
在这里插入图片描述

6.5 eval.py检测过程的error、warning

  • error
    RuntimeError: Legacy autograd function with non-static forward method is deprecated. Please use new-style autograd function with static forward method.
    solved
    据说pytorch版本低于1.2不会出现该问题,可自行降版本,以下是不降版本的解决方法,参考的这位博主的解决方法:地址链接

    • 定位./ssd.py 98line(注释的是原代码,以下是修改后的)
        if self.phase == "test":
            # output = self.detect(
            #     loc.view(loc.size(0), -1, 4),                   # loc preds
            #     self.softmax(conf.view(conf.size(0), -1,
            #                  self.num_classes)),                # conf preds
            #     self.priors.type(type(x.data))                  # default boxes
            # )
            output = self.detect.forward(
                loc.view(loc.size(0), -1, 4),  # loc preds
                self.softmax(conf.view(conf.size(0), -1,
                                       self.num_classes)),  # conf preds
                self.priors.type(type(x.data))  # default boxes
            )
  • 定位./layers/box_utils.py 中的def nms(boxes, scores, overlap=0.5, top_k=200)函数,改成以下的函数:
def nms(boxes, scores, overlap=0.5, top_k=200):  ##参数:边界框精确位置,边界框类别的分数、nms阈值、前200个边界框
    '''(1)构建keep张量:初始值为0,形状与预测框的数量相同(预测框的数量为该类,类别置信度大于阈值的预测边界框的数量)'''
    keep = scores.new(scores.size(0)).zero_().long()

    if boxes.numel() == 0:
        return keep

    '''(2)计算预测边界框的面积'''
    x1 = boxes[:, 0]
    y1 = boxes[:, 1]
    x2 = boxes[:, 2]
    y2 = boxes[:, 3]
    area = torch.mul(x2 - x1, y2 - y1)

    '''(3)获取 类别置信度分数最高的top_k个 预测边界框的索引'''
    v, idx = scores.sort(0)  # 对类别置信度分数升序排序,返回 按照类别置信度分数排序后的   预测边界框的索引
    # I = I[v >= 0.01]
    '''类别置信度分数最高的前top_k个预测框的索引:idx '''
    idx = idx[-top_k:]  # indices of the top-k largest vals
    xx1 = boxes.new()
    yy1 = boxes.new()
    xx2 = boxes.new()
    yy2 = boxes.new()
    w = boxes.new()
    h = boxes.new()
    '''(4)将nms后的预测边界框的索引,存入keep'''
    count = 0
    while idx.numel() > 0:
        ''''#1.类别置信度分数最高的预测边界框————————索引逐一写入keep'''
        i = idx[-1]  # index of current largest val
        # keep.append(i)
        keep[count] = i
        count += 1

        if idx.size(0) == 1:
            break
        '''#2.剩余预测边界框的索引'''
        idx = idx[:-1]  # remove kept element from view
        '''#3.计算剩余预测边界框与,分数最高的边界框之间的iou值'''
        #####################################添加代码##########################################
        # 否者出错RuntimeError: index_select(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
        idx = torch.autograd.Variable(idx, requires_grad=False)
        idx = idx.data
        x1 = torch.autograd.Variable(x1, requires_grad=False)
        x1 = x1.data
        y1 = torch.autograd.Variable(y1, requires_grad=False)
        y1 = y1.data
        x2 = torch.autograd.Variable(x2, requires_grad=False)
        x2 = x2.data
        y2 = torch.autograd.Variable(y2, requires_grad=False)
        y2 = y2.data
        ######################################添加代码#################################################
        torch.index_select(x1, 0, idx, out=xx1)
        torch.index_select(y1, 0, idx, out=yy1)
        torch.index_select(x2, 0, idx, out=xx2)
        torch.index_select(y2, 0, idx, out=yy2)
        # store element-wise max with next highest score
        xx1 = torch.clamp(xx1, min=x1[i])
        yy1 = torch.clamp(yy1, min=y1[i])
        xx2 = torch.clamp(xx2, max=x2[i])
        yy2 = torch.clamp(yy2, max=y2[i])
        w.resize_as_(xx2)
        h.resize_as_(yy2)
        w = xx2 - xx1
        h = yy2 - yy1
        # check sizes of xx1 and xx2.. after each iteration
        w = torch.clamp(w, min=0.0)
        h = torch.clamp(h, min=0.0)
        inter = w * h
        # IoU = i / (area(a) + area(b) - i)
        #####################################添加代码##########################################
        # 否者出错RuntimeError: index_select(): functions with out=... arguments don't support automatic differentiation, but one of the arguments requires grad.
        area = torch.autograd.Variable(area, requires_grad=False)
        area = area.data
        idx = torch.autograd.Variable(idx, requires_grad=False)
        idx = idx.data
        ######################################添加代码#################################################
        rem_areas = torch.index_select(area, 0, idx)  # load remaining areas)
        union = (rem_areas - inter) + area[i]
        IoU = inter / union  # store result in iou
        # keep only elements with an IoU <= overlap
        '''4.保留iou值小于nms阈值的预测边界框的索引'''
        idx = idx[IoU.le(overlap)]  # 保留交并比小于阈值的预测边界框的id
    return keep, count```

+ **warning** 
	*UserWarning: volatile was removed and now has no effect. Use `with torch.no_grad():` instead.
  self.priors = Variable(self.priorbox.forward(), volatile=True)*
  **solved**:
  定位./ssd.py 34line *self.priors = Variable(self.priorbox.forward(), volatile=True)*
  改为:*self.priors = Variable(self.priorbox.forward())*

参考了这位博主的流程:[https://blog.csdn.net/weixin_42447868/article/details/105675158#comments_19145022](https://blog.csdn.net/weixin_42447868/article/details/105675158#comments_19145022)

版权声明:本文为CSDN博主「卖strawberry的小女孩」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/baidu_41906969/article/details/121835265

我还没有学会写个人说明!

暂无评论

发表评论

相关推荐

YOLO-V3-SPP详细解析

YOLO-V3-SPP 继前两篇简单的YOLO博文 YOLO-V1 论文理解《You Only Look Once: Unified, Real-Time Object Detection》YOLO-V2论文理解《YOLO9000: Bet