yolov5训练自己的数据集并计算mAP

本人前段时间用yolov5进行目标检测研究，记录一下流程方便查看，也希望能帮助刚入门的人快速利用yolov5进行研究。

参考文献

（1）https://blog.csdn.net/oJiWuXuan/article/details/107558286?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param
（2）https://blog.csdn.net/sihaiyinan/article/details/89417963?ops_request_misc=%257B%2522request%255Fid%2522%253A%2522159149151419724846405391%2522%252C%2522scm%2522%253A%252220140713.130102334…%2522%257D&request_id=159149151419724846405391&biz_id=0&utm_medium=distribute.pc_search_result.none-task-blog-2_alltop_click~default-1-89417963.ecpm_v1_rank_ctr_v3&utm_term=voc_eval

代码和权重下载

为了方便，可以直接下载本人调试过的代码，这些代码在原有代码上有增加，如mAP的计算。
百度网盘：
链接：https://pan.baidu.com/s/1sbqoA5-xY3z5bZIItwwO5g
提取码：0ved

也可以下载GitHub上的原始代码：
代码下载： https://github.com/ultralytics/yolov5
在这里插入图片描述
下载权重：
我下载的是yolov5s.pt，并将该权重保存到 yolov5\weights中。

准备工作

data中新建几个文件夹

在yolov5\data目录下新建Annotations, ImageSets, labels 三个文件夹。
在这里插入图片描述
先将images文件夹清空，然后将用于训练的图片放入images中，将对应的xml文件放入Annotations中，如下图所示。

makeTxt.py

在yolov5根目录下创建makeTxt.py，代码如下：

import os
import random

trainval_percent = 0.03
train_percent = 1.0
xmlfilepath = './data/Annotations'
total_xml = os.listdir(xmlfilepath)
num = len(total_xml)
list = range(num)
tv = int(num * trainval_percent)
trainval = random.sample(list, tv)

txt_train='./data/ImageSets/train.txt'
if os.path.exists(txt_train):
    os.remove(txt_train)
else:
    open(txt_train,'w')

txt_val='./data/ImageSets/val.txt'
if os.path.exists(txt_val):
    os.remove(txt_val)
else:
    open(txt_val,'w')

ftrain = open(txt_train, 'w')
fval = open(txt_val, 'w')

for i in list:
    name = total_xml[i][:-4] + '\n'
    ftrain.write(name)
    if i in trainval:
        fval.write(name)

ftrain.close()
fval.close()

makeTxt.py主要是将数据集分类成训练数据集和验证数据集，运行后ImagesSets文件夹中会生成2个文件，用于记录训练数据集和验证数据集的图片名称，如下图所示。
在这里插入图片描述因为本人研究需要，对makeTxt.py作了修改。原来的makeTxt.py代码请参考
https://blog.csdn.net/oJiWuXuan/article/details/107558286?utm_medium=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param&depth_1-utm_source=distribute.pc_relevant.none-task-blog-BlogCommendFromMachineLearnPai2-3.channel_param

voc_label.py

接着再新建另一个文件voc_label.py，切记，classes=[……] 中填入的一定要是自己在数据集中所标注的类别名称，标记了几个类别就填写几个类别名，填写错误的话会造成读取不出xml文件里的标注信息。代码如下：

# -*- coding: utf-8 -*-
# xml解析包
import xml.etree.ElementTree as ET
import os
from os import getcwd
import shutil

sets = ['train', 'val']
classes = ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool',
           'hpc_stage_5', 'mixer', 'nozzle', 'nozzle_cone', 'stand']

style = '.png'

# 进行归一化操作
def convert(size, box):  # size:(原图w,原图h) , box:(xmin,xmax,ymin,ymax)
    dw = 1. / size[0]  # 1/w
    dh = 1. / size[1]  # 1/h
    x = (box[0] + box[1]) / 2.0  # 物体在图中的中心点x坐标
    y = (box[2] + box[3]) / 2.0  # 物体在图中的中心点y坐标
    w = box[1] - box[0]  # 物体实际像素宽度
    h = box[3] - box[2]  # 物体实际像素高度
    x = x * dw  # 物体中心点x的坐标比(相当于 x/原图w)
    w = w * dw  # 物体宽度的宽度比(相当于 w/原图w)
    y = y * dh  # 物体中心点y的坐标比(相当于 y/原图h)
    h = h * dh  # 物体宽度的宽度比(相当于 h/原图h)
    return (x, y, w, h)  # 返回 相对于原图的物体中心点的x坐标比,y坐标比,宽度比,高度比,取值范围[0-1]


# year ='2012', 对应图片的id（文件名）
def convert_annotation(image_id):
    '''
    将对应文件名的xml文件转化为label文件，xml文件包含了对应的bunding框以及图片长款大小等信息，
    通过对其解析，然后进行归一化最终读到label文件中去，也就是说
    一张图片文件对应一个xml文件，然后通过解析和归一化，能够将对应的信息保存到唯一一个label文件中去
    labal文件中的格式：calss x y w h　　同时，一张图片对应的类别有多个，所以对应的ｂｕｎｄｉｎｇ的信息也有多个
    '''
    # 对应的通过year 找到相应的文件夹，并且打开相应image_id的xml文件，其对应bund文件
    in_file = open('./data/Annotations/%s.xml' % (image_id), encoding='utf-8')
    # 准备在对应的image_id 中写入对应的label，分别为
    # <object-class> <x> <y> <width> <height>
    out_file = open('./data/labels/%s.txt' % (image_id), 'w', encoding='utf-8')
    # 解析xml文件
    tree = ET.parse(in_file)
    # 获得对应的键值对
    root = tree.getroot()
    # 获得图片的尺寸大小
    size = root.find('size')
    # 如果xml内的标记为空，增加判断条件
    if size != None:
        # 获得宽
        w = int(size.find('width').text)
        # 获得高
        h = int(size.find('height').text)
        # 遍历目标obj
        for obj in root.iter('object'):
            # 获得difficult ？？
            difficult = obj.find('difficult').text
            # 获得类别 =string 类型
            cls = obj.find('name').text
            # 如果类别不是对应在我们预定好的class文件中，或difficult==1则跳过
            if cls not in classes or int(difficult) == 1:
                continue
            # 通过类别名称找到id
            cls_id = classes.index(cls)
            # 找到bndbox 对象
            xmlbox = obj.find('bndbox')
            # 获取对应的bndbox的数组 = ['xmin','xmax','ymin','ymax']
            b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
                 float(xmlbox.find('ymax').text))
            print(image_id, cls, b)
            # 带入进行归一化操作
            # w = 宽, h = 高， b= bndbox的数组 = ['xmin','xmax','ymin','ymax']
            bb = convert((w, h), b)
            # bb 对应的是归一化后的(x,y,w,h)
            # 生成 calss x y w h 在label文件中
            out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')


# 返回当前工作目录
wd = getcwd()
print(wd)

# 先找labels文件夹如果不存在则创建
labels = './data/labels'
if os.path.exists(labels):
    shutil.rmtree(labels)  # delete output folder
os.makedirs(labels)  # make new output folder


for image_set in sets:
    '''
    对所有的文件数据集进行遍历
    做了两个工作：
　　　　１．将所有图片文件都遍历一遍，并且将其所有的全路径都写在对应的txt文件中去，方便定位
　　　　２．同时对所有的图片文件进行解析和转化，将其对应的bundingbox 以及类别的信息全部解析写到label 文件中去
    　　　　　最后再通过直接读取文件，就能找到对应的label 信息
    '''
    # 读取在ImageSets/Main 中的train、test..等文件的内容
    # 包含对应的文件名称
    image_ids = open('./data/ImageSets/%s.txt' % (image_set)).read().strip().split()
    # 打开对应的2012_train.txt 文件对其进行写入准备

    txt_name = './data/%s.txt' % (image_set)
    if os.path.exists(txt_name):
        os.remove(txt_name)
    else:
        open(txt_name, 'w')

    list_file = open(txt_name, 'w')
    # 将对应的文件_id以及全路径写进去并换行
    for image_id in image_ids:
        list_file.write('data/images/%s%s\n' % (image_id, style))
        # 调用  year = 年份  image_id = 对应的文件名_id
        convert_annotation(image_id)
    # 关闭文件
    list_file.close()

voc_label.py主要是将图片数据集标注后的xml文件中的标注信息读取出来并写入txt文件，运行后在labels文件夹中生成所有图片数据集的标注信息，如下图：
在这里插入图片描述
同时在data文件夹下生成train和val两个txt文件。
到此，本次训练所需的数据集已经全部准备好了。

文件修改

数据集方面的yaml文件修改

首先在data目录下，新建object.yaml，并对object.yaml中的参数进行配置。其中train，val后面分别为训练集和验证集图片的路径， nc为数据集的类别数（我的为10类），names: 换成自己的类别名称。
在这里插入图片描述
代码如下：

# COCO 2017 dataset http://cocodataset.org
# Download command: bash yolov5/data/get_coco2017.sh
# Train command: python train.py --data ./data/coco.yaml
# Dataset should be placed next to yolov5 folder:
#   /parent_folder
#     /coco
#     /yolov5


# train and val datasets (image directory or *.txt file with image paths)
train: data/train.txt  # 118k images
val: data/val.txt  # 5k images
#test: data/test.txt  # 20k images for submission to https://competitions.codalab.org/competitions/20794

# number of classes
nc: 10

# class names
names: ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool', 'hpc_stage_5', 'mixer', 'nozzle', 'nozzle_cone', 'stand']

# Print classes
# with open('data/coco.yaml') as f:
#   d = yaml.load(f, Loader=yaml.FullLoader)  # dict
#   for i, x in enumerate(d['names']):
#     print(i, x)

网络参数方面的yaml文件修改

在yolov5\models目录下，选择一个模型，我用的是yolov5s.yaml文件，修改该文件，只需要修改nc。我的是10类。

# parameters
nc: 10  # number of classes
depth_multiple: 0.33  # model depth multiple
width_multiple: 0.50  # layer channel multiple

train.py中的一些参数修改

最后，在根目录中对train.py中的部分参数进行修改，batch-size和workers根据自己电脑的性能进行设置，如下所示：

parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='weights/yolov5s.pt', help='initial weights path')
    parser.add_argument('--cfg', type=str, default='models/yolov5s.yaml', help='model.yaml path')
    parser.add_argument('--data', type=str, default='data/object.yaml', help='data.yaml path')
    parser.add_argument('--hyp', type=str, default='data/hyp.scratch.yaml', help='hyperparameters path')
    parser.add_argument('--epochs', type=int, default=300)
    parser.add_argument('--batch-size', type=int, default=4, help='total batch size for all GPUs')
    parser.add_argument('--workers', type=int, default=2, help='maximum number of dataloader workers')

训练

全部配置好后，直接执行train.py文件开始训练。训练好后会在yolov5\runs\train\exp文件夹得到如下文件：
在这里插入图片描述
其中best.pt是epoch次训练中得到的最好的一个权重，last.pt是最后一次训练所得的权重。

在训练过程中，运行yolov5\loss_line文件夹中的criterion_line.py，可以实时查看loss等曲线：
loss_line文件夹需要自己创建。

criterion_line.py代码：

import matplotlib.pyplot as plt
import numpy as np

with open('../runs/train/exp/results.txt', 'r') as out_data:
    text = out_data.readlines()  # 结果为str类型
loss = []
for ss in text:
    ss = ss.strip()
    ss = ss.split()
    strr = ss[2:6] + ss[8:12]
    numbers = list(map(float, strr))
    loss.append(numbers)

# 0-GIoU, 1-obj, 2-cls, 3-total, 4-P, 5-R, 6-mAP@.5, 7-mAP@.5:.95
loss = np.array(loss)

epoch_n = len(loss)
x = np.linspace(1, epoch_n, epoch_n)

GIoU = loss[:, 0]
obj = loss[:, 1]
cls = loss[:, 2]
total = loss[:, 3]
P = loss[:, 4]
R = loss[:, 5]
mAP_5 = loss[:, 6]
mAP_5_95 = loss[:, 7]

plt.figure(num=1, figsize=(16, 10), )
plt.subplot(4, 2, 1)
plt.plot(x, GIoU, color='red', linewidth=1.0, linestyle='--', label='GIoU')
plt.legend(loc='upper right')

plt.subplot(4, 2, 2)
plt.plot(x, obj, color='red', linewidth=1.0, linestyle='--', label='obj')
plt.legend(loc='upper right')

plt.subplot(4, 2, 3)
plt.plot(x, cls, color='red', linewidth=1.0, linestyle='--', label='cls')
plt.legend(loc='upper right')

plt.subplot(4, 2, 4)
plt.plot(x, total, color='red', linewidth=1.0, linestyle='--', label='total')
plt.legend(loc='upper right')

plt.subplot(4, 2, 5)
plt.plot(x, P, color='red', linewidth=1.0, linestyle='--', label='P')
plt.legend(loc='upper right')

plt.subplot(4, 2, 6)
plt.plot(x, R, color='red', linewidth=1.0, linestyle='--', label='R')
plt.legend(loc='upper right')

plt.subplot(4, 2, 7)
plt.plot(x, mAP_5, color='red', linewidth=1.0, linestyle='--', label='mAP_5')
plt.legend(loc='upper right')

plt.subplot(4, 2, 8)
plt.plot(x, mAP_5_95, color='red', linewidth=1.0, linestyle='--', label='mAP_5_95')
plt.legend(loc='upper right')

plt.show()

结果图片：
在这里插入图片描述

测试

新建data_test

在yolov5中新建data_test文件夹，在该文件夹中新建5个文件夹和一个txt文件，如下所示。
并将测试集中图片放在JPEGImages_manual文件夹中，将对应的xml文件放在Annotations_manual中。
在这里插入图片描述

新建几个py文件

在yolov5中新建mAP文件夹，并新建cfg_mAP.py，detect_eval_class_txt.py，compute_mAP.py，mAP_line.py，utils_mAP.py和yolov5_eval.py
代码分别如下：

cfg_mAP.py：

# -*- coding: utf-8 -*-

import os
from easydict import EasyDict

Cfg = EasyDict()

Cfg.names = ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool', 'hpc_stage_5',
             'mixer', 'nozzle', 'nozzle_cone', 'stand']
# 由于原对象的名字太长，绘制在图片上显得很杂乱，所以将名字简写。
Cfg.textnames = ['combustion', 'fan', 'stator', 'core', 'spool', 'stage', 'mixer', 'nozzle', 'cone', 'stand']

Cfg.device = '0,1'

# manual
Cfg.origimgs_filepath = '../data_test/JPEGImages_manual'
Cfg.testimgs_filepath = '../data_test/JPEGImages_manual'
Cfg.eval_classtxt_path = '../data_test/class_txt_manual/'
Cfg.eval_Annotations_path = '../data_test/Annotations_manual'
Cfg.eval_imgs_name_txt = '../data_test/imgs_name_manual.txt'
Cfg.cachedir = '../data_test/cachedir_manual/'
Cfg.prediction_path = '../data_test/predictions_manual'

# mAP_line cachedir
Cfg.systhesis_valid_cachedir = '../data_test/cachedir_systhesis_valid/'
Cfg.manual_cachedir = '../data_test/cachedir_manual/'

detect_eval_class_txt.py :

import argparse
import os
import platform
import shutil
import time
from pathlib import Path

import cv2
import torch
import torch.backends.cudnn as cudnn
from numpy import random

from models.experimental import attempt_load
from utils.datasets import LoadStreams, LoadImages
from utils.general import (
    check_img_size, non_max_suppression, apply_classifier, scale_coords,
    xyxy2xywh, plot_one_box, strip_optimizer, set_logging)
from utils.torch_utils import select_device, load_classifier, time_synchronized
from cfg_mAP import Cfg

cfg = Cfg


def detect(save_img=False):
    out, source, weights, view_img, save_txt, imgsz = \
        opt.output, opt.source, opt.weights, opt.view_img, opt.save_txt, opt.img_size
    webcam = source == '0' or source.startswith('rtsp') or source.startswith('http') or source.endswith('.txt')

    # Initialize
    set_logging()
    device = select_device(opt.device)
    if os.path.exists(out):
        shutil.rmtree(out)  # delete output folder
    os.makedirs(out)  # make new output folder
    half = device.type != 'cpu'  # half precision only supported on CUDA

    # Load model
    model = attempt_load(weights, map_location=device)  # load FP32 model
    imgsz = check_img_size(imgsz, s=model.stride.max())  # check img_size
    if half:
        model.half()  # to FP16

    # Second-stage classifier
    classify = False
    if classify:
        modelc = load_classifier(name='resnet101', n=2)  # initialize
        modelc.load_state_dict(torch.load('weights/resnet101.pt', map_location=device)['model'])  # load weights
        modelc.to(device).eval()

    # Set Dataloader
    vid_path, vid_writer = None, None
    if webcam:
        view_img = True
        cudnn.benchmark = True  # set True to speed up constant image size inference
        dataset = LoadStreams(source, img_size=imgsz)
    else:
        save_img = True
        dataset = LoadImages(source, img_size=imgsz)

    # Get names and colors
    names = model.module.names if hasattr(model, 'module') else model.names
    colors = [[random.randint(0, 255) for _ in range(3)] for _ in range(len(names))]

    # Run inference
    t0 = time.time()
    img = torch.zeros((1, 3, imgsz, imgsz), device=device)  # init img
    _ = model(img.half() if half else img) if device.type != 'cpu' else None  # run once
    test_time=[]
    for path, img, im0s, vid_cap in dataset:

        # Inference
        t1 = time_synchronized()

        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # # Inference
        # t1 = time_synchronized()
        pred = model(img, augment=opt.augment)[0]

        # Apply NMS
        pred = non_max_suppression(pred, opt.conf_thres, opt.iou_thres, classes=opt.classes, agnostic=opt.agnostic_nms)
        t2 = time_synchronized()

        # Apply Classifier
        if classify:
            pred = apply_classifier(pred, modelc, img, im0s)

        # Process detections
        for i, det in enumerate(pred):  # detections per image
            if webcam:  # batch_size >= 1
                p, s, im0 = path[i], '%g: ' % i, im0s[i].copy()
            else:
                p, s, im0 = path, '', im0s

            img_name = Path(p).name

            txt = open(opt.eval_imgs_name_txt, 'a')
            txt.write(img_name[:-4])
            txt.write('\n')
            txt.close()

            save_path = str(Path(out) / Path(p).name)
            txt_path = str(Path(out) / Path(p).stem) + ('_%g' % dataset.frame if dataset.mode == 'video' else '')
            s += '%gx%g ' % img.shape[2:]  # print string
            gn = torch.tensor(im0.shape)[[1, 0, 1, 0]]  # normalization gain whwh
            if det is not None and len(det):
                # Rescale boxes from img_size to im0 size
                det[:, :4] = scale_coords(img.shape[2:], det[:, :4], im0.shape).round()

                # Print results
                for c in det[:, -1].unique():
                    n = (det[:, -1] == c).sum()  # detections per class
                    s += '%g %ss, ' % (n, names[int(c)])  # add to string

                # Write results
                for *xyxy, conf, cls in reversed(det):
                    txt = open(opt.eval_classtxt_path + '/%s' % names[int(cls)], 'a')
                    obj_conf = conf.cpu().numpy()

                    xyxy = torch.tensor(xyxy).numpy()
                    x1 = xyxy[0]
                    y1 = xyxy[1]
                    x2 = xyxy[2]
                    y2 = xyxy[3]

                    new_box = [img_name[:-4], obj_conf, x1, y1, x2, y2]

                    txt.write(" ".join([str(a) for a in new_box]))
                    txt.write('\n')
                    txt.close()

                    if save_txt:  # Write to file
                        xywh = (xyxy2xywh(torch.tensor(xyxy).view(1, 4)) / gn).view(-1).tolist()  # normalized xywh
                        with open(txt_path + '.txt', 'a') as f:
                            f.write(('%g ' * 5 + '\n') % (cls, *xywh))  # label format

                    if save_img or view_img:  # Add bbox to image
                        label = '%s %.2f' % (cfg.textnames[int(cls)], conf)
                        plot_one_box(xyxy, im0, label=label, color=colors[int(cls)], line_thickness=3)

            test_time.append(t2 - t1)

            # Print time (inference + NMS)
            print('%sDone. (%.3fs)' % (s, t2 - t1))


            # Stream results
            if view_img:
                cv2.imshow(p, im0)
                if cv2.waitKey(1) == ord('q'):  # q to quit
                    raise StopIteration

            # Save results (image with detections)
            if save_img:
                if dataset.mode == 'images':
                    cv2.imwrite(save_path, im0)
                else:
                    if vid_path != save_path:  # new video
                        vid_path = save_path
                        if isinstance(vid_writer, cv2.VideoWriter):
                            vid_writer.release()  # release previous video writer

                        fourcc = 'mp4v'  # output video codec
                        fps = vid_cap.get(cv2.CAP_PROP_FPS)
                        w = int(vid_cap.get(cv2.CAP_PROP_FRAME_WIDTH))
                        h = int(vid_cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
                        vid_writer = cv2.VideoWriter(save_path, cv2.VideoWriter_fourcc(*fourcc), fps, (w, h))
                    vid_writer.write(im0)

    if save_txt or save_img:
        print('Results saved to %s' % Path(out))
        if platform.system() == 'Darwin' and not opt.update:  # MacOS
            os.system('open ' + save_path)

    print('Done. (%.3fs)' % (time.time() - t0))
    mean_time=sum(test_time)/len(test_time)
    print('mean time:', mean_time)
    print('frame: ', 1/mean_time)


if __name__ == '__main__':

    dir = '../data_test/imgs_name_manual.txt'
    if os.path.exists(dir):
        os.remove(dir)
    else:
        open(dir, 'w')

    predictions_manual='../data_test/predictions_manual'
    class_txt_manual='../data_test/class_txt_manual'
    cachedir_manual='../data_test/cachedir_manual'

    if os.path.exists(predictions_manual):
        shutil.rmtree(predictions_manual)  # delete output folder
    os.makedirs(predictions_manual)  # make new output folder

    if os.path.exists(class_txt_manual):
        shutil.rmtree(class_txt_manual)  # delete output folder
    os.makedirs(class_txt_manual)  # make new output folder

    if os.path.exists(cachedir_manual):
        shutil.rmtree(cachedir_manual)  # delete output folder
    os.makedirs(cachedir_manual)  # make new output folder

    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default='../runs/train/exp/weights/last.pt', help='model.pt path(s)')
    parser.add_argument('--source', type=str, default='../data_test/JPEGImages_manual',
                        help='source')  # file/folder, 0 for webcam
    parser.add_argument('--output', type=str, default='../data_test/predictions_manual',
                        help='output folder')  # output folder
    parser.add_argument('--eval_imgs_name_txt', type=str, default='../data_test/imgs_name_manual.txt',
                        help='output folder')  # output folder
    parser.add_argument('--eval_classtxt_path', type=str, default='../data_test/class_txt_manual',
                        help='output folder')  # output folder
    parser.add_argument('--img-size', type=int, default=640, help='inference size (pixels)')
    parser.add_argument('--conf-thres', type=float, default=0.4, help='object confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.5, help='IOU threshold for NMS')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='display results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --class 0, or --class 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--update', action='store_true', help='update all models')
    opt = parser.parse_args()
    print(opt)

    with torch.no_grad():
        if opt.update:  # update all models (to fix SourceChangeWarning)
            for opt.weights in ['yolov5s.pt', 'yolov5m.pt', 'yolov5l.pt', 'yolov5x.pt']:
                detect()
                strip_optimizer(opt.weights)
        else:
            detect()

此外需要将plot_one_box函数放入yolov5\utils\general.py中。

def plot_one_box(x, img, color=None, label=None, line_thickness=None):
    # Plots one bounding box on image img
    tl = line_thickness or round(0.002 * (img.shape[0] + img.shape[1]) / 2) + 1  # line/font thickness
    color = color or [random.randint(0, 255) for _ in range(3)]
    c1, c2 = (int(x[0]), int(x[1])), (int(x[2]), int(x[3]))
    cv2.rectangle(img, c1, c2, color, thickness=tl, lineType=cv2.LINE_AA)
    if label:
        tf = max(tl - 1, 1)  # font thickness
        t_size = cv2.getTextSize(label, 0, fontScale=tl / 3, thickness=tf)[0]
        c2 = c1[0] + t_size[0], c1[1] - t_size[1] - 3
        cv2.rectangle(img, c1, c2, color, -1, cv2.LINE_AA)  # filled
        cv2.putText(img, label, (c1[0], c1[1] - 2), 0, tl / 3, [225, 255, 255], thickness=tf, lineType=cv2.LINE_AA)

compute_mAP.py：

# -*- coding: utf-8 -*-
import os
import numpy as np
from yolov5_eval import yolov5_eval  # 注意将yolov4_eval.py和compute_mAP.py放在同一级目录下
from cfg_mAP import Cfg
import pickle
import shutil

cfg = Cfg
eval_classtxt_path = cfg.eval_classtxt_path  # 各类txt文件路径
eval_classtxt_files = os.listdir(eval_classtxt_path)

classes = cfg.names  # ['combustion_lining', 'fan', 'fan_stator_casing_and_support', 'hp_core_casing', 'hpc_spool', 'hpc_stage_5','mixer', 'nozzle', 'nozzle_cone', 'stand']

aps = []  # 保存各类ap
cls_rec = {}  # 保存recall
cls_prec = {}  # 保存精度
cls_ap = {}

annopath = cfg.eval_Annotations_path + '/{:s}.xml'  # annotations的路径，{:s}.xml方便后面根据图像名字读取对应的xml文件
imagesetfile = cfg.eval_imgs_name_txt  # 读取图像名字列表文件
cachedir = cfg.cachedir

if os.path.exists(cachedir):
    shutil.rmtree(cachedir)  # delete output folder
os.makedirs(cachedir)  # make new output folder

for cls in eval_classtxt_files:  # 读取cls类对应的txt文件
    filename = eval_classtxt_path + cls

    rec, prec, ap = yolov5_eval(  # yolov4_eval.py计算cls类的recall precision ap
        filename, annopath, imagesetfile, cls, cachedir, ovthresh=0.5,
        use_07_metric=False)

    aps += [ap]
    cls_ap[cls] = ap
    cls_rec[cls] = rec[-1]
    cls_prec[cls] = prec[-1]

    print('AP for {} = {:.4f}'.format(cls, ap))
    print('recall for {} = {:.4f}'.format(cls, rec[-1]))
    print('precision for {} = {:.4f}'.format(cls, prec[-1]))

with open(os.path.join(cfg.cachedir, 'cls_ap.pkl'), 'wb') as in_data:
    pickle.dump(cls_ap, in_data, pickle.HIGHEST_PROTOCOL)

with open(os.path.join(cfg.cachedir, 'cls_rec.pkl'), 'wb') as in_data:
    pickle.dump(cls_rec, in_data, pickle.HIGHEST_PROTOCOL)

with open(os.path.join(cfg.cachedir, 'cls_prec.pkl'), 'wb') as in_data:
    pickle.dump(cls_prec, in_data, pickle.HIGHEST_PROTOCOL)

print('Mean AP = {:.4f}'.format(np.mean(aps)))
print('~~~~~~~~')

print('Results:')
for ap in aps:
    print('{:.3f}'.format(ap))
print('~~~~~~~~')
print('{:.3f}'.format(np.mean(aps)))
print('~~~~~~~~')

mAP_line.py :

import os
import matplotlib.pyplot as plt
import numpy as np
import pickle
from cfg_mAP import Cfg

cfg = Cfg

x = np.linspace(1, 10, 10)
ap_systhesis_valid = []
ap_manual = []
plt.figure(num=1, figsize=(8, 5), )

with open(os.path.join(cfg.manual_cachedir, 'cls_ap.pkl'), 'rb') as out_data:
    # 按保存变量的顺序加载变量
    manual_cls_ap = pickle.load(out_data)
    print(manual_cls_ap)  # dataList
    print(len(manual_cls_ap))  # dataList

for cls in cfg.names:
    if cls in manual_cls_ap.keys():
        ap_manual.append(manual_cls_ap[cls])
    else:
        ap_manual.append(0.0)
print('ap_manual: ', ap_manual)
manual_mAP = np.mean(ap_manual)
l2, = plt.plot(x, ap_manual, color='k', linewidth=1.0, linestyle='-.', label='manual_AP')
plt.scatter(x, ap_manual, s=10, color='k')
for x1, y1 in zip(x, ap_manual):
    plt.text(x1, y1, '%s' % str('{0:.3f}'.format(y1)), fontdict={'fontsize': 14}, verticalalignment="bottom",
             horizontalalignment="center")

plt.annotate(r'manual_mAP=%s' % str('{0:.3f}'.format(manual_mAP)), xy=(5, manual_mAP), xycoords='data',
             xytext=(0.0, 0.0),
             textcoords='offset points', fontsize=13, )

plt.xticks(np.linspace(1, 10, 10),
           [r'combustion_lining', r'fan', r'fan_support', r'hp_core_casing', r'hpc_spool',
            r'hpc_stage5', r'mixer', r'nozzle', r'nozzle_cone', r'stand'])

plt.legend(handles=[l2], loc='best')

plt.show()

utils_mAP.py:

import sys
import os
import time
import math
import torch
import numpy as np
from PIL import Image, ImageDraw, ImageFont
from torch.autograd import Variable

import itertools
import struct  # get_image_size
import imghdr  # get_image_size


def sigmoid(x):
    return 1.0 / (np.exp(-x) + 1.)


def softmax(x):
    x = np.exp(x - np.expand_dims(np.max(x, axis=1), axis=1))
    x = x / np.expand_dims(x.sum(axis=1), axis=1)
    return x


def bbox_iou(box1, box2, x1y1x2y2=True):
    if x1y1x2y2:
        mx = min(box1[0], box2[0])
        Mx = max(box1[2], box2[2])
        my = min(box1[1], box2[1])
        My = max(box1[3], box2[3])
        w1 = box1[2] - box1[0]
        h1 = box1[3] - box1[1]
        w2 = box2[2] - box2[0]
        h2 = box2[3] - box2[1]
    else:
        mx = min(box1[0] - box1[2] / 2.0, box2[0] - box2[2] / 2.0)
        Mx = max(box1[0] + box1[2] / 2.0, box2[0] + box2[2] / 2.0)
        my = min(box1[1] - box1[3] / 2.0, box2[1] - box2[3] / 2.0)
        My = max(box1[1] + box1[3] / 2.0, box2[1] + box2[3] / 2.0)
        w1 = box1[2]
        h1 = box1[3]
        w2 = box2[2]
        h2 = box2[3]
    uw = Mx - mx
    uh = My - my
    cw = w1 + w2 - uw
    ch = h1 + h2 - uh
    carea = 0
    if cw <= 0 or ch <= 0:
        return 0.0

    area1 = w1 * h1
    area2 = w2 * h2
    carea = cw * ch
    uarea = area1 + area2 - carea
    return carea / uarea


def bbox_ious(boxes1, boxes2, x1y1x2y2=True):
    if x1y1x2y2:
        mx = torch.min(boxes1[0], boxes2[0])
        Mx = torch.max(boxes1[2], boxes2[2])
        my = torch.min(boxes1[1], boxes2[1])
        My = torch.max(boxes1[3], boxes2[3])
        w1 = boxes1[2] - boxes1[0]
        h1 = boxes1[3] - boxes1[1]
        w2 = boxes2[2] - boxes2[0]
        h2 = boxes2[3] - boxes2[1]
    else:
        mx = torch.min(boxes1[0] - boxes1[2] / 2.0, boxes2[0] - boxes2[2] / 2.0)
        Mx = torch.max(boxes1[0] + boxes1[2] / 2.0, boxes2[0] + boxes2[2] / 2.0)
        my = torch.min(boxes1[1] - boxes1[3] / 2.0, boxes2[1] - boxes2[3] / 2.0)
        My = torch.max(boxes1[1] + boxes1[3] / 2.0, boxes2[1] + boxes2[3] / 2.0)
        w1 = boxes1[2]
        h1 = boxes1[3]
        w2 = boxes2[2]
        h2 = boxes2[3]
    uw = Mx - mx
    uh = My - my
    cw = w1 + w2 - uw
    ch = h1 + h2 - uh
    mask = ((cw <= 0) + (ch <= 0) > 0)
    area1 = w1 * h1
    area2 = w2 * h2
    carea = cw * ch
    carea[mask] = 0
    uarea = area1 + area2 - carea
    return carea / uarea


def nms(_boxes, _nms_thresh):
    if len(_boxes) == 0:
        return _boxes

    det_confs = torch.zeros(len(_boxes))
    for i in range(len(_boxes)):
        det_confs[i] = 1 - _boxes[i][4]

    _, sortIds = torch.sort(det_confs)
    out_boxes = []
    for i in range(len(_boxes)):
        box_i = _boxes[sortIds[i]]
        if box_i[4] > 0:
            out_boxes.append(box_i)
            for j in range(i + 1, len(_boxes)):
                box_j = _boxes[sortIds[j]]
                if bbox_iou(box_i, box_j, x1y1x2y2=False) > _nms_thresh:
                    # print(box_i, box_j, bbox_iou(box_i, box_j, x1y1x2y2=False))
                    box_j[4] = 0
    return out_boxes


def convert2cpu(gpu_matrix):
    return torch.FloatTensor(gpu_matrix.size()).copy_(gpu_matrix)


def convert2cpu_long(gpu_matrix):
    return torch.LongTensor(gpu_matrix.size()).copy_(gpu_matrix)


def get_region_boxes_in_model(output, conf_thresh, num_classes, anchors, num_anchors, only_objectness=1,
                              validation=False):
    anchor_step = len(anchors) // num_anchors
    if output.dim() == 3:
        output = output.unsqueeze(0)
    batch = output.size(0)
    assert (output.size(1) == (5 + num_classes) * num_anchors)
    h = output.size(2)
    w = output.size(3)

    t0 = time.time()
    all_boxes = []
    output = output.view(batch * num_anchors, 5 + num_classes, h * w).transpose(0, 1).contiguous().view(5 + num_classes,
                                                                                                        batch * num_anchors * h * w)

    grid_x = torch.linspace(0, w - 1, w).repeat(h, 1).repeat(batch * num_anchors, 1, 1).view(
        batch * num_anchors * h * w).type_as(output)  # cuda()
    grid_y = torch.linspace(0, h - 1, h).repeat(w, 1).t().repeat(batch * num_anchors, 1, 1).view(
        batch * num_anchors * h * w).type_as(output)  # cuda()
    xs = torch.sigmoid(output[0]) + grid_x
    ys = torch.sigmoid(output[1]) + grid_y

    anchor_w = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([0]))
    anchor_h = torch.Tensor(anchors).view(num_anchors, anchor_step).index_select(1, torch.LongTensor([1]))
    anchor_w = anchor_w.repeat(batch, 1).repeat(1, 1, h * w).view(batch * num_anchors * h * w).type_as(output)  # cuda()
    anchor_h = anchor_h.repeat(batch, 1).repeat(1, 1, h * w).view(batch * num_anchors * h * w).type_as(output)  # cuda()
    ws = torch.exp(output[2]) * anchor_w
    hs = torch.exp(output[3]) * anchor_h

    det_confs = torch.sigmoid(output[4])

    cls_confs = torch.nn.Softmax()(Variable(output[5:5 + num_classes].transpose(0, 1))).data
    cls_max_confs, cls_max_ids = torch.max(cls_confs, 1)
    cls_max_confs = cls_max_confs.view(-1)
    cls_max_ids = cls_max_ids.view(-1)
    t1 = time.time()

    sz_hw = h * w
    sz_hwa = sz_hw * num_anchors
    det_confs = convert2cpu(det_confs)
    cls_max_confs = convert2cpu(cls_max_confs)
    cls_max_ids = convert2cpu_long(cls_max_ids)
    xs = convert2cpu(xs)
    ys = convert2cpu(ys)
    ws = convert2cpu(ws)
    hs = convert2cpu(hs)
    if validation:
        cls_confs = convert2cpu(cls_confs.view(-1, num_classes))
    t2 = time.time()
    for b in range(batch):
        boxes = []
        for cy in range(h):
            for cx in range(w):
                for i in range(num_anchors):
                    ind = b * sz_hwa + i * sz_hw + cy * w + cx
                    det_conf = det_confs[ind]
                    if only_objectness:
                        conf = det_confs[ind]
                    else:
                        conf = det_confs[ind] * cls_max_confs[ind]

                    if conf > conf_thresh:
                        bcx = xs[ind]
                        bcy = ys[ind]
                        bw = ws[ind]
                        bh = hs[ind]
                        cls_max_conf = cls_max_confs[ind]
                        cls_max_id = cls_max_ids[ind]
                        box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]
                        if (not only_objectness) and validation:
                            for c in range(num_classes):
                                tmp_conf = cls_confs[ind][c]
                                if c != cls_max_id and det_confs[ind] * tmp_conf > conf_thresh:
                                    box.append(tmp_conf)
                                    box.append(c)
                        boxes.append(box)
        all_boxes.append(boxes)
    t3 = time.time()
    if False:
        print('---------------------------------')
        print('matrix computation : %f' % (t1 - t0))
        print('        gpu to cpu : %f' % (t2 - t1))
        print('      tpz filter : %f' % (t3 - t2))
        print('---------------------------------')
    return all_boxes


def get_region_boxes_out_model(_output, _cfg, _anchors, _num_anchors, _only_objectness=1, _validation=False):
    anchor_step = len(_anchors) // _num_anchors
    if len(_output.shape) == 3:
        _output = np.expand_dims(_output, axis=0)
    batch = _output.shape[0]
    assert (_output.shape[1] == (5 + _cfg.classes) * _num_anchors)
    h = _output.shape[2]
    w = _output.shape[3]

    t0 = time.time()
    all_boxes = []
    _output = _output.reshape(batch * _num_anchors, 5 + _cfg.classes, h * w).transpose((1, 0, 2)).reshape(
        5 + _cfg.classes,
        batch * _num_anchors * h * w)

    grid_x = np.expand_dims(np.expand_dims(np.linspace(0, w - 1, w), axis=0).repeat(h, 0), axis=0).repeat(
        batch * _num_anchors, axis=0).reshape(
        batch * _num_anchors * h * w)
    grid_y = np.expand_dims(np.expand_dims(np.linspace(0, h - 1, h), axis=0).repeat(w, 0).T, axis=0).repeat(
        batch * _num_anchors, axis=0).reshape(
        batch * _num_anchors * h * w)

    xs = sigmoid(_output[0]) + grid_x
    ys = sigmoid(_output[1]) + grid_y

    anchor_w = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 0]
    anchor_h = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 1]
    anchor_w = np.expand_dims(np.expand_dims(anchor_w, axis=1).repeat(batch, 1), axis=2) \
        .repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
    anchor_h = np.expand_dims(np.expand_dims(anchor_h, axis=1).repeat(batch, 1), axis=2) \
        .repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
    ws = np.exp(_output[2]) * anchor_w
    hs = np.exp(_output[3]) * anchor_h

    det_confs = sigmoid(_output[4])

    cls_confs = softmax(_output[5:5 + _cfg.classes].transpose(1, 0))
    cls_max_confs = np.max(cls_confs, 1)
    cls_max_ids = np.argmax(cls_confs, 1)
    t1 = time.time()

    sz_hw = h * w
    sz_hwa = sz_hw * _num_anchors
    t2 = time.time()
    for b in range(batch):
        boxes = []
        for cy in range(h):
            for cx in range(w):
                for i in range(_num_anchors):
                    ind = b * sz_hwa + i * sz_hw + cy * w + cx
                    det_conf = det_confs[ind]
                    if _only_objectness:
                        conf = det_confs[ind]
                    else:
                        conf = det_confs[ind] * cls_max_confs[ind]

                    if conf > _cfg.conf_thresh:
                        bcx = xs[ind]
                        bcy = ys[ind]
                        bw = ws[ind]
                        bh = hs[ind]
                        cls_max_conf = cls_max_confs[ind]
                        cls_max_id = cls_max_ids[ind]
                        box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]
                        if (not _only_objectness) and _validation:
                            for c in range(_cfg.classes):
                                tmp_conf = cls_confs[ind][c]
                                if c != cls_max_id and det_confs[ind] * tmp_conf > _cfg.conf_thresh:
                                    box.append(tmp_conf)
                                    box.append(c)
                        boxes.append(box)
        all_boxes.append(boxes)
    t3 = time.time()
    if False:
        print('---------------------------------')
        print('matrix computation : %f' % (t1 - t0))
        print('        gpu to cpu : %f' % (t2 - t1))
        print('      tpz filter : %f' % (t3 - t2))
        print('---------------------------------')
    return all_boxes


def get_classtxt_out_model(_output, _cfg, _anchors, _num_anchors, _only_objectness=1, _validation=False):
    anchor_step = len(_anchors) // _num_anchors
    if len(_output.shape) == 3:
        _output = np.expand_dims(_output, axis=0)
    batch = _output.shape[0]
    assert (_output.shape[1] == (5 + _cfg.n_classes) * _num_anchors)
    h = _output.shape[2]
    w = _output.shape[3]

    t0 = time.time()
    all_boxes = []
    _output = _output.reshape(batch * _num_anchors, 5 + _cfg.n_classes, h * w).transpose((1, 0, 2)).reshape(
        5 + _cfg.n_classes,
        batch * _num_anchors * h * w)

    grid_x = np.expand_dims(np.expand_dims(np.linspace(0, w - 1, w), axis=0).repeat(h, 0), axis=0).repeat(
        batch * _num_anchors, axis=0).reshape(
        batch * _num_anchors * h * w)
    grid_y = np.expand_dims(np.expand_dims(np.linspace(0, h - 1, h), axis=0).repeat(w, 0).T, axis=0).repeat(
        batch * _num_anchors, axis=0).reshape(
        batch * _num_anchors * h * w)

    xs = sigmoid(_output[0]) + grid_x
    ys = sigmoid(_output[1]) + grid_y

    anchor_w = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 0]
    anchor_h = np.array(_anchors).reshape((_num_anchors, anchor_step))[:, 1]
    anchor_w = np.expand_dims(np.expand_dims(anchor_w, axis=1).repeat(batch, 1), axis=2) \
        .repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
    anchor_h = np.expand_dims(np.expand_dims(anchor_h, axis=1).repeat(batch, 1), axis=2) \
        .repeat(h * w, axis=2).transpose(1, 0, 2).reshape(batch * _num_anchors * h * w)
    ws = np.exp(_output[2]) * anchor_w
    hs = np.exp(_output[3]) * anchor_h

    det_confs = sigmoid(_output[4])

    cls_confs = softmax(_output[5:5 + _cfg.n_classes].transpose(1, 0))
    cls_max_confs = np.max(cls_confs, 1)
    cls_max_ids = np.argmax(cls_confs, 1)
    t1 = time.time()

    sz_hw = h * w
    sz_hwa = sz_hw * _num_anchors
    t2 = time.time()
    for b in range(batch):
        boxes = []
        for cy in range(h):
            for cx in range(w):
                for i in range(_num_anchors):
                    ind = b * sz_hwa + i * sz_hw + cy * w + cx
                    det_conf = det_confs[ind]
                    if _only_objectness:
                        conf = det_confs[ind]
                    else:
                        conf = det_confs[ind] * cls_max_confs[ind]

                    if conf > _cfg.conf_thresh:
                        bcx = xs[ind]
                        bcy = ys[ind]
                        bw = ws[ind]
                        bh = hs[ind]
                        cls_max_conf = cls_max_confs[ind]
                        cls_max_id = cls_max_ids[ind]
                        box = [bcx / w, bcy / h, bw / w, bh / h, det_conf, cls_max_conf, cls_max_id]
                        if (not _only_objectness) and _validation:
                            for c in range(_cfg.classes):
                                tmp_conf = cls_confs[ind][c]
                                if c != cls_max_id and det_confs[ind] * tmp_conf > _cfg.conf_thresh:
                                    box.append(tmp_conf)
                                    box.append(c)
                        boxes.append(box)
        all_boxes.append(boxes)
    t3 = time.time()
    if False:
        print('---------------------------------')
        print('matrix computation : %f' % (t1 - t0))
        print('        gpu to cpu : %f' % (t2 - t1))
        print('      tpz filter : %f' % (t3 - t2))
        print('---------------------------------')
    return all_boxes


def plot_boxes_cv2(img, boxes, savename=None, class_names=None, color=None):
    import cv2
    colors = torch.FloatTensor([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]]);

    def get_color(c, x, max_val):
        ratio = float(x) / max_val * 5
        i = int(math.floor(ratio))
        j = int(math.ceil(ratio))
        ratio = ratio - i
        r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
        return int(r * 255)

    width = img.shape[1]
    height = img.shape[0]
    for i in range(len(boxes)):
        box = boxes[i]
        x1 = int((box[0] - box[2] / 2.0) * width)
        y1 = int((box[1] - box[3] / 2.0) * height)
        x2 = int((box[0] + box[2] / 2.0) * width)
        y2 = int((box[1] + box[3] / 2.0) * height)

        if color:
            rgb = color
        else:
            rgb = (255, 0, 0)
        if len(box) >= 7 and class_names:
            cls_conf = box[5]
            cls_id = box[6]
            print('%s: %f' % (class_names[cls_id], cls_conf))
            classes = len(class_names)
            offset = cls_id * 123457 % classes
            red = get_color(2, offset, classes)
            green = get_color(1, offset, classes)
            blue = get_color(0, offset, classes)
            if color is None:
                rgb = (red, green, blue)
            img = cv2.putText(img, class_names[cls_id], (x1, y1), cv2.FONT_HERSHEY_SIMPLEX, 1.2, rgb, 1)
        img = cv2.rectangle(img, (x1, y1), (x2, y2), rgb, 1)
    if savename:
        print("save plot results to %s" % savename)
        cv2.imwrite(savename, img)
    return img


def plot_boxes(_img,  _boxes, _savename=None, _class_names=None):
    font = ImageFont.truetype("consola.ttf", 40, encoding="unic")  # 设置字体

    colors = torch.FloatTensor([[1, 0, 1], [0, 0, 1], [0, 1, 1], [0, 1, 0], [1, 1, 0], [1, 0, 0]]);

    def get_color(c, x, max_val):
        ratio = float(x) / max_val * 5
        i = int(math.floor(ratio))
        j = int(math.ceil(ratio))
        ratio = ratio - i
        r = (1 - ratio) * colors[i][c] + ratio * colors[j][c]
        return int(r * 255)

    # width = _img.shape[1]
    # height = _img.shape[0]
    draw = ImageDraw.Draw(_img)
    for i in range(len(_boxes)):
        box = _boxes[i]
        x1 = box[0]
        y1 = box[1]
        x2 = box[2]
        y2 = box[3]

        rgb = (255, 0, 0)
        if len(box) >= 7 and _class_names:
            cls_conf = box[5]
            cls_id = box[6]
            print('%s: %f' % (_class_names[cls_id], cls_conf))
            classes = len(_class_names)
            offset = cls_id * 123457 % classes
            red = get_color(2, offset, classes)
            green = get_color(1, offset, classes)
            blue = get_color(0, offset, classes)
            rgb = (red, green, blue)
            # draw.text((x1, y1), _class_names[cls_id], fill=rgb, font=font)
            draw.text((x1, y1), _class_names[cls_id], fill=rgb, font=font)
        draw.rectangle([x1, y1, x2, y2], outline=rgb, width=5)
    if _savename:
        print("save plot results to %s" % _savename)
        _img.save(_savename)
    return _img


def read_truths(lab_path):
    if not os.path.exists(lab_path):
        return np.array([])
    if os.path.getsize(lab_path):
        truths = np.loadtxt(lab_path)
        truths = truths.reshape(truths.size / 5, 5)  # to avoid single truth problem
        return truths
    else:
        return np.array([])


def load_class_names(_namesfile):
    class_names = []
    with open(_namesfile, 'r') as fp:
        lines = fp.readlines()
    for line in lines:
        line = line.rstrip()
        class_names.append(line)
    return class_names


def do_detect(_model, _img, _cfg, _use_cuda=1):
    _model.eval()
    t0 = time.time()

    if isinstance(_img, Image.Image):
        width = _img.width
        height = _img.height
        img = torch.ByteTensor(torch.ByteStorage.from_buffer(_img.tobytes()))
        img = img.view(height, width, 3).transpose(0, 1).transpose(0, 2).contiguous()
        img = img.view(1, 3, height, width)
        img = img.float().div(255.0)
    elif type(_img) == np.ndarray and len(_img.shape) == 3:  # cv2 image
        img = torch.from_numpy(_img.transpose(2, 0, 1)).float().div(255.0).unsqueeze(0)
    elif type(_img) == np.ndarray and len(_img.shape) == 4:
        img = torch.from_numpy(_img.transpose(0, 3, 1, 2)).float().div(255.0)
    else:
        print("unknow image type")
        exit(-1)

    t1 = time.time()

    if _use_cuda:
        img = img.cuda()
    img = torch.autograd.Variable(img)
    t2 = time.time()

    list_features = _model(img)

    list_features_numpy = []
    for feature in list_features:
        list_features_numpy.append(feature.data.cup().numpy())

    return post_processing(_img=img, _cfg=_cfg, _list_features_numpy=list_features_numpy, _t0=t0, _t1=t1, _t2=t2)


def post_processing(_img, _cfg, _list_features_numpy, _t0, _t1, _t2):
    anchor_step = len(_cfg.anchors) // _cfg.num_anchors
    boxes = []
    for i in range(3):
        masked_anchors = []
        for m in _cfg.anchor_masks[i]:
            masked_anchors += _cfg.anchors[m * anchor_step:(m + 1) * anchor_step]
        masked_anchors = [anchor / _cfg.strides[i] for anchor in masked_anchors]
        boxes.append(get_region_boxes_out_model(_output=_list_features_numpy[i], _cfg=_cfg, _anchors=masked_anchors,
                                                _num_anchors=len(_cfg.anchor_masks[i])))
    if _img.shape[0] > 1:
        bboxs_for_imgs = [
            boxes[0][index] + boxes[1][index] + boxes[2][index]
            for index in range(_img.shape[0])]
        # 分别对每一张图片的结果进行nms
        t3 = time.time()
        boxes = [nms(_boxes=bboxs, _nms_thresh=_cfg.nms_thresh) for bboxs in bboxs_for_imgs]
    else:
        boxes = boxes[0][0] + boxes[1][0] + boxes[2][0]
        t3 = time.time()
        boxes = nms(boxes, _cfg.nms_thresh)
    t4 = time.time()

    if True:
        print('-----------------------------------')
        print(' image to tensor : %f' % (_t1 - _t0))
        print('  tensor to cuda : %f' % (_t2 - _t1))
        print('         predict : %f' % (t3 - _t2))
        print('             nms : %f' % (t4 - t3))
        print('           total : %f' % (t4 - _t0))
        print('-----------------------------------')
    return boxes


def classtxt_processing(_img, _cfg, _list_features_numpy, _t0, _t1, _t2):
    anchor_step = len(_cfg.anchors) // _cfg.num_anchors
    boxes = []
    for i in range(3):
        masked_anchors = []
        for m in _cfg.anchor_masks[i]:
            masked_anchors += _cfg.anchors[m * anchor_step:(m + 1) * anchor_step]
        masked_anchors = [anchor / _cfg.strides[i] for anchor in masked_anchors]
        boxes.append(get_classtxt_out_model(_output=_list_features_numpy[i], _cfg=_cfg, _anchors=masked_anchors,
                                            _num_anchors=len(_cfg.anchor_masks[i])))
    if _img.shape[0] > 1:
        bboxs_for_imgs = [
            boxes[0][index] + boxes[1][index] + boxes[2][index]
            for index in range(_img.shape[0])]
        # 分别对每一张图片的结果进行nms
        t3 = time.time()
        boxes = [nms(_boxes=bboxs, _nms_thresh=_cfg.nms_thresh) for bboxs in bboxs_for_imgs]
    else:
        boxes = boxes[0][0] + boxes[1][0] + boxes[2][0]
        t3 = time.time()
        boxes = nms(boxes, _cfg.nms_thresh)
    t4 = time.time()

    if True:
        print('-----------------------------------')
        print(' image to tensor : %f' % (_t1 - _t0))
        print('  tensor to cuda : %f' % (_t2 - _t1))
        print('         predict : %f' % (t3 - _t2))
        print('             nms : %f' % (t4 - t3))
        print('           total : %f' % (t4 - _t0))
        print('-----------------------------------')
    return boxes


def gen_cls_txt(_model, _img, _cfg, _use_cuda):
    _model.eval()
    t0 = time.time()

    if isinstance(_img, Image.Image):
        width = _img.width
        height = _img.height
        img = torch.ByteTensor(torch.ByteStorage.from_buffer(_img.tobytes()))
        img = img.view(height, width, 3).transpose(0, 1).transpose(0, 2).contiguous()
        img = img.view(1, 3, height, width)
        img = img.float().div(255.0)
    elif type(_img) == np.ndarray and len(_img.shape) == 3:  # cv2 image
        img = torch.from_numpy(_img.transpose(2, 0, 1)).float().div(255.0).unsqueeze(0)
    elif type(_img) == np.ndarray and len(_img.shape) == 4:
        img = torch.from_numpy(_img.transpose(0, 3, 1, 2)).float().div(255.0)
    else:
        print("unknow image type")
        exit(-1)

    t1 = time.time()

    if _use_cuda:
        img = img.cuda()
    img = torch.autograd.Variable(img)
    t2 = time.time()

    list_features = _model(img)

    list_features_numpy = []
    for feature in list_features:
        list_features_numpy.append(feature.data.cpu().numpy())

    return classtxt_processing(_img=img, _cfg=_cfg, _list_features_numpy=list_features_numpy, _t0=t0, _t1=t1, _t2=t2)

yolov5_eval.py :

# -*- coding: utf-8 -*-
# --------------------------------------------------------
# Fast/er R-CNN
# Licensed under The MIT License [see LICENSE for details]
# Written by Bharath Hariharan
# --------------------------------------------------------

import xml.etree.ElementTree as ET
import os
import pickle
import numpy as np


def parse_rec(filename):
    """ Parse a PASCAL VOC xml file """
    tree = ET.parse(filename)
    objects = []
    for obj in tree.findall('object'):
        obj_struct = {}
        obj_struct['name'] = (obj.find('name').text).replace(" ", "")
        obj_struct['pose'] = obj.find('pose').text
        obj_struct['truncated'] = int(obj.find('truncated').text)
        obj_struct['difficult'] = int(obj.find('difficult').text)
        bbox = obj.find('bndbox')
        obj_struct['bbox'] = [int(bbox.find('xmin').text),
                              int(bbox.find('ymin').text),
                              int(bbox.find('xmax').text),
                              int(bbox.find('ymax').text)]
        objects.append(obj_struct)

    return objects


def voc_ap(rec, prec, use_07_metric=False):  # voc2007的计算方式和voc2012的计算方式不同，目前一般采用第二种
    """ ap = voc_ap(rec, prec, [use_07_metric])
    Compute VOC AP given precision and recall.
    If use_07_metric is true, uses the
    VOC 07 11 point method (default:False).
    """
    if use_07_metric:
        # 11 point metric
        ap = 0.
        for t in np.arange(0., 1.1, 0.1):
            if np.sum(rec >= t) == 0:
                p = 0
            else:
                p = np.max(prec[rec >= t])
            ap = ap + p / 11.
    else:
        # correct AP calculation
        # first append sentinel values at the end
        mrec = np.concatenate(([0.], rec, [1.]))
        mpre = np.concatenate(([0.], prec, [0.]))

        # compute the precision envelope
        for i in range(mpre.size - 1, 0, -1):
            mpre[i - 1] = np.maximum(mpre[i - 1], mpre[i])

        # to calculate area under PR curve, look for points
        # where X axis (recall) changes value
        i = np.where(mrec[1:] != mrec[:-1])[0]

        # and sum (\Delta recall) * prec
        ap = np.sum((mrec[i + 1] - mrec[i]) * mpre[i + 1])
    return ap


## 程序入口

def yolov5_eval(detpath,  # 保存检测到的目标框的文件路径，每一类的目标框单独保存在一个文件
                annopath,  # Annotations的路径
                imagesetfile,  # 测试图片名字列表
                classname,  # 类别名称
                cachedir,  # 缓存文件夹
                ovthresh=0.5,  # IoU阈值
                use_07_metric=False):  # mAP计算方法
    """rec, prec, ap = voc_eval(eval_classtxt_path,
                                annopath,
                                imagesetfile,
                                classname,
                                [ovthresh],
                                [use_07_metric])
    Top level function that does the PASCAL VOC evaluation.
    eval_classtxt_path: Path to detections
        eval_classtxt_path.format(classname) should produce the detection results file.
    annopath: Path to annotations
        annopath.format(imagename) should be the xml annotations file.
    imagesetfile: Text file containing the list of images, one image per line.
    classname: Category name (duh)
    cachedir: Directory for caching the annotations
    [ovthresh]: Overlap threshold (default = 0.5)
    [use_07_metric]: Whether to use VOC07's 11 point AP computation
        (default False)
    """
    # assumes detections are in eval_classtxt_path.format(classname)
    # assumes annotations are in annopath.format(imagename)
    # assumes imagesetfile is a text file with each line an image name
    # cachedir caches the annotations in a pickle file

    # first load gt   获取真实目标框
    # 当程序第一次运行时，会读取Annotations下的xml文件获取每张图片中真实的目标框
    # 然后把获取的结果保存在annotations_cache文件夹中
    # 以后再次运行时直接从缓存文件夹中读取真实目标

    if not os.path.isdir(cachedir):
        os.mkdir(cachedir)
    cachefile = os.path.join(cachedir, 'annots.pkl')
    # read list of images
    with open(imagesetfile, 'r') as f:
        lines = f.readlines()
    imagenames = [x.strip() for x in lines]

    if not os.path.isfile(cachefile):
        # load annots
        recs = {}
        for i, imagename in enumerate(imagenames):
            recs[imagename] = parse_rec(annopath.format(imagename))
            if i % 100 == 0:
                print('Reading annotation for {:d}/{:d}'.format(i + 1, len(imagenames)))

        # save
        print('Saving cached annotations to {:s}'.format(cachefile))

        # with open(cachefile, 'w') as cls:
        #     pickle.dump(recs, cls)
        with open(cachefile, 'wb') as f:
            pickle.dump(recs, f)
    else:
        # load
        with open(cachefile, 'rb') as f:
            recs = pickle.load(f)

    # extract gt objects for this class 提取该类的真实目标
    class_recs = {}
    npos = 0  # 保存该类一共有多少真实目标
    for imagename in imagenames:
        R = [obj for obj in recs[imagename] if obj['name'] == classname]  # 保存名字为imagename的图片中，类别为classname的目标框的信息
        bbox = np.array([x['bbox'] for x in R])  # 目标框的坐标
        difficult = np.array([x['difficult'] for x in R]).astype(np.bool)  # 是否是难以识别的目标
        det = [False] * len(R)  # 每一个目标框对应一个det[i]，用来判断该目标框是否已经处理过
        npos = npos + sum(~difficult)  # 计算总的目标个数
        class_recs[imagename] = {'bbox': bbox,  # 把每一张图像中的目标框信息放到class_recs中
                                 'difficult': difficult,
                                 'det': det}

    # read dets
    detfile = detpath.format(classname)  # 打开classname类别检测到的目标框文件
    with open(detfile, 'r') as f:
        lines = f.readlines()

    splitlines = [x.strip().split(' ') for x in lines]
    image_ids = [x[0] for x in splitlines]  # 图像名字
    confidence = np.array([float(x[1]) for x in splitlines])  # 置信度
    BB = np.array([[float(z) for z in x[2:]] for x in splitlines])  # 目标框坐标

    # sort by confidence  按照置信度排序
    sorted_ind = np.argsort(-confidence)
    sorted_scores = np.sort(-confidence)
    BB = BB[sorted_ind, :]
    image_ids = [image_ids[x] for x in sorted_ind]

    # go down dets and mark TPs and FPs
    nd = len(image_ids)  # 统计检测到的目标框个数
    tp = np.zeros(nd)  # 创建tp列表，列表长度为目标框个数
    fp = np.zeros(nd)  # 创建fp列表，列表长度为目标框个数

    for d in range(nd):
        R = class_recs[image_ids[d]]  # 得到图像名字为image_ids[d]真实的目标框信息
        bb = BB[d, :].astype(float)  # 得到图像名字为image_ids[d]检测的目标框坐标
        ovmax = -np.inf
        BBGT = R['bbox'].astype(float)  # 得到图像名字为image_ids[d]真实的目标框坐标

        if BBGT.size > 0:
            # compute overlaps  计算IoU
            # intersection
            ixmin = np.maximum(BBGT[:, 0], bb[0])
            iymin = np.maximum(BBGT[:, 1], bb[1])
            ixmax = np.minimum(BBGT[:, 2], bb[2])
            iymax = np.minimum(BBGT[:, 3], bb[3])
            iw = np.maximum(ixmax - ixmin + 1., 0.)
            ih = np.maximum(iymax - iymin + 1., 0.)
            inters = iw * ih

            # union
            uni = ((bb[2] - bb[0] + 1.) * (bb[3] - bb[1] + 1.) +
                   (BBGT[:, 2] - BBGT[:, 0] + 1.) *
                   (BBGT[:, 3] - BBGT[:, 1] + 1.) - inters)

            overlaps = inters / uni
            ovmax = np.max(overlaps)  # 检测到的目标框可能预若干个真实目标框都有交集，选择其中交集最大的
            jmax = np.argmax(overlaps)

        if ovmax > ovthresh:  # IoU是否大于阈值
            if not R['difficult'][jmax]:  # 真实目标框是否难以识别
                if not R['det'][jmax]:  # 该真实目标框是否已经统计过
                    tp[d] = 1.  # 将tp对应第d个位置变成1
                    R['det'][jmax] = 1  # 将该真实目标框做标记
                else:
                    fp[d] = 1.  # 否则将fp对应的位置变为1
        else:
            fp[d] = 1.  # 否则将fp对应的位置变为1

    # compute precision recall
    fp = np.cumsum(fp)  # 按列累加，最大值即为tp数量
    tp = np.cumsum(tp)  # 按列累加，最大值即为fp数量
    rec = tp / float(npos)  # 计算recall
    # avoid divide by zero in case the first detection matches a difficult
    # ground truth
    prec = tp / np.maximum(tp + fp, np.finfo(np.float64).eps)  # 计算精度
    ap = voc_ap(rec, prec, use_07_metric)  # 计算ap

    return rec, prec, ap

首先运行detect_eval_class_txt.py
在yolov5\data_test\predictions_manual文件夹中保存了测试结果图片，在imgs_name_manual.txt文件中记录测试图片名称，在yolov5\data_test\class_txt_manual文件夹中保存了每一类的结果信息。

然后运行compute_mAP.py
输出各类的AP、recall、precision 以及 mAP.

"D:\Program Files\Python38\python.exe" E:/tpz/yolov5/mAP/compute_mAP.py
Reading annotation for 1/712
Reading annotation for 101/712
Reading annotation for 201/712
Reading annotation for 301/712
Reading annotation for 401/712
Reading annotation for 501/712
Reading annotation for 601/712
Reading annotation for 701/712
Saving cached annotations to ../data_test/cachedir_manual/annots.pkl
AP for combustion_lining = 0.9992
recall for combustion_lining = 1.0000
precision for combustion_lining = 0.9951
AP for fan = 0.9968
recall for fan = 0.9968
precision for fan = 1.0000
AP for fan_stator_casing_and_support = 0.9995
recall for fan_stator_casing_and_support = 1.0000
precision for fan_stator_casing_and_support = 0.9918
AP for hpc_spool = 1.0000
recall for hpc_spool = 1.0000
precision for hpc_spool = 0.9950
AP for hpc_stage_5 = 0.9967
recall for hpc_stage_5 = 0.9967
precision for hpc_stage_5 = 0.9918
AP for hp_core_casing = 0.9951
recall for hp_core_casing = 0.9967
precision for hp_core_casing = 0.9870
AP for mixer = 0.9992
recall for mixer = 1.0000
precision for mixer = 0.9967
AP for nozzle = 0.9953
recall for nozzle = 0.9953
precision for nozzle = 0.9953
AP for nozzle_cone = 0.9984
recall for nozzle_cone = 0.9984
precision for nozzle_cone = 0.9967
AP for stand = 1.0000
recall for stand = 1.0000
precision for stand = 0.9985
Mean AP = 0.9980
~~~~~~~~
Results:
0.999
0.997
0.999
1.000
0.997
0.995
0.999
0.995
0.998
1.000
~~~~~~~~
0.998
~~~~~~~~

运行mAP_line.py，可以绘制AP曲线。绘制其他曲线的代码类似。
在这里插入图片描述

版权声明：本文为CSDN博主「tpz789」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/tpz789/article/details/110675268

参考文献

代码和权重下载

准备工作

data中新建几个文件夹

makeTxt.py

voc_label.py

文件修改

数据集方面的yaml文件修改

网络参数方面的yaml文件修改

train.py中的一些参数修改

训练

测试

新建data_test

新建几个py文件

cfg_mAP.py：

detect_eval_class_txt.py :

compute_mAP.py：

mAP_line.py :

utils_mAP.py:

yolov5_eval.py :

【学习记录】win10搭建YOLOX训练自己的VOC数据集

机器视觉需求20230529

tpz789

暂无评论

发表评论取消回复

参考文献

代码和权重下载

准备工作

data中新建几个文件夹

makeTxt.py

voc_label.py

文件修改

数据集方面的yaml文件修改

网络参数方面的yaml文件修改

train.py中的一些参数修改

训练

测试

新建data_test

新建几个py文件

cfg_mAP.py：

detect_eval_class_txt.py :

compute_mAP.py：

mAP_line.py :

utils_mAP.py:

yolov5_eval.py :

【学习记录】win10搭建YOLOX训练自己的VOC数据集

机器视觉需求20230529

tpz789

暂无评论

发表评论 取消回复

相关推荐

发表评论取消回复