目标检测是机器视觉领域内最具挑战性的任务之一。近年来，深度学习理论及技术的快速发展，使得基于深度学习的目标检测算法取得了巨大进展，目标检测实时性、准确度得到了很大的提高。但是除了准确度外，计算复杂度也是目标检测要考虑的重要指标，过复杂的网络可能速度很慢。另外移动端设备也需要既准确又快的小模型。研究轻量化的网络模型是很有必要的，结合前人的成果、及现有的工程，我通过将轻量化网络ShuffleNetv2 代替YoloV4的CSPDarknet-53 作为主干网络，结合一定调参方法，在VOC2007与2012进行训练。实验证明，该轻量级YOLOv4模型效果还可以，但相比其他类型的轻量级网络仍然有差距，诚然如此，但我认为这是一次有意义的探索。

1.1 说明

该工程是借鉴了Bubbliiiing的学习小课堂_Bubbliiiing_CSDN博客-神经网络学习小记录,睿智的目标检测,有趣的数据结构算法领域博主Bubbliiiing擅长神经网络学习小记录,睿智的目标检测,有趣的数据结构算法,等方面的知识https://blog.csdn.net/weixin_44791964?spm=1001.2014.3001.5509

的项目

GitHub - bubbliiiing/mobilenet-yolov4-pytorch: 这是一个mobilenet-yolov4的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。这是一个mobilenet-yolov4的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。 - GitHub - bubbliiiing/mobilenet-yolov4-pytorch: 这是一个mobilenet-yolov4的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。https://github.com/bubbliiiing/mobilenet-yolov4-pytorch由于我并非计算机科班出身，所以难免有错误，请借鉴的同学擦亮眼睛，取其精华，去其糟粕（不用私信我，因为我可能也没时间看，看了也不一定答得上来）。

1.2替换完成的工程请参考gitee

yolov4-lite-pytorch: yolov4-lite-pytorch

2、网络结构基础

YOLOv4 算法是在原有的YOLO 目标检测架构的基础上，分别在加强特征提取、增强网络模型的非线性、防止过度拟合等方面进行了优化，可谓是集CNN 百家之长,谈到YOLOv4，要从YOLOv3入手。

2.1YOLOv3

YOLOv3 是 YOLO 系列算法的第三代改进算法，YOLOv3 采用 Darknet-53作为特征提取的主干网络(Backbone)。Darknet-53 主干网络由五层残差网络(Resnet, Res)构成，每个残差网络由若干个残差块(Res Unit)、图像填充(Padding)与 CBL 组件级联组成。CBL 残差网络、残差块、CBL 组件的组成如图所示。每一层残差网络中的残差块个数不同，随着网络逐层加深，残差块数逐渐增多。特征送入 Res Unit 后，Res Unit 将输入特征分别送入两个通道：首先，特征经由采样通道，分别经过卷积核大小为1 × 1的卷积操作调整维度，再经3 × 3的卷积提取高维特征；同时，输入特征经由另一通道直接输出，保留了输入的原始特征；最后，将两通道的特征相加，得到输出特征。CBL 是 YOLO 系列算法中提取特征的基本组件，用于代替原始的卷积过程。CBL由卷积、批正则化(Batch Normalization[47])以及 Leaky Relu激活函数构成。相较于普通的卷积操作，CBL具有更好的特征拟合能力。

详情请看Bubbliiiing的学习小课堂_Bubbliiiing_CSDN博客-神经网络学习小记录,睿智的目标检测,有趣的数据结构算法领域博主大佬文章睿智的目标检测26——Pytorch搭建yolo3目标检测平台_Bubbliiiing的学习小课堂-CSDN博客_目标检测26

2.1 YOLOv4算法

YOLOv4 算法的网络结构如图1 所示，其主干特征提取网络CSPDarkNet53 结合了CSPNet和 DarkNet53 的优点，在提高网络推理速度和准确性的同时，又能降低主干网络的计算量。相较于之前的YOLOv3，YOLOv4 的颈部网络（NECK）采用SPP（空间金字塔池化）+PANET（路径聚合网络）的网络结构。SPP（Spatial Pyramid Pooling），可以使得图片数据矩阵以固定大小的格式输出到YOLOv4 的预测网络模块（Head），避免直接对图片进行剪裁缩放造成数据信息丢失，进而导致预测网络可信度降低的问题。同时，在提取图片各维度特征方面，不再使用上一代算法中的特征金字塔网络（Feature Pyramid Networks ，FPN），而是采用一种金字塔和倒金字塔并行的网络结构，即路径聚合网络（ Path AggregationNetwork，PANET）。

具体详细分析请看Bubbliiiing的学习小课堂_Bubbliiiing_CSDN博客-神经网络学习小记录,睿智的目标检测,有趣的数据结构算法领域博主大佬的文章

睿智的目标检测30——Pytorch搭建YoloV4目标检测平台_Bubbliiiing的学习小课堂-CSDN博客_yolov4目标检测

2.3 ShuffleNetv2

ShuffleNetv2 是在 ShuffleNetv1 基础上提出的升级版本，并且在同等复杂度下比ShuffleNetv1 和 MobileNetv2 地准确度更高。该模型的研究者提出仅使用每秒计算浮点数（FLOPs）并不能准确衡量模型的复杂度和速度,根据实验结果提出了 4 个适用原则：（1）channel 大小相同时使用最小的内存；（2）组卷积使用过多会增多内存使用量；（3）模型碎片化会降低并行度；（4）元素级的运算非常重要。ShuffleNetv2 模型借鉴了 DenseNet 结构，使用 Concat 方操作代替了 DenseNet 结构中的 Add 操作，可以实现特征的重用。与 DenseNet 的不同之处在于：ShuffleNetv2 不是密集的 Concat，并且在 Concat 操作后通过 Channel Shuffle 层来混洗特征，这也是 ShuffleNetv2的速度和精度都优于 ShuffleNetv1 的一个重要原因。

ShuffleNet V2 网络模型的基本组成单元大致可分为两种，第一种如图2 中a 部分所示，在特征图输入后有一个通道分支（channel split）操作，该操作将输入通道数为c 的特征图分为c−c′和c′，左边的分支不做任何操作，右边的分支包含了3 个卷积操作，并且两个1*1 卷积已经由ShuffleNetv1 中的分组卷积更换为普通卷积，最后再将这两个分支通道中的数据用Concat+Channel Shuffle操作进行合并，这样不仅可以使得该基础模块的输入输出通道数一样，而且避免Add操作，加快模型了的推理速度，最后进行通道重组（channelshuffle）操作。值得注意的是，b中没有了channel split操作，因此该基本模块的输出通道数是输入通道数的两倍，其左右分支的操作过程和a 基本一致，此处不在赘述。

详情请看Bubbliiiing的学习小课堂_Bubbliiiing_CSDN博客-神经网络学习小记录,睿智的目标检测,有趣的数据结构算法领域博主大佬文章

神经网络学习小记录47——ShuffleNetV2模型的复现详解_Bubbliiiing的学习小课堂-CSDN博客_shufflenetv2模型

2.4 替换后的网络结构

ShufflenetV2系列网络可用于进行分类，其主干部分的作用是进行特征提取，我们可以使用ShufflenetV2系列网络代替YOLOv4当中的CSPdarknet53进行特征提取，将三个初步的有效特征层相同shape的特征层进行加强特征提取，便可以将ShufflenetV2系列替换进YOLOv4当中了。

对于YOLOv4来讲，我们需要取出它的最后三个shape的有效特征层进行加强特征提取。在代码中，我们取出了out1、out2、out3。

替换教程请看Bubbliiiing的学习小课堂_Bubbliiiing_CSDN博客-神经网络学习小记录,睿智的目标检测,有趣的数据结构算法领域博主大佬的文章

睿智的目标检测49——Pytorch 利用mobilenet系列（v1,v2,v3）搭建yolov4目标检测平台_Bubbliiiing的学习小课堂-CSDN博客

3 实验结果

3.1实验环境配置及数据集介绍

3.1.1数据集介绍

PASCAL VOC[82]（The PASCAL Visual Object Classes Challenge）曾经是计算机视觉领域中一个世界级的挑战赛，促进了计算机视觉中目标检测和语义分割等任务的发展，催生了大量杰出的算法模型。

PASCAL VOC数据集共有20个类，包含了生活中常见的物体。其中鸟、瓶子、盆栽植物等属于尺寸较小的物体。目前较为常用的是 PASCAL VOC 2007 和 PASCAL VOC 2012 两种数据集。其中 PASCAL VOC 2007 包含训练集 5011 张，验证集 4952张，PASCAL VOC 2012 包含训练集 5717 张，测试集 5823 张。

这里将VOC2007与VOC2012的数据集进行了合并训练，所以数据集的容量为21503张图片。在每轮训练时都会取90%的照片用于训练，另外10%的照片实时检测训练效果。

即是：训练集：测试集=9:1；训练集中（训练集：验证集9:1）。

3.1.2实验验环境配置

实验平台

由于条件受限，训练所用的是笔记本平台：RTX3060(6G)+16G内存 windows10系统。与 CPU 相比，GPU 有并行处理架构，因此可以进行更高效、速度更快的运算，适合处理图像这种参数量大，计算复杂的数据。

算法框架

运用Anconda3（Python3.8）+Pytorch1.8+Cuda11.2软件进行搭建。

Pytorch 是基于 Torch 框架开发，使用 Python 语言作为底层代码，且支持动态网络调整。相比于 Tensorflow，Pytorch结构更加简洁清晰，易开发调试。综合上手难度，代码繁杂程度以及开发灵活性考虑，本文选用Pytorch 作为本文实验环境的框架。

3.2实验方案

本实验共分为三部分包含1组实验。本实验包含五个模型：ShuffleNetv2-YOLOv4、MobileNetv1-YOLOv4、MobileNetv2-YOLOv4、MobileNetv3-YOLOv4、YOLOv4-Tiny和在数据集VOC2007+VOC2012进行训练与验证，接着进行横向向对比，对比不同模型在同一数据集上的表现。

训练过程

训练阶段的参数设置如下：

使用Pytorch的官方预训练权重加载ShuffleNetv2，接着冻结ShuffleNetv2的主干网络，对PANET以及YOLO- Head进行训练，练轮次为50，初始学习率为1e-3，标签平滑值0.005，batch size 设为16，每迭代一次同时输入16张图片进行训练。

解冻ShuffleNetv2的主干网络，对整个网络进行训练，练轮次为100，初始学习率为1e-4，标签平滑值0.005，batch size 设为 8，每迭代一次同时输入8张图片进行训练。

评价指标

MAP

本文方法的目的主要是在保证模型精度和速度的同时，减少模型的训练时间和模型内存占比.通过mAP（mean average precision）和FPS（frameper second）对模型的测试性能进行评价。具体的表达式如下：

式中：P 为准确率，R 为召回率，TP 为真阳性样本数， FP 为假阳性样本数， FN 为假阴性数样本.AP 表示P-R 曲线下的面积，综合考虑精确率和召回率的影响，反映了模型对不同种类识别的好坏程度. mAP 全称为 Mean Average Precision，指平均准确率，对多个数据集的 AP 求平均值，mAP 通常为目标检测中度量检测准确率的指标。

FPS

目标检测模型中另外一个重要的指标为速度，只有保证检测速度快，才能达到实时检测的目的，这在自动驾驶等场景中非常重要。我们通常使用每秒帧率（Frame Per Second，FPS目标检测模型的速度，即每秒钟处理的图像数。我们在测试训练好的模型时可以计算出处理一张图片需要的时间为 t，则一秒钟该目标检测模型可以处理 1/t 张图片。

3.3实验结果对比

3.3.1ShuffleNetv2-YOLOv4的训练结果

图7.ShuffleNetv2-YOLOv4的训练结果map

图7.ShuffleNetv2-YOLOv4的训练结果log-avg miss rate

表1 ShuffleNetv2-YOLOv4的训练结果

算法	MAP	模型大小	FPS	模型参数量	FLOPS
ShuffleNetv2-YOLOv4	75.04	44789K	51.59	10.64M	3.71GFlops

(以上结果可能不太准确哈，主要是数据划分跟大佬的不太一样，还有一些参数计算有一点问题）

3.3.2部分训练结果展示

图8.ShuffleNetv2-YOLOv4的预测结果展示

3.3.3各个算法模型对比（很多参数是错的看看就好一些数据比如map从大佬的github直接拷贝）

详细见大佬工程

GitHub - bubbliiiing/mobilenet-yolov4-pytorch: 这是一个mobilenet-yolov4的库，把yolov4主干网络修改成了mobilenet，修改了Panet的卷积组成，使参数量大幅度缩小。

表2 各个算法模型对比

算法	MAP	模型大小	FPS	模型参数量	FLOPS
ShuffleNetv2-YOLOv4	75.04	44.8M	51.59	10.64M	3.71GFlops
Mobilenetv1-YOLOv4	79.72	53.5M	65.72	12.69M	5.3GFlops
Mobilenetv2-YOLOv4	80.12	47.6M	52.83	10.80M	（*）
Mobilenetv3-YOLOv4	79.01	55.53M	47.20	11.2M	3.82GFlops
Gostnet- YOLOv4	78.69	43.76M	38.92	11.2M	3.82GFlops

首先本文将ShuffleNetv2-YOLOv4、MobileNetv1-YOLOv4、MobileNetv2-YOLOv4、MobileNetv3-YOLOv4、Gostnet-YOLOv4、YOLOv4-Tiny相同软硬件环境和数据集下进行VOC2007+2012目标检测实验，输入图片固定为416*416 像素。对比二者在正确率、检测速度以及模型大小上的差距，其实验结果如表所示。

4、部分代码演示

4.1 shufflenetv2（来自torchvrsion）


import torch as t
import torch.nn as nn
import math
from collections import OrderedDict
import torch 
__all__ = ['shufflenet2']

#### The model below is defined by myself


def channel_shuffle(x, groups=2):
  bat_size, channels, w, h = x.shape
  group_c = channels // groups
  x = x.view(bat_size, groups, group_c, w, h)
  x = t.transpose(x, 1, 2).contiguous()
  x = x.view(bat_size, -1, w, h)
  return x

# used in the block
def conv_1x1_bn(in_c, out_c, stride=1):
  return nn.Sequential(
    nn.Conv2d(in_c, out_c, 1, stride, 0, bias=False),
    nn.BatchNorm2d(out_c),
    nn.ReLU(True)
  )

def conv_bn(in_c, out_c, stride=2):
  return nn.Sequential(
    nn.Conv2d(in_c, out_c, 3, stride, 1, bias=False),
    nn.BatchNorm2d(out_c),
    nn.ReLU(True)
  )


class ShuffleBlock(nn.Module):
  def __init__(self, in_c, out_c, downsample=False):
    super(ShuffleBlock, self).__init__()
    self.downsample = downsample
    half_c = out_c // 2
    if downsample:
      self.branch1 = nn.Sequential(
          # 3*3 dw conv, stride = 2
          nn.Conv2d(in_c, in_c, 3, 2, 1, groups=in_c, bias=False),
          nn.BatchNorm2d(in_c),
          # 1*1 pw conv
          nn.Conv2d(in_c, half_c, 1, 1, 0, bias=False),
          nn.BatchNorm2d(half_c),
          nn.ReLU(True)
      )
      
      self.branch2 = nn.Sequential(
          # 1*1 pw conv
          nn.Conv2d(in_c, half_c, 1, 1, 0, bias=False),
          nn.BatchNorm2d(half_c),
          nn.ReLU(True),
          # 3*3 dw conv, stride = 2
          nn.Conv2d(half_c, half_c, 3, 2, 1, groups=half_c, bias=False),
          nn.BatchNorm2d(half_c),
          # 1*1 pw conv
          nn.Conv2d(half_c, half_c, 1, 1, 0, bias=False),
          nn.BatchNorm2d(half_c),
          nn.ReLU(True)
      )
    else:
      # in_c = out_c
      assert in_c == out_c
        
      self.branch2 = nn.Sequential(
          # 1*1 pw conv
          nn.Conv2d(half_c, half_c, 1, 1, 0, bias=False),
          nn.BatchNorm2d(half_c),
          nn.ReLU(True),
          # 3*3 dw conv, stride = 1
          nn.Conv2d(half_c, half_c, 3, 1, 1, groups=half_c, bias=False),
          nn.BatchNorm2d(half_c),
          # 1*1 pw conv
          nn.Conv2d(half_c, half_c, 1, 1, 0, bias=False),
          nn.BatchNorm2d(half_c),
          nn.ReLU(True)
      )
      
      
  def forward(self, x):
    out = None
    if self.downsample:
      # if it is downsampling, we don't need to do channel split
      out = t.cat((self.branch1(x), self.branch2(x)), 1)
    else:
      # channel split
      channels = x.shape[1]
      c = channels // 2
      x1 = x[:, :c, :, :]
      x2 = x[:, c:, :, :]
      out = t.cat((x1, self.branch2(x2)), 1)
    return channel_shuffle(out, 2)
    

class ShuffleNet2(nn.Module):
  def __init__(self, input_size=416, net_type=1):
    super(ShuffleNet2, self).__init__()
    assert input_size % 32 == 0 # 因为一共会下采样32倍
    self.layers_out_filters = [24, 116, 232, 1024] # used for shufflenet v2
    
    self.stage_repeat_num = [4, 8, 4]
    if net_type == 0.5:
      self.out_channels = [3, 24, 48, 96, 192, 1024]
    elif net_type == 1:
      self.out_channels = [3, 24, 116, 232, 464, 1024]
    elif net_type == 1.5:
      self.out_channels = [3, 24, 176, 352, 704, 1024]
    elif net_type == 2:
      self.out_channels = [3, 24, 244, 488, 976, 2948]
    elif net_type == -1:
      self.out_channels = [3, 24, 128, 256, 512, 1024]
    else:
      print("the type is error, you should choose 0.5, 1, 1.5 or 2")
      
    # let's start building layers
    self.conv1 = nn.Conv2d(3, self.out_channels[1], 3, 2, 1)
    self.maxpool = nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
    in_c = self.out_channels[1]
    
    self.stage2 = []
    self.stage3 = []
    self.stage4 = []
    for stage_idx in range(len(self.stage_repeat_num)):
      out_c = self.out_channels[2+stage_idx]
      repeat_num = self.stage_repeat_num[stage_idx]
      stage = []
      for i in range(repeat_num):
        if i == 0:
          stage.append(ShuffleBlock(in_c, out_c, downsample=True))
        else:
          stage.append(ShuffleBlock(in_c, in_c, downsample=False))
        in_c = out_c
      if stage_idx == 0:
        self.stage2 = stage
      elif stage_idx == 1:
        self.stage3 = stage
      elif stage_idx == 2:
        self.stage4 = stage
      else:
        print("error")
    # self.stages = nn.Sequential(*self.stages)
    self.stage2 = nn.Sequential(*self.stage2) # 58 * 58 * 116
    self.stage3 = nn.Sequential(*self.stage3) # 26 * 26 * 232
    self.stage4 = nn.Sequential(*self.stage4)
    in_c = self.out_channels[-2]
    out_c = self.out_channels[-1]
    self.conv5 = conv_1x1_bn(in_c, out_c, 1) # 13 * 13 * 1024
    # self.g_avg_pool = nn.AvgPool2d(kernel_size=(int)(input_size/32)) # 如果输入的是224，则此处为7
    
    # # fc layer
    # self.fc = nn.Linear(out_c, num_classes)
    

  def forward(self, x):
    x = self.conv1(x)
    x = self.maxpool(x)
    out3 = self.stage2(x)
    out4 = self.stage3(out3)
    out5 = self.stage4(out4)
    out5 = self.conv5(out5)
    # x = self.g_avg_pool(x)
    # x = x.view(-1, self.out_channels[-1])
    # x = self.fc(x)
    return out3, out4, out5

def shufflenet2(pretrained, **kwargs):
    """Constructs a darknet-53 model.
    """
    model = ShuffleNet2()
    if pretrained:
        state_dict = torch.load('./shufflenetv2_x1-5666bf0f80.pth')
        # model.load_state_dict(t.load(pretrained)) 
        model.load_state_dict(state_dict, strict=True)
        
    return model


if __name__ == "__main__":
    from torchsummary import summary

    # 需要使用device来指定网络在GPU还是CPU运行
    device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
    model = shufflenet2(pretrained=False).to(device);
    summary(model, input_size=(3,416,416))

4.2 yolo本体

from collections import OrderedDict

import torch
import torch.nn as nn
import torch.nn.functional as F

from nets.densenet import _Transition, densenet121, densenet169, densenet201
from nets.ghostnet import ghostnet
from nets.mobilenet_v1 import mobilenet_v1
from nets.mobilenet_v2 import mobilenet_v2
from nets.mobilenet_v3 import mobilenet_v3
from nets.shufflenet import shufflenet2
from nets.shufflenetv2 import shufflenet_v2_x1_0

class MobileNetV1(nn.Module):
    def __init__(self, pretrained = False):
        super(MobileNetV1, self).__init__()
        self.model = mobilenet_v1(pretrained=pretrained)

    def forward(self, x):
        out3 = self.model.stage1(x)
        out4 = self.model.stage2(out3)
        out5 = self.model.stage3(out4)
        return out3, out4, out5

class MobileNetV2(nn.Module):
    def __init__(self, pretrained = False):
        super(MobileNetV2, self).__init__()
        self.model = mobilenet_v2(pretrained=pretrained)

    def forward(self, x):
        out3 = self.model.features[:7](x)
        out4 = self.model.features[7:14](out3)
        out5 = self.model.features[14:18](out4)
        return out3, out4, out5

class MobileNetV3(nn.Module):
    def __init__(self, pretrained = False):
        super(MobileNetV3, self).__init__()
        self.model = mobilenet_v3(pretrained=pretrained)

    def forward(self, x):
        out3 = self.model.features[:7](x)
        out4 = self.model.features[7:13](out3)
        out5 = self.model.features[13:16](out4)
        return out3, out4, out5

class GhostNet(nn.Module):
    def __init__(self, pretrained=True):
        super(GhostNet, self).__init__()
        model = ghostnet()
        if pretrained:
            state_dict = torch.load("model_data/ghostnet_weights.pth")
            model.load_state_dict(state_dict)
        del model.global_pool
        del model.conv_head
        del model.act2
        del model.classifier
        del model.blocks[9]
        self.model = model

    def forward(self, x):
        x = self.model.conv_stem(x)
        x = self.model.bn1(x)
        x = self.model.act1(x)
        feature_maps = []

        for idx, block in enumerate(self.model.blocks):
            x = block(x)
            if idx in [2,4,6,8]:
                feature_maps.append(x)
        return feature_maps[1:]
    
class ShufflenetV2(nn.Module):
    def __init__(self, pretrained = False):
        super(ShufflenetV2, self).__init__()
        # self.model = shufflenet2(pretrained=pretrained)
        self.model = shufflenet_v2_x1_0(pretrained=pretrained)
    def forward(self, x):
        # out3, out4, out5 = self.model(x)
        
        # return out3, out4, out5
        x = self.model.conv1(x)
        x = self.model.maxpool(x)
        out3 = self.model.stage2(x)
        out4 = self.model.stage3(out3)
        out5 = self.model.stage4(out4)
        out5 = self.model.conv5(out5)
        return out3, out4, out5
    

class Densenet(nn.Module):
    def __init__(self, backbone, pretrained=False):
        super(Densenet, self).__init__()
        densenet = {
            "densenet121" : densenet121, 
            "densenet169" : densenet169, 
            "densenet201" : densenet201
        }[backbone]
        model = densenet(pretrained)
        del model.classifier
        self.model = model

    def forward(self, x):
        feature_maps = []
        for block in self.model.features:
            if type(block)==_Transition:
                for _, subblock in enumerate(block):
                    x = subblock(x)
                    if type(subblock)==nn.Conv2d:
                        feature_maps.append(x)
            else:
                x = block(x)
        x = F.relu(x, inplace=True)
        feature_maps.append(x)
        return feature_maps[1:]

def conv2d(filter_in, filter_out, kernel_size, groups=1, stride=1):
    pad = (kernel_size - 1) // 2 if kernel_size else 0
    return nn.Sequential(OrderedDict([
        ("conv", nn.Conv2d(filter_in, filter_out, kernel_size=kernel_size, stride=stride, padding=pad, groups=groups, bias=False)),
        ("bn", nn.BatchNorm2d(filter_out)),
        ("relu", nn.ReLU6(inplace=True)),
    ]))

def conv_dw(filter_in, filter_out, stride = 1):
    return nn.Sequential(
        nn.Conv2d(filter_in, filter_in, 3, stride, 1, groups=filter_in, bias=False),
        nn.BatchNorm2d(filter_in),
        nn.ReLU6(inplace=True),

        nn.Conv2d(filter_in, filter_out, 1, 1, 0, bias=False),
        nn.BatchNorm2d(filter_out),
        nn.ReLU6(inplace=True),
    )

#---------------------------------------------------#
#   SPP结构，利用不同大小的池化核进行池化
#   池化后堆叠
#---------------------------------------------------#
class SpatialPyramidPooling(nn.Module):
    def __init__(self, pool_sizes=[5, 9, 13]):
        super(SpatialPyramidPooling, self).__init__()

        self.maxpools = nn.ModuleList([nn.MaxPool2d(pool_size, 1, pool_size//2) for pool_size in pool_sizes])

    def forward(self, x):
        features = [maxpool(x) for maxpool in self.maxpools[::-1]]
        features = torch.cat(features + [x], dim=1)

        return features

#---------------------------------------------------#
#   卷积 + 上采样
#---------------------------------------------------#
class Upsample(nn.Module):
    def __init__(self, in_channels, out_channels):
        super(Upsample, self).__init__()

        self.upsample = nn.Sequential(
            conv2d(in_channels, out_channels, 1),
            nn.Upsample(scale_factor=2, mode='nearest')
        )

    def forward(self, x,):
        x = self.upsample(x)
        return x

#---------------------------------------------------#
#   三次卷积块
#---------------------------------------------------#
def make_three_conv(filters_list, in_filters):
    m = nn.Sequential(
        conv2d(in_filters, filters_list[0], 1),
        conv_dw(filters_list[0], filters_list[1]),
        conv2d(filters_list[1], filters_list[0], 1),
    )
    return m

#---------------------------------------------------#
#   五次卷积块
#---------------------------------------------------#
def make_five_conv(filters_list, in_filters):
    m = nn.Sequential(
        conv2d(in_filters, filters_list[0], 1),
        conv_dw(filters_list[0], filters_list[1]),
        conv2d(filters_list[1], filters_list[0], 1),
        conv_dw(filters_list[0], filters_list[1]),
        conv2d(filters_list[1], filters_list[0], 1),
    )
    return m

#---------------------------------------------------#
#   最后获得yolov4的输出
#---------------------------------------------------#
def yolo_head(filters_list, in_filters):
    m = nn.Sequential(
        conv_dw(in_filters, filters_list[0]),
        
        nn.Conv2d(filters_list[0], filters_list[1], 1),
    )
    return m

    
#---------------------------------------------------#
#   yolo_body
#---------------------------------------------------#
class YoloBody(nn.Module):
    def __init__(self, anchors_mask, num_classes, backbone="mobilenetv2", pretrained=False):
        super(YoloBody, self).__init__()
        #---------------------------------------------------#   
        #   生成mobilnet的主干模型，获得三个有效特征层。
        #---------------------------------------------------#
        if backbone == "mobilenetv1":
            #---------------------------------------------------#   
            #   52,52,256；26,26,512；13,13,1024
            #---------------------------------------------------#
            self.backbone   = MobileNetV1(pretrained=pretrained)
            in_filters      = [256, 512, 1024]
        elif backbone == "mobilenetv2":
            #---------------------------------------------------#   
            #   52,52,32；26,26,92；13,13,320
            #---------------------------------------------------#
            self.backbone   = MobileNetV2(pretrained=pretrained)
            in_filters      = [32, 96, 320]
        elif backbone == "mobilenetv3":
            #---------------------------------------------------#   
            #   52,52,40；26,26,112；13,13,160
            #---------------------------------------------------#
            self.backbone   = MobileNetV3(pretrained=pretrained)
            in_filters      = [40, 112, 160]
        elif backbone == "ghostnet":
            #---------------------------------------------------#   
            #   52,52,40;26,26,112；13,13,160
            #---------------------------------------------------#
            self.backbone   = GhostNet(pretrained=pretrained)
            in_filters      = [40, 112, 160]
            
        elif backbone == "shufflenet2":
            #---------------------------------------------------#   
            #   58 * 58 * 116; 26 * 26 * 232; 13 * 13 * 1024
            #---------------------------------------------------#
            self.backbone   = ShufflenetV2(pretrained=pretrained)
            
            
            in_filters      = [116, 232, 1024]  
            
            
            
        elif backbone in ["densenet121", "densenet169", "densenet201"]:
            #---------------------------------------------------#   
            #   52,52,256；26,26,512；13,13,1024
            #---------------------------------------------------#
            self.backbone   = Densenet(backbone, pretrained=pretrained)
            in_filters = {
                "densenet121" : [256, 512, 1024], 
                "densenet169" : [256, 640, 1664], 
                "densenet201" : [256, 896, 1920]
            }[backbone]
        else:
            raise ValueError('Unsupported backbone - `{}`, Use mobilenetv1, mobilenetv2, mobilenetv3, ghostnet, densenet121, densenet169, densenet201.'.format(backbone))

        self.conv1           = make_three_conv([512, 1024], in_filters[2])
        self.SPP             = SpatialPyramidPooling()
        self.conv2           = make_three_conv([512, 1024], 2048)

        self.upsample1       = Upsample(512, 256)
        self.conv_for_P4     = conv2d(in_filters[1], 256,1)
        self.make_five_conv1 = make_five_conv([256, 512], 512)

        self.upsample2       = Upsample(256, 128)
        self.conv_for_P3     = conv2d(in_filters[0], 128,1)
        self.make_five_conv2 = make_five_conv([128, 256], 256)

        # 3*(5+num_classes) = 3*(5+20) = 3*(4+1+20)=75
        self.yolo_head3      = yolo_head([256, len(anchors_mask[0]) * (5 + num_classes)], 128)

        self.down_sample1    = conv_dw(128, 256, stride = 2)
        self.make_five_conv3 = make_five_conv([256, 512], 512)

        # 3*(5+num_classes) = 3*(5+20) = 3*(4+1+20)=75
        self.yolo_head2      = yolo_head([512, len(anchors_mask[1]) * (5 + num_classes)], 256)

        self.down_sample2    = conv_dw(256, 512, stride = 2)
        self.make_five_conv4 = make_five_conv([512, 1024], 1024)

        # 3*(5+num_classes)=3*(5+20)=3*(4+1+20)=75
        self.yolo_head1      = yolo_head([1024, len(anchors_mask[2]) * (5 + num_classes)], 512)


    def forward(self, x):
        #  backbone
        x2, x1, x0 = self.backbone(x)

        # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,2048 
        P5 = self.conv1(x0)
        P5 = self.SPP(P5)
        # 13,13,2048 -> 13,13,512 -> 13,13,1024 -> 13,13,512
        P5 = self.conv2(P5)

        # 13,13,512 -> 13,13,256 -> 26,26,256
        P5_upsample = self.upsample1(P5)
        # 26,26,512 -> 26,26,256
        P4 = self.conv_for_P4(x1)
        # 26,26,256 + 26,26,256 -> 26,26,512
        P4 = torch.cat([P4,P5_upsample],axis=1)
        # 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256
        P4 = self.make_five_conv1(P4)

        # 26,26,256 -> 26,26,128 -> 52,52,128
        P4_upsample = self.upsample2(P4)
        # 52,52,256 -> 52,52,128
        P3 = self.conv_for_P3(x2)
        # 52,52,128 + 52,52,128 -> 52,52,256
        P3 = torch.cat([P3,P4_upsample],axis=1)
        # 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128 -> 52,52,256 -> 52,52,128
        P3 = self.make_five_conv2(P3)

        # 52,52,128 -> 26,26,256
        P3_downsample = self.down_sample1(P3)
        # 26,26,256 + 26,26,256 -> 26,26,512
        P4 = torch.cat([P3_downsample,P4],axis=1)
        # 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256 -> 26,26,512 -> 26,26,256
        P4 = self.make_five_conv3(P4)

        # 26,26,256 -> 13,13,512
        P4_downsample = self.down_sample2(P4)
        # 13,13,512 + 13,13,512 -> 13,13,1024
        P5 = torch.cat([P4_downsample,P5],axis=1)
        # 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512 -> 13,13,1024 -> 13,13,512
        P5 = self.make_five_conv4(P5)

        #---------------------------------------------------#
        #   第三个特征层
        #   y3=(batch_size,75,52,52)
        #---------------------------------------------------#
        out2 = self.yolo_head3(P3)
        #---------------------------------------------------#
        #   第二个特征层
        #   y2=(batch_size,75,26,26)
        #---------------------------------------------------#
        out1 = self.yolo_head2(P4)
        #---------------------------------------------------#
        #   第一个特征层
        #   y1=(batch_size,75,13,13)
        #---------------------------------------------------#
        out0 = self.yolo_head1(P5)

        return out0, out1, out2

版权声明：本文为CSDN博主「龙晨天」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_30759585/article/details/122520471

基于ShuffleNetv2-YOLOv4模型的目标检测

1、引言（摘要）