文章目录[隐藏]

一、设置学习率

学习率在configs/.py文件中设置

因为之前的lr是在8个gpu的情况下设置的，所以这里应该除以8降低学习率。
mmdetection是根据8个gpu且samples_per_gpu=2来设置学习率为0.02,自己的学习率应该为[0.02/（82）]samples_per_gpugpus

二、训练自己的数据集

参考链接：1. https://zhuanlan.zhihu.com/p/76191492
2. https://github.com/open-mmlab/mmdetection/blob/master/demo/MMDet_Tutorial.ipynb
To train a new detector, there are usually three things to do:

Support a new dataset
Modify the config
Train a new detector

训练一个新的目标检测器需要创建自己的数据集然后修改配置文件，再进行训练。

There are three ways to support a new dataset in MMDetection:

reorganize the dataset into COCO format.
reorganize the dataset into a middle format.
implement a new dataset.

有三种方法可以创建新的数据集：

将新的数据集格式转化成COCO数据集格式以指定的目录形式放在指定的目录下，要修改mmdetection/mmdet/datasets/coco.py里的CLASSES和config文件里model字典的的num_classes和data字典的img_scale(输入图像尺寸的最大边和最小边）和optimizer中的lr，接着在mmdetection/mmdet/core/evaluation/class_names.py修改coco_classes数据集类别，这个关系到后面test的时候结果图中显示的类别名称。
将新的数据集转化成中间格式
构建一个新的Dataset继承CustomDataset

如采用第二种方法要将annotations转化成MMDetection接受的中间形式，如下图：
在这里插入图片描述
这时就可以用原来的load_annotations函数，但需要自己写一个继承CustomDataset的Dataset类，如下所示：（文件在mmdet/datasets/*.py)，这时就不用修改mmdetection/mmdet/datasets/coco.py里的CLASSES了，因为用自己的新建Dataset类了
在这里插入图片描述

再修改config文件，可以自己写一个属于自己的config文件，注意要修改文件路径。

实测用自己的数据集训练并预测成功
首先在mmdet/datasets/文件夹下新建一个自己的数据集py文件，然后可以直接复制仿写coco.py文件，类名要改成自己的数据集，CLASSES修改为自己的类别注意一个类别要加逗号，这样才是元组数据类型；接着要在mmdet/dataset/init.py文件的__all__添加自己的数据集名，还要加上from .MyDataset(自己的数据集的py文件名） import MyDataset(自己的数据集类名），最后要记得重新安装mmdet才能将自己的数据集注册进DATASETS,可以运行pip install -v -e .，不要忘了.，还要记得修改config文件里的num_classes和classes，并不用修改网上所说的mmdet/core/evaluation/class_names.py文件，也可以预测出自己的类别。
我在kaggle上成功用下面的方法也添加了自己的数据集，这种方法就是注册了一个继承CoCoDataset类的子类，并重新CLASSES，但在其他地方却显示未注册。

@DATASETS.register_module()
class UnderwaterDataset(CocoDataset):
    CLASSES = set(['starfish'])

三、错误记录

1、ValueError: need at least one array to concatenate
出现这种错误可以用configs/cascade_rcnn/underwater.py代码验证，如果没有读取到肯定是数据集的问题，我出现问题的原因是数据集json文件的categories的name与config文件和Dataset的CLASSES里的不一样。

2、AttributeError: ‘ConfigDict’ object has no attribute ‘train_cfg’ during custom dataset training
链接：https://github.com/open-mmlab/mmdetection/issues/4603
用model = build_detector( cfg.model, train_cfg=cfg.get('train_cfg'), test_cfg=cfg.get('test_cfg'))取代model = build_detector( cfg.model, train_cfg=cfg.train_cfg, test_cfg=cfg.test_cfg)

3、KeyError: 'xxx is not in the dataset registry’
链接：https://github.com/open-mmlab/mmdetection/issues/3751
在mmdet/datasets/下新建自己数据集的py文件，可以仿照CoCo数据集写法，@DATASETS.register_module
class MyDataset xxxx
再在__init__.py文件里
from .my_dataset import MyDataset
然后再在__all__加上自己的数据集
pip install -v -e .
4、has no key: init_cfg

5、TypeError: CascadeRCNN: ResNet: init() got an unexpected keyword argument 'groups’
错误原因是resnet没有groups这个参数，是因为我将ResneXt写成了Resnet

四、多尺度训练

The term of “multiscale training” is adopted in many papers, which indicates resizing images to different scales at each iteration. In the challenge, we use the setting [(400, 1600), (1400, 1600)] which means the short edge are randomly sampled from 400~1400, and the long edge is fixed as 1600.
由源码可知有两种模式，range和value,range模式只能有两种尺度，小边从两个尺度小边的范围内去，大边在以两个尺度大边为范围随机取一个整数；value模式可以有任意多个输入尺度，随机挑选一个尺度作为resize的尺度
在这里插入图片描述

五、GC

https://zhuanlan.zhihu.com/p/102817180

六、代码解读

官方文档：

https://mmdetection.readthedocs.io/zh_CN/latest/tutorials/config.html

1、config文件

参考链接：目标检测比赛中的tricks：https://zhuanlan.zhihu.com/p/102817180
MMDetection 中常用算法详解：Faster R-CNN：https://zhuanlan.zhihu.com/p/422456194

# model settings
model = dict(
    type='CascadeRCNN',#算法名称
    backbone=dict(
        type='ResNeXt',#特征提取网络类型
        depth=101,#深度
        num_stages=4,#resnet的stage数，一般都是4个，resnet包括一个stem和4个stage输出
        out_indices=(0, 1, 2, 3),#输出的特征图索引，表示四个stage输出都需要
        frozen_stages=1,#冻结的stage数，-1表示不冻结，0表示冻结stage0,1表示冻结0，1，以此类推
        norm_cfg=dict(type='BN', requires_grad=True),#采用的归一化算子（一般是BN或者GN），这里采用BN，且梯度更新
        norm_eval=True,#backbone的所有BN层都采用eval模式，即均值和方差都直接采用全局预训练值，不进行更新
        style='pytorch',
        init_cfg=dict(type='Pretrained', checkpoint='open-mmlab://resnext101_64x4d'),
        groups=64,
        base_width=4),
    neck=dict(
        type='FPN',
        in_channels=[256, 512, 1024, 2048],# resnet模块四个stage的四个尺度输出feature map的的通道数
        out_channels=256,# fpn输出的每个尺度特征图的通道数
        num_outs=5),# fpn输出的特征图个数
    rpn_head=dict(
        type='RPNHead',# rpn网络模型
        in_channels=256,# rpn网络的输入通道数
        feat_channels=256,# 中间特征图通道数
        anchor_generator=dict(
            type='AnchorGenerator',
            scales=[8],#锚点的基本比例，特征图某一位置的锚点面积为 scale * base_sizes
            ratios=[0.5, 1.0, 2.0],#每个feature map生成的anchor宽高比，可修改优化
            strides=[4, 8, 16, 32, 64]),#每个尺度的feature map生成anchor对于原图的步长，对应于stage的stride,# 锚生成器的步幅。这与 FPN 特征步幅一致。 如果未设置 base_sizes，则当前步幅值将被视为 base_sizes。
        bbox_coder=dict(
            type='DeltaXYWHBBoxCoder',
            target_means=[.0, .0, .0, .0],# 输出的四个target值减去均值除以标准差
            target_stds=[1.0, 1.0, 1.0, 1.0]),
        loss_cls=dict(
            type='CrossEntropyLoss', use_sigmoid=True, loss_weight=1.0),
        loss_bbox=dict(type='SmoothL1Loss', beta=1.0 / 9.0, loss_weight=1.0)),
    roi_head=dict(
        type='CascadeRoIHead',
        num_stages=3,
        stage_loss_weights=[1, 0.5, 0.25],
        bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),# ROI类型为ROIAlign,输出尺寸为7，连接rpn和rcnn
            out_channels=256,# 输入shape为（batch,nms_post,4)输出为（batch,nms_post,256,roi_feat_size,roi_feat_size),256为fpn层输出的特征图通道大小
            featmap_strides=[4, 8, 16, 32]),
        bbox_head=[
            dict(
                type='Shared2FCBBoxHead',# 两个共享的FC模块
                in_channels=256,# 输入通道数，相当于FPN输出通道数
                fc_out_channels=1024,# 输出通道数，应用两次共享全连接层，输出shape为（batch*nms_post,1024)
                roi_feat_size=7,# ROIAlign或者ROIPool输出的特征图大小
                num_classes=1,，类别个数
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.1, 0.1, 0.2, 0.2]),
                reg_class_agnostic=True,# 是否采用class_agnostic的方式来预测，true时表示输出bbox只考虑其是否为前景，后续分类的时候再根据该bbox在网络中的类别得分来分类，也就是一个框可以对应多个类别，影响bbox分支的通道数，True表示4通道输出,False表示4*num_classes通道输出
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=1,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.05, 0.05, 0.1, 0.1]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0,
                               loss_weight=1.0)),
            dict(
                type='Shared2FCBBoxHead',
                in_channels=256,
                fc_out_channels=1024,
                roi_feat_size=7,
                num_classes=1,
                bbox_coder=dict(
                    type='DeltaXYWHBBoxCoder',
                    target_means=[0., 0., 0., 0.],
                    target_stds=[0.033, 0.033, 0.067, 0.067]),
                reg_class_agnostic=True,
                loss_cls=dict(
                    type='CrossEntropyLoss',
                    use_sigmoid=False,
                    loss_weight=1.0),
                loss_bbox=dict(type='SmoothL1Loss', beta=1.0, loss_weight=1.0))
        ]),
    # model training and testing settings
    train_cfg=dict(
        rpn=dict(
            assigner=dict(
                type='MaxIoUAssigner',
                pos_iou_thr=0.7,# 分别计算每个anchor和所有gt的iou,大于0.7为正样本
                neg_iou_thr=0.3,# 小于0.3为负样本
                min_pos_iou=0.3,# 计算每个gt和所有anchor的iou，最大的大于0.3则保留这个anchor
                match_low_quality=True,
                ignore_iof_thr=-1),# 
            sampler=dict(
                type='RandomSampler',
                num=256,# 需要提取的正负样本总数
                pos_fraction=0.5,# 正样本比例
                neg_pos_ub=-1,# 正负样本比例，用于确定负样本采样个数上界，-1表示负样本在正样本个数不足的情况下用于补齐256
                add_gt_as_proposals=False),# 是否加入gt作为proposal
            allowed_border=0,
            pos_weight=-1,
            debug=False),
        rpn_proposal=dict(
            nms_pre=2000,
            max_per_img=2000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=[
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.5,
                    neg_iou_thr=0.5,
                    min_pos_iou=0.5,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.6,
                    neg_iou_thr=0.6,
                    min_pos_iou=0.6,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False),
            dict(
                assigner=dict(
                    type='MaxIoUAssigner',
                    pos_iou_thr=0.7,
                    neg_iou_thr=0.7,
                    min_pos_iou=0.7,
                    match_low_quality=False,
                    ignore_iof_thr=-1),
                sampler=dict(
                    type='RandomSampler',
                    num=512,
                    pos_fraction=0.25,
                    neg_pos_ub=-1,
                    add_gt_as_proposals=True),
                pos_weight=-1,
                debug=False)
        ]),
    test_cfg=dict(
        rpn=dict(
            nms_pre=1000,
            max_per_img=1000,
            nms=dict(type='nms', iou_threshold=0.7),
            min_bbox_size=0),
        rcnn=dict(
            score_thr=0.05,
            nms=dict(type='nms', iou_threshold=0.5),
            max_per_img=100)))
# 公共逻辑部分输出 batch * nms_post 个候选框的分类和回归预测结果
将所有预测结果按照 batch 维度进行切分，然后依据单张图片进行后处理，后处理逻辑为：先解码并还原为原图尺度；然后利用 score_thr 去除低分值预测；然后进行 NMS；最后保留最多 max_per_img 个结果
# dataset settings
dataset_type = 'UnderwaterDataset'
data_root = '/kaggle/working/'
img_norm_cfg = dict(
    mean=[123.675, 116.28, 103.53], std=[58.395, 57.12, 57.375], to_rgb=True)
train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True),
    dict(type='Resize', img_scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', flip_ratio=0.5),
    dict(type='Normalize', **img_norm_cfg),
    dict(type='Pad', size_divisor=32),
    dict(type='DefaultFormatBundle'),
    dict(type='Collect', keys=['img', 'gt_bboxes', 'gt_labels']),
]
test_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(
        type='MultiScaleFlipAug',
        img_scale=(1333, 800),
        flip=False,
        transforms=[
            dict(type='Resize', keep_ratio=True),
            dict(type='RandomFlip'),
            dict(type='Normalize', **img_norm_cfg),
            dict(type='Pad', size_divisor=32),
            dict(type='ImageToTensor', keys=['img']),
            dict(type='Collect', keys=['img']),
        ])
]
data = dict(
    samples_per_gpu=2,
    workers_per_gpu=2,
    train=dict(
        type=dataset_type,
        ann_file=data_root + 'anotation_train.json',
        img_prefix=data_root + 'image/',
        pipeline=train_pipeline),
    val=dict(
        type=dataset_type,
        ann_file=data_root + 'annotation_valid.json',
        img_prefix=data_root + 'image/',
        pipeline=test_pipeline),
    test=dict(
        type=dataset_type,
        ann_file=data_root + 'annotation_valid.json',
        img_prefix=data_root + 'image/',
        pipeline=test_pipeline))
evaluation = dict(interval=1, metric='bbox')

# optimizer
optimizer = dict(type='SGD', lr=0.02/8, momentum=0.9, weight_decay=0.0001)
optimizer_config = dict(grad_clip=None)
# learning policy
lr_config = dict(
    policy='step',
    warmup='linear',
    warmup_iters=500,
    warmup_ratio=0.001,
    step=[8, 11])
#runner = dict(type='EpochBasedRunner', max_epochs=12)
total_epochs =12

checkpoint_config = dict(interval=1)
# yapf:disable
log_config = dict(
    interval=50,
    hooks=[
        dict(type='TextLoggerHook'),
        # dict(type='TensorboardLoggerHook')
    ])
# yapf:enable
custom_hooks = [dict(type='NumClassCheckHook')]

dist_params = dict(backend='nccl')
log_level = 'INFO'
load_from = None
resume_from = None
workflow = [('train', 1)]

2、roi_extractor

参考链接：https://zhuanlan.zhihu.com/p/137454940
config代码：

bbox_roi_extractor=dict(
            type='SingleRoIExtractor',
            roi_layer=dict(type='RoIAlign', output_size=7, sampling_ratio=0),
            out_channels=256,
            featmap_strides=[4, 8, 16, 32],
            gc_context=True),

源码： mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py

    def forward(self, feats, rois, roi_scale_factor=None):
        """Forward function."""
        #print("feats[0].shape:",feats[0].shape) # torch.Size([1, 256, 96, 168])
        #print("feats[1].shape:", feats[1].shape)# torch.Size([1, 256, 48, 84])
        #print("feats[2].shape:", feats[2].shape)# torch.Size([1, 256, 24, 42])
        #print("feats[3].shape:", feats[3].shape)# torch.Size([1, 256, 12, 21])
        #print("rois.shape",rois.shape) # torch.Size([512, 5])
        # feats是四個尺度的feature map，rois是rpn生成的512個proposal,(512,5)
        # self.roi_layers是四個roi_extractor的集合
        out_size = self.roi_layers[0].output_size
        print("self.roi_layers[0].output_size:",self.roi_layers[0].output_size)
        print("self.roi_layers:",self.roi_layers)# 見下面
        num_levels = len(feats)
        expand_dims = (-1, self.out_channels * out_size[0] * out_size[1])# （-1，256*7*7）
        if torch.onnx.is_in_onnx_export():
            # Work around to export mask-rcnn to onnx
            roi_feats = rois[:, :1].clone().detach()
            roi_feats = roi_feats.expand(*expand_dims)
            roi_feats = roi_feats.reshape(-1, self.out_channels, *out_size)
            roi_feats = roi_feats * 0
        else:
            roi_feats = feats[0].new_zeros(
                rois.size(0), self.out_channels, *out_size)# roi_feats是預先生成的，（512，256，7，7）
        # TODO: remove this when parrots supports
        if torch.__version__ == 'parrots':
            roi_feats.requires_grad = True

        if num_levels == 1:
            if len(rois) == 0:
                return roi_feats
            return self.roi_layers[0](feats[0], rois)

        if self.gc_context:
            context = []
            for feat in feats:
                context.append(self.pool(feat))# context長度為4的列表，每個元素是形狀為（1，256，7，7），其中1為batch_size,256為rpn輸出維度
        print("context[0].shape:",context[0].shape)# torch.Size([1, 256, 7, 7])
        print("context[1].shape:", context[1].shape)# torch.Size([1, 256, 7, 7])
        # target_lvs形狀為（512，),為512proposal分配到不同的feature map上
        target_lvls = self.map_roi_levels(rois, num_levels)
        batch_size = feats[0].shape[0] # 7
   
        if roi_scale_factor is not None:
            rois = self.roi_rescale(rois, roi_scale_factor)

        for i in range(num_levels):# 遍歷不同尺度的特徵圖
            mask = target_lvls == i
            if torch.onnx.is_in_onnx_export():
                # To keep all roi_align nodes exported to onnx
                # and skip nonzero op
                mask = mask.float().unsqueeze(-1)
                # select target level rois and reset the rest rois to zero.
                rois_i = rois.clone().detach()
                rois_i *= mask
                mask_exp = mask.expand(*expand_dims).reshape(roi_feats.shape)
                roi_feats_t = self.roi_layers[i](feats[i], rois_i)
                roi_feats_t *= mask_exp
                roi_feats += roi_feats_t
                continue
            inds = mask.nonzero(as_tuple=False).squeeze(1)# (478,)，用來訪問當前尺度feature map對應的proposal,
            if inds.numel() > 0:
                rois_ = rois[inds]# （478，5)表示的是feature map 0對應的有478個proposal
                roi_feats_t = self.roi_layers[i](feats[i], rois_)# roi_feats_t:(478,256,7,7)
                if self.gc_context:
                    for j in range(batch_size):
                        roi_feats_t[rois_[:, 0] == j] += context[i][j]
                roi_feats[inds] = roi_feats_t
            else:
                # Sometimes some pyramid levels will not be used for RoI
                # feature extraction and this will cause an incomplete
                # computation graph in one GPU, which is different from those
                # in other GPUs and will cause a hanging error.
                # Therefore, we add it to ensure each feature pyramid is
                # included in the computation graph to avoid runtime bugs.
                roi_feats += sum(
                    x.view(-1)[0]
                    for x in self.parameters()) * 0. + feats[i].sum() * 0.
        print("roi_feats:",roi_feats.shape) # torch.Size([512, 256, 7, 7])
        return roi_feats

可见总流程就是：roi_extractor输入的是來自rpn生成的形状为（512，5）的512个proposal和fpn生成的四个尺度的feature map（维度是256），j将512个proposal按照算法对应到不同的feature map上，然后通过roi pooling生成（n,256,7,7)的张量，n是當前feature map对应的proposal个数，然后遍历所有尺度的feature map最后输出的是（512，256，7，7）的张量送入roi head中。
在这里插入图片描述