微调基于 torchvision 0.3的目标检测模型（pytorch官方教程）

文章目录[隐藏]

1.定义数据集
2.为 PennFudan 编写自定义数据集
- 2.1 下载数据集
- 2.2 为数据集编写类
3.定义模型
- 3.1 PennFudan 数据集的实例分割模型
4.整合
- 4.1 为数据扩充/转换编写辅助函数：
- 4.2 编写执行训练和验证的主要功能
5.总结

在本教程中，我们将微调在 Penn-Fudan 数据库中对行人检测和分割的已预先训练的 Mask R-CNN模型。它包含170个图像和345个行人实例，我们将用它来说明如何在 torchvision 中使用新功能，以便在自定义数据集上训练实例分割模型。

1.定义数据集

对于训练对象检测的引用脚本，实例分割和人员关键点检测要求能够轻松支持添加新的自定义数据。数据集应该从标准的类 torch.utils.data.Dataset 继承而来，并实现 _len 和 __getitem_我们要求的唯一特性是数据集的 getitem 应该返回： * 图像：PIL图像大小(H,W) * 目标：包含以下字段的字典
<1> boxes(FloatTensor[N,4]) ：N边框（bounding boxes）坐标的格式[x0,x1,y0,y1]，取值范围是0到W,0到H。
<2> labels(Int64Tensor[N]) ：每个边框的标签。
<3> image_id(Int64Tensor[1]) ：图像识别器，它应该在数据集中的所有图像中是唯一的，并在评估期间使用。
<4> area(Tensor[N]) ：边框的面积，在使用COCO指标进行评估时使用此项来分隔小、中和大框之间的度量标准得分。
<5> iscrowed(UInt8Tensor[N,H,W]) ：在评估期间属性设置为 iscrowed=True 的实例会被忽略。
<6> (可选) masks(UInt8Tesor[N,H,W]) ：每个对象的分段掩码。
<7> (可选) keypoints (FloatTensor[N, K, 3] ：对于N个对象中的每一个，它包含[x，y，visibility]格式的K个关键点，用于定义对象。 visibility = 0 表示关键点不可见。请注意，对于数据扩充，翻转关键点的概念取决于数据表示，您应该调整 reference/detection/transforms.py 以用于新的关键点表示。

如果你的模型返回上述方法，它们将使其适用于培训和评估，并将使用 pycocotools 的评估脚本。

此外，如果要在训练期间使用宽高比分组（以便每个批次仅包含具有相似宽高比的图像），则建议还实现 get_height_and_width 方法，该方法返回图像的高度和宽度。如果未提供此方法，我们将通过 getitem 查询数据集的所有元素，这会将图像加载到内存中，但比提供自定义方法时要慢。

2.为 PennFudan 编写自定义数据集

2.1 下载数据集

下载并解压缩zip文件后，我们有以下文件夹结构：

PennFudanPed/
	PedMasks/
		FudanPed00001_mask.png
		FudanPed00002_mask.png
		FudanPed00003_mask.png
		FudanPed00004_mask.png
		...
	PNGImages/
		FudanPed00001.png
		FudanPed00002.png
		FudanPed00003.png
		FudanPed00004.png

下面是一个图像以及其分割掩膜的例子：
在这里插入图片描述

因此每个图像具有相应的分割掩膜，其中每个颜色对应于不同的实例。让我们为这个数据集写一个 torch.utils.data.Dataset 类。

2.2 为数据集编写类

import os
import numpy as np
import torch
from PIL import Image


class PennFudanDataset(torch.utils.data.Dataset):
    def __init__(self, root, transforms):
        self.root = root
        self.transforms = transforms
        # load all image files, sorting them to
        # ensure that they are aligned
        self.imgs = list(sorted(os.listdir(os.path.join(root, "PNGImages"))))
        self.masks = list(sorted(os.listdir(os.path.join(root, "PedMasks"))))

    def __getitem__(self, idx):
        # load images and masks
        img_path = os.path.join(self.root, "PNGImages", self.imgs[idx])
        mask_path = os.path.join(self.root, "PedMasks", self.masks[idx])
        img = Image.open(img_path).convert("RGB")
        # note that we haven't converted the mask to RGB,
        # because each color corresponds to a different instance
        # with 0 being background
        mask = Image.open(mask_path)
        # convert the PIL Image into a numpy array
        mask = np.array(mask)
        # instances are encoded as different colors
        obj_ids = np.unique(mask)
        # first id is the background, so remove it
        obj_ids = obj_ids[1:]

        # split the color-encoded mask into a set
        # of binary masks
        masks = mask == obj_ids[:, None, None]

        # get bounding box coordinates for each mask
        num_objs = len(obj_ids)
        boxes = []
        for i in range(num_objs):
            pos = np.where(masks[i])
            xmin = np.min(pos[1])
            xmax = np.max(pos[1])
            ymin = np.min(pos[0])
            ymax = np.max(pos[0])
            boxes.append([xmin, ymin, xmax, ymax])

        # convert everything into a torch.Tensor
        boxes = torch.as_tensor(boxes, dtype=torch.float32)
        # there is only one class
        labels = torch.ones((num_objs,), dtype=torch.int64)
        masks = torch.as_tensor(masks, dtype=torch.uint8)

        image_id = torch.tensor([idx])
        area = (boxes[:, 3] - boxes[:, 1]) * (boxes[:, 2] - boxes[:, 0])
        # suppose all instances are not crowd
        iscrowd = torch.zeros((num_objs,), dtype=torch.int64)

        target = {}
        target["boxes"] = boxes
        target["labels"] = labels
        target["masks"] = masks
        target["image_id"] = image_id
        target["area"] = area
        target["iscrowd"] = iscrowd

        if self.transforms is not None:
            img, target = self.transforms(img, target)

        return img, target

    def __len__(self):
        return len(self.imgs)

3.定义模型

现在我们需要定义一个可以上述数据集执行预测的模型。在本教程中，我们将使用 Mask R-CNN，它基于 Faster R-CNN。Faster R-CNN 是一种模型，可以预测图像中潜在对象的边界框和类别得分。
在这里插入图片描述
Mask R-CNN 在 Faster R-CNN 中添加了一个额外的分支，它还预测每个实例的分割蒙版。

有两种常见情况可能需要修改 torchvision modelzoo 中的一个可用模型。第一个是我们想要从预先训练的模型开始，然后微调最后一层。另一种是当我们想要用不同的模型替换模型的主干时（例如，用于更快的预测）。

下面是对这两种情况的处理。

1 微调已经预训练的模型让我们假设你想从一个在COCO上已预先训练过的模型开始，并希望为你的特定类进行微调。这是一种可行的方法：

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor

# load a model pre-trained on COCO
model = torchvision.models.detection.fasterrcnn_resnet50_fpn(pretrained=True)

# replace the classifier with a new one, that has
# num_classes which is user-defined
num_classes = 2  # 1 class (person) + background
# get number of input features for the classifier
in_features = model.roi_heads.box_predictor.cls_score.in_features
# replace the pre-trained head with a new one
model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

2 修改模型以添加不同的主干

import torchvision
from torchvision.models.detection import FasterRCNN
from torchvision.models.detection.rpn import AnchorGenerator

# load a pre-trained model for classification and return
# only the features
backbone = torchvision.models.mobilenet_v2(pretrained=True).features
# FasterRCNN needs to know the number of
# output channels in a backbone. For mobilenet_v2, it's 1280
# so we need to add it here
backbone.out_channels = 1280

# let's make the RPN generate 5 x 3 anchors per spatial
# location, with 5 different sizes and 3 different aspect
# ratios. We have a Tuple[Tuple[int]] because each feature
# map could potentially have different sizes and
# aspect ratios
anchor_generator = AnchorGenerator(sizes=((32, 64, 128, 256, 512),),
                                   aspect_ratios=((0.5, 1.0, 2.0),))

# let's define what are the feature maps that we will
# use to perform the region of interest cropping, as well as
# the size of the crop after rescaling.
# if your backbone returns a Tensor, featmap_names is expected to
# be [0]. More generally, the backbone should return an
# OrderedDict[Tensor], and in featmap_names you can choose which
# feature maps to use.
roi_pooler = torchvision.ops.MultiScaleRoIAlign(featmap_names=['0'],
                                                output_size=7,
                                                sampling_ratio=2)

# put the pieces together inside a FasterRCNN model
model = FasterRCNN(backbone,
                   num_classes=2,
                   rpn_anchor_generator=anchor_generator,
                   box_roi_pool=roi_pooler)

3.1 PennFudan 数据集的实例分割模型

在我们的例子中，我们希望从预先训练的模型中进行微调，因为我们的数据集非常小，所以我们将遵循上述第一种情况。

这里我们还要计算实例分割掩膜，因此我们将使用 Mask R-CNN：

import torchvision
from torchvision.models.detection.faster_rcnn import FastRCNNPredictor
from torchvision.models.detection.mask_rcnn import MaskRCNNPredictor


def get_model_instance_segmentation(num_classes):
    # load an instance segmentation model pre-trained on COCO
    model = torchvision.models.detection.maskrcnn_resnet50_fpn(pretrained=True)

    # get number of input features for the classifier
    in_features = model.roi_heads.box_predictor.cls_score.in_features
    # replace the pre-trained head with a new one
    model.roi_heads.box_predictor = FastRCNNPredictor(in_features, num_classes)

    # now get the number of input features for the mask classifier
    in_features_mask = model.roi_heads.mask_predictor.conv5_mask.in_channels
    hidden_layer = 256
    # and replace the mask predictor with a new one
    model.roi_heads.mask_predictor = MaskRCNNPredictor(in_features_mask,
                                                       hidden_layer,
                                                       num_classes)

    return model

就是这样，这将使模型准备好在您的自定义数据集上进行训练和评估。

4.整合

在 references/detection/ 中，我们有许多辅助函数来简化训练和评估检测模型。在这里，我们将使用 references/detection/engine.py ， references/detection/utils.py 和 references/detection/transforms.py。只需将它们复制到您的文件夹并在此处使用它们。

注意:这里的三个py文件需要自己下载，同时还需要另外下载两个文件，全部代码已经整合到github，文末会给出地址

4.1 为数据扩充/转换编写辅助函数：

import transforms as T

def get_transform(train):
    transforms = []
    transforms.append(T.ToTensor())
    if train:
        transforms.append(T.RandomHorizontalFlip(0.5))
    return T.Compose(transforms)

4.2 编写执行训练和验证的主要功能

from engine import train_one_epoch, evaluate
import utils


def main():
    # train on the GPU or on the CPU, if a GPU is not available
    device = torch.device('cuda') if torch.cuda.is_available() else torch.device('cpu')

    # our dataset has two classes only - background and person
    num_classes = 2
    # use our dataset and defined transformations
    dataset = PennFudanDataset('PennFudanPed', get_transform(train=True))
    dataset_test = PennFudanDataset('PennFudanPed', get_transform(train=False))

    # split the dataset in train and test set
    indices = torch.randperm(len(dataset)).tolist()
    dataset = torch.utils.data.Subset(dataset, indices[:-50])
    dataset_test = torch.utils.data.Subset(dataset_test, indices[-50:])

    # define training and validation data loaders
    data_loader = torch.utils.data.DataLoader(
        dataset, batch_size=2, shuffle=True, num_workers=4,
        collate_fn=utils.collate_fn)

    data_loader_test = torch.utils.data.DataLoader(
        dataset_test, batch_size=1, shuffle=False, num_workers=4,
        collate_fn=utils.collate_fn)

    # get the model using our helper function
    model = get_model_instance_segmentation(num_classes)

    # move model to the right device
    model.to(device)

    # construct an optimizer
    params = [p for p in model.parameters() if p.requires_grad]
    optimizer = torch.optim.SGD(params, lr=0.005,
                                momentum=0.9, weight_decay=0.0005)
    # and a learning rate scheduler
    lr_scheduler = torch.optim.lr_scheduler.StepLR(optimizer,
                                                   step_size=3,
                                                   gamma=0.1)

    # let's train it for 10 epochs
    num_epochs = 10

    for epoch in range(num_epochs):
        # train for one epoch, printing every 10 iterations
        train_one_epoch(model, optimizer, data_loader, device, epoch, print_freq=10)
        # update the learning rate
        lr_scheduler.step()
        # evaluate on the test dataset
        evaluate(model, data_loader_test, device=device)

    print("That's it!")

因为我用自己电脑学习的，不带有GPU，发现需要一个小时多才跑完60个epoch，所以放弃运行，学习了官网给的结果分析

5.总结

在本教程中，学习了如何在自定义数据集上为实例分段模型创建自己的训练管道。为此，编写了一个 torch.utils.data.Dataset 类，它返回图像以及地面实况框和分割掩码。还利用了在COCO train2017上预训练的Mask R-CNN模型，以便对此新数据集执行传输学习。

有关包含multi-machine / multi-gpu training的更完整示例，请检查 torchvision 存储库中的references/detection/train.py 。

可以在此处下载本教程的完整源文件。

版权声明：本文为CSDN博主「小Aer」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/qq_41542989/article/details/122913418

微调基于 torchvision 0.3的目标检测模型（pytorch官方教程）

1.定义数据集

2.为 PennFudan 编写自定义数据集

2.1 下载数据集

2.2 为数据集编写类

3.定义模型

3.1 PennFudan 数据集的实例分割模型

4.整合

4.1 为数据扩充/转换编写辅助函数：

4.2 编写执行训练和验证的主要功能

5.总结

从yolov5谈 Backbone neck和head

论文笔记（十八）：Object Detection and Spatial Location Method for ... Based on 3D Virtual Geographical Scen

小Aer

暂无评论

发表评论取消回复

1.定义数据集

2.为 PennFudan 编写自定义数据集

2.1 下载数据集

2.2 为数据集编写类

3.定义模型

3.1 PennFudan 数据集的实例分割模型

4.整合

4.1 为数据扩充/转换编写辅助函数：

4.2 编写执行训练和验证的主要功能

5.总结

从yolov5谈 Backbone neck和head

论文笔记（十八）：Object Detection and Spatial Location Method for ... Based on 3D Virtual Geographical Scen

小Aer

暂无评论

发表评论 取消回复

相关推荐

发表评论取消回复