三维目标检测：（五）如何将pytorch模型部署到C++工程中及pytorch模型转libtorch模型常见的问题

文章目录[隐藏]

如何将pytorch中的python模型转化为libtorch中的C++模型
如何使用接口
常踩的坑

如何将pytorch中的python模型转化为libtorch中的C++模型

1.为什么我们要这么做？
我们编写神经网络和进行前向推理都是在pytorch的基础上来做的，我们希望把python神经网络部署到c++工程中需要想一些办法。
当然可以通过Ros，但是通用的Ros支持的python版本是2.7，现在多数开源的pytorch代码都是用python3来写的，而且还是没有从根本上解决python部分速度较慢的问题。
后来想到看一看c++中有没有类似与pytorch的包，还真有，而且pytorch为我们提供了接口，将python转化为C++。
2.TorchScript
TorchScript是一种从PyTorch代码创建可序列化和可优化模型的方法。任何TorchScript程序都可以从Python进程中保存，并加载到没有Python依赖的进程中。
我们提供了一些工具来增量地将模型从纯Python程序转换为能够独立于Python运行的TorchScript程序，例如在独立的c++程序中。这使得使用熟悉的Python工具在PyTorch中训练模型，然后通过TorchScript将模型导出到生产环境中成为可能，在这种环境中，Python程序可能由于性能和多线程的原因不适用。

如何使用接口

## 1.针对比较简单的神经网络（不包含判断和循环等）

torch.jit.trace(Module,input)

这种方法只针对很简单的网络架构比较有效，就是追踪一次输入所走的路线，根据路线推导出网络的架构，可想而知这种方法对于网络结构复杂，比如判断等都会失效。

class MyModule(nn.Module):
    def __init__(self):
        super(MyModule,self).__init__()
        self.conv1 = nn.Conv2d(1,3,3)

    def forward(self,x):
    	x = self.conv1(x)
        return x

model = MyModule()  # 实例化模型
trace_module = torch.jit.trace(model,torch.rand(1,1,224,224)) 
print(trace_module.code)  # 查看模型结构
output = trace_module (torch.ones(1, 3, 224, 224)) # 测试
print(output)
trace_modult('model.pt') # 模型保存.

## 2.针对比较复杂的神经网络
当模型中有控制流的时候，只能选择这种方式

script_module = torch.jit.script(model)

import torch
import os
from pointnet2_cls_won_model import *
# 用于定义存储的路径
BASE_DIR = os.path.dirname(os.path.abspath(__file__))
ROOT_DIR = BASE_DIR
# 加载权重文件
checkpoint = torch.load(ROOT_DIR + '/checkpoints/best_model.pth')
# 获取模型
classifier = get_model(num_class=10, normal_channel=False)
# 将参数加载到模型中
classifier.load_state_dict(checkpoint['model_state_dict'])
# 输出为.pt文件
scripted_gate = torch.jit.script(classifier)
scripted_gate.save(ROOT_DIR + "script_model_1.pt")

常踩的坑

1.All inputs of range must be ints, found Tensor in argument 0:

由于C++的数据类型要求比python严格，所以对于网络中所有Tensor类型以外的函数参数，都需要用：指明数据类型

def sample_and_group(npoint:int, radius:float, nsample:int, xyz, points, returnfps:bool=False):
"""
    Input:
        npoint:
        radius:
        nsample:
        xyz: input points position data, [B, N, 3]
        points: input points data, [B, N, D]
    Return:
        new_xyz: sampled points position data, [B, npoint, nsample, 3]
        new_points: sampled points data, [B, npoint, nsample, 3+D]
    """

2.Sliced expression not yet supported for subscripted assignment. File a bug if you want this:

切片操作是python中所特有的操作方式，所以要用循环代替切片操作

# 原代码：
view_shape[1:] = [1] * (len(view_shape) - 1)
# 更改后：
for i in range(1, len(view_shape)):
        view_shape[i] = 1

3.Tried to access nonexistent attribute or method ‘len’ of type ‘torch.torch.nn.modules.container.ModuleList’. Did you forget to initialize an attribute in init()?

forward函数中不支持len(nn.ModuleList())和下标访问，如果是一个ModuleList()可以用enumerate函数，多个同维度的可以用zip函数

# 源代码
for i, conv in enumerate(self.mlp_convs):
      bn = self.mlp_bns[i]
      new_points = F.relu(bn(conv(new_points)))
# 更改后的代码
for conv, bn in zip(self.mlp_convs, self.mlp_bns):
        new_points = F.relu(bn(conv(new_points)))

4.Previous return statement returned a value of type Tuple[Tensor, Tensor, Tensor, Tensor] but this return statement returns a value of type Tuple[Tensor, Tensor]:

这类问题是由于python函数可以由条件语句引发不同数量的返回值，但是c++不允许

# 源代码
    if returnfps:
        return new_xyz, new_points,grouped_points,idx
    else:
        return new_xyz, new_points
# 要不就根据判断条件更改为两个不同的函数，我这个条件默认为false，所以我直接暴力删除
if returnfps:
        return new_xyz, new_points
    else:
        return new_xyz, new_points

5.Expected a value of type ‘int’ for argument ‘npoint’ but instead found type ‘None’.

这是由于python中允许Nonetype类型的数据出现，c++中是不允许的，所以要根据条件再生成其他的函数。

# 比如我这里给函数的值为None 用于后续判断
self.sa2 = PointNetSetAbstraction(npoint=128, radius=0.4, nsample=64, in_channel=128 + 3, mlp=[128, 128, 256], group_all=False)
self.sa3 = PointNetSetAbstraction(npoint=None, radius=None, nsample=None, in_channel=256 + 3, mlp=[256, 512, 1024], group_all=True)
# None值在这里可以用到
    if points is not None:
        grouped_points = index_points(points, idx)
        new_points = torch.cat([grouped_xyz_norm, grouped_points], dim=-1) # [B, npoint, nsample, C+D]
    else:
        new_points = grouped_xyz_norm
# 我可以增加一个标志位，代替points来判断这个条件，或者再生成一个类，专门处理points=None这种情况
self.sa3 = PointNetSetAbstraction(npoint=100, radius=100, nsample=100, in_channel=256 + 3, mlp=[256, 512, 1024], group_all=True，if_points=False)
# 或者
self.sa3 = PointNetSetAbstractionNone(in_channel=256 + 3, mlp=[256, 512, 1024], group_all=True)

6.Expected integer literal for index

索引应该为整数类型，处理方法类似与问题3

7.Arguments for call are not valid. The following variants are available

赋值类型不对，需求是tensor，但给的是int，将int类型的数N用torch.tensor(N)代替

# 源代码
mask = sqrdists > radius ** 2
group_idx[mask] = N
# 修改后的代码
mask = sqrdists > radius ** 2
group_idx[mask] = torch.tensor(N)

8.Expected a value of type ‘Tensor (inferred)’ for argument ‘points’ but insted of Optional[Tensor]，

在TorchScript中，有一种Optional类型，举例：在一个函数中，如果可以通过if控制来返回None或者tensor，那么这个返回值会被认定为Optional[Tensor]，这会导致无法对该返回值使用tensor的内置方法或属性，比如tensor.shape,tensor.size()等；
解决方法就是不要用None作为判断条件，可以定义标志位或者重新定义函数或者类

# 源代码
    def forward(self, xyz):
        B, _, _ = xyz.shape
        if self.normal_channel:
            norm = xyz[:, 3:, :]
            xyz = xyz[:, :3, :]
        else:
            norm = None
        l1_xyz, l1_points = self.sa1(xyz, norm)
 # 更改定义
 self.sa1 = PointNetSetAbstractionFirst(npoint=512, radius=0.2, nsample=32, in_channel=in_channel, mlp=[64, 64, 128])
 self.sa2 = PointNetSetAbstraction(npoint=128, radius=0.4, nsample=64, in_channel=128 + 3, mlp=[128, 128, 256], group_all=False)
 self.sa3 = PointNetSetAbstractionLast(in_channel=256 + 3, mlp=[256, 512, 1024])