《智能计算系统》实验-7-1-YOLOv3

文章目录[隐藏]

一、搭建环境
二、nms_detection.h实现
三、算子集成
四、框架编译
五、在线推理
六、可能遇到的问题

在做《智能计算系统》综合实验7-1-YOLOv3时，遇到了很多问题，实验书过程不全，现将整个实验流程梳理如下，以对其他读者有所裨益：

一、搭建环境

新建容器v7（非v7-update1）

二、nms_detection.h实现

1. 补全nms_detection.h，实现函数:

#define T half // ./plugin_yolov3_detection_helper.h

__mlu_func__ void nms_detection(
        int& output_box_num,
        T* output_data,
        Addr dst,
        T* input_data_score,
        T* input_data_box,
        Addr src,
        T *buffer,
        int buffer_size,
        T* sram,
        SplitMode split_mode,
        int input_box_num,
        int input_stride,
        int output_stride,
        int keepNum,
        T thresh_iou,
        T thresh_score,
        int save_method){...}

只需考虑以下情况：

src == NRAM
split_mode == NMS_BLOCK
save_method == 1
MODE == 1

此时，系统工作在单核模式下，输入数据存放在NRAM上，满足向量对其要求，且计算空间充分大，数据保存模式为score---, x1---, y1---, x2---, y2---。

上述条件下，在边界框保存阶段，每次搜索到的先保存在NRAM_save空间上，若存储的框个数M到达一定数量N(output_data_AddrType == NRAM ? N = 0 : N = 256)可批量拷贝到具体位置；当前max_box_Score <= thresh_score时，若目的空间为SRAM/GDRAM，将NRAM_save空间数据拷贝到目的空间并break，若为NRAM直接break。

if (output_data_AddrType != NRAM && output_box_num != 0)
{
    ...
    if ((M == N) || (max_box_Score <= thresh_score))
    {
        __memcpy(...)
        ...
    }
}

if (max_box_Score <= thresh_score)     break;

2. 将/opt/code_chap_7_student/yolov3/bangc/PluginYolov3DetectionOutputOp复制到/opt/code_chap_7_student/env/Cambricon-CNPlugin-MLU270/pluginops路径下

3. 初始化环境
每次进入系统，都需进入env目录, 执行 source env.sh命令

cd /opt/code_chap_7_student/env
source env.sh

4. 编译

cd /opt/code_chap_7_student/env/Cambricon-CNPlugin-MLU270
./build_cnplugin.sh

5. 将./build/libcnplugin.so复制到../neuware/lib64

cp ./build/libcnplugin.so /opt/code_chap_7_student/env/neuware/lib64

6. 将./pluginops/PluginYolov3DetectionOutputOp/cnplugin.h复制到./neuware/include

cp ./pluginops/PluginYolov3DetectionOutputOp/cnplugin.h /opt/code_chap_7_student/env/neuware/include

三、算子集成

补全/opt/code_chap_7_student/yolov3/tf-implementation/tf-1.14-detectionoutput目录下文件。

这部分内容的具体介绍见TensorFlow的自定义算子实现。

1. MLULib封装

/*
    mlu_lib_ops.h //line 924
    mlu_lib_ops.cc //line 1918
*/
tensorflow::Status CreateYolov3DetectionOutputOp(
   MLUBaseOp** op, 
   MLUTensor** input_tensors, 
   MLUTensor** output_tensors,
   cnmlPluginYolov3DetectionOutputOpParam_t param){...}
   
 tensorflow::Status ComputeYolov3DetectionOutputOp(
   MLUBaseOp* op,
   MLUCnrtQueue* queue,
   void* inputs[], 
   int input_num,
   void* outputs[],
   int output_num){...}

2. MLUOp封装

/*
    mlu_ops.h //line 530
*/
struct MLUYolov3DetectionOutputOpParam{};
DECLARE_OP_CLASS(MLUYolov3DetectionOutput);

/*
    yolov3detectionoutput.cc //line 12
*/
Status MLUYolov3DetectionOutput::CreateMLUOp(std::vector<MLUTensor*> &inputs, std::vector<MLUTensor*> &outputs, void *param){...}

Status MLUYolov3DetectionOutput::Compute(const std::vector<void *> &inputs,
 const std::vector<void *> &outputs, cnrtQueue_t queue){...}

3. MLUStream封装

/*
    mlu_stream.h //line 141
*/
Status Yolov3DetectionOutput(
     OpKernelContext* ctx,
     Tensor* tensor_input0,
     Tensor* tensor_input1,
     Tensor* tensor_input2,
     int batchNum,
     int inputNum,
     int classNum,
     int maskGroupNum,
     int maxBoxNum,
     int netw,
     int neth,
     float confidence_thresh,
     float nms_thresh,
     int* inputWs,
     int* inputHs,
     float* biases,
     Tensor* output1,
     Tensor* output2){...}

4. MLUOpKernel封装

/*
    yolov3_detection_output_op_mlu.h //line 49
*/
void ComputeOnMLU(OpKernelContext* context) override{...}

/*
    yolov3_detection_output_op.cc //line 23
*/
namespace tensorflow{...}

5. 算子注册

/*
    image_ops.cc //line 1007
*/
REGISTER_OP("Yolov3DetectionOutput"){...}

四、框架编译

1. BUILD修改(已完成)

2. 将/opt/code_chap_7_student/yolov3/tf-implementation/tf-1.14-detectionoutput下各文件依次放入对应文件夹，可利用cp命令：

cp ./tf-implementation/tf-1.14-detectionoutput/BUILD ../env/tensorflow-v1.10/tensorflow/core/kernels/BUILD
cp ./tf-implementation/tf-1.14-detectionoutput/image_ops.cc ../env/tensorflow-v1.10/tensorflow/core/ops/image_ops.cc
cp ./tf-implementation/tf-1.14-detectionoutput/yolov3_detection_output_op.cc ../env/tensorflow-v1.10/tensorflow/core/kernels/yolov3_detection_output_op.cc
cp ./tf-implementation/tf-1.14-detectionoutput/yolov3_detection_output_op_mlu.h ../env/tensorflow-v1.10/tensorflow/core/kernels/yolov3_detection_output_op_mlu.h
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_lib_ops.cc ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/lib_ops/mlu_lib_ops.cc
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_lib_ops.h ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/lib_ops/mlu_lib_ops.h
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_ops.h ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/ops/mlu_ops.h
cp ./tf-implementation/tf-1.14-detectionoutput/yolov3detectionoutput.cc ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_api/ops/yolov3detectionoutput.cc
cp ./tf-implementation/tf-1.14-detectionoutput/mlu_stream.h ../env/tensorflow-v1.10/tensorflow/stream_executor/mlu/mlu_stream.h

3. 框架编译

rm -rf /root/.cache/bazel/_bazel_root
cd /opt/code_chap_7_student/env/tensorflow-v1.10
./build_tensorflow-v1.10_mlu.sh

在编译时，要先删除/root/.cache/bazel/_bazel_root文件夹，若报错，则删除/root/.cache/bazel/_bazel_root/*重试

五、在线推理

1. pb->pbtxt

1）将/opt/Cambricon-Test/models/yolov3/目录下yolov3_int8_bang_shape_new.pb复制到/opt/code_chap_7_student/yolov3/yolov3-bcl/demo目录

cd /opt/code_chap_7_student/yolov3/yolov3-bcl/demo
cp /opt/Cambricon-Test/models/yolov3/yolov3_int8_bang_shape_new.pb ./

2）将.pb转为.pbtxt

python /opt/code_chap_7_student/tools/pb_to_pbtxt/pb_to_pbtxt.py yolov3_int8_bang_shape_new.pb yolov3_int8_bang_shape_new.pbtxt

2. 修改yolov3_int8_bang_shape_new.pbtxt，添加node{...}

执行该过程时，由于.pbtxt文件过大，打开文件并修改导致连接断开，可利用shell命令添加相关内容。例如，将node{...}存放在pb_node_append.txt文件中（包含library{...}），类似：

//pb_node_append.txt
node {...}
library {...}

执行

sed -i "/^library/,/^38\n}$/d" yolov3_int8_bang_shape_new.pbtxt
cat ./pb_node_append.txt >> yolov3_int8_bang_shape_new.pbtxt

3. pbtxt->pb

python /opt/code_chap_7_student/tools/pbtxt_to_pb/pbtxt_to_pb.py ./yolov3_int8_bang_shape_new.pbtxt yolov3_int8.pb

4. 修改./run_evaluate.sh中MODEL_PATH="./yolov3_int8.pb"

5. 运行./run_aicse.sh

`六、可能遇到的问题`

1. 框架编译时执行./build_tensorflow-v1.10_mlu.sh，fetching不通过，删除/root/.cache/bazel/_bazel_root/*重试

2. ***cannot find ，需要初始化环境变量，即在./env目录下执行source env.sh命令，需要注意的是，每一次登录后进入开发容器均需要执行source env.sh命令

3. 框架编译时类似如下错误

Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v2 failed (Exit 1): bash failed: error executing command

/opt/code_chap_7_student/env/tensorflow-v1.10/tensorflow/python/keras/api/BUILD:28:1: Executing genrule //tensorflow/python/keras/api:keras_python_api_gen_compat_v1 failed (Exit 1): bash failed: error executing command

需要确定是否在对CNPlugin编译后，将./build/libcnplugin.so复制到./neuware/lib64。

七、其他

1. 开发手册下载：文档中心 – 寒武纪开发者社区

2. BUILD文件语法：bazel C++语法入门 - 简书

3. PB文件格式：Tensorflow模型持久化与恢复_jinying2224的博客-CSDN博客

版权声明：本文为CSDN博主「继明照于四方」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/weixin_40943865/article/details/122059436