调试YOLOv3/YOLOv5过程中遇到的各种问题

码遇到的各种error

接上一篇YOLOv3-Pytorch版本自己学习及训练数据的记录!


.cfg文件版本中遇到的

1. OSError: 页面文件太小,无法完成操作;BrokenPipeError; Error loading caffe2_detectron_ops_gpu.dll

OSError: 页面文件太小,无法完成操作。
BrokenPipeError: [Errno 32] Broken pipe
Error loading “D:\Anaconda3\envs\py36\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll” or one of its dependencies.

num_workers改成0
train.py传入参数那里改,如果没有的话就是在前面dataloader改在这里插入图片描述

2. RuntimeError: CUDA out of memory.

形如RuntimeError: CUDA out of memory. Tried to allocate 1.04 GiB (GPU 0; 4.00 GiB total capacity; 86.63 MiB already allocated; 2.52 GiB free; 94.00 MiB reserved in total by PyTorch)
显存不够,调小训练的batch-size,其他进程关掉点或者重启一下电脑

3. 至今还不会解决的:RuntimeError:Expected all tensors tobe on the same device, but found at least two devices,cuda:0 and cpu!

在这里插入图片描述
用CPU可以训练,但是–device 0 命令就会报错,搜了一圈都解决不了T T 还好yaml版的我可以用(
先留在这

.yaml文件版本中遇到的

1. yaml文件报错AttributeError: ‘str’ object has no attribute ‘get’

我这个是报错在自己数据集的.yaml文件,修改确认写的路径正确就不会报错了。在这里插入图片描述

2.UnicodeDecodeError:’gbk’ codec can’t decode byte 0xae in position - : illegal multibyte sequence

UnicodeDecodeError:’gbk’ codec can’t decode byte 0xae in position42 : illegal multibyte sequence
找到报错对应位置,看看有没有with open()命令,加上encoding=‘utf-8’
在这里插入图片描述

3. TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.

File “D:\Anaconda3\envs\py38\lib\site-packages\torch\tensor.py”, line 621, in __array__return self.numpy();TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
在报错的地方.numpy()前面加个.cpu()
在这里插入图片描述

4. RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

File “D:\cxy\PyTorch_YOLOv3-master\PyTorch_YOLOv3-master\models\yolo_layer.py”, line 103, in forwardreturn pred.view(batchsize, -1, n_ch).data;RuntimeError: view size is not compatible with input tensor’s size and stride (at least one dimension spans across two contiguous subspaces). Use .reshape(…) instead.

别人的博客: 这是因为view()需要Tensor中的元素地址是连续的,但可能出现Tensor不连续的情况,所以先用 .contiguous() 将其在内存中变成连续分布:

在.view()前加.contiguous()

在这里插入图片描述

5. 用detect.py检测图片发现什么目标都识别不出来,用初始yolov3.pt也没有结果

可能和我一样需要修改detect.py文件中的一个地方
在这里插入图片描述
添加我标的这行,也就是和上面4行一样的

记录时间2021/3/24

6. OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.

video 1/1 (2/129) d:\testvideo\test01.mp4:
OMP: Error #15: Initializing libiomp5md.dll, but found libiomp5md.dll already initialized.
OMP: Hint This means that multiple copies of the OpenMP runtime have been linked into the program. That is dangerous, since it can degrade performance or cause incorrect resul
ts. The best thing to do is to ensure that only a single OpenMP runtime is linked into the process, e.g. by avoiding static linking of the OpenMP runtime in any library. As an
unsafe, unsupported, undocumented workaround you can set the environment variable KMP_DUPLICATE_LIB_OK=TRUE to allow the program to continue to execute, but that may cause cr
ashes or silently produce incorrect results. For more information, please see http://www.intel.com/software/products/support/.

import os
os.environ['KMP_DUPLICATE_LIB_OK'] = 'True'  # 加上这句话不报错

记录时间2021/4/1

版权声明:本文为CSDN博主「clnnnnn」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_45033788/article/details/115160288

clnnnnn

我还没有学会写个人说明!

暂无评论

发表评论

相关推荐

目标检测-锚框概念和代码实现

前言 经历过图像分类后,进一步的就是更复杂的目标检测了,从这一章开始,将会不断记录图像目标检测中的学习经历,其中大多数思路以及代码来源,来自于李沐的动手学深度学习课程&#x