目标识别和文字检测算法 Faster R-CNN、CTPN

文章目录[隐藏]

Faster R-CNN 目标检测算法
Mask R-CNN
CTPN 文字检测算法
Detecting Text in Natural Image with Connectionist Text Proposal Network
Text line construction
Code
CRNN 文字识别算法
CTC Theory

Faster R-CNN 目标检测算法

Towards Real-Time Object Detection with Region Proposal Networks

R-CNN：Regions with CNN features

Input image
Extract region proposals(~2k)
Compute CNN features
Classify regions

IoU Intersection over Union

测量在特定数据集中检测相应物体准确度的一个标准

预测范围： bounding boxex

ground-truth bounding boxes（人为在训练集图像中标出要检测物体的大概范围）

IoU = \frac{Area\quad of\quad Overlap}{Area \quad of \quad Union}

$I o U = \frac{A r e a o f O v e r l a p}{A r e a o f U n i o n}$

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-IldTlrdy-1639284143003)(Detecting%20Text%20in%20Natural%20Image%20with%20Connectionist%20Text%20Proposal%20Network.assets/watermark,type_ZmFuZ3poZW5naGVpdGk,shadow_10,text_aHR0cHM6Ly9ibG9nLmNzZG4ubmV0L2dhb3l1MTI1MzQwMTU2Mw==,size_16,color_FFFFFF,t_70.png)]

NMS (Non-Maximum Suppression)

Fast R-CNN

[外链图片转存失败,源站可能有防盗链机制,建议将图片保存下来直接上传(img-YaZeS7QC-1639284143006)(Detecting%20Text%20in%20Natural%20Image%20with%20Connectionist%20Text%20Proposal%20Network.assets/image-20211211202345500.png)]

Selection search

Anchor sliding window Feature extraction

RPN Loss

Cls label 二分类，是否有物体，使用IoU gt bounding box anchor box

Loc label

∗

(

∗

−

)

∗

(

∗

−

)

∗

(

∗

)

∗

(

∗

)

t_x^* = (x^*-x_a)/w_a, t_y^* = (y^*-y_a)/h_a,\\ t_w^* = log(w^*/w_a), t_h^* = log(t_w^*)

$t_{x *} = (x^{*} - x_{a}) / w_{a}, t_{y *} = (y^{*} - y_{a}) / h_{a}, t_{w *} = l o g (w^{*} / w_{a}), t_{h *} = l o g (t_{w *})$

(

−

)

(

−

)

(

)

(

)

t_x = (x-x_a)/w_a, t_y = (y-ya)/h_a,\\ t_w = log(w/w_a), t_h = log(h/h_a)

$t_{x} = (x - x_{a}) / w_{a}, t_{y} = (y - y a) / h_{a}, t_{w} = l o g (w / w_{a}), t_{h} = l o g (h / h_{a})$

Cls loss

Cross Entropy交叉熵

Loc Loss

0.5

(

−

)

∣

−

∣

−

∣

−

0.5

∗

z_i = 0.5(x_i-y_i)^2/beta, \quad if |x_i-y_i|<beta\\ z_i = |x_i-y_i|-0.5*beta, \quad otherwise

$z_{i} = 0.5 (x_{i} - y_{i})^{2} / b e t a, i f ∣ x_{i} - y_{i} ∣ < b e t a z_{i} = ∣ x_{i} - y_{i} ∣ - 0.5 * b e t a, o t h e r w i s e$

RoI Head Region of Interest

Mask R-CNN

L = L_{cls}+L_{box}+L_{mask}

$L = L_{c l s} + L_{b o x} + L_{m a s k}$

To this we apply a per-pixel sigmoid,and define

L_{mask}

$L_{m a s k}$ as the average binary cross-entropy loss. For an RoI associated with gorund-truth k,

L_{mask}

$L_{m a s k}$ is only defined o the k-th mask(other mask outputs do not contribute to the loss).

RoI Align不对齐，保留浮点，在小区域之内继续划分

CTPN 文字检测算法

Detecting Text in Natural Image with Connectionist Text Proposal Network

Detecting text in fine-scale proposals
Recurrent connectionist text proposals
Side-refinement

(

−

)

∗

(

∗

−

)

(

)

∗

(

∗

)

v_c= (c_y-c_y^a)/h^a\\ v_c^* = (c_y^*-c_y^a)/h^a\\ v_h = log(h/h_a)\\ v_h^* = log(h^*/h^a)

$v_{c} = (c_{y} - c_{y a}) / h^{a} v_{c *} = (c_{y *} - c_{y a}) / h^{a} v_{h} = l o g (h / h_{a}) v_{h *} = l o g (h^{*} / h^{a})$

Text line construction

∗

(

∗

−

)

o^* = (x^*_{side} -c^a_x)/w^a

$o^{*} = (x_{s i d e *} - c_{x a}) / w^{a}$

Code

bounding box

CRNN 文字识别算法

An End-yo-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition

CRNN
Code
CTC
lexicon-based
lexicon-free

feature sequence —— receptive field感受野

CRNN——CTC

−

(

)

(

∣

)

∑

(

)

(

∣

)

(

′

∣

)

∑

(

)

′

(

∣

)

\pi = --hh-e-l-ll-oo--\\ B(\pi) = hello\\ p(l|y) = \sum_{\pi:B(\pi)=1} p(\pi|y), \quad p('hello'|y) = \sum_{\pi:B(\pi)='hello'} p(\pi|y)

$π = - - h h - e - l - l l - o o - - B (π) = h e l l o p (l ∣ y) = π : B (π) = 1 \sum p (π ∣ y), p (^{'} h e l l o^{'} ∣ y) = π : B (π) =^{'} h e l l o^{'} \sum p (π ∣ y)$

CTC Theory

(

∣

)

∑

∈

−

(

)

(

∣

)

(

)

∈

≤

(

∣

)

(

)

−

∑

(

)

∈

(

∣

)

−

∑

(

)

∈

(

∑

∈

−

(

)

(

∣

)

p(l|x) = \sum_{\pi \in B^{-1}(1)} p(\pi|x).\\ h(x) = arg\quad max_{1\in L\leq T} \quad p(l|x).\\ O^{ML}(S,N_w) = -\sum_{(x,z)\in S} ln(p(z|x))=-\sum_{(x,z) \in S} ln(\sum_{\pi \in B^{-1}(z)} p(\pi |x))

$p (l ∣ x) = π \in B^{- 1} (1) \sum p (π ∣ x) . h (x) = a r g m a x_{1 \in L \leq T} p (l ∣ x) . O^{M L} (S, N_{w}) = - (x, z) \in S \sum l n (p (z ∣ x)) = - (x, z) \in S \sum l n (π \in B^{- 1} (z) \sum p (π ∣ x))$

为了让所有的path都能在图中唯一、合法的表示，结点转换有如下约束：

转换只能往右下方向，其他方向不允许
相同的字符之间起码要有一个空字符
非空字符不能被跳过
起点必须从前两个字符开始
终点必须落在结尾两个字符

forward-backward

定义在时刻t经过节点s的全部前缀子路径的概率总和为前向概率

(

)

\alpha_t(s)

$α_{t} (s)$

(

)

(

)

(

)

(

)

(

)

\alpha_3(4) = p(_ap)+p(aap)+p(a_p)+p(app)

$α_{3} (4) = p (_{a} p) + p (a a p) + p (a_{p}) + p (a p p)$

情况1：第s个符号为空符号blank

α

t

(

s

)

=

(

α

t

−

1

(

s

)

+

α

t

−

1

(

s

−

1

)

)

⋅

y

s

e

q

(

s

)

t

\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1))·y^t_{seq(s)}

$α_{t} (s) = (α_{t - 1} (s) + α_{t - 1} (s - 1)) \cdot y_{s e q (s) t}$
情况2：第s个符号等于第s-2个符号

α

t

(

s

)

=

(

α

t

−

1

(

s

)

+

α

t

−

1

(

s

−

1

)

)

⋅

y

s

e

q

(

s

)

t

\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1))·y^t_{seq(s)}

$α_{t} (s) = (α_{t - 1} (s) + α_{t - 1} (s - 1)) \cdot y_{s e q (s) t}$
情况3：既不属于情况1，也不属于情况2

α

t

(

s

)

=

(

α

t

−

1

(

s

)

+

α

t

−

1

(

s

−

1

)

+

α

t

−

1

(

s

−

2

)

)

⋅

y

s

e

q

(

s

)

t

\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1)+\alpha_{t-1}(s-2))·y^t_{seq(s)}

$α_{t} (s) = (α_{t - 1} (s) + α_{t - 1} (s - 1) + α_{t - 1} (s - 2)) \cdot y_{s e q (s) t}$

不属于情况2

(

)

(

−

(

)

−

(

−

)

−

(

−

)

⋅

(

)

\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1)+\alpha_{t-1}(s-2))·y^t_{seq(s)}

$α_{t} (s) = (α_{t - 1} (s) + α_{t - 1} (s - 1) + α_{t - 1} (s - 2)) \cdot y_{s e q (s) t}$
[外链图片转存中...(img-GAanawON-1639284143009)]

版权声明：本文为CSDN博主「Cachel wood」的原创文章，遵循CC 4.0 BY-SA版权协议，转载请附上原文出处链接及本声明。
原文链接：https://blog.csdn.net/weixin_46530492/article/details/121885721

目标识别和文字检测算法 Faster R-CNN、CTPN

Faster R-CNN 目标检测算法

Mask R-CNN

CTPN 文字检测算法

Detecting Text in Natural Image with Connectionist Text Proposal Network

Text line construction

Code

CRNN 文字识别算法

CTC Theory

论文笔记：EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

目标检测综述——单阶段检测器

Cachel wood

暂无评论

发表评论取消回复

Faster R-CNN 目标检测算法

Mask R-CNN

CTPN 文字检测算法

Detecting Text in Natural Image with Connectionist Text Proposal Network

Text line construction

Code

CRNN 文字识别算法

CTC Theory

论文笔记：EPNet: Enhancing Point Features with Image Semantics for 3D Object Detection

目标检测综述——单阶段检测器

Cachel wood

暂无评论

发表评论 取消回复

相关推荐

发表评论取消回复