文章目录[隐藏]
Faster R-CNN 目标检测算法
Towards Real-Time Object Detection with Region Proposal Networks
R-CNN:Regions with CNN features
- Input image
- Extract region proposals(~2k)
- Compute CNN features
- Classify regions
IoU Intersection over Union
测量在特定数据集中检测相应物体准确度的一个标准
预测范围: bounding boxex
ground-truth bounding boxes(人为在训练集图像中标出要检测物体的大概范围)
I
o
U
=
A
r
e
a
o
f
O
v
e
r
l
a
p
A
r
e
a
o
f
U
n
i
o
n
IoU = \frac{Area\quad of\quad Overlap}{Area \quad of \quad Union}
IoU=AreaofUnionAreaofOverlap
NMS (Non-Maximum Suppression)
Fast R-CNN
Selection search
Anchor sliding window Feature extraction
RPN Loss
Cls label 二分类,是否有物体,使用IoU gt bounding box anchor box
Loc label
t
x
∗
=
(
x
∗
−
x
a
)
/
w
a
,
t
y
∗
=
(
y
∗
−
y
a
)
/
h
a
,
t
w
∗
=
l
o
g
(
w
∗
/
w
a
)
,
t
h
∗
=
l
o
g
(
t
w
∗
)
t_x^* = (x^*-x_a)/w_a, t_y^* = (y^*-y_a)/h_a,\\ t_w^* = log(w^*/w_a), t_h^* = log(t_w^*)
tx∗=(x∗−xa)/wa,ty∗=(y∗−ya)/ha,tw∗=log(w∗/wa),th∗=log(tw∗)
t
x
=
(
x
−
x
a
)
/
w
a
,
t
y
=
(
y
−
y
a
)
/
h
a
,
t
w
=
l
o
g
(
w
/
w
a
)
,
t
h
=
l
o
g
(
h
/
h
a
)
t_x = (x-x_a)/w_a, t_y = (y-ya)/h_a,\\ t_w = log(w/w_a), t_h = log(h/h_a)
tx=(x−xa)/wa,ty=(y−ya)/ha,tw=log(w/wa),th=log(h/ha)
Cls loss
Cross Entropy交叉熵
Loc Loss
z
i
=
0.5
(
x
i
−
y
i
)
2
/
b
e
t
a
,
i
f
∣
x
i
−
y
i
∣
<
b
e
t
a
z
i
=
∣
x
i
−
y
i
∣
−
0.5
∗
b
e
t
a
,
o
t
h
e
r
w
i
s
e
z_i = 0.5(x_i-y_i)^2/beta, \quad if |x_i-y_i|<beta\\ z_i = |x_i-y_i|-0.5*beta, \quad otherwise
zi=0.5(xi−yi)2/beta,if∣xi−yi∣<betazi=∣xi−yi∣−0.5∗beta,otherwise
RoI Head Region of Interest
Mask R-CNN
L
=
L
c
l
s
+
L
b
o
x
+
L
m
a
s
k
L = L_{cls}+L_{box}+L_{mask}
L=Lcls+Lbox+Lmask
To this we apply a per-pixel sigmoid,and define
L
m
a
s
k
L_{mask}
Lmask as the average binary cross-entropy loss. For an RoI associated with gorund-truth k,
L
m
a
s
k
L_{mask}
Lmask is only defined o the k-th mask(other mask outputs do not contribute to the loss).
RoI Align不对齐,保留浮点,在小区域之内继续划分
CTPN 文字检测算法
Detecting Text in Natural Image with Connectionist Text Proposal Network
- Detecting text in fine-scale proposals
- Recurrent connectionist text proposals
- Side-refinement
v
c
=
(
c
y
−
c
y
a
)
/
h
a
v
c
∗
=
(
c
y
∗
−
c
y
a
)
/
h
a
v
h
=
l
o
g
(
h
/
h
a
)
v
h
∗
=
l
o
g
(
h
∗
/
h
a
)
v_c= (c_y-c_y^a)/h^a\\ v_c^* = (c_y^*-c_y^a)/h^a\\ v_h = log(h/h_a)\\ v_h^* = log(h^*/h^a)
vc=(cy−cya)/havc∗=(cy∗−cya)/havh=log(h/ha)vh∗=log(h∗/ha)
Text line construction
o
∗
=
(
x
s
i
d
e
∗
−
c
x
a
)
/
w
a
o^* = (x^*_{side} -c^a_x)/w^a
o∗=(xside∗−cxa)/wa
Code
bounding box
CRNN 文字识别算法
An End-yo-End Trainable Neural Network for Image-based Sequence Recognition and Its Application to Scene Text Recognition
-
CRNN
-
Code
-
CTC
-
lexicon-based
-
lexicon-free
feature sequence —— receptive field感受野
CRNN——CTC
π
=
−
−
h
h
−
e
−
l
−
l
l
−
o
o
−
−
B
(
π
)
=
h
e
l
l
o
p
(
l
∣
y
)
=
∑
π
:
B
(
π
)
=
1
p
(
π
∣
y
)
,
p
(
′
h
e
l
l
o
′
∣
y
)
=
∑
π
:
B
(
π
)
=
′
h
e
l
l
o
′
p
(
π
∣
y
)
\pi = --hh-e-l-ll-oo--\\ B(\pi) = hello\\ p(l|y) = \sum_{\pi:B(\pi)=1} p(\pi|y), \quad p('hello'|y) = \sum_{\pi:B(\pi)='hello'} p(\pi|y)
π=−−hh−e−l−ll−oo−−B(π)=hellop(l∣y)=π:B(π)=1∑p(π∣y),p(′hello′∣y)=π:B(π)=′hello′∑p(π∣y)
CTC Theory
p
(
l
∣
x
)
=
∑
π
∈
B
−
1
(
1
)
p
(
π
∣
x
)
.
h
(
x
)
=
a
r
g
m
a
x
1
∈
L
≤
T
p
(
l
∣
x
)
.
O
M
L
(
S
,
N
w
)
=
−
∑
(
x
,
z
)
∈
S
l
n
(
p
(
z
∣
x
)
)
=
−
∑
(
x
,
z
)
∈
S
l
n
(
∑
π
∈
B
−
1
(
z
)
p
(
π
∣
x
)
)
p(l|x) = \sum_{\pi \in B^{-1}(1)} p(\pi|x).\\ h(x) = arg\quad max_{1\in L\leq T} \quad p(l|x).\\ O^{ML}(S,N_w) = -\sum_{(x,z)\in S} ln(p(z|x))=-\sum_{(x,z) \in S} ln(\sum_{\pi \in B^{-1}(z)} p(\pi |x))
p(l∣x)=π∈B−1(1)∑p(π∣x).h(x)=argmax1∈L≤Tp(l∣x).OML(S,Nw)=−(x,z)∈S∑ln(p(z∣x))=−(x,z)∈S∑ln(π∈B−1(z)∑p(π∣x))
为了让所有的path都能在图中唯一、合法的表示,结点转换有如下约束:
- 转换只能往右下方向,其他方向不允许
- 相同的字符之间起码要有一个空字符
- 非空字符不能被跳过
- 起点必须从前两个字符开始
- 终点必须落在结尾两个字符
forward-backward
定义在时刻t经过节点s的全部前缀子路径的概率总和为前向概率
α
t
(
s
)
\alpha_t(s)
αt(s)
α
3
(
4
)
=
p
(
a
p
)
+
p
(
a
a
p
)
+
p
(
a
p
)
+
p
(
a
p
p
)
\alpha_3(4) = p(_ap)+p(aap)+p(a_p)+p(app)
α3(4)=p(ap)+p(aap)+p(ap)+p(app)
-
情况1:第s个符号为空符号blank
α
t
(
s
)
=
(
α
t
−
1
(
s
)
+
α
t
−
1
(
s
−
1
)
)
⋅
y
s
e
q
(
s
)
t
\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1))·y^t_{seq(s)}
αt(s)=(αt−1(s)+αt−1(s−1))⋅yseq(s)t
-
情况2:第s个符号等于第s-2个符号
α
t
(
s
)
=
(
α
t
−
1
(
s
)
+
α
t
−
1
(
s
−
1
)
)
⋅
y
s
e
q
(
s
)
t
\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1))·y^t_{seq(s)}
αt(s)=(αt−1(s)+αt−1(s−1))⋅yseq(s)t
-
情况3:既不属于情况1,也不属于情况2
α
t
(
s
)
=
(
α
t
−
1
(
s
)
+
α
t
−
1
(
s
−
1
)
+
α
t
−
1
(
s
−
2
)
)
⋅
y
s
e
q
(
s
)
t
\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1)+\alpha_{t-1}(s-2))·y^t_{seq(s)}
αt(s)=(αt−1(s)+αt−1(s−1)+αt−1(s−2))⋅yseq(s)t
不属于情况2
α
t
(
s
)
=
(
α
t
−
1
(
s
)
+
α
t
−
1
(
s
−
1
)
+
α
t
−
1
(
s
−
2
)
)
⋅
y
s
e
q
(
s
)
t
\alpha_t(s) = (\alpha_{t-1}(s)+\alpha_{t-1}(s-1)+\alpha_{t-1}(s-2))·y^t_{seq(s)}
αt(s)=(αt−1(s)+αt−1(s−1)+αt−1(s−2))⋅yseq(s)t
版权声明:本文为CSDN博主「Cachel wood」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_46530492/article/details/121885721
暂无评论