訂閱
糾錯(cuò)
加入自媒體

深蘭科技|目標(biāo)檢測二十年間的那些事兒

(3) Fast RCNN

2015年,R. Girshick提出了Fast RCNN檢測器[19],這是對R-CNN和SPPNet的進(jìn)一步改進(jìn)。Fast RCNN使我們能夠在相同的網(wǎng)絡(luò)配置下同時(shí)訓(xùn)練檢測器和邊界框回歸器。在VOC07數(shù)據(jù)集上,F(xiàn)ast RCNN將mAP從58.5%( RCNN)提高到70.0%,檢測速度是R-CNN的200多倍。

雖然Fast-RCNN成功地融合了R-CNN和SPPNet的優(yōu)點(diǎn),但其檢測速度仍然受到提案檢測的限制。然后,一個(gè)問題自然而然地出現(xiàn)了:“ 我們能用CNN模型生成對象提案嗎? ” 之后的Faster R-CNN解決了這個(gè)問題。

(4) Faster RCNN

2015年,S. Ren等人提出了Faster RCNN檢測器[20],在Fast RCNN之后不久。Faster RCNN 是第一個(gè)端到端的,也是第一個(gè)接近實(shí)時(shí)的深度學(xué)習(xí)檢測器(COCO mAP@.5=42.7%,COCO mAP@[.5,.95]=21.9%, VOC07 mAP=73.2%,VOC12 mAP=70.4%)。Faster RCNN的主要貢獻(xiàn)是引入了區(qū)域提案網(wǎng)絡(luò) (RPN)從而允許幾乎所有的cost-free的區(qū)域提案。從RCNN到Faster RCNN,一個(gè)目標(biāo)檢測系統(tǒng)中的大部分獨(dú)立塊,如提案檢測、特征提取、邊界框回歸等,都已經(jīng)逐漸集成到一個(gè)統(tǒng)一的端到端學(xué)習(xí)框架中。

雖然Faster RCNN突破了Fast RCNN的速度瓶頸,但是在后續(xù)的檢測階段仍然存在計(jì)算冗余。后來提出了多種改進(jìn)方案,包括RFCN和 Light head RCNN。

(5) Feature Pyramid Networks(FPN)

2017年,T.-Y.Lin等人基于Faster RCNN提出了特征金字塔網(wǎng)絡(luò)(FPN)[21]。在FPN之前,大多數(shù)基于深度學(xué)習(xí)的檢測器只在網(wǎng)絡(luò)的頂層進(jìn)行檢測。雖然CNN較深層的特征有利于分類識別,但不利于對象的定位。為此,開發(fā)了具有橫向連接的自頂向下體系結(jié)構(gòu),用于在所有級別構(gòu)建高級語義。由于CNN通過它的正向傳播,自然形成了一個(gè)特征金字塔,F(xiàn)PN在檢測各種尺度的目標(biāo)方面顯示出了巨大的進(jìn)步。在基礎(chǔ)的Faster RCNN系統(tǒng)中使用FPN骨架可在無任何修飾的條件下在MS-COCO數(shù)據(jù)集上以單模型實(shí)現(xiàn)state-of-the-art 的效果(COCO mAP@.5=59.1%,COCO mAP@[.5,.95]= 36.2%)。FPN現(xiàn)在已經(jīng)成為許多最新探測器的基本組成部分。

基于卷積神經(jīng)網(wǎng)絡(luò)的單級檢測器

單階段檢測的發(fā)展及各類檢測器的結(jié)構(gòu)[2]

(1) You Only Look Once (YOLO)

YOLO由R. Joseph等人于2015年提出[22]。它是深度學(xué)習(xí)時(shí)代的第一個(gè)單級檢測器。YOLO非?欤篩OLO的一個(gè)快速版本運(yùn)行速度為155fps, VOC07 mAP=52.7%,而它的增強(qiáng)版本運(yùn)行速度為45fps, VOC07 mAP=63.4%, VOC12 mAP=57.9%。YOLO是“ You Only Look Once ” 的縮寫。從它的名字可以看出,作者完全拋棄了之前的“提案檢測+驗(yàn)證”的檢測范式。相反,它遵循一個(gè)完全不同的設(shè)計(jì)思路:將單個(gè)神經(jīng)網(wǎng)絡(luò)應(yīng)用于整個(gè)圖像。該網(wǎng)絡(luò)將圖像分割成多個(gè)區(qū)域,同時(shí)預(yù)測每個(gè)區(qū)域的邊界框和概率。后來R. Joseph在 YOLO 的基礎(chǔ)上進(jìn)行了一系列改進(jìn),其中包括以路徑聚合網(wǎng)絡(luò)(Path aggregation Network, PAN)取代FPN,定義新的損失函數(shù)等,陸續(xù)提出了其 v2、v3及v4版本(截止本文的2020年7月,Ultralytics發(fā)布了“YOLO v5”,但并沒有得到官方承認(rèn)),在保持高檢測速度的同時(shí)進(jìn)一步提高了檢測精度。

必須指出的是,盡管與雙級探測器相比YOLO的探測速度有了很大的提高,但它的定位精度有所下降,特別是對于一些小目標(biāo)而言。YOLO的后續(xù)版本及在它之后提出的SSD更關(guān)注這個(gè)問題。

(2) Single Shot MultiBox Detector (SSD)

SSD由W. Liu等人于2015年提出[23]。這是深度學(xué)習(xí)時(shí)代的第二款單級探測器。SSD的主要貢獻(xiàn)是引入了多參考和多分辨率檢測技術(shù),這大大提高了單級檢測器的檢測精度,特別是對于一些小目標(biāo)。SSD在檢測速度和準(zhǔn)確度上都有優(yōu)勢(VOC07 mAP=76.8%,VOC12 mAP=74.9%, COCO mAP@.5=46.5%,mAP@[.5,.95]=26.8%,快速版本運(yùn)行速度為59fps) 。SSD與其他的檢測器的主要區(qū)別在于,前者在網(wǎng)絡(luò)的不同層檢測不同尺度的對象,而后者僅在其頂層運(yùn)行檢測。

(3) RetinaNet

單級檢測器有速度快、結(jié)構(gòu)簡單的優(yōu)點(diǎn),但在精度上多年來一直落后于雙級檢測器。T.-Y.Lin等人發(fā)現(xiàn)了背后的原因,并在2017年提出了RetinaNet[24]。他們的觀點(diǎn)為精度不高的原因是在密集探測器訓(xùn)練過程中極端的前景-背景階層不平衡(the extreme foreground-background class imbalance)現(xiàn)象。為此,他們在RetinaNet中引入了一個(gè)新的損失函數(shù) “ 焦點(diǎn)損失(focal loss)”,通過對標(biāo)準(zhǔn)交叉熵?fù)p失的重構(gòu),使檢測器在訓(xùn)練過程中更加關(guān)注難分類的樣本。焦損耗使得單級檢測器在保持很高的檢測速度的同時(shí),可以達(dá)到與雙級檢測器相當(dāng)?shù)木取#–OCO mAP@.5=59.1%,mAP@[.5, .95]=39.1% )。

參考文獻(xiàn):

[1]Zhengxia Zou, Zhenwei Shi, Member, IEEE, Yuhong Guo, and Jieping Ye, Object Detection in 20 Years: A Survey Senior Member, IEEE

[2]Xiongwei Wu, Doyen Sahoo, Steven C.H. Hoi, Recent Advances in Deep Learning for Object Detection, arXiv:1908.03673v1

[3]K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in: CVPR, 2016.

[4]R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: CVPR, 2014.

[5]K. He, G. Gkioxari, P. Dollar, R. Girshick, Mask r-cnn, in: ICCV, 2017.

[6]L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, Semantic image segmentation with deep convolutional nets and fully connected crfs, in: arXiv preprint arXiv:1412.7062, 2014.

[7]Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” nature, vol. 521, no. 7553, p. 436, 2015.

[8]P. Viola and M. Jones, “Rapid object detection using a boosted cascade of simple features,” in Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, vol. 1. IEEE, 2001, pp. I–I.

[9]P. Viola and M. J. Jones, “Robust real-time face detection,” International journal of computer vision, vol. 57, no. 2, pp. 137–154, 2004.

[10]C. Papageorgiou and T. Poggio, “A trainable system for object detection,” International journal of computer vision, vol. 38, no. 1, pp. 15–33, 2000.

[11]N. Dalal and B. Triggs, “Histograms of oriented gradients for human detection,” in Computer Vision and Pattern Recognition, 2005. CVPR 2005. IEEE Computer Society Conference on, vol. 1. IEEE, 2005, pp. 886–893.

[12]P. Felzenszwalb, D. McAllester, and D. Ramanan, “A discriminatively trained, multiscale, deformable part model,” in Computer Vision and Pattern Recognition, 2008. CVPR 2008. IEEE Conference on. IEEE, 2008, pp. 1–8.

[13]P. F. Felzenszwalb, R. B. Girshick, and D. McAllester, “Cascade object detection with deformable part models,” in Computer vision and pattern recognition (CVPR), 2010 IEEE conference on. IEEE, 2010, pp. 2241–2248.

[14]P. F. Felzenszwalb, R. B. Girshick, D. McAllester, and D. Ramanan, “Object detection with discriminatively trained part-based models,” IEEE transactions on pattern analysis and machine intelligence, vol. 32, no. 9, pp. 1627– 1645, 2010.

[15]A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” in Advances in neural information processing systems, 2012, pp. 1097–1105.

[16]R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Regionbased convolutional networks for accurate object detection and segmentation,” IEEE transactions on pattern analysis and machine intelligence, vol. 38, no. 1, pp. 142– 158, 2016.

[17]K. E. Van de Sande, J. R. Uijlings, T. Gevers, and A. W. Smeulders, “Segmentation as selective search for object recognition,” in Computer Vision (ICCV), 2011 IEEE International Conference on. IEEE, 2011, pp. 1879–1886.

[18]K. He, X. Zhang, S. Ren, and J. Sun, “Spatial pyramid pooling in deep convolutional networks for visualrecognition,” in European conference on computer vision. Springer, 2014, pp. 346–361.

[19]R. Girshick, “Fast r-cnn,” in Proceedings of the IEEE international conference on computer vision, 2015, pp. 1440–1448.

[20]S. Ren, K. He, R. Girshick, and J. Sun, “Faster r-cnn: Towards real-time object detection with region proposal networks,” in Advances in neural information processing systems, 2015, pp. 91–99.

[21]T.-Y. Lin, P. Dollar, R. B. Girshick, K. He, B. Hariharan, and S. J. Belongie, “Feature pyramid networks for object detection.” in CVPR, vol. 1, no. 2, 2017, p. 4.

[22]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look once: Unified, real-time object detection,” in Proceedings of the IEEE conference on computer vision and pattern recognition, 2016, pp. 779–788.

[23]W. Liu, D. Anguelov, D. Erhan, C. Szegedy, S. Reed, C.-Y. Fu, and A. C. Berg, “Ssd: Single shot multibox detector,” in European conference on computer vision. Springer, 2016, pp. 21–37.

[24]T.-Y. Lin, P. Goyal, R. Girshick, K. He, and P. Dollar, “Focal loss for dense object detection,” IEEE transactions on pattern analysis and machine intelligence, 2018.

<上一頁  1  2  3  
聲明: 本文由入駐維科號的作者撰寫,觀點(diǎn)僅代表作者本人,不代表OFweek立場。如有侵權(quán)或其他問題,請聯(lián)系舉報(bào)。

發(fā)表評論

0條評論,0人參與

請輸入評論內(nèi)容...

請輸入評論/評論長度6~500個(gè)字

您提交的評論過于頻繁,請輸入驗(yàn)證碼繼續(xù)

  • 看不清,點(diǎn)擊換一張  刷新

暫無評論

暫無評論

    掃碼關(guān)注公眾號
    OFweek人工智能網(wǎng)
    獲取更多精彩內(nèi)容
    文章糾錯(cuò)
    x
    *文字標(biāo)題:
    *糾錯(cuò)內(nèi)容:
    聯(lián)系郵箱:
    *驗(yàn) 證 碼:

    粵公網(wǎng)安備 44030502002758號