AlphaTree:Object Detection 物体检测

0 / 895

Object Detection 物体检测

RCNN FastRCNN FasterRCNN为一脉相承。另外两个方向为Yolo 和SSD。Yolo迭代到Yolo V3,SSD的设计也让它后来在很多方向都有应用。

Christian Szegedy / Google 用AlexNet也做过物体检测的尝试。

[1] Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. "Deep neural networks for object detection." Advances in Neural Information Processing Systems. 2013. pdf

不过真正取得巨大突破,引发基于深度学习目标检测的热潮的还是RCNN

但是如果将如何检测出区域,按照回归问题的思路去解决,预测出(x,y,w,h)四个参数的值,从而得出方框的位置。回归问题的训练参数收敛时间要长很多,于是将回归问题转成分类问题来解决。总共两个步骤:

第一步:将图片转换成不同大小的框,
第二步:对框内的数据进行特征提取,然后通过分类器判定,选区分最高的框作为物体定位框。

old.pngcompare.png

评价标准: IoU(Intersection over Union); mAP(Mean Average Precision) 速度:帧率FPS
iou.png
obj.png

link

  • Method
    -[SPPNet]

  • [Two-Stage Object Detection】

    • [R-CNN]
    • [Fast R-CNN]
    • [Faster R-CNN]
  • [Single-Shot Object Detection]

    • [YOLO]
    • [YOLOv2]
    • [YOLOv3]
    • [SSD]
    • [RetinaNet]
  • [Great improvement]

  • [R-FCN]

  • Feature Pyramid Network (FPN)

Method

SPPNet 何凯明 He Kaiming /MSRA

  • SPPNet Spatial Pyramid Pooling(空间金字塔池化)
    [3] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. pdf

一般CNN后接全连接层或者分类器,他们都需要固定的输入尺寸,因此不得不对输入数据进行crop或者warp,这些预处理会造成数据的丢失或几何的失真。SPP Net的提出,将金字塔思想加入到CNN,实现了数据的多尺度输入。此时网络的输入可以是任意尺度的,在SPP layer中每一个pooling的filter会根据输入调整大小,而SPP的输出尺度始终是固定的。

spp.png

这样打破了之前大家认为需要先提出检测框,然后resize到一个固定尺寸再通过CNN的模式,而可以图片先通过CNN获取到特征后,在特征图上使用不同的检测框提取特征。之后pooling到同样尺寸进行后续步骤。这样可以提高物体检测速度。

Two-Stage Object Detection

  • RCNN R-CNN框架,取代传统目标检测使用的滑动窗口+手工设计特征,而使用CNN来进行特征提取。这是深度神经网络的应用。

Traditional region proposal methods + CNN classifier

也就是将第二步改成了深度神经网络提取特征。
然后通过线性svm分类器识别对象的的类别,再通过回归模型用于收紧边界框;
创新点:将CNN用在物体检测上,提高了检测率。
缺点: 基于选择性搜索算法为每个图像提取2,000个候选区域,使用CNN为每个图像区域提取特征,重复计算,速度慢,40-50秒。

R-CNN在PASCAL VOC2007上的检测结果提升到66%(mAP)

rcnn

[2] SGirshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. pdf

github: https://github.com/rbgirshick/rcnn

intro: R-CNN
arxiv: http://arxiv.org/abs/1311.2524
supp: http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf
slides: http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf
slides: http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf
github: https://github.com/rbgirshick/rcnn
notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/
caffe-pr(“Make R-CNN the Caffe detection example”): https://github.com/BVLC/caffe/pull/482

Fast RCNN Ross B. Girshick

  • Fast RCNN
    [4] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.

如果RCNN的卷积计算只需要计算一次,那么速度就可以很快降下来了。

Ross Girshick将SPPNet的方法应用到RCNN中,提出了一个可以看做单层sppnet的网络层,叫做ROI Pooling,这个网络层可以把不同大小的输入映射到一个固定尺度的特征向量.将图像输出到CNN生成卷积特征映射。使用这些特征图结合候选区域算法提取候选区域。然后,使用RoI池化层将所有可能的区域重新整形为固定大小,以便将其馈送到全连接网络中。

1.首先将图像作为输入;
2.将图像传递给卷积神经网络,计算卷积后的特征。
3.然后通过之前proposal的方法提取ROI,在所有的感兴趣的区域上应用RoI池化层,并调整区域的尺寸。然后,每个区域被传递到全连接层的网络中;
4.softmax层用于全连接网以输出类别。与softmax层一起,也并行使用线性回归层,以输出预测类的边界框坐标。

fastrcnn

Fast R-CNN

arxiv: http://arxiv.org/abs/1504.08083
slides: http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
github: https://github.com/rbgirshick/fast-rcnn
github(COCO-branch): https://github.com/rbgirshick/fast-rcnn/tree/coco
webcam demo: https://github.com/rbgirshick/fast-rcnn/pull/29
notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/
notes: http://blog.csdn.net/linj_m/article/details/48930179
github(“Fast R-CNN in MXNet”): https://github.com/precedenceguo/mx-rcnn
github: https://github.com/mahyarnajibi/fast-rcnn-torch
github: https://github.com/apple2373/chainer-simple-fast-rnn
github: https://github.com/zplizzi/tensorflow-fast-rcnn

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

intro: CVPR 2017
arxiv: https://arxiv.org/abs/1704.03414
paper: http://abhinavsh.info/papers/pdfs/adversarial_object_detection.pdf
github(Caffe): https://github.com/xiaolonw/adversarial-frcnn

Faster RCNN 何凯明 He Kaiming

  • Faster RCNN
    Fast RCNN的区域提取还是使用的传统方法,而Faster RCNN将Region Proposal Network和特征提取、目标分类和边框回归统一到了一个框架中。

Faster R-CNN = Region Proposal Network +Fast R-CNN

fasterrcnn1

fasterrcnn

fasterrcnn2

将区域提取通过一个CNN完成。这个CNN叫做Region Proposal Network,RPN的运用使得region proposal的额外开销就只有一个两层网络。关于RPN可以参考link

rpn

Faster R-CNN设计了提取候选区域的网络RPN,代替了费时的Selective Search(选择性搜索),使得检测速度大幅提升,下表对比了R-CNN、Fast R-CNN、Faster R-CNN的检测速度:

speed

[5] Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

R-CNN minus R

Faster R-CNN in MXNet with distributed implementation and data parallelization

Contextual Priming and Feedback for Faster R-CNN

An Implementation of Faster RCNN with Study for Region Sampling

Interpretable R-CNN

Light-Head R-CNN: In Defense of Two-Stage Object Detector

Cascade R-CNN: Delving into High Quality Object Detection

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

SMC Faster R-CNN: Toward a scene-specialized multi-object detector

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Robust Physical Adversarial Attack on Faster R-CNN Object Detector

Auto-Context R-CNN

Grid R-CNN

Grid R-CNN Plus: Faster and Better

Few-shot Adaptive Faster R-CNN

Libra R-CNN: Towards Balanced Learning for Object Detection

Rethinking Classification and Localization in R-CNN

Reprojection R-CNN: A Fast and Accurate Object Detector for 360° Images

Single-Shot Object Detection

Yolo

  • Yolo(You only look once)

    yolologo

    YOLO的检测思想不同于R-CNN系列的思想,它将目标检测作为回归任务来解决。YOLO 的核心思想就是利用整张图作为网络的输入,直接在输出层回归 bounding box(边界框) 的位置及其所属的类别。

    yolo

    yolo

    [6] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015). pdfYOLO,Oustanding Work, really practical
    PPT

c 官方: https://pjreddie.com/darknet/yolo/ v3
https://pjreddie.com/darknet/yolov2/ v2
https://pjreddie.com/darknet/yolov1/ v1

pytorch (tencent) v1, v2, v3 :https://github.com/TencentYoutuResearch/ObjectDetection-OneStageDet

yolo 介绍 可以参考介绍

darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++

Start Training YOLO with Our Own Data

YOLO: Core ML versus MPSNNGraph

TensorFlow YOLO object detection on Android

Computer Vision in iOS – Object Detection

YOLOv2

YOLO9000: Better, Faster, Stronger

darknet_scripts

Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2

LightNet: Bringing pjreddie’s DarkNet out of the shadows

YOLO v2 Bounding Box Tool

YOLOv3

YOLOv3: An Incremental Improvement

Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving

YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers

Spiking-YOLO: Spiking Neural Network for Real-time Object Detection

SSD(The Single Shot Detector) 详解 detail

What’s the diffience in performance between this new code you pushed and the previous code?

DSSD : Deconvolutional Single Shot Detector

Enhancement of SSD by concatenating feature maps for object detection

Context-aware Single-Shot Detector

Feature-Fused SSD: Fast Detection for Small Objects

FSSD: Feature Fusion Single Shot Multibox Detector

Weaving Multi-scale Context for Single Shot Detector

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Residual Features and Unified Prediction Network for Single Stage Detection

FPN

FPN(feature pyramid networks)特征金字塔,是一种融合了多层特征信息的特征提取方法,可以结合各种深度神经网络使用。
SSD的多尺度特征融合的方式,没有上采样过程,没有用到足够低层的特征(在SSD中,最低层的特征是VGG网络的conv4_3)

fpn

Feature Pyramid Networks for Object Detection pdf

Feature Pyramid Networks for Object Detection

Action-Driven Object Detection with Top-Down Visual Attentions

arxiv: https://arxiv.org/abs/1612.06704

Beyond Skip Connections: Top-Down Modulation for Object Detection

Wide-Residual-Inception Networks for Real-time Object Detection

Attentional Network for Visual Object Detection

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Spatial Memory for Context Reasoning in Object Detection

Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection

https://arxiv.org/abs/1704.05775

LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

Point Linking Network for Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection

https://arxiv.org/abs/1706.05274

Few-shot Object Detection

https://arxiv.org/abs/1706.08249

Yes-Net: An effective Detector Based on Global Information

https://arxiv.org/abs/1706.09180

Towards lightweight convolutional neural networks for object detection

https://arxiv.org/abs/1707.01395

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Deformable Part-based Fully Convolutional Network for Object Detection

Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors

Recurrent Scale Approximation for Object Detection in CNN

DSOD: Learning Deeply Supervised Object Detectors from Scratch

Object Detection from Scratch with Deep Supervision

https://arxiv.org/abs/1809.09294

CoupleNet: Coupling Global Structure with Local Parts for Object Detection

Incremental Learning of Object Detectors without Catastrophic Forgetting

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

https://arxiv.org/abs/1709.04347

StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

https://arxiv.org/abs/1709.05788

Dynamic Zoom-in Network for Fast Object Detection in Large Images

https://arxiv.org/abs/1711.05187

Zero-Annotation Object Detection with Web Knowledge Transfer

MegDet: A Large Mini-Batch Object Detector

Receptive Field Block Net for Accurate and Fast Object Detection

An Analysis of Scale Invariance in Object Detection - SNIP

Feature Selective Networks for Object Detection

https://arxiv.org/abs/1711.08879

Learning a Rotation Invariant Detector with Rotatable Bounding Box

Scalable Object Detection for Stylized Objects

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

Deep Regionlets for Object Detection

Training and Testing Object Detectors with Virtual Images

Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video

  • keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation
  • arxiv: https://arxiv.org/abs/1712.08832

Spot the Difference by Object Detection

Localization-Aware Active Learning for Object Detection

Object Detection with Mask-based Feature Encoding

https://arxiv.org/abs/1802.03934

LSTD: A Low-Shot Transfer Detector for Object Detection

Pseudo Mask Augmented Object Detection

https://arxiv.org/abs/1803.05858

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection

Learning Region Features for Object Detection

Object Detection for Comics using Manga109 Annotations

Task-Driven Super Resolution: Object Detection in Low-resolution Images

https://arxiv.org/abs/1803.11316

Transferring Common-Sense Knowledge for Object Detection

https://arxiv.org/abs/1804.01077

Multi-scale Location-aware Kernel Representation for Object Detection

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

DetNet: A Backbone network for Object Detection

AdvDetPatch: Attacking Object Detectors with Adversarial Patches

https://arxiv.org/abs/1806.02299

Attacking Object Detectors via Imperceptible Patches on Background

https://arxiv.org/abs/1809.05966

Physical Adversarial Examples for Object Detectors

Object detection at 200 Frames Per Second

Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images

SNIPER: Efficient Multi-Scale Training

Soft Sampling for Robust Object Detection

https://arxiv.org/abs/1806.06986

MetaAnchor: Learning to Detect Objects with Customized Anchors

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Pooling Pyramid Network for Object Detection

Modeling Visual Context is Key to Augmenting Object Detection Datasets

Acquisition of Localization Confidence for Accurate Object Detection

CornerNet: Detecting Objects as Paired Keypoints

Unsupervised Hard Example Mining from Videos for Improved Object Detection

SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection

https://arxiv.org/abs/1808.04974

A Survey of Modern Object Detection Literatur