AlphaTree:Object Detection 物体检测

0 / 605

Object Detection 物体检测

RCNN FastRCNN FasterRCNN为一脉相承。另外两个方向为Yolo 和SSD。Yolo迭代到Yolo V3,SSD的设计也让它后来在很多方向都有应用。

Christian Szegedy / Google 用AlexNet也做过物体检测的尝试。

[1] Szegedy, Christian, Alexander Toshev, and Dumitru Erhan. "Deep neural networks for object detection." Advances in Neural Information Processing Systems. 2013. pdf

不过真正取得巨大突破,引发基于深度学习目标检测的热潮的还是RCNN

但是如果将如何检测出区域,按照回归问题的思路去解决,预测出(x,y,w,h)四个参数的值,从而得出方框的位置。回归问题的训练参数收敛时间要长很多,于是将回归问题转成分类问题来解决。总共两个步骤:

第一步:将图片转换成不同大小的框,
第二步:对框内的数据进行特征提取,然后通过分类器判定,选区分最高的框作为物体定位框。

old.pngcompare.png

评价标准: IoU(Intersection over Union); mAP(Mean Average Precision) 速度:帧率FPS
iou.png
obj.png

link

  • Method
    -[SPPNet]

  • [Two-Stage Object Detection】

    • [R-CNN]
    • [Fast R-CNN]
    • [Faster R-CNN]
  • [Single-Shot Object Detection]

    • [YOLO]
    • [YOLOv2]
    • [YOLOv3]
    • [SSD]
    • [RetinaNet]
  • [Great improvement]

  • [R-FCN]

  • Feature Pyramid Network (FPN)

Method

SPPNet 何凯明 He Kaiming /MSRA

  • SPPNet Spatial Pyramid Pooling(空间金字塔池化)
    [3] He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." European Conference on Computer Vision. Springer International Publishing, 2014. pdf

一般CNN后接全连接层或者分类器,他们都需要固定的输入尺寸,因此不得不对输入数据进行crop或者warp,这些预处理会造成数据的丢失或几何的失真。SPP Net的提出,将金字塔思想加入到CNN,实现了数据的多尺度输入。此时网络的输入可以是任意尺度的,在SPP layer中每一个pooling的filter会根据输入调整大小,而SPP的输出尺度始终是固定的。

spp.png

这样打破了之前大家认为需要先提出检测框,然后resize到一个固定尺寸再通过CNN的模式,而可以图片先通过CNN获取到特征后,在特征图上使用不同的检测框提取特征。之后pooling到同样尺寸进行后续步骤。这样可以提高物体检测速度。

Two-Stage Object Detection

  • RCNN R-CNN框架,取代传统目标检测使用的滑动窗口+手工设计特征,而使用CNN来进行特征提取。这是深度神经网络的应用。

Traditional region proposal methods + CNN classifier

也就是将第二步改成了深度神经网络提取特征。
然后通过线性svm分类器识别对象的的类别,再通过回归模型用于收紧边界框;
创新点:将CNN用在物体检测上,提高了检测率。
缺点: 基于选择性搜索算法为每个图像提取2,000个候选区域,使用CNN为每个图像区域提取特征,重复计算,速度慢,40-50秒。

R-CNN在PASCAL VOC2007上的检测结果提升到66%(mAP)

rcnn

[2] SGirshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." Proceedings of the IEEE conference on computer vision and pattern recognition. 2014. pdf

github: https://github.com/rbgirshick/rcnn

intro: R-CNN
arxiv: http://arxiv.org/abs/1311.2524
supp: http://people.eecs.berkeley.edu/~rbg/papers/r-cnn-cvpr-supp.pdf
slides: http://www.image-net.org/challenges/LSVRC/2013/slides/r-cnn-ilsvrc2013-workshop.pdf
slides: http://www.cs.berkeley.edu/~rbg/slides/rcnn-cvpr14-slides.pdf
github: https://github.com/rbgirshick/rcnn
notes: http://zhangliliang.com/2014/07/23/paper-note-rcnn/
caffe-pr(“Make R-CNN the Caffe detection example”): https://github.com/BVLC/caffe/pull/482

Fast RCNN Ross B. Girshick

  • Fast RCNN
    [4] Girshick, Ross. "Fast r-cnn." Proceedings of the IEEE International Conference on Computer Vision. 2015.

如果RCNN的卷积计算只需要计算一次,那么速度就可以很快降下来了。

Ross Girshick将SPPNet的方法应用到RCNN中,提出了一个可以看做单层sppnet的网络层,叫做ROI Pooling,这个网络层可以把不同大小的输入映射到一个固定尺度的特征向量.将图像输出到CNN生成卷积特征映射。使用这些特征图结合候选区域算法提取候选区域。然后,使用RoI池化层将所有可能的区域重新整形为固定大小,以便将其馈送到全连接网络中。

1.首先将图像作为输入;
2.将图像传递给卷积神经网络,计算卷积后的特征。
3.然后通过之前proposal的方法提取ROI,在所有的感兴趣的区域上应用RoI池化层,并调整区域的尺寸。然后,每个区域被传递到全连接层的网络中;
4.softmax层用于全连接网以输出类别。与softmax层一起,也并行使用线性回归层,以输出预测类的边界框坐标。

fastrcnn

Fast R-CNN

arxiv: http://arxiv.org/abs/1504.08083
slides: http://tutorial.caffe.berkeleyvision.org/caffe-cvpr15-detection.pdf
github: https://github.com/rbgirshick/fast-rcnn
github(COCO-branch): https://github.com/rbgirshick/fast-rcnn/tree/coco
webcam demo: https://github.com/rbgirshick/fast-rcnn/pull/29
notes: http://zhangliliang.com/2015/05/17/paper-note-fast-rcnn/
notes: http://blog.csdn.net/linj_m/article/details/48930179
github(“Fast R-CNN in MXNet”): https://github.com/precedenceguo/mx-rcnn
github: https://github.com/mahyarnajibi/fast-rcnn-torch
github: https://github.com/apple2373/chainer-simple-fast-rnn
github: https://github.com/zplizzi/tensorflow-fast-rcnn

A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection

intro: CVPR 2017
arxiv: https://arxiv.org/abs/1704.03414
paper: http://abhinavsh.info/papers/pdfs/adversarial_object_detection.pdf
github(Caffe): https://github.com/xiaolonw/adversarial-frcnn

Faster RCNN 何凯明 He Kaiming

  • Faster RCNN
    Fast RCNN的区域提取还是使用的传统方法,而Faster RCNN将Region Proposal Network和特征提取、目标分类和边框回归统一到了一个框架中。

Faster R-CNN = Region Proposal Network +Fast R-CNN

fasterrcnn1

fasterrcnn

fasterrcnn2

将区域提取通过一个CNN完成。这个CNN叫做Region Proposal Network,RPN的运用使得region proposal的额外开销就只有一个两层网络。关于RPN可以参考link

rpn

Faster R-CNN设计了提取候选区域的网络RPN,代替了费时的Selective Search(选择性搜索),使得检测速度大幅提升,下表对比了R-CNN、Fast R-CNN、Faster R-CNN的检测速度:

speed

[5] Ren, Shaoqing, et al. "Faster R-CNN: Towards real-time object detection with region proposal networks." Advances in neural information processing systems. 2015.

Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

R-CNN minus R

Faster R-CNN in MXNet with distributed implementation and data parallelization

Contextual Priming and Feedback for Faster R-CNN

An Implementation of Faster RCNN with Study for Region Sampling

Interpretable R-CNN

Light-Head R-CNN: In Defense of Two-Stage Object Detector

Cascade R-CNN: Delving into High Quality Object Detection

Cascade R-CNN: High Quality Object Detection and Instance Segmentation

SMC Faster R-CNN: Toward a scene-specialized multi-object detector

Domain Adaptive Faster R-CNN for Object Detection in the Wild

Robust Physical Adversarial Attack on Faster R-CNN Object Detector

Auto-Context R-CNN

Grid R-CNN

Grid R-CNN Plus: Faster and Better

Few-shot Adaptive Faster R-CNN

Libra R-CNN: Towards Balanced Learning for Object Detection

Rethinking Classification and Localization in R-CNN

Reprojection R-CNN: A Fast and Accurate Object Detector for 360° Images

Single-Shot Object Detection

Yolo

  • Yolo(You only look once)

    yolologo

    YOLO的检测思想不同于R-CNN系列的思想,它将目标检测作为回归任务来解决。YOLO 的核心思想就是利用整张图作为网络的输入,直接在输出层回归 bounding box(边界框) 的位置及其所属的类别。

    yolo

    yolo

    [6] Redmon, Joseph, et al. "You only look once: Unified, real-time object detection." arXiv preprint arXiv:1506.02640 (2015). pdfYOLO,Oustanding Work, really practical
    PPT

c 官方: https://pjreddie.com/darknet/yolo/ v3
https://pjreddie.com/darknet/yolov2/ v2
https://pjreddie.com/darknet/yolov1/ v1

pytorch (tencent) v1, v2, v3 :https://github.com/TencentYoutuResearch/ObjectDetection-OneStageDet

yolo 介绍 可以参考介绍

darkflow - translate darknet to tensorflow. Load trained weights, retrain/fine-tune them using tensorflow, export constant graph def to C++

Start Training YOLO with Our Own Data

YOLO: Core ML versus MPSNNGraph

TensorFlow YOLO object detection on Android

Computer Vision in iOS – Object Detection

YOLOv2

YOLO9000: Better, Faster, Stronger

darknet_scripts

Yolo_mark: GUI for marking bounded boxes of objects in images for training Yolo v2

LightNet: Bringing pjreddie’s DarkNet out of the shadows

YOLO v2 Bounding Box Tool

YOLOv3

YOLOv3: An Incremental Improvement

Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving

YOLO-LITE: A Real-Time Object Detection Algorithm Optimized for Non-GPU Computers

Spiking-YOLO: Spiking Neural Network for Real-time Object Detection

SSD(The Single Shot Detector) 详解 detail

What’s the diffience in performance between this new code you pushed and the previous code?

DSSD : Deconvolutional Single Shot Detector

Enhancement of SSD by concatenating feature maps for object detection

Context-aware Single-Shot Detector

Feature-Fused SSD: Fast Detection for Small Objects

FSSD: Feature Fusion Single Shot Multibox Detector

Weaving Multi-scale Context for Single Shot Detector

Extend the shallow part of Single Shot MultiBox Detector via Convolutional Neural Network

Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection

MDSSD: Multi-scale Deconvolutional Single Shot Detector for small objects

Accurate Single Stage Detector Using Recurrent Rolling Convolution

Residual Features and Unified Prediction Network for Single Stage Detection

FPN

FPN(feature pyramid networks)特征金字塔,是一种融合了多层特征信息的特征提取方法,可以结合各种深度神经网络使用。
SSD的多尺度特征融合的方式,没有上采样过程,没有用到足够低层的特征(在SSD中,最低层的特征是VGG网络的conv4_3)

fpn

Feature Pyramid Networks for Object Detection pdf

Feature Pyramid Networks for Object Detection

Action-Driven Object Detection with Top-Down Visual Attentions

arxiv: https://arxiv.org/abs/1612.06704

Beyond Skip Connections: Top-Down Modulation for Object Detection

Wide-Residual-Inception Networks for Real-time Object Detection

Attentional Network for Visual Object Detection

Learning Chained Deep Features and Classifiers for Cascade in Object Detection

DeNet: Scalable Real-time Object Detection with Directed Sparse Sampling

Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries

Spatial Memory for Context Reasoning in Object Detection

Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection

https://arxiv.org/abs/1704.05775

LCDet: Low-Complexity Fully-Convolutional Neural Networks for Object Detection in Embedded Systems

Point Linking Network for Object Detection

Perceptual Generative Adversarial Networks for Small Object Detection

https://arxiv.org/abs/1706.05274

Few-shot Object Detection

https://arxiv.org/abs/1706.08249

Yes-Net: An effective Detector Based on Global Information

https://arxiv.org/abs/1706.09180

Towards lightweight convolutional neural networks for object detection

https://arxiv.org/abs/1707.01395

RON: Reverse Connection with Objectness Prior Networks for Object Detection

Deformable Part-based Fully Convolutional Network for Object Detection

Adaptive Feeding: Achieving Fast and Accurate Detections by Adaptively Combining Object Detectors

Recurrent Scale Approximation for Object Detection in CNN

DSOD: Learning Deeply Supervised Object Detectors from Scratch

Object Detection from Scratch with Deep Supervision

https://arxiv.org/abs/1809.09294

CoupleNet: Coupling Global Structure with Local Parts for Object Detection

Incremental Learning of Object Detectors without Catastrophic Forgetting

Zoom Out-and-In Network with Map Attention Decision for Region Proposal and Object Detection

https://arxiv.org/abs/1709.04347

StairNet: Top-Down Semantic Aggregation for Accurate One Shot Detection

https://arxiv.org/abs/1709.05788

Dynamic Zoom-in Network for Fast Object Detection in Large Images

https://arxiv.org/abs/1711.05187

Zero-Annotation Object Detection with Web Knowledge Transfer

MegDet: A Large Mini-Batch Object Detector

Receptive Field Block Net for Accurate and Fast Object Detection

An Analysis of Scale Invariance in Object Detection - SNIP

Feature Selective Networks for Object Detection

https://arxiv.org/abs/1711.08879

Learning a Rotation Invariant Detector with Rotatable Bounding Box

Scalable Object Detection for Stylized Objects

Learning Object Detectors from Scratch with Gated Recurrent Feature Pyramids

Deep Regionlets for Object Detection

Training and Testing Object Detectors with Virtual Images

Large-Scale Object Discovery and Detector Adaptation from Unlabeled Video

  • keywords: object mining, object tracking, unsupervised object discovery by appearance-based clustering, self-supervised detector adaptation
  • arxiv: https://arxiv.org/abs/1712.08832

Spot the Difference by Object Detection

Localization-Aware Active Learning for Object Detection

Object Detection with Mask-based Feature Encoding

https://arxiv.org/abs/1802.03934

LSTD: A Low-Shot Transfer Detector for Object Detection

Pseudo Mask Augmented Object Detection

https://arxiv.org/abs/1803.05858

Revisiting RCNN: On Awakening the Classification Power of Faster RCNN

Decoupled Classification Refinement: Hard False Positive Suppression for Object Detection

Learning Region Features for Object Detection

Object Detection for Comics using Manga109 Annotations

Task-Driven Super Resolution: Object Detection in Low-resolution Images

https://arxiv.org/abs/1803.11316

Transferring Common-Sense Knowledge for Object Detection

https://arxiv.org/abs/1804.01077

Multi-scale Location-aware Kernel Representation for Object Detection

Loss Rank Mining: A General Hard Example Mining Method for Real-time Detectors

DetNet: A Backbone network for Object Detection

AdvDetPatch: Attacking Object Detectors with Adversarial Patches

https://arxiv.org/abs/1806.02299

Attacking Object Detectors via Imperceptible Patches on Background

https://arxiv.org/abs/1809.05966

Physical Adversarial Examples for Object Detectors

Object detection at 200 Frames Per Second

Object Detection using Domain Randomization and Generative Adversarial Refinement of Synthetic Images

SNIPER: Efficient Multi-Scale Training

Soft Sampling for Robust Object Detection

https://arxiv.org/abs/1806.06986

MetaAnchor: Learning to Detect Objects with Customized Anchors

Localization Recall Precision (LRP): A New Performance Metric for Object Detection

Pooling Pyramid Network for Object Detection

Modeling Visual Context is Key to Augmenting Object Detection Datasets

Acquisition of Localization Confidence for Accurate Object Detection

CornerNet: Detecting Objects as Paired Keypoints

Unsupervised Hard Example Mining from Videos for Improved Object Detection

SAN: Learning Relationship between Convolutional Features for Multi-Scale Object Detection

https://arxiv.org/abs/1808.04974

A Survey of Modern Object Detection Literature using Deep Learning

https://arxiv.org/abs/1808.07256

Tiny-DSOD: Lightweight Object Detection for Resource-Restricted Usages

Deep Feature Pyramid Reconfiguration for Object Detection

MDCN: Multi-Scale, Deep Inception Convolutional Neural Networks for Efficient Object Detection

Recent Advances in Object Detection in the Age of Deep Convolutional Neural Networks

https://arxiv.org/abs/1809.03193

Deep Learning for Generic Object Detection: A Survey

https://arxiv.org/abs/1809.02165

Training Confidence-Calibrated Classifier for Detecting Out-of-Distribution Samples

Fast and accurate object detection in high resolution 4K and 8K video using GPUs

  • intro: Best Paper Finalist at IEEE High Performance Extreme Computing Conference (HPEC) 2018
  • intro: Carnegie Mellon University
  • arxiv: https://arxiv.org/abs/1810.10551

Hybrid Knowledge Routed Modules for Large-scale Object Detection

BAN: Focusing on Boundary Context for Object Detection

https://arxiv.org/abs/1811.05243

R2CNN++: Multi-Dimensional Attention Based Rotation Invariant Detector with Robust Anchor Strategy

DeRPN: Taking a further step toward more general object detection

Fast Efficient Object Detection Using Selective Attention

https://arxiv.org/abs/1811.07502

Sampling Techniques for Large-Scale Object Detection from Sparsely Annotated Objects

https://arxiv.org/abs/1811.10862

Efficient Coarse-to-Fine Non-Local Module for the Detection of Small Objects

https://arxiv.org/abs/1811.12152

Deep Regionlets: Blended Representation and Deep Learning for Generic Object Detection

https://arxiv.org/abs/1811.11318

Transferable Adversarial Attacks for Image and Video Object Detection

https://arxiv.org/abs/1811.12641

Anchor Box Optimization for Object Detection

AutoFocus: Efficient Multi-Scale Inference

Few-shot Object Detection via Feature Reweighting

https://arxiv.org/abs/1812.01866

Practical Adversarial Attack Against Object Detector

https://arxiv.org/abs/1812.10217

Scale-Aware Trident Networks for Object Detection

Region Proposal by Guided Anchoring

Bottom-up Object Detection by Grouping Extreme and Center Points

Bag of Freebies for Training Object Detection Neural Networks

Augmentation for small object detection

https://arxiv.org/abs/1902.07296

Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression

SimpleDet: A Simple and Versatile Distributed Framework for Object Detection and Instance Recognition

BayesOD: A Bayesian Approach for Uncertainty Estimation in Deep Object Detectors

DetNAS: Neural Architecture Search on Object Detection

ThunderNet: Towards Real-time Generic Object Detection

https://arxiv.org/abs/1903.11752

Feature Intertwiner for Object Detection

Improving Object Detection with Inverted Attention

https://arxiv.org/abs/1903.12255

What Object Should I Use? - Task Driven Object Detection

Towards Universal Object Detection by Domain Attention

Prime Sample Attention in Object Detection

https://arxiv.org/abs/1904.04821

BAOD: Budget-Aware Object Detection

https://arxiv.org/abs/1904.05443

An Analysis of Pre-Training on Object Detection

DuBox: No-Prior Box Objection Detection via Residual Dual Scale Detectors

NAS-FPN: Learning Scalable Feature Pyramid Architecture for Object Detection

Objects as Points

CenterNet: Object Detection with Keypoint Triplets

CenterNet: Keypoint Triplets for Object Detection

CornerNet-Lite: Efficient Keypoint Based Object Detection

Automated Focal Loss for Image based Object Detection

https://arxiv.org/abs/1904.09048

Exploring Object Relation in Mean Teacher for Cross-Domain Detection

An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection

RepPoints: Point Set Representation for Object Detection

Object Detection in 20 Years: A Survey

https://arxiv.org/abs/1905.05055

Light-Weight RetinaNet for Object Detection

https://arxiv.org/abs/1905.10011

Learning Data Augmentation Strategies for Object Detection

Towards Adversarially Robust Object Detection

Object as Distribution

Detecting 11K Classes: Large Scale Object Detection without Fine-Grained Bounding Boxes

Relation Distillation Networks for Video Object Detection

FreeAnchor: Learning to Match Anchors for Visual Object Detection

Efficient Neural Architecture Transformation Search in Channel-Level for Object Detection

https://arxiv.org/abs/1909.02293

Self-Training and Adversarial Background Regularization for Unsupervised Domain Adaptive One-Stage Object Detection

R-FCN

  • R-FCN
    R-FCN是对faster rcnn的改进。因为Faster RCNN的roi pooling中的全连接层计算量大,但是丢弃全连接层(起到了融合特征和特征映射的作用),直接将roi pooling的生成的feature map 连接到最后的分类和回归层检测结果又很差,《Deep residual learning for image recognition》认为:图像分类具有图像移动不敏感性;而目标检测领域是图像移动敏感的,因此在roi pooling中加入位置相关性设计。

    rfcn

    [8] Dai, Jifeng, et al. "R-FCN: Object Detection via Region-based Fully Convolutional Networks." arXiv preprint arXiv:1605.06409 (2016). pdf

介绍

R-FCN-3000 at 30fps: Decoupling Detection and Classification

https://arxiv.org/abs/1712.01802

Mask R-CNN

  • Mask R-CNN

ICCV 2017的最佳论文,在Mask R-CNN的工作中,它主要完成了三件事情:目标检测,目标分类,像素级分割。它在Faster R-CNN的结构基础上加上了Mask预测分支,并且改良了ROI Pooling,提出了ROI Align。这是第一次将目标检测和目标分割任务统一起来。

maskrcnn

[9] He, Gkioxari, et al. "Mask R-CNN" arXiv preprint arXiv:1703.06870 (2017). [pdf]

介绍
zhihu

Video Object Detection

Learning Object Class Detectors from Weakly Annotated Video

Analysing domain shift factors between videos and images for object detection

arxiv: https://arxiv.org/abs/1501.01186

Video Object Recognition

slides: http://vision.princeton.edu/courses/COS598/2015sp/slides/VideoRecog/Video%20Object%20Recognition.pptx

Deep Learning for Saliency Prediction in Natural Video

  • intro: Submitted on 12 Jan 2016
  • keywords: Deep learning, saliency map, optical flow, convolution network, contrast features
  • paper: https://hal.archives-ouvertes.fr/hal-01251614/document

T-CNN: Tubelets with Convolutional Neural Networks for Object Detection from Videos

Object Detection from Video Tubelets with Convolutional Neural Networks

Object Detection in Videos with Tubelets and Multi-context Cues

Context Matters: Refining Object Detection in Video with Recurrent Neural Networks

CNN Based Object Detection in Large Video Images

Object Detection in Videos with Tubelet Proposal Networks

arxiv: https://arxiv.org/abs/1702.06355

Flow-Guided Feature Aggregation for Video Object Detection

Video Object Detection using Faster R-CNN

Improving Context Modeling for Video Object Detection and Tracking

http://image-net.org/challenges/talks_2017/ilsvrc2017_short(poster).pdf

Temporal Dynamic Graph LSTM for Action-driven Video Object Detection

Mobile Video Object Detection with Temporally-Aware Feature Maps

https://arxiv.org/abs/1711.06368

Towards High Performance Video Object Detection

https://arxiv.org/abs/1711.11577

Impression Network for Video Object Detection

https://arxiv.org/abs/1712.05896

Spatial-Temporal Memory Networks for Video Object Detection

https://arxiv.org/abs/1712.06317

3D-DETNet: a Single Stage Video-Based Vehicle Detector

https://arxiv.org/abs/1801.01769

Object Detection in Videos by Short and Long Range Object Linking

https://arxiv.org/abs/1801.09823

Object Detection in Video with Spatiotemporal Sampling Networks

Towards High Performance Video Object Detection for Mobiles

Optimizing Video Object Detection via a Scale-Time Lattice

Pack and Detect: Fast Object Detection in Videos Using Region-of-Interest Packing

https://arxiv.org/abs/1809.01701

Fast Object Detection in Compressed Video

https://arxiv.org/abs/1811.11057

Tube-CNN: Modeling temporal evolution of appearance for object detection in video

AdaScale: Towards Real-time Video Object Detection Using Adaptive Scaling

SCNN: A General Distribution based Statistical Convolutional Neural Network with Application to Video Object Detection

Looking Fast and Slow: Memory-Guided Mobile Video Object Detection

Progressive Sparse Local Attention for Video object detection

Sequence Level Semantics Aggregation for Video Object Detection

https://arxiv.org/abs/1907.06390

Object Detection in Video with Spatial-temporal Context Aggregation

A Delay Metric for Video Object Detection: What Average Precision Fails to Tell

Minimum Delay Object Detection From Video

Object Detection on Mobile Devices

Pelee: A Real-Time Object Detection System on Mobile Devices

Object Detection in 3D

Vote3Deep: Fast Object Detection in 3D Point Clouds Using Efficient Convolutional Neural Networks

arxiv: https://arxiv.org/abs/1609.06666

VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Complex-YOLO: Real-time 3D Object Detection on Point Clouds

Focal Loss in 3D Object Detection

3D Object Detection Using Scale Invariant and Feature Reweighting Networks

** 3D Backbone Network for 3D Object Detection**

https://arxiv.org/abs/1901.08373

Complexer-YOLO: Real-Time 3D Object Detection and Tracking on Semantic Point Clouds

https://arxiv.org/abs/1904.07537

Monocular 3D Object Detection and Box Fitting Trained End-to-End Using Intersection-over-Union Loss

https://arxiv.org/abs/1906.08070

IoU Loss for 2D/3D Object Detection

Fast Point R-CNN

Object Detection on RGB-D

Learning Rich Features from RGB-D Images for Object Detection and Segmentation

arxiv: http://arxiv.org/abs/1407.5736

Differential Geometry Boosts Convolutional Neural Networks for Object Detection

A Self-supervised Learning System for Object Detection using Physics Simulation and Multi-view Pose Estimation

https://arxiv.org/abs/1703.03347

Cross-Modal Attentional Context Learning for RGB-D Object Detection

Zero-Shot Object Detection

Zero-Shot Detection

Zero-Shot Object Detection

https://arxiv.org/abs/1804.04340

Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts

Zero-Shot Object Detection by Hybrid Region Embedding

Visual Relationship Detection

Visual Relationship Detection with Language Priors

ViP-CNN: A Visual Phrase Reasoning Convolutional Neural Network for Visual Relationship Detection

Visual Translation Embedding Network for Visual Relation Detection

arxiv: https://www.arxiv.org/abs/1702.08319

Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection

Detecting Visual Relationships with Deep Relational Networks

Identifying Spatial Relations in Images using Convolutional Neural Networks

https://arxiv.org/abs/1706.04215

PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN

Natural Language Guided Visual Relationship Detection

https://arxiv.org/abs/1711.06032

Detecting Visual Relationships Using Box Attention

Google AI Open Images - Visual Relationship Track

Context-Dependent Diffusion Network for Visual Relationship Detection

A Problem Reduction Approach for Visual Relationships Detection

Exploring the Semantics for Visual Relationship Detection

https://arxiv.org/abs/1904.02104