深度学习从入门到放弃之CV-Pose estimation目录

目录

一、常用数据集

二、主流方法

三、Single Person Pose estimation

四、Multi-Person Pose estimation

---------------------------------------------------------------------------

一、常用数据集

Pose Estimation/keypoint常用数据集

1. Posetrack:posetrack.net/

  • > 500 video sequences
  • > 20K frames
  • > 150K body pose annotations
  • 3 challenges


2. LSP:sam.johnson.io/research

  • 样本数:2K
  • 关节点个数:14
  • 全身,单人

3. FLIC:bensapp.github.io/flic-

  • 样本数:2W
  • 关节点个数:9
  • 全身,单人

4. MPII:human-pose.mpi-inf.mpg.de

  • 样本数:25K
  • 关节点个数:16
  • 全身,单人/多人,40K people,410 human activities

5. MSCOCOcocodataset.org/#

  • 样本数:>= 30W
  • 关节点个数:18
  • 全身,多人,keypoints on 10W people

6. AI Challengechallenger.ai/competiti

  • 样本数:21W Training, 3W Validation, 3W Testing
  • 关节点个数:14
  • 全身,多人,38W people

二、主流方法

2D Pose estimation主要面临的困难:遮挡、复杂背景、光照、真实世界的复杂姿态、人的尺度不一、拍摄角度不固定等。

单人姿态估计

传统方法:基于Pictorial Structures, DPM

▪ 基于深度学习的算法包括直接回归坐标(Deep Pose)和通过热力图回归坐标(CPM, Hourlgass)

目前单人姿态估计,主流算法是基于Hourlgass各种更改结构的算法。

多人姿态估计

二维图像姿态估计基于CNN的多人姿态估计方法,通常有2个思路(Bottom-Up Approaches和Top-Down Approaches):

(1)Top-Down Approaches,即two-step framework,就是先进行行人检测,得到边界框,然后在每一个边界框中检测人体关键点,连接成一个人形,缺点就是受检测框的影响太大,漏检,误检,IOU大小等都会对结果有影响,算法包括RMPE、Mask-RCNN 等。

(2)Bottom-Up Approaches,即part-based framework,就是先对整个图片进行每个人体关键点部件的检测,再将检测到的部件拼接成一个人形,缺点就是会将不同人的不同部位按一个人进行拼接,代表方法就是openpose、DeepCut 、PAFs。

tricks

  • 采用多尺度,多分辨率的网络结构
  • 采用基于Residual Block来构建网络
  • 扩大感受野(large kernel, dilation convolution, Spatial Transformer Network、hourglass module)
  • 预处理很重要(将人放在输入图片的中心,人的尺度尽量归一化到统一尺度,对图片进行翻转、旋转)
  • 后处理同样重要

三、Single Person Pose estimation


2014----Articulated Pose Estimation by a Graphical Model with Image Dependent Pairwise Relations

2014----DeepPose_Human Pose Estimation via Deep Neural Networks

2014----Joint Training of a Convolutional Network and a Graphical Model for Human Pose Estimation

2014----Learning Human Pose Estimation Features with Convolutional Networks

2014----MoDeep_ A Deep Learning Framework Using Motion Features for Human Pose Estimation

2015----Efficient Object Localization Using Convolutional Networks

2015----Human Pose Estimation with Iterative Error

2015----Pose-based CNN Features for Action Recognition

2016----Advancing Hand Gesture Recognition with High Resolution Electrical Impedance Tomography

2016----Chained Predictions Using Convolutional Neural Networks

2016----CPM----Convolutional Pose Machines

2016----CVPR-2016----End-to-End Learning of Deformable Mixture of Parts and Deep Convolutional Neural Networks for Human Pose Estimation

2016----Deep Learning of Local RGB-D Patches for 3D Object Detection and 6D Pose Estimation

2016----PAFs----Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

2016----Stacked hourglass----Stacked Hourglass Networks for Human Pose Estimation

2016----Structured Feature Learning for Pose Estimation

2017----Adversarial PoseNet_ A Structure-aware Convolutional Network for Human pose estimation

2017----CVPR2017 oral----Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

2017----Learning Feature Pyramids for Human Pose Estimation

2017----Multi-Context_Attention_for_Human_Pose_Estimation

2017----Self Adversarial Training for Human Pose Estimation


四、Multi-Person Pose estimation

2016----Associative Embedding_End-to-End Learning for Joint Detection and Grouping

2016----DeepCut----Joint Subset Partition and Labeling for Multi Person Pose Estimation

2016----DeepCut----Joint Subset Partition and Labeling for Multi Person Pose Estimation_poster

2016----DeeperCut----DeeperCut A Deeper, Stronger, and Faster Multi-Person Pose Estimation Model

2017----G-RMI----Towards Accurate Multi-person Pose Estimation in the Wild

2017----RMPE_ Regional Multi-Person Pose Estimation

   这篇是上海交大卢策吾教授项目组的论文,基于Top-Down Approaches。
    论文的Motivation就是解决定位误差和定位框冗余检测这两个问题。引入Google提出的Spatial 
Transformer Networks,可以使得传统的卷积带有了裁剪、平移、缩放、旋转等特性。
    论文中一个实验:Upper Bound of Our Framework,就是论文直接使用ground truth的人体边
界框,在验证数据集取得84.2 mAP成绩,说明算法不仅需要提供人体边界框,第二阶段的单人姿态估计
性能也需要提高。
    脑洞:可以参考MSRA的deformable convolutional network,应该有新的paper。


2017----PyraNet----Learning Feature Pyramids for Human Pose Estimation

2017----COCO2017 Keypoints winner----Cascaded Pyramid Network for Multi-Person Pose Estimation


返回CV总目录

编辑于 2018-01-27

文章被以下专栏收录