当前位置: 首页 > news >正文

YOLO - pose detect 输入输出接口与执行效率测试

0.参考资料:

Pose - Ultralytics YOLO Docs

下面仅对这个模型的输入输出接口和效率做了判断,尚不涉及训练。

pose和segment 相对class detect是相对自然的扩展。object box内部的 subclass就是seg,object box 内部的point array 就是Pose。

面部识别可以做到极快,所以segment, pose也可以做到极快。

1.实测速度比对【yolo-class vs yolo-pose】 

结论:pose的执行速度与普通的Detect模型相当。

一组执行时间:

mage 1/1 /dataset/pose/body.jpg: 448x640 2 persons, 44.7ms
Speed: 2.7ms preprocess, 44.7ms inference, 59.1ms postprocess per image at shape (1, 3, 448, 640)

比对仅仅进行物体识别,执行时间:

 

image 1/1 /dataset/pose/body.jpg: 448x640 2 persons, 48.0ms
Speed: 5.8ms preprocess, 48.0ms inference, 132.0ms postprocess per image at shape (1, 3, 448, 640) 

2.理论速度比对

 不同的模型推理速度相差不多。

Modelsize
(pixels)
mAPval
50-95
Speed
CPU ONNX
(ms)
Speed
T4 TensorRT10
(ms)
params
(M)
FLOPs
(B)
YOLO11n64039.556.1 ± 0.81.5 ± 0.02.66.5
YOLO11s64047.090.0 ± 1.22.5 ± 0.09.421.5
YOLO11m64051.5183.2 ± 2.04.7 ± 0.120.168.0
YOLO11l64053.4238.6 ± 1.46.2 ± 0.125.386.9
YOLO11x64054.7462.8 ± 6.711.3 ± 0.256.9194.9
Modelsize
(pixels)
mAPpose
50-95
mAPpose
50
Speed
CPU ONNX
(ms)
Speed
T4 TensorRT10
(ms)
params
(M)
FLOPs
(B)
YOLO11n-pose64050.081.052.4 ± 0.51.7 ± 0.02.97.6
YOLO11s-pose64058.986.390.5 ± 0.62.6 ± 0.09.923.2
YOLO11m-pose64064.989.4187.3 ± 0.84.9 ± 0.120.971.7
YOLO11l-pose64066.189.9247.7 ± 1.16.4 ± 0.126.290.7
YOLO11x-pose64069.591.1488.0 ± 13.912.1 ± 0.258.8203.3

3.输出数据 

3.1 yolo 文档给出的17个项点:

  1. Nose
  2. Left Eye
  3. Right Eye
  4. Left Ear
  5. Right Ear
  6. Left Shoulder
  7. Right Shoulder
  8. Left Elbow
  9. Right Elbow
  10. Left Wrist
  11. Right Wrist
  12. Left Hip
  13. Right Hip
  14. Left Knee
  15. Right Knee
  16. Left Ankle
  17. Right Ankle

3.2 实际检测数据

数据长度56 = 类型1(0:person) + box(left, top, width, height) + 17*(x, y, prob_value)

0 0.60229 0.518654 0.305345 0.725004 0.5296 0.247867 0.989055 0.538345 0.229076 0.987131 0.523728 0.236682 0.803624 0.565252 0.220562 0.97476 0 0 0.178567 0.605952 0.272757 0.99945 0.524902 0.320555 0.996575 0.67903 0.325687 0.998379 0.477179 0.385596 0.969531 0.720728 0.381518 0.997152 0.460585 0.311368 0.971978 0.632899 0.481503 0.999804 0.596195 0.502827 0.999446 0.554051 0.568819 0.999557 0.600891 0.602427 0.998719 0.601888 0.723575 0.994931 0.714637 0.736034 0.991084

3.2.1 格式化后的数据

类别 ID: 0.0
边界框信息 (x_center, y_center, width, height): [0.60229, 0.518654, 0.305345, 0.725004]
关键点信息:
Nose: 横坐标=0.5296, 纵坐标=0.247867, 可见性=0.989055
Left Eye: 横坐标=0.538345, 纵坐标=0.229076, 可见性=0.987131
Right Eye: 横坐标=0.523728, 纵坐标=0.236682, 可见性=0.803624
Left Ear: 横坐标=0.565252, 纵坐标=0.220562, 可见性=0.97476
Right Ear: 横坐标=0.0, 纵坐标=0.0, 可见性=0.178567
Left Shoulder: 横坐标=0.605952, 纵坐标=0.272757, 可见性=0.99945
Right Shoulder: 横坐标=0.524902, 纵坐标=0.320555, 可见性=0.996575
Left Elbow: 横坐标=0.67903, 纵坐标=0.325687, 可见性=0.998379
Right Elbow: 横坐标=0.477179, 纵坐标=0.385596, 可见性=0.969531
Left Wrist: 横坐标=0.720728, 纵坐标=0.381518, 可见性=0.997152
Right Wrist: 横坐标=0.460585, 纵坐标=0.311368, 可见性=0.971978
Left Hip: 横坐标=0.632899, 纵坐标=0.481503, 可见性=0.999804
Right Hip: 横坐标=0.596195, 纵坐标=0.502827, 可见性=0.999446
Left Knee: 横坐标=0.554051, 纵坐标=0.568819, 可见性=0.999557
Right Knee: 横坐标=0.600891, 纵坐标=0.602427, 可见性=0.998719
Left Ankle: 横坐标=0.601888, 纵坐标=0.723575, 可见性=0.994931
Right Ankle: 横坐标=0.714637, 纵坐标=0.736034, 可见性=0.991084

附录A pose数据格式化输出程序

# 人体部位名称列表
keypoint_names = ["Nose","Left Eye","Right Eye","Left Ear","Right Ear","Left Shoulder","Right Shoulder","Left Elbow","Right Elbow","Left Wrist","Right Wrist","Left Hip","Right Hip","Left Knee","Right Knee","Left Ankle","Right Ankle"
]# 假设的 YOLO Pose 训练集 label 数据行
label_data_str = " 0 0.844458 0.649356 0.310934 0.681836 0 0 0.18451 0 0 0.0718303 0 0 0.140077 0 0 0.0899211 0 0 0.225879 0 0 0.0909999 0.91247 0.856484 0.628418 0 0 0.022336 0 0 0.262855 0 0 0.10164 0 0 0.426727 0 0 0.009039 0 0 0.0300809 0 0 0.0136058 0 0 0.035202 0 0 0.0130479 0 0 0.0247065"# 将字符串数据转换为浮点数列表
label_data = [float(num) for num in label_data_str.split()]# 提取类别 ID 和边界框信息
class_id = label_data[0]
bbox = label_data[1:5]# 提取关键点信息
keypoints_data = label_data[5:]# 确保关键点数量和名称列表长度一致
if len(keypoints_data) == len(keypoint_names) * 3:print(f"类别 ID: {class_id}")print(f"边界框信息 (x_center, y_center, width, height): {bbox}")print("关键点信息:")for i in range(len(keypoint_names)):start_index = i * 3x, y, visibility = keypoints_data[start_index:start_index + 3]print(f"{keypoint_names[i]}: 横坐标={x}, 纵坐标={y}, 可见性={visibility}")
else:print("关键点数据长度和名称列表长度不匹配,无法正确对应。")

 


http://www.mrgr.cn/news/92685.html

相关文章:

  • ⭐算法OJ⭐位操作实战【计数】(C++ 实现)
  • 标签使用笔记
  • 图像仿射变换
  • Git快速入门
  • WorldQuant Brain的专属语言——Fast Expression
  • Java中的ArrayDeque
  • vscode集成DeepSeek
  • C++ ++++++++++
  • 一个原教旨的多路径 TCP
  • [C++] enum 以及 enum class 简单用法
  • Java进阶——数据类型深入解析
  • Java进阶——Stream流以及常用方法详解
  • 【漫话机器学习系列】110.线性可分(Linearly Separable)
  • Java进阶——注解一文全懂
  • 查看ITHOR全部仿真家庭场景
  • 阿里云物联网获取设备属性api接口:QueryDevicePropertyData
  • ubuntu离线安装Ollama并部署Llama3.1 70B INT4并对外发布服务
  • FinRobot:一个使用大型语言模型进行金融分析的开源AI代理平台
  • AcWing 5933:爬楼梯 ← 递归 / 递推 / 高精度
  • 本地部署Deepseek+Cherry Studio