当前位置：首页 > news >正文

基于YOLO11实例分割与奥比中光相机的快递包裹抓取点检测

news 2025/6/30 14:52:30

本博客来源于CSDN机器鱼，未同意任何人转载。

更多内容，欢迎点击本专栏，查看更多内容。

0 引言

1 奥比中光相机测试

1.1 SDK相机初体验

1.2 将SDK的Example改成所需要程序

2 YOLO11S-SEG的onnxruntime推理

3 结语

0 引言

项目采用六轴机械臂搭配末端真空吸盘，从无序包裹中抓取想要的包裹。AI算法需要提供各包裹的抓取点的3D坐标与3D姿态。由于快递包裹含有多个面，且大多为倾斜状态，为了顺利抓取，我们的算法需要如下几个步骤：

①从彩色图的包裹堆里面识别、分割出每个单独的包裹，得到2D掩码Mask1；

②从对齐的深度图中提取每个包裹Mask1对应的深度掩码Mask2；

③根据相机参数，计算Mask2对应的三维点云DisMask；

④将DisMask中的异常点云值剔除后做平面分割，只要含点云数量最多的那个平面；

⑤求平面的质心作为抓取点，求平面的法向量与x、y、z轴的角度为末端抓取姿态。

最终得到6个值返回给下位机做机械臂的控制。

本次是在windows上进行开发，我用到的软件与SDK有：cmak-3.28.1，vs2019，opencv-4.6.1，open3d-0.18.0，orbbecsdk1.10.18、cuda-11.6、cudnn-8.5.0、onnxruntime-gpu-1.14.0（原本依旧用tensorrt的毕竟快，但是我之前已经出过yolov8-seg的tensorrt推理博客了，而yolov11-seg的可以直接用，再写一遍没啥意思，所以改成onnxruntime-gpu进行推理，实测比tensorrt差不到很多，还可以省掉onnx转tensorrt模型的步骤，巴适的板。我的PC显卡是3060，yolo11s-seg的推理速度是11ms，而onnxruntime的速度是24ms左右）。

模型用的ultralytics最新版训练的yolo11s-seg，训练好之后转好的onnx模型与训练数据可以在【这里】下载，训练方法网上到处都有，不是此博客的重点。

1 奥比中光相机测试

1.1 SDK相机初体验

以前做这种对精度要求不那么高项目用的都是intel的realsense系列，但价格越来越贵了，加上奥博中光天天网上吹牛批，所以某鱼买了一款二手的GeminiPro相机，是奥比中光与轮趣科技一同研发的。

首先配置奥比中光相机的驱动与SDK，必须要装驱动才能使用，下载地址：地址1。因为我采用的win11，所以下载的win_x64这个版本。

解压后的目录为D:/Project/OrbbecSDK/（这是我放置的目录），也可以换其他路径存放。有个driver的文件夹，里面的exe无脑安装即可。

安装完成后，打开CMAKE_GUI与VS2019，CMAKE我采用的版本是3.28.1。首先打开CMakeLists.txt将OpenCV_DIR的路径改成自己的实际路径，关于opencv的安装编译网上有很多，这里就不赘述了。

https://github.com/orbbec/OrbbecSDK/releases然后打开cmakegui按照下列步骤配置并生成vs的工程文件，生成好之后打开，最后ALL_BUILD。

ALL_BUILD右键生成之后会在bin目录下生成很多exe文件，插上相机、运行color_viewer有画面就代表已经可以正确使用SDK了，下面我们对其进行改写，得到我们想要的程序。

1.2 将SDK的Example改成所需要程序

在我的案例中，对于SDK需要有以下几个功能：

①设定参数与初始化相机；②获取彩色图与深度图；③对彩色图与深度图进行对齐；④传入检测得到的box与mask计算3d点云；⑤点云处理得到三维抓取点与三维姿态。

首先建立一个camera.h，定义需要的函数，代码如下：

#ifndef CAMERAUTIL_H
#define CAMERAUTIL_H#include "libobsensor/ObSensor.hpp"
#include "libobsensor/hpp/Pipeline.hpp"
#include <Eigen/Dense>
#include <open3d/Open3D.h>
#include "opencv2/opencv.hpp"
#include <fstream>
#include <iostream>
#include <cmath>
#include <vector>
class Orbbec {
public:bool Init(void);//初始化相机bool GetRgbAndDepth(cv::Mat& src, cv::Mat& dep);//获取当前帧，并进行深度图与彩色图的对齐void GetDistance(int x, int y);//打印提取某个点的x、y、z坐标void GetDistance(int box[4], cv::Mat& mask, cv::Mat& dis_mask);//提取某个box中mask对应的所有点的三维点云，并组成dis_maskvoid ProcessPointcloud(cv::Mat& dis_mask, cv::Mat imgcrop, std::vector<float>& result, float *ylength, float *xlength);//对点云进行后处理得到抓取坐标与抓取姿态void Postprocess(int box[4], cv::Mat& mask, cv::Mat srcimg, float result[10]);//对提取的dis_mask进行后处理,GetDistance与ProcessPointcloud//都是后处理需要用到的函数void close(void);
private:ob::Pipeline pipe;OBCameraParam cameraParam;cv::Mat depth;float scale;ob::Align* align=nullptr;
};
#endif

然后新建一个camera.cpp，orbbec的函数进行实现，代码如下：

#include "camera.h"bool Orbbec::Init(void) {// Query the list of connected devices// Create a Context.ob::Context ctx;auto devList = ctx.queryDeviceList();// Get the number of connected devicesif (devList->deviceCount() == 0) {std::cerr << "Device not found!" << std::endl;return false;}// Configure which streams to enable or disable for the Pipeline by creating a Configstd::shared_ptr<ob::Config> config = std::make_shared<ob::Config>();// Get all stream profiles of the color camera, including stream resolution, frame rate, and frame formatauto colorProfiles = pipe.getStreamProfileList(OB_SENSOR_COLOR);// Get the default color profile//auto colorProfile = colorProfiles->getProfile(OB_PROFILE_DEFAULT);std::shared_ptr<ob::VideoStreamProfile> colorProfile = nullptr;colorProfile = colorProfiles->getVideoStreamProfile(640, 480, OB_FORMAT_RGB, 30);config->enableStream(colorProfile);// Get all stream profiles of the depth camera, including stream resolution, frame rate, and frame formatauto depthProfiles = pipe.getStreamProfileList(OB_SENSOR_DEPTH);// Get the default depth profile //auto depthProfile = depthProfiles->getProfile(OB_PROFILE_DEFAULT);std::shared_ptr<ob::VideoStreamProfile> depthProfile = nullptr;depthProfile = depthProfiles->getVideoStreamProfile(640, OB_HEIGHT_ANY, OB_FORMAT_Y11, 30);config->enableStream(depthProfile);// Configure the alignment mode as hardware D2C alignmentconfig->setAlignMode(ALIGN_D2C_HW_MODE);align = new ob::Align(OB_STREAM_COLOR);try {pipe.start(config);cameraParam = pipe.getCameraParam();}catch (ob::Error& e) {std::cerr << "function:" << e.getName() << "\nargs:" << e.getArgs() << "\nmessage:" << e.getMessage() << "\ntype:" << e.getExceptionType() << std::endl;return false;}return true;
};bool Orbbec::GetRgbAndDepth(cv::Mat& src, cv::Mat& dep) {auto frameset = pipe.waitForFrames(100);if (frameset == nullptr || frameset->depthFrame() == nullptr || frameset->colorFrame() == nullptr) {return false;}auto newFrame = align->process(frameset);auto newFrameSet = newFrame->as<ob::FrameSet>();auto colorFrame = newFrameSet->colorFrame();auto depthFrame = newFrameSet->depthFrame();uint32_t  width = colorFrame->width();uint32_t  height = colorFrame->height();cv::Mat rawMat(height, width, CV_8UC3, (uint16_t*)colorFrame->data(), cv::Mat::AUTO_STEP);cv::cvtColor(rawMat, src, cv::COLOR_RGB2BGR);width = depthFrame->width();height = depthFrame->height();scale = depthFrame->getValueScale();uint16_t* data = (uint16_t*)depthFrame->data();cv::Mat depimg(height, width, CV_16U, data, cv::Mat::AUTO_STEP);depimg.copyTo(depth);// 将深度图转换为8位图像，进行可视化cv::Mat depth_image_8bit;double minVal, maxVal;cv::Point minLoc, maxLoc;cv::minMaxLoc(depimg, &minVal, &maxVal, &minLoc, &maxLoc);cv::convertScaleAbs(depimg, depth_image_8bit, 255.0 / maxVal);cv::applyColorMap(depth_image_8bit, dep, cv::COLORMAP_JET);//float centerDistance = data[width * height / 2 + width / 2] * scale;return true;
};
void Orbbec::GetDistance(int x,int y) {float Z = (float)depth.at<uint16_t>(y,x) * scale;float X= (x - cameraParam.depthIntrinsic.cx) * Z / cameraParam.depthIntrinsic.fx;float Y= (y - cameraParam.depthIntrinsic.cy) * Z / cameraParam.depthIntrinsic.fy;printf("X,Y,Z:%f,%f.%f\n", X, Y, Z);
}void Orbbec::GetDistance(int box[4], cv::Mat& mask, cv::Mat& dis_mask) {for (int i = box[0]; i < box[2]; i++) {for (int j = box[1]; j < box[3]; j++) {if (mask.at<uchar>(j - box[1], i - box[0]) != 0){float upoint[3];upoint[2] = (float)depth.at<uint16_t>(j, i)* scale/1000.f;//转换成米upoint[0] = (i - cameraParam.depthIntrinsic.cx) * upoint[2] / cameraParam.depthIntrinsic.fx;upoint[1] = (j - cameraParam.depthIntrinsic.cy) * upoint[2] / cameraParam.depthIntrinsic.fy;dis_mask.at<cv::Vec3f>(j - box[1], i - box[0]) = cv::Vec3f(upoint[0], upoint[1], upoint[2]);if (upoint[2] == 0.0f){mask.at<uchar>(j - box[1], i - box[0]) = 0; // 深度值为0的地方 mask也改成0}}}}
}void Orbbec::Postprocess(int box[4], cv::Mat& mask, cv::Mat srcimg,float result[10]) {cv::Rect r = cv::Rect(box[0], box[1], box[2] - box[0], box[3] - box[1]);cv::Mat imgcrop = srcimg(r).clone();mask = mask(r).clone();cv::Mat dis_mask = cv::Mat::zeros(r.height,r.width, CV_32FC3);GetDistance(box, mask, dis_mask);//cv::Mat dis_show;//cv::convertScaleAbs(dis_mask, dis_show, 255.0 / 1000);//cv::applyColorMap(dis_show, dis_show, cv::COLORMAP_JET);//cv::imshow("dismask", dis_show);// 利用open3d进行点云的后处理// segment plane分割平面，因为快递分割结果里面会含有侧面，因此先分割，取最大的那个面，然后提取这个面的中点作抓取点// 以及该面的法向量float ylength = 0, xlength = 0;std::vector<float> Normals_and_Axis;ProcessPointcloud(dis_mask, imgcrop, Normals_and_Axis, &ylength, &xlength);// 通过抓取点反算彩色图上的抓取点，用来观察是否抓对点float CX = Normals_and_Axis[3] * cameraParam.depthIntrinsic.fx / Normals_and_Axis[5] + cameraParam.depthIntrinsic.cx;float CY = Normals_and_Axis[4] * cameraParam.depthIntrinsic.fy / Normals_and_Axis[5] + cameraParam.depthIntrinsic.cy;// 通过点云算出来的包裹长宽来反算彩色图中的2D最小包围框,用于观察是否抓准平面float Width = 0, Height = 0;if (Normals_and_Axis[5] > 0) {//只计算有合理深度值的//根据xlength与ylength，反算框的wh，与XY形成抓取面的2d最小包围框Width = xlength * cameraParam.depthIntrinsic.fx / Normals_and_Axis[5];Height = ylength * cameraParam.depthIntrinsic.fy / Normals_and_Axis[5];}for (int i = 0; i < 6; i++) {result[i] = Normals_and_Axis[i];}result[6] = CX;result[7] = CY;result[8] = Width;result[9] = Height;}
void Orbbec::ProcessPointcloud(cv::Mat& dis_mask, cv::Mat image, std::vector<float>& result, float* ylength, float* xlength)
{// //dis_mask to pointcloudauto outputPC = std::make_shared<open3d::geometry::PointCloud>();int num_points = dis_mask.cols * dis_mask.rows;for (int i = 0; i < dis_mask.rows; i++){for (int j = 0; j < dis_mask.cols; j++){Eigen::Vector3d pc = { dis_mask.at<cv::Vec3f>(i, j)[0], dis_mask.at<cv::Vec3f>(i, j)[1], dis_mask.at<cv::Vec3f>(i, j)[2] };Eigen::Vector3d clr = { (float)image.at<cv::Vec3b>(i,j)[2] / 255.0,(float)image.at<cv::Vec3b>(i,j)[1] / 255.0,(float)image.at<cv::Vec3b>(i,j)[0] / 255.0 };outputPC->points_.push_back(pc);outputPC->colors_.push_back(clr);}}//open3d::visualization::DrawGeometries({ outputPC}, "outputPC", 800, 450);//异常值剔除，剔除z距离为0的无效点云std::vector<size_t> indices;for (size_t i = 0; i < outputPC->points_.size(); i++) {if (outputPC->points_[i][2] == 0.0) {indices.push_back(i);}}outputPC = outputPC->SelectByIndex(indices, true);if (outputPC->points_.size() <= 3) {result = { 0,0,0,0,0,0,0 };printf("points size too small\n");return;}// 平面分割 取出来最大的平面auto start = std::chrono::system_clock::now();double tDis = 0.01; // 邻域距离 int minNum = 3;    // 最小点云数int numIter = 200; // 迭代数  std::tuple<Eigen::Vector4d, std::vector<size_t>> vRes = outputPC->SegmentPlane(tDis, minNum, numIter);// 平面参数// AX+BY+CZ+D=0  ==> Z=-(AX+BY+D)/CEigen::Vector4d para = std::get<0>(vRes);//printf("Segment plane:%.3fX+%.3fY+%.3fZ+%.3f=0\n",para[0],para[1],para[2],para[3]);// 内点索引std::vector<size_t> selectedIndex = std::get<1>(vRes);std::shared_ptr<open3d::geometry::PointCloud> inPC = outputPC->SelectByIndex(selectedIndex, false);//open3d::visualization::DrawGeometries({ inPC}, "inPC", 800, 450);// 点云质心作为距离坐标Eigen::Vector3d Axis;Axis = inPC->GetCenter();// 计算法向量z夹角Eigen::Vector3d A(para[0], para[1], para[2]), X(-1, 0, 0), Y(0, -1, 0), Z(0, 0, 1);double angleX0, angleX, angleY, angleZ;double PI = 3.1415926;angleZ = std::acos(A.dot(Z) / (A.norm() * Z.norm())) * 180 / PI;angleX0 = std::acos(A.dot(X) / (A.norm() * X.norm())) * 180 / PI;angleY = std::acos(A.dot(Y) / (A.norm() * Y.norm())) * 180 / PI;angleX = std::atan2(A[1], A[0]) * 180 / PI;if (angleZ > 90) {angleZ = 180 - angleZ;X = { 1, 0, 0 };Y = { 0, 1, 0 };angleX0 = std::acos(A.dot(X) / (A.norm() * X.norm())) * 180 / PI;angleY = std::acos(A.dot(Y) / (A.norm() * Y.norm())) * 180 / PI;angleX = std::atan2(-A[1], A[0]) * 180 / PI;}//printf("angle:%f,%f,%f,xplane:%f\n", angleX0, angleY, angleZ, angleX);//angleX的角度是与法向量投影到xoy平面的并于x负方向的夹角,顺时针是0-180，逆时针是0-负180//result.push_back(angleX0);result.push_back(angleY);result.push_back(angleZ);result.push_back(Axis[0]);result.push_back(Axis[1]);result.push_back(Axis[2]);auto end = std::chrono::system_clock::now();//std::cout << "open3d process time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;// // 绘制分割出来的平面模型Eigen::Vector3d min_pt, max_pt;min_pt = inPC->GetMinBound();max_pt = inPC->GetMaxBound();double Xmin = min_pt[0], Ymin = min_pt[1], Zmin = min_pt[2];double Xmax = max_pt[0], Ymax = max_pt[1], Zmax = max_pt[2];*ylength = (float)Ymax - (float)Ymin;*xlength = (float)Xmax - (float)Xmin;//展示结果if (0) {// 外点   绿色std::shared_ptr<open3d::geometry::PointCloud> outPC = outputPC->SelectByIndex(selectedIndex, true);const Eigen::Vector3d colorOut = { 0,1,0 };outPC->PaintUniformColor(colorOut);// open3d::visualization::DrawGeometries({ outPC}, "outPC", 800, 450);auto Plane = std::make_shared<open3d::geometry::PointCloud>();int width = 20, height = 20;for (int i = 0; i < width; i++) {double x = Xmin + (Xmax - Xmin) / width * i;for (int j = 0; j < height; j++) {double y = Ymin + (Ymax - Ymin) / height * j;double z = -(para[0] * x + para[1] * y + para[3]) / para[2];Eigen::Vector3d pc = { x, y, z };Plane->points_.push_back(pc);}}const Eigen::Vector3d color = { 1,0,1 };Plane->PaintUniformColor(color);// 夹爪方向auto lineCloud = std::make_shared<open3d::geometry::PointCloud>();lineCloud->points_.resize(20);for (size_t i = 0; i < lineCloud->points_.size(); ++i){lineCloud->points_[i] = Axis - i * 0.01 * A;}lineCloud->PaintUniformColor(color);// 抓取点auto mesh = open3d::geometry::TriangleMesh::CreateSphere(0.01);mesh->ComputeVertexNormals();mesh->PaintUniformColor(Eigen::Vector3d(0.1, 0.1, 0.1));mesh->Translate(Axis);//圆心坐标auto mesh1 = open3d::geometry::TriangleMesh::CreateCoordinateFrame(0.1);mesh1->ComputeVertexNormals();//旋转到相机方向 方便我们观看Eigen::Matrix4d transform;transform << 1.0, 0.0, 0.0, 0.0,0.0, -1.0, 0.0, 0.0,0.0, 0.0, -1.0, 0.0,0.0, 0.0, 0.0, 1.0;inPC->Transform(transform);outPC->Transform(transform);lineCloud->Transform(transform);Plane->Transform(transform);mesh->Transform(transform);mesh1->Transform(transform);// open3d::io::WritePointCloudToPCD("inPC.pcd", *inPC,false);// open3d::io::WritePointCloudToPCD("outPC.pcd", *outPC,false);// open3d::io::WritePointCloudToPCD("lineCloud.pcd", *lineCloud,false);// open3d::io::WritePointCloudToPCD("Plane.pcd", *Plane,false);open3d::visualization::DrawGeometries({ inPC,outPC,lineCloud,Plane,mesh,mesh1 }, "Result", 800, 450);}
}void Orbbec::close(void) {pipe.stop();
};

再新建一个主函数，用来调用上面的函数，名字就叫main1_testcamera.cpp写，主体代码如下所示：

#include "camera.h"
// #include "opencv2/opencv.hpp"
#include <fstream>
#include <iostream>
#include <cmath>
#include <conio.h>
using namespace std;int main(int argc, char** argv) try {Orbbec camera;camera.Init();while (true)  {cv::Mat src, dep;if (!camera.GetRgbAndDepth(src, dep)) {continue;}cv::Mat show=src.clone();//cv::addWeighted(src,0.5,dep,0.8,0.5,show);int box[4] = { 280,200,360,280};//检测框的xmin,ymin,xmax,ymaxcv::Mat mask = cv::Mat::ones(cv::Size(src.cols, src.rows), CV_8UC1);float result[10] = {0};camera.Postprocess(box, mask, src, result);printf("ax,ay,az,x,y,z,:%f,%f,%f,%f,%f,%f\n", result[0], result[1], result[2], result[3], result[4], result[5]);cv::Point p1((int)(0.5 * box[0] + 0.5 * box[2]), (int)(0.5 * box[1] + 0.5 * box[3]));cv::circle(show, p1, 3, cv::Scalar(0, 255, 255), 3);//center pointcv::Point p2((int)result[6], (int)result[7]);cv::circle(show, p2, 3, cv::Scalar(0, 0, 255), 3);//pick point   cv::Rect rdraw0 = cv::Rect((int)box[0], (int)box[1], (int)(box[2]-box[0]), (int)(box[3] - box[1]));cv::rectangle(show, rdraw0, cv::Scalar(0, 255, 255), 2);//detect boxcv::Rect rdraw1 = cv::Rect((int)(result[6] - 0.5 * result[8]),(int)(result[7] - 0.5 * result[9]),(int)result[8], (int)result[9]);cv::rectangle(show, rdraw1, cv::Scalar(0, 0, 255), 2);//postprocess boxprintf("-------------------------------------------------\n");cv::imshow("show", show);char c = cv::waitKey(1);if (c == 27) {cv::destroyAllWindows();break;}}camera.close();return 0;
}
catch (ob::Error& e) {std::cerr << "function:" << e.getName() << "\nargs:" << e.getArgs() << "\nmessage:" << e.getMessage() << "\ntype:" << e.getExceptionType() << std::endl;exit(EXIT_FAILURE);
}

最后，新建一个CMakeLists.txt配置我们的环境，如下所示，需要根据实际情况修改OpenCV_DIR、OrbbecSDK_ROOT_DIR、Open3D_DIR，

cmake_minimum_required(VERSION 3.10)project(orbbec)add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Release)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/bin)include_directories(include/)
link_directories(lib/) #添加特定的库文件搜索路径 相当于g++的 -L参数include_directories(${PROJECT_SOURCE_DIR}/src/)
file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp)#opencv
Set(OpenCV_DIR D:/Project/opencv/build/)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${OpenCV_DIR}/include/)
link_directories(${OpenCV_DIR}/x64/vc15/lib/)#orbbec
set(OrbbecSDK_ROOT_DIR D:/Project/OrbbecSDK/SDK)
set(OrbbecSDK_LIBRARY_DIRS ${OrbbecSDK_ROOT_DIR}/lib)
set(OrbbecSDK_INCLUDE_DIR ${OrbbecSDK_ROOT_DIR}/include)
include_directories(${OrbbecSDK_INCLUDE_DIR})
link_directories(${OrbbecSDK_LIBRARY_DIRS})#open3d
Set(Open3D_DIR D:/Project/open3d-0.18.0/CMake/)
find_package( Open3D  REQUIRED)
include_directories(${Open3D_INCLUDE_DIRS})
link_directories(${Open3D_LIBRARY_DIRS})add_executable(test main1_test_camera.cpp ${SRCS})
target_link_libraries(test OrbbecSDK ${OpenCV_LIBS})
target_link_libraries(test ${Open3D_LIBRARIES})
target_include_directories(test PUBLIC ${Open3D_INCLUDE_DIRS})

最后，利用Cmake-gui与vs2019生成可执行文件即可，运行程序时候将这3个dll文件放在test.exe的同级目录即可。

2 YOLO11S-SEG的onnxruntime推理

直接上代码，首先建立Segment.h，用来定义推理用到的函数，如下

#ifndef SEGMENT_H
#define SEGMENT_H
#include <fstream>
#include <sstream>
#include <iostream>
#include <opencv2/imgproc.hpp>
#include <opencv2/highgui.hpp>
//#include <cuda_provider_factory.h>
#include <onnxruntime_cxx_api.h>
#include <chrono>
using namespace std;
using namespace Ort;
struct BoxInfo
{float x;float y;float w;float h;float score;int label;float mask[32];
};struct Net_config
{float confThreshold; // Confidence thresholdfloat nmsThreshold;  // Non-maximum suppression thresholdstd::string modelpath;std::string labelpath;
};constexpr static int num_class = 1;//我们只有一类就是1，多类就要改这儿
const static int kOutputSize = num_class + 4 + 32;
class YOLOV8
{
public:YOLOV8(Net_config config);//加载模型void preprocess_img(cv::Mat& img, cv::Mat& dis, int input_w, int input_h);//将图像letterbox resize到640*640void normalize_(cv::Mat img);//除以255，归一化，同时将rgbrgbrgb转成rrrgggbbbvoid detect(cv::Mat& frame, std::vector<BoxInfo>& generate_boxes, std::vector<cv::Mat>& masks);//输入图像进行检测与后处理//得到每个目标的检测框与每个目标的mask1float iou(float lbox[4], float rbox[4]);//iou计算void nms(vector<BoxInfo>& input_boxes);//nmsstd::vector<cv::Mat> process_mask(const float* proto, int proto_size, std::vector<BoxInfo>& dets, cv::Mat img);//后处理得到maskcv::Rect get_downscale_rect(float bbox[4], float scale);//将640*640输入的box尺寸下采样到160*160的特征图上cv::Mat scale_mask(cv::Mat mask, cv::Mat img); // 将640 * 640输入得到的掩码还原到原图上, 很好理解，就跟processimg反着操作cv::Rect get_rect(cv::Mat& img, float bbox[4]);// 将640 * 640输入得到的box还原到原图上, 很好理解，就跟processimg反着操作private:int inpWidth;int inpHeight;int nout;int num_proposal;std::vector<string> class_names;float confThreshold;float nmsThreshold;std::vector<float> input_image_;//ONNXRUMTIME接收的数据是一个vectorEnv env = Env(ORT_LOGGING_LEVEL_ERROR, "YOLOV8");Ort::Session* ort_session = nullptr;SessionOptions sessionOptions = SessionOptions();std::vector<char*> input_names;std::vector<char*> output_names;std::vector<vector<int64_t>> input_node_dims; // >=1 outputsstd::vector<vector<int64_t>> output_node_dims; // >=1 outputsconst std::vector<uint32_t> colors = { 0xFF3838, 0xFF9D97, 0xFF701F, 0xFFB21D, 0xCFD231, 0x48F90A,0x92CC17, 0x3DDB86, 0x1A9334, 0x00D4BB, 0x2C99A8, 0x00C2FF,0x344593, 0x6473FF, 0x0018EC, 0x8438FF, 0x520085, 0xCB38FF,0xFF95C8, 0xFF37C7 };
};
#endif

然后新建Segment.cpp开具体实现这些函数，代码如下：

#include "Segment.h"YOLOV8::YOLOV8(Net_config config)
{confThreshold = config.confThreshold;nmsThreshold = config.nmsThreshold;string classesFile = config.labelpath;string model_path = config.modelpath;std::wstring widestr = std::wstring(model_path.begin(), model_path.end());OrtStatus* status = OrtSessionOptionsAppendExecutionProvider_CUDA(sessionOptions, 0);sessionOptions.SetGraphOptimizationLevel(ORT_ENABLE_BASIC);ort_session = new Session(env, widestr.c_str(), sessionOptions);size_t numInputNodes = ort_session->GetInputCount();size_t numOutputNodes = ort_session->GetOutputCount();AllocatorWithDefaultOptions allocator;for (int i = 0; i < numInputNodes; i++){AllocatedStringPtr input_name_Ptr = ort_session->GetInputNameAllocated(i, allocator);input_names.push_back(input_name_Ptr.get());Ort::TypeInfo input_type_info = ort_session->GetInputTypeInfo(i);auto input_tensor_info = input_type_info.GetTensorTypeAndShapeInfo();auto input_dims = input_tensor_info.GetShape();input_node_dims.push_back(input_dims);}for (int i = 0; i < numOutputNodes; i++){AllocatedStringPtr output_name_Ptr = ort_session->GetOutputNameAllocated(i, allocator);output_names.push_back(output_name_Ptr.get());Ort::TypeInfo output_type_info = ort_session->GetOutputTypeInfo(i);auto output_tensor_info = output_type_info.GetTensorTypeAndShapeInfo();auto output_dims = output_tensor_info.GetShape();output_node_dims.push_back(output_dims);}inpHeight = input_node_dims[0][2];inpWidth = input_node_dims[0][3];nout = output_node_dims[0][1];num_proposal = output_node_dims[0][2];ifstream ifs(classesFile.c_str());string line;while (getline(ifs, line))  class_names.push_back(line);printf("classsize is:%d\n", num_class);assert(num_class = class_names.size());
}
cv::Rect YOLOV8::get_rect(cv::Mat& img, float bbox[4]) {float l, r, t, b;float r_w = inpWidth / (img.cols * 1.0);float r_h = inpHeight / (img.rows * 1.0);//inpHeight inpWidthif (r_h > r_w) {l = bbox[0] - bbox[2] / 2.f;r = bbox[0] + bbox[2] / 2.f;t = bbox[1] - bbox[3] / 2.f - (inpHeight - r_w * img.rows) / 2;b = bbox[1] + bbox[3] / 2.f - (inpHeight - r_w * img.rows) / 2;l = l / r_w;r = r / r_w;t = t / r_w;b = b / r_w;}else {l = bbox[0] - bbox[2] / 2.f - (inpWidth - r_h * img.cols) / 2;r = bbox[0] + bbox[2] / 2.f - (inpWidth - r_h * img.cols) / 2;t = bbox[1] - bbox[3] / 2.f;b = bbox[1] + bbox[3] / 2.f;l = l / r_h;r = r / r_h;t = t / r_h;b = b / r_h;}return cv::Rect(round(l), round(t), round(r - l), round(b - t));
}cv::Rect YOLOV8::get_downscale_rect(float bbox[4], float scale) {float left = bbox[0] - bbox[2] / 2;float top = bbox[1] - bbox[3] / 2;float right = bbox[0] + bbox[2] / 2;float bottom = bbox[1] + bbox[3] / 2;left /= scale;top /= scale;right /= scale;bottom /= scale;return cv::Rect(round(left), round(top), round(right - left), round(bottom - top));
}cv::Mat YOLOV8::scale_mask(cv::Mat mask, cv::Mat img) {int x, y, w, h;float r_w = inpWidth / (img.cols * 1.0);float r_h = inpHeight / (img.rows * 1.0);if (r_h > r_w) {w = inpWidth;h = r_w * img.rows;x = 0;y = (inpHeight - h) / 2;}else {w = r_h * img.cols;h = inpHeight;x = (inpWidth - w) / 2;y = 0;}cv::Rect r(x, y, w, h);cv::Mat res;cv::resize(mask(r), res, img.size());return res;
}std::vector<cv::Mat> YOLOV8::process_mask(const float* proto, int proto_size, std::vector<BoxInfo>& dets, cv::Mat img) {std::vector<cv::Mat> masks;//proto 32x(H/4)x(W/4)//auto start = std::chrono::system_clock::now();for (size_t i = 0; i < dets.size(); i++) {cv::Mat mask_mat = cv::Mat::zeros(inpHeight / 4, inpWidth / 4, CV_8UC1);//160*160float bbox[4] = { dets[i].x, dets[i].y, dets[i].w, dets[i].h };auto r = get_downscale_rect(bbox, 4);//将640*640输入的box尺寸下采样到160*160的特征图上for (int x = r.x; x < r.x + r.width; x++) {for (int y = r.y; y < r.y + r.height; y++) {float e = 0.0f;for (int j = 0; j < 32; j++) {//1x32 x 1*32*160*160 =1*160*160 可以理解成160*160特征图上的每一个像素有32个值，要与mask的32个值分别相乘再相加//而1*32*160*160在内存中的存储方式是160*160 160*160 160*160 .... 因此每次乘法操作的时候都要跳过160*160个内存空间e += dets[i].mask[j] * proto[j * proto_size / 32 + y * mask_mat.cols + x];}e = 1.0f / (1.0f + expf(-e));if (e > 0.5) {mask_mat.at<uchar>(y, x) = 1;}}}//160*160的掩码resize到640*640上cv::resize(mask_mat, mask_mat, cv::Size(inpWidth, inpHeight), 0, 0, 1);//再将640*640的掩码还原到原图上,很好理解，就跟processimg反着操作cv::Mat img_mask = scale_mask(mask_mat, img);//将640*640的box还原到原图上,很好理解，就跟processimg反着操作r = get_rect(img, bbox);cv::Rect holeImgRect(0, 0, img.cols, img.rows);r = r & holeImgRect;dets[i].x = r.x;dets[i].y = r.y;dets[i].w = r.width;dets[i].h = r.height;masks.push_back(img_mask);}return masks;
}
void YOLOV8::normalize_(cv::Mat img)
{int row = img.rows;int col = img.cols;input_image_.resize(row * col * img.channels());// bgrbgrbgr to rrrgggbbbfor (int c = 0; c < 3; c++){for (int i = 0; i < row; i++){for (int j = 0; j < col; j++){float pix = img.ptr<uchar>(i)[j * 3 + 2 - c];input_image_[c * row * col + i * col + j] = pix / 255.0;}}}
}
float YOLOV8::iou(float lbox[4], float rbox[4]) {float interBox[] = {(std::max)(lbox[0] - lbox[2] / 2.f , rbox[0] - rbox[2] / 2.f), //left(std::min)(lbox[0] + lbox[2] / 2.f , rbox[0] + rbox[2] / 2.f), //right(std::max)(lbox[1] - lbox[3] / 2.f , rbox[1] - rbox[3] / 2.f), //top(std::min)(lbox[1] + lbox[3] / 2.f , rbox[1] + rbox[3] / 2.f), //bottom};if (interBox[2] > interBox[3] || interBox[0] > interBox[1])return 0.0f;float interBoxS = (interBox[1] - interBox[0]) * (interBox[3] - interBox[2]);return interBoxS / (lbox[2] * lbox[3] + rbox[2] * rbox[3] - interBoxS);
}
void YOLOV8::nms(vector<BoxInfo>& input_boxes)
{sort(input_boxes.begin(), input_boxes.end(), [](BoxInfo a, BoxInfo b) { return a.score > b.score; });vector<bool> isSuppressed(input_boxes.size(), false);for (int i = 0; i < int(input_boxes.size()); ++i){if (isSuppressed[i]) { continue; }for (int j = i + 1; j < int(input_boxes.size()); ++j){if (isSuppressed[j]) { continue; }float lbox[4] = { input_boxes.at(i).x,input_boxes.at(i).y,input_boxes.at(i).w,input_boxes.at(i).h };float rbox[4] = { input_boxes.at(j).x,input_boxes.at(j).y,input_boxes.at(j).w,input_boxes.at(j).h };float ovr = iou(lbox, rbox);if (ovr >= nmsThreshold){isSuppressed[j] = true;}}}int idx_t = 0;input_boxes.erase(remove_if(input_boxes.begin(), input_boxes.end(), [&idx_t, &isSuppressed](const BoxInfo& f) { return isSuppressed[idx_t++]; }), input_boxes.end());
}
void YOLOV8::preprocess_img(cv::Mat& img, cv::Mat& dis, int input_w, int input_h) {int w, h, x, y;float r_w = input_w / (img.cols * 1.0);float r_h = input_h / (img.rows * 1.0);if (r_h > r_w) {//宽大于高w = input_w;h = r_w * img.rows;x = 0;y = (input_h - h) / 2;}else {w = r_h * img.cols;h = input_h;x = (input_w - w) / 2;y = 0;}cv::Mat re(h, w, CV_8UC3);cv::resize(img, re, re.size(), 0, 0, cv::INTER_LINEAR);cv::Mat out(input_h, input_w, CV_8UC3, cv::Scalar(128, 128, 128));re.copyTo(out(cv::Rect(x, y, re.cols, re.rows)));out.copyTo(dis);}void YOLOV8::detect(cv::Mat& frame, vector<BoxInfo>& generate_boxes,std::vector<cv::Mat>& masks)
{cv::Mat dstimg;preprocess_img(frame, dstimg, inpWidth, inpHeight);normalize_(dstimg);array<int64_t, 4> input_shape_{ 1, 3,  inpHeight,  inpWidth };auto allocator_info = MemoryInfo::CreateCpu(OrtDeviceAllocator, OrtMemTypeCPU);Value input_tensor_ = Value::CreateTensor<float>(allocator_info, input_image_.data(), input_image_.size(), input_shape_.data(), input_shape_.size());// 开始推理const std::array<const char*, 1> inputNames = { "images" };const std::array<const char*, 2> outNames = { "output0","output1" };vector<Value> ort_outputs = ort_session->Run(Ort::RunOptions{ nullptr }, inputNames.data(), &input_tensor_, 1, outNames.data(), outNames.size());/generate proposals//vector<BoxInfo> generate_boxes;int n = 0, k = 0;const float* pdata = ort_outputs[0].GetTensorMutableData<float>();// 1 * 116 *8400const float* pdata1 = ort_outputs[1].GetTensorMutableData<float>();//1*32*160*160 mask propsalfor (n = 0; n < num_proposal; n++){float out[kOutputSize];//4+class+32float mask[32];for (int j = 0; j < nout; j++) {out[j] = pdata[n + num_proposal * j];}memcpy(&mask, &out[4 + num_class], 32 * sizeof(float));int max_ind = 0;float max_class_socre = 0;// 计算最大的置信度得分for (k = 0; k < num_class; k++){if (out[k + 4] > max_class_socre){max_class_socre = out[k + 4];max_ind = k;}}if (max_class_socre > confThreshold){float cx = out[0];float cy = out[1];float w = out[2];float h = out[3];BoxInfo det;det.x = cx;det.y = cy;det.w = w;det.h = h;det.score = max_class_socre;det.label = max_ind;for (int jj = 0; jj < 32; jj++) {det.mask[jj] = mask[jj];}generate_boxes.push_back(det);}}nms(generate_boxes);int kOutputSize2 = 32 * (inpHeight / 4) * (inpWidth / 4);masks = process_mask(pdata1, kOutputSize2, generate_boxes, frame);// draw at original framefor (size_t i = 0; i < generate_boxes.size(); i++) {cv::Mat img_mask = masks[i];auto color = colors[(int)generate_boxes[i].label % colors.size()];auto bgr = cv::Scalar(color & 0xFF, color >> 8 & 0xFF, color >> 16 & 0xFF);cv::Rect r = cv::Rect((int)generate_boxes[i].x, (int)generate_boxes[i].y, (int)generate_boxes[i].w, (int)generate_boxes[i].h);cv::rectangle(frame, r, bgr, 2);// mask 画原图上for (int x = r.x; x < r.x + r.width; x++) {for (int y = r.y; y < r.y + r.height; y++) {float val = img_mask.at<uchar>(y, x);if (val == 0) continue;frame.at<cv::Vec3b>(y, x)[0] = frame.at<cv::Vec3b>(y, x)[0] / 2 + bgr[0] / 2;frame.at<cv::Vec3b>(y, x)[1] = frame.at<cv::Vec3b>(y, x)[1] / 2 + bgr[1] / 2;frame.at<cv::Vec3b>(y, x)[2] = frame.at<cv::Vec3b>(y, x)[2] / 2 + bgr[2] / 2;}}char label[100];sprintf(label, "%s:%.2f", class_names[generate_boxes[i].label], generate_boxes[i].score);cv::putText(frame, label, cv::Point(r.x, r.y - 4), cv::FONT_HERSHEY_PLAIN, 1.2, cv::Scalar(128, 128, 255), 2);}
}

然后再建立一个主程序，叫main.cpp用来调用这些程序，代码如下：


#include <iostream>
#include <chrono>
#include <cmath>
#include <assert.h>
#include "Segment.h"
#include "camera.h"
#include <stdio.h>
using namespace std;int main(int argc, char** argv) {Net_config YOLOV8_nets = { 0.9, 0.5,"D:/Project/inference/Orbbec_example/model/kuaidi.onnx" ,"D:/Project/inference/Orbbec_example/model/kuaidi.names" };YOLOV8 net(YOLOV8_nets);Orbbec camera;camera.Init();while (1){cv::Mat src, dep;if (!camera.GetRgbAndDepth(src, dep)) {continue;}cv::Mat show = src.clone();// Run inferenceauto start = std::chrono::system_clock::now();std::vector<BoxInfo> res;std::vector<cv::Mat> masks;net.detect(src, res, masks);auto end = std::chrono::system_clock::now();std::cout << "inference time: " << std::chrono::duration_cast<std::chrono::milliseconds>(end - start).count() << "ms" << std::endl;if (res.size()==0){printf("no target detect\n");}cv::imshow("reult", src);char c = cv::waitKey(1);for (int i = 0; i < res.size(); i++){int box[4] = { (int)res[i].x, (int)res[i].y,(int)(res[i].x + res[i].w),(int)(res[i].y + res[i].h) };cv::Mat mask = masks[i].clone();float result[10] = { 0 };camera.Postprocess(box, mask, src, result);cv::Point p1((int)(0.5 * box[0] + 0.5 * box[2]), (int)(0.5 * box[1] + 0.5 * box[3]));cv::circle(show, p1, 3, cv::Scalar(0, 255, 255), 3);//center pointcv::Point p2((int)result[6], (int)result[7]);cv::circle(show, p2, 3, cv::Scalar(0, 0, 255), 3);//pick point   cv::Rect rdraw0 = cv::Rect((int)box[0], (int)box[1], (int)(box[2] - box[0]), (int)(box[3] - box[1]));cv::rectangle(show, rdraw0, cv::Scalar(0, 255, 255), 2);//detect boxcv::Rect rdraw1 = cv::Rect((int)(result[6] - 0.5 * result[8]), (int)(result[7] - 0.5 * result[9]),(int)result[8], (int)result[9]);cv::rectangle(show, rdraw1, cv::Scalar(0, 0, 255), 2);//postprocess box}printf("-------------------------------------------------\n");cv::imshow("show", show);c = cv::waitKey(1);if (c == 27) {cv::destroyAllWindows();break;}}camera.close(); printf("exit prgram\n");return 0;
}

改写以下CMakeLists.txt，使其可以同时使用1.2与这一章节，如下：

cmake_minimum_required(VERSION 3.10)project(orbbec)add_definitions(-std=c++11)
add_definitions(-DAPI_EXPORTS)
set(CMAKE_CXX_STANDARD 11)
set(CMAKE_BUILD_TYPE Release)
set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${PROJECT_SOURCE_DIR}/bin)include_directories(include/)
link_directories(lib/) #添加特定的库文件搜索路径 相当于g++的 -L参数include_directories(${PROJECT_SOURCE_DIR}/src/)
file(GLOB_RECURSE SRCS ${PROJECT_SOURCE_DIR}/src/*.cpp)#opencv
Set(OpenCV_DIR D:/Project/opencv/build/)
find_package(OpenCV)
include_directories(${OpenCV_INCLUDE_DIRS})
include_directories(${OpenCV_DIR}/include/)
link_directories(${OpenCV_DIR}/x64/vc15/lib/)#orbbec
set(OrbbecSDK_ROOT_DIR D:/Project/OrbbecSDK/SDK)
set(OrbbecSDK_LIBRARY_DIRS ${OrbbecSDK_ROOT_DIR}/lib)
set(OrbbecSDK_INCLUDE_DIR ${OrbbecSDK_ROOT_DIR}/include)
include_directories(${OrbbecSDK_INCLUDE_DIR})
link_directories(${OrbbecSDK_LIBRARY_DIRS})
#open3d
Set(Open3D_DIR D:/Project/open3d-0.18.0/CMake/)
find_package( Open3D  REQUIRED)
include_directories(${Open3D_INCLUDE_DIRS})
link_directories(${Open3D_LIBRARY_DIRS})#onnxruntinme
Set(ONNXRUNTIME_ROOT_DIR D:/Project/onnxruntime-win-x64-gpu-1.14.0/)
include_directories(${ONNXRUNTIME_ROOT_DIR}/include/)
link_directories(${ONNXRUNTIME_ROOT_DIR}/lib/) add_executable(test main1_test_camera.cpp ${SRCS})
target_link_libraries(test OrbbecSDK ${OpenCV_LIBS})
target_link_libraries(test ${Open3D_LIBRARIES})
target_include_directories(test PUBLIC ${Open3D_INCLUDE_DIRS})
target_link_libraries(test onnxruntime)add_executable(main main.cpp ${SRCS})
target_link_libraries(main OrbbecSDK ${OpenCV_LIBS})
target_link_libraries(main ${Open3D_LIBRARIES})
target_include_directories(main PUBLIC ${Open3D_INCLUDE_DIRS})
target_link_libraries(main onnxruntime)

依旧基于cmake-gui与vs，生成对应的exe文件，运行exe的时候记得将下面几个dll拷贝到同一个目录下，不然会报错。