OpenCV DNN Batch Inference: A Guide for Image Classification
OpenCV features a powerful DNN module that is efficient and easy to use. To implement a deep learning inference application, we only need to call a few core APIs provided by the module. The basic workflow for implementing DNN inference with OpenCV is as follows:
- Initialization: Create a
cv::dnn::Netobject by reading network weights (e.g., Caffe, ONNX, etc.).- Preprocessing: Determine the input shape required by the network and reshape the raw input images. This step usually includes operations like normalization and mean subtraction.
- Inference: Execute the forward pass using the created
cv::dnn::Netobject.- Post-processing: Decode the output data and perform further business logic or data handling.
In my experience, since the network weights are fixed during deployment, the most critical parts are preprocessing and post-processing. You must determine the exact input shape and normalization parameters (mean/std values). Post-processing can be significantly more complex: while classification is straightforward, tasks like object detection or segmentation require substantial data wrangling. Sometimes, you may even need to rewrite certain operations from scratch because C++ lacks direct equivalents to specific Python implementations.
In this article, I will describe a simple implementation of image classification using the OpenCV DNN module and provide a quick guide to batch inference.
Important APIs
To load weights and create the DNN Net object, the OpenCV DNN module provides the readNet method, which supports ONNX, Caffe, TensorFlow, OpenVINO, and more.
In this example, we use a Caffe model:
Net cv::dnn::readNetFromCaffe( const String & prototxt, \
const String & caffeModel = String())
Parameters:
prototxt: Path to the.prototxtfile containing the network architecture description.caffeModel: Path to the.caffemodelfile containing the learned weights.Returns: A
Netobject.
We also use blobFromImage to preprocess the input cv::Mat.
Mat cv::dnn::blobFromImage( InputArray image,
double scalefactor = 1.0,
const Size & size = Size(),
const Scalar & mean = Scalar(),
bool swapRB = false,
bool crop = false,
int ddepth = CV_32F)
Creates a 4-dimensional blob from an image. It optionally resizes and center-crops the image, subtracts mean values, scales pixel values, and swaps Blue and Red channels.
Parameters:
image: Input image (1, 3, or 4 channels).size: Spatial size for the output image.mean: Scalar values subtracted from channels. IfswapRBis true, the order should be (R, G, B).scalefactor: Multiplier for image values.swapRB: Flag to swap the first and last channels (BGR to RGB).crop: Whether to crop the image after resizing.ddepth: Depth of the output blob (e.g.,CV_32ForCV_8U).Returns: A 4-dimensional
Matwith NCHW dimension order.
Difference Between cv::Mat and Blob
The primary difference lies in the data format. In a standard cv::Mat, data is arranged in HWC format (e.g., RGBRGB...RGB). When converted to a blob, it follows the NCHW format, which is standard for most neural networks.
In a standard cv::Mat, we access dimensions via cols and rows. In a blob, these members are typically set to -1. Instead, we retrieve sizes using blob.size(0) (Batch), blob.size(1) (Channels), etc. This is crucial when decomposing result matrices from batch inference.
Including Dependencies
First, we include the necessary headers and declare a global cv::dnn::Net variable. This object handles the model loading and all neural network calculations.
#include <iostream>
#include <string>
#include <vector>
#include "opencv2/core.hpp"
#include "opencv2/dnn.hpp"
#include "opencv2/core/cuda.hpp"
// Global net variable.
cv::dnn::Net net;
In the init function, we load the model. Note that we enable CUDA support for faster inference. If you need to build OpenCV with CUDA, refer to the OpenCV CUDA Build Guide.
void init(const std::string& model_deploy, const std::string& model_bin)
{
net = cv::dnn::readNetFromCaffe(model_deploy, model_bin);
// Enable CUDA backend
cv::cuda::setDevice(cuda_id);
this->net.setPreferableBackend(cv::dnn::DNN_BACKEND_CUDA);
this->net.setPreferableTarget(cv::dnn::DNN_TARGET_CUDA);
}
Single Image Inference
Single image inference is straightforward: convert the image to a blob using blobFromImage, set it as the input, and call forward().
void single_inference(const cv::Mat& image)
{
if (image.empty())
{
std::cout << "Empty image!!!" << std::endl;
return;
}
cv::Mat blob = cv::dnn::blobFromImage(image, 1.0, cv::Size(224, 224), \
cv::Scalar(0, 0, 0), false, false);
net.setInput(blob);
cv::Mat out = net.forward();
post_process(out);
}
For post-processing, we assume the final layer is a Softmax layer. We use cv::minMaxLoc to identify the class with the highest probability.
void post_process(const cv::Mat& out)
{
double min_val;
double max_val;
cv::Point min_pos;
cv::Point max_pos;
cv::minMaxLoc(out, &min_val, &max_val, &min_pos, &max_pos);
int label = max_pos.x;
std::cout << "Result label: " << label << " Score: " << max_val << std::endl;
}
Batch Inference
Batch inference processes multiple images in a single forward pass, improving throughput. We use blobFromImages to create the batch blob.
void batch_inference(const std::vector<cv::Mat>& images)
{
// Create a batch blob (NCHW)
cv::Mat blob = cv::dnn::blobFromImages(images, 1.0, cv::Size(224, 224), cv::Scalar(0, 0, 0), false, false);
net.setInput(blob);
cv::Mat out = net.forward();
// Decompose the output: each row represents one image's result
for (int i = 0; i < out.rows; ++i)
{
cv::Mat out_single = out.rowRange(i, i + 1);
post_process(out_single);
}
}
Overall Demo
The following is a complete implementation demonstrating the concepts discussed. You can download the ImageNet Caffe model from this repository.
#include <iostream>
#include <string>
#include <vector>
#include "opencv2/core.hpp"
#include "opencv2/dnn.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/imgproc.hpp"
typedef std::pair<int, float> Result;
Result post_process(const cv::Mat& out)
{
double min_val, max_val;
cv::Point min_loc, max_loc;
cv::minMaxLoc(out, &min_val, &max_val, &min_loc, &max_loc);
return std::make_pair(max_loc.x, (float)max_val);
}
std::vector<Result> batch_post_process(const cv::Mat& outs)
{
std::vector<Result> ret;
for(int i = 0; i < outs.rows; i++)
{
cv::Mat out = outs.rowRange(i, i + 1);
auto result = post_process(out);
ret.push_back(result);
}
return ret;
}
int main()
{
// Load the network
cv::dnn::Net net = cv::dnn::readNetFromCaffe("models/deploy.prototxt", "models/resnet50_cvgj_iter_320000.caffemodel");
std::vector<std::string> image_files = {"data/a.jpeg", "data/b.jpeg", "data/c.jpeg"};
std::vector<cv::Mat> images;
for (const auto& file : image_files)
{
cv::Mat img = cv::imread(file);
if (!img.empty()) images.push_back(img);
}
if (images.empty()) return -1;
// Preprocessing multiple images
cv::Mat blobs = cv::dnn::blobFromImages(images, 1.0, cv::Size(224, 224), cv::Scalar(104, 117, 123), false, false);
// Inference
net.setInput(blobs);
cv::Mat outs = net.forward();
// Post-processing and visualization
auto results = batch_post_process(outs);
for (size_t i = 0; i < results.size(); ++i)
{
auto result = results[i];
std::cout << "Image " << i << " -> Label: " << result.first << " Score: " << result.second << std::endl;
char result_text[50];
sprintf(result_text, "ID:%d Score:%.2f", result.first, result.second);
cv::putText(images[i], result_text, cv::Point(10, 30), cv::FONT_HERSHEY_SIMPLEX, 0.7, cv::Scalar(0, 0, 255), 2);
cv::imshow("Inference Result", images[i]);
cv::waitKey(0);
}
return 0;
}