Getting Started with PaddleOCR: Conda Env + GPU Inference End-to-End
Operation Demo
1. Create an Independent Conda Environment
To maintain a clean development environment and avoid dependency conflicts, I prefer using Conda to create isolated environments. According to the official documentation, the base version of PaddleOCR supports Python 3.8+, but if you need to install optional dependency groups like [all] (which include advanced features such as document parsing and information extraction), a Python version above 3.9 is recommended. Here, I choose Python 3.11, which satisfies all feature requirements while offering good performance.
conda create -n paddleocr python=3.11
conda activate paddleocr
2. Install PaddleOCR and PaddlePaddle GPU
The installation process consists of two steps: first install the OCR toolkit, then install the underlying PaddlePaddle engine. paddleocr[all] pulls in all optional components, suitable for scenarios requiring full functionality. For the PaddlePaddle engine, if you are using GPU inference, you must install the paddlepaddle-gpu version.
Here is a key point: choosing between cu118 and cu126 depends on your GPU driver version, not the CUDA Toolkit version installed on the host. If the driver version is ≥ 550.54.14, you can choose cu126; otherwise, cu118 is recommended. In this record, I chose cu126 to adapt to a newer driver environment. It is recommended to verify the version after installation to ensure success.
python -m pip install "paddleocr[all]"
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -c "import paddle; print(paddle.__version__)"
3. Quick Verification via Command Line
After installation, you can verify directly using the paddleocr command-line tool. During testing, I tried both CPU and GPU modes and adjusted the output directory (e.g., ./output or ./temp). To improve inference speed, you can disable preprocessing steps like document orientation classification and dewarping using parameters such as --use_doc_orientation_classify False.
Below is a typical GPU inference command, supporting direct input of local paths or image URLs. Results are printed to the terminal and visualization images are saved to the specified directory.
paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png \
--use_doc_orientation_classify False \
--use_doc_unwarping False \
--use_textline_orientation False \
--save_path ./output \
--device gpu:0
If you want to try other OCR model versions, you can switch using the --ocr_version parameter, for example, specifying PP-OCRv4:
paddleocr ocr -i ./general_ocr_002.png --ocr_version PP-OCRv4
When switching to CPU inference, simply change --device gpu:0 to --device cpu, keeping the rest of the parameters unchanged.
4. Python Script Integration
Besides the command line, PaddleOCR also provides a convenient Python API for easy integration into business code. Based on my test directory structure, I placed images in the data directory and output results to the output directory.
mkdir paddleocr_test && cd paddleocr_test
mkdir data && cd data
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
cd .. && mkdir output
When initializing PaddleOCR, you can pass optimization parameters identical to those used in the command line. The predict() method returns a list of Result objects. Each result object provides .print(), .save_to_img(), and .save_to_json() methods for console printing, saving visualization images, and saving structured data, respectively.
# test.py
from paddleocr import PaddleOCR
import os
os.makedirs("output", exist_ok=True)
ocr = PaddleOCR(
use_doc_orientation_classify=False,
use_doc_unwarping=False,
use_textline_orientation=False,
lang='ch',
)
result = ocr.predict("./data/general_ocr_002.png")
for res in result:
res.print()
res.save_to_img("output")
res.save_to_json("output")
Run:
python test.py
Summary
The deployment process for PaddleOCR is very clear; using a Conda isolated environment combined with pip installation allows for a quick start. The command-line tool is suitable for quick testing and single-image processing, while the Python API offers more flexible integration capabilities. By disabling unnecessary preprocessing options, you can significantly improve inference efficiency while maintaining accuracy.