Getting Started with PaddleOCR: Conda Env + GPU Inference End-to-End

Operation Demo

1. Create an Independent Conda Environment

To maintain a clean development environment and avoid dependency conflicts, I prefer using Conda to create isolated environments. According to the official documentation, the base version of PaddleOCR supports Python 3.8+, but if you need to install optional dependency groups like [all] (which include advanced features such as document parsing and information extraction), a Python version above 3.9 is recommended. Here, I choose Python 3.11, which satisfies all feature requirements while offering good performance.

conda create -n paddleocr python=3.11
conda activate paddleocr

2. Install PaddleOCR and PaddlePaddle GPU

The installation process consists of two steps: first install the OCR toolkit, then install the underlying PaddlePaddle engine. paddleocr[all] pulls in all optional components, suitable for scenarios requiring full functionality. For the PaddlePaddle engine, if you are using GPU inference, you must install the paddlepaddle-gpu version.

Here is a key point: choosing between cu118 and cu126 depends on your GPU driver version, not the CUDA Toolkit version installed on the host. If the driver version is ≥ 550.54.14, you can choose cu126; otherwise, cu118 is recommended. In this record, I chose cu126 to adapt to a newer driver environment. It is recommended to verify the version after installation to ensure success.

python -m pip install "paddleocr[all]"
python -m pip install paddlepaddle-gpu==3.2.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/
python -c "import paddle; print(paddle.__version__)"

3. Quick Verification via Command Line

After installation, you can verify directly using the paddleocr command-line tool. During testing, I tried both CPU and GPU modes and adjusted the output directory (e.g., ./output or ./temp). To improve inference speed, you can disable preprocessing steps like document orientation classification and dewarping using parameters such as --use_doc_orientation_classify False.

Below is a typical GPU inference command, supporting direct input of local paths or image URLs. Results are printed to the terminal and visualization images are saved to the specified directory.

paddleocr ocr -i https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png \
  --use_doc_orientation_classify False \
  --use_doc_unwarping False \
  --use_textline_orientation False \
  --save_path ./output \
  --device gpu:0

If you want to try other OCR model versions, you can switch using the --ocr_version parameter, for example, specifying PP-OCRv4:

paddleocr ocr -i ./general_ocr_002.png --ocr_version PP-OCRv4

When switching to CPU inference, simply change --device gpu:0 to --device cpu, keeping the rest of the parameters unchanged.

4. Python Script Integration

Besides the command line, PaddleOCR also provides a convenient Python API for easy integration into business code. Based on my test directory structure, I placed images in the data directory and output results to the output directory.

mkdir paddleocr_test && cd paddleocr_test
mkdir data && cd data
wget https://paddle-model-ecology.bj.bcebos.com/paddlex/imgs/demo_image/general_ocr_002.png
cd .. && mkdir output

When initializing PaddleOCR, you can pass optimization parameters identical to those used in the command line. The predict() method returns a list of Result objects. Each result object provides .print(), .save_to_img(), and .save_to_json() methods for console printing, saving visualization images, and saving structured data, respectively.

# test.py
from paddleocr import PaddleOCR
import os

os.makedirs("output", exist_ok=True)

ocr = PaddleOCR(
    use_doc_orientation_classify=False,
    use_doc_unwarping=False,
    use_textline_orientation=False,
    lang='ch',
)

result = ocr.predict("./data/general_ocr_002.png")

for res in result:
    res.print()
    res.save_to_img("output")
    res.save_to_json("output")

Run:

python test.py

Summary

The deployment process for PaddleOCR is very clear; using a Conda isolated environment combined with pip installation allows for a quick start. The command-line tool is suitable for quick testing and single-image processing, while the Python API offers more flexible integration capabilities. By disabling unnecessary preprocessing options, you can significantly improve inference efficiency while maintaining accuracy.