MMOCR Installation and Training on Custom Datasets

What is OCR?

According to Wikipedia, Optical Character Recognition (OCR) is the electronic or mechanical conversion of images of typed, handwritten, or printed text into machine-encoded text.

Why MMOCR?

PyTorch-based: It leverages the strengths of the PyTorch ecosystem.
Maintained by OpenMMLab: Backed by SenseTime, ensuring long-term support and updates.
Clean Architecture: Follows a consistent structure similar to MMDetection.
Deployment Ready: Includes experimental deployment code for C++ environments.

Installation

The MMOCR documentation provides comprehensive installation instructions. Below is the setup process I followed.

First, create an isolated virtual environment:

conda create -n open-mmlab python=3.8
conda activate open-mmlab

Install PyTorch 1.10 with CUDA 10.2:

conda install pytorch=1.10.0 torchvision cudatoolkit=10.2 -c pytorch

Install mmcv and mmdet:

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu102/torch1.10.0/index.html
pip install mmdet

Clone the MMOCR repository and build it:

git clone https://github.com/open-mmlab/mmocr.git
cd mmocr
pip install -r requirements.txt
pip install -v -e . # or "python setup.py develop"

Note: The official docs suggest adding the repo path to your PYTHONPATH. I typically work within the repo directory, so this step is optional unless you need to import MMOCR from external projects.

Simple Test

Run the following command to test the installation; it will automatically download weight files and perform end-to-end detection and recognition:

python mmocr/utils/ocr.py demo/demo_text_ocr.jpg --print-result --imshow

To run text detection only, you can specify the det option (e.g., using the TextSnake algorithm):

python mmocr/utils/ocr.py demo/demo_text_det.jpg --output demo/det_out.jpg --det TextSnake --recog None --export demo/

Training with Custom Data

Since my training data was in PASCAL VOC format and MMOCR typically requires COCO format, I had to perform a conversion.

1. Generate ID Lists

Create a gen_ids.py script to split your data into training and testing IDs:

import os
import argparse

def parse_args():
    parser = argparse.ArgumentParser(description='generate id list file.')
    parser.add_argument('--path', type=str, default=None, help='path to annotation files dir.')
    return parser.parse_args()

def generate_ids(anno_path):
    with open("./train_ids.txt", 'w') as ftrain, open("./test_ids.txt", 'w') as ftest:
        cntr=0
        for path, folders, files in os.walk(anno_path):
            for file_name in files:
                if cntr % 10 == 0:
                    ftest.write(file_name.split('.')[0] + '\n')
                else:
                    ftrain.write(file_name.split('.')[0] + '\n')
                cntr += 1
                
def main():
    args = parse_args()
    generate_ids(args.path)
    
if __name__ == '__main__':
    main()

Run it with: python gen_ids.py --path ./Annotations/

2. Format Conversion

Use a voc2coco.py script to convert your XML files to COCO JSON. Ensure you have a coco.names file containing a single line: text.

Execute the conversion:

python voc2coco.py --ann_dir ./Annotations/ --ann_ids ./train_ids.txt --labels ./coco.names --output ./annotations/instances_training.json --ext xml
python voc2coco.py --ann_dir ./Annotations/ --ann_ids ./test_ids.txt --labels ./coco.names --output ./annotations/instances_test.json --ext xml

3. Training

Create a new dataset config at configs/_base_/det_datasets/new_dataset.py, adapting it from icdar2015.py.
Create a model config at configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_new_dataset.py and update the dataset reference.
Train using multiple GPUs:

./tools/dist_train.sh configs/textdet/dbnet/dbnet_r50dcnv2_fpnc_1200e_new_dataset.py work_dirs/dbnet_r50dcnv2_fpnc_1200e_new_dataset 8

Conclusion

MMOCR is highly stable and easy to configure, with automatic logging. While the deployment code is experimental, it is functional; I have successfully converted models to ONNX and integrated them into C++ applications.