Scene Text Recognition

/img/content-concepts-raw-computer-vision-scene-text-recognition-img.png

Introduction

Definition: Text—as a fundamental tool of communicating information—scatters throughout natural scenes, e.g., street signs, product labels, license plates, etc. Automatically reading text in natural scene images is an important task in machine learning and gains increasing attention due to a variety of applications. For example, accessing text in images can help the visually impaired understand the surrounding environment. To enable autonomous driving, one must accurately detect and recognize every road sign. Indexing text in images would enable image search and retrieval from billions of consumer photos on the internet.
Applications: Indexing of multimedia archives, recognizing signs in driver assisted systems, providing scene information to visually impaired people, identifying vehicles by reading their license plates.
Scope: No scope decided yet.
Tools: OpenCV, Tesseract, PaddleOCR

Models

Process flow

Step 1: Collect Images

Fetch from database, scrap from the internet or use public datasets. Setup the database connection and fetch the data into python environment.

Step 2: Data Preparation

Explore the data, validate it and create preprocessing strategy. Clean the data and make it ready for processing.

Step 3: Model Building

Apply different kinds of detection, recognition and single-shot models on the images. Track the progress and experiments. Validate the final set of models and select/assemble the final model.

Step 4: UAT Testing

Wrap the model inference engine in API for client testing

Step 5: Deployment

Deploy the model on cloud or edge as per the requirement

Step 6: Documentation

Prepare the documentation and transfer all assets to the client

Use Cases

Scene Text Detection with EAST Tesseract

Detect the text in images and videos using EAST model. Read the characters using Tesseract. Check out this notion.