Building an Image Text Extraction Web Application Using Flask and Tesseract

Optical Character Recognition (OCR) is a powerful tool that allows you to extract text from images. It has a wide range of applications, from digitizing printed documents to reading text from signs and billboards in images. In this blog, we'll walk through the creation of a simple web application that extracts text from an uploaded image using Flask and Tesseract.

Introduction

The goal of this project is to develop a web application where users can upload an image, and the application will extract and display the text found within the image. We will be using Flask as the web framework, Tesseract for the OCR (Optical Character Recognition) process, and OpenCV for image processing. This project will guide you through setting up the application, processing the image, and extracting text.

What You Will Learn

By the end of this tutorial, you will have learned:

How to set up a Flask web application.
How to integrate Tesseract OCR with Python using the pytesseract library.
How to process images using OpenCV.
How to build a user-friendly web interface for uploading images and displaying extracted text.

Prerequisites

Before you begin, make sure you have the following installed:

Flask (pip install flask)
OpenCV (pip install opencv-python)
pytesseract (pip install pytesseract)
Tesseract-OCR installed on your machine (instructions here)

Understanding the Provided Code

Let's go through the provided code in detail, explaining each part and its role in the overall application.

1. Importing Required Libraries

from flask import Flask, render_template, request
import cv2
import pytesseract
import numpy as np
import base64

Here, we import the essential libraries:

flask: For creating the web application and handling HTTP requests.
cv2: OpenCV for image processing.
pytesseract: A Python wrapper for Google’s Tesseract-OCR Engine.
numpy: For handling numerical operations, particularly image data in array form.
base64: For encoding images to be rendered in HTML.

2. Tesseract OCR Function

def read_text(image) -> str:
    # Load the image
    img_bytes = image.read()
    img_arr = np.frombuffer(img_bytes, np.uint8)
    img = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)
    img_str = cv2.imencode('.jpg', img)[1]
    img_str = base64.b64encode(img_str).decode('utf-8')
    
    # Convert the image to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply thresholding to remove noise
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    
    # Apply OCR using Tesseract
    pytesseract.pytesseract.tesseract_cmd = r'Tesseract-OCR/tesseract'
    text = pytesseract.image_to_string(gray)

    # Return the extracted text and the image string for rendering
    return text, img_str

This function, read_text, processes the image and extracts the text using Tesseract OCR:

Image Loading: The image is read from the uploaded file and converted into a NumPy array, which is then decoded into an image format using OpenCV.
Grayscale Conversion: The image is converted to grayscale to improve the OCR accuracy, as Tesseract works better on high-contrast images.
Thresholding: The grayscale image is thresholded to remove noise and create a binary image, making the text more distinguishable from the background.
OCR Application: Tesseract OCR is applied to the processed image to extract text.
Image Encoding: The processed image is encoded to a base64 string so it can be rendered on the web page.
Return: The function returns the extracted text and the encoded image string.

3. Setting Up Flask Application

app = Flask(__name__)

This line initializes the Flask application, which will manage routing, rendering templates, and handling user requests.

4. Defining Flask Routes

a) Home Route

@app.route('/')
def home():
    return render_template('home.html')

This route renders the home.html template when the user accesses the root URL. The template provides the form where users can upload their image files.

b) Prediction Route

@app.route('/predict', methods=['GET', 'POST'])
def get_predict():
    if request.method == 'POST':
        if 'image' not in request.files:
            return render_template('predict.html', text='No image file selected', img_str=None)
        file = request.files['image']
        if file.filename == '':
            return render_template('predict.html', text='No image file selected', img_str=None)
        
        # Get text from the image
        image_text, img_str = read_text(file)
        return render_template('predict.html', text=image_text, img_str=img_str)
    else:
        return render_template('predict.html', text='GET', img_str=None)

This route handles both GET and POST requests:

POST Request:
- Checks if the image file is uploaded.
- If an image is provided, it extracts the text using the read_text function.
- The extracted text and image are then passed to the predict.html template for display.
GET Request: Simply renders the predict.html template with a placeholder text.

5. Running the Application

if __name__ == '__main__':
    app.run(debug=True)

This block ensures that the Flask application runs when the script is executed directly. Setting debug=True allows for easy debugging during development.

The User Interface

The application includes two HTML templates: home.html and predict.html.

a) Home Page (home.html)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/css/bootstrap.min.css">
    <script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/js/bootstrap.min.js"></script>
    <link rel="stylesheet" href="{{ url_for('static', filename='css/styles.css') }}">
    <title>Image Text Extraction App</title>
</head>
<body>
    <nav class="nav navbar p-2">
        <a class="nav navbar-brand" href="/">Image_Text</a>
    </nav>
    <div class="container mx-auto text-center">
        <h1>Text Extraction from image</h1>
        <form action="{{ url_for('get_predict') }}" method="POST" enctype="multipart/form-data">
            <div class="mb-3">
                <label for="image" class="form-label">Select an image:</label>
                <input type="file" class="form-control" id="image" name="image">
            </div>
            <button type="submit" class="btn btn-secondary">Predict</button>
        </form>
    </div>
</body>
</html>

This template provides the main interface where users can upload an image for text extraction:

Navbar: A simple navigation bar with the app's title.
Form: Allows users to upload an image file and submit it for processing.
Styling: Bootstrap is used for responsive design and styling.

b) Prediction Page (predict.html)

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/css/bootstrap.min.css">
    <script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/js/bootstrap.min.js"></script>
    <link rel="stylesheet" href="{{ url_for('static', filename='css/styles.css') }}">
    <title>Document</title>
</head>
<body>
    <nav class="nav navbar p-2">
        <a class="nav navbar-brand" href="/">Image_Text</a>
    </nav>
    <div class="container py-5">
        <div class="row">
          <div class="col-md-6">
            {% if img_str %}
                <h2>Image:</h2>
                <img src="data:image/jpeg;base64,{{ img_str }}" alt="Image">
            {% endif %}
          </div>
          <div class="col-md-6" style="background-color: lightcyan;">
            <h2>Text from the image</h2>
            <p>{{ text }}</p>
          </div>
        </div>
      </div>
</body>
</html>

This template displays the results after the image is processed:

Left Column: Shows the uploaded image.
Right Column: Displays the text extracted from the image.
Styling: Uses Bootstrap for layout and styling.

Running the Application

To run the application, save the code and HTML files in the appropriate directory. Ensure your Tesseract-OCR installation is correctly set up and accessible by the pytesseract library. Then, start the Flask server by running:

python ocr_page.py

Once the server is running, open your web browser and navigate to http://127.0.0.1:5000/ to access the web app.

Complete Code

ocr_page.py

from flask import Flask, render_template, request

# tesseract files
import cv2
import pytesseract
import numpy as np
import base64

# tesseract function
def read_text(image) -> str:
    # Load the image
    # read the file contents and convert to NumPy array
    img_bytes = image.read()
    img_arr = np.frombuffer(img_bytes, np.uint8)
    # decode the NumPy array into an image using OpenCV
    img = cv2.imdecode(img_arr, cv2.IMREAD_COLOR)
    img_str = cv2.imencode('.jpg', img)[1]
    img_str = base64.b64encode(img_str).decode('utf-8')
    # Convert the image to grayscale
    gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    # Apply thresholding to remove noise
    gray = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY | cv2.THRESH_OTSU)[1]
    
    # Apply OCR using Tesseract
    pytesseract.pytesseract.tesseract_cmd = r'Tesseract-OCR/tesseract'
    text = pytesseract.image_to_string(gray)

    # Print the extracted text
    return text, img_str

app = Flask(__name__)

@app.route('/')
def home():
    return render_template('home.html')

@app.route('/predict', methods=['GET', 'POST'])
def get_predict():
    if request.method == 'POST':
        if 'image' not in request.files:
            return render_template('predict.html', text='No image file selected', img_str=None)
        file = request.files['image']
        if file.filename == '':
            return render_template('predict.html', text='No image file selected', img_str=None)
        # getting text from the image
        image_text, img_str = read_text(file)
        return render_template('predict.html', text=image_text, img_str=img_str)
    else:
        return render_template('predict.html', text='GET', img_str=None)

if __name__ == '__main__':
    app.run(debug=True)

index.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/css/bootstrap.min.css">
    <script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/js/bootstrap.min.js"></script>
    <link rel="stylesheet" href="{{ url_for('static', filename='css/styles.css') }}">
    <title>Image Text Extraction App</title>
</head>
<body>
    <nav class=" nav navbar p-2">
        <a class=" nav navbar-brand" href="/">Image_Text</a>
    </nav>
    <div class="container mx-auto text-center">
        <h1> Text Extraction from image</h1>
        <form action="{{ url_for('get_predict') }}" method="POST" enctype="multipart/form-data">
            <div class="mb-3">
                <label for="image" class="form-label">Select an image:</label>
                <input type="file" class="form-control" id="image" name="image">
            </div>
            <button type="submit" class="btn btn-secondary">Predict</button>
        </form>
</div>
</body>
</html>

predict.html

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta http-equiv="X-UA-Compatible" content="IE=edge">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <link rel="stylesheet" href="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/css/bootstrap.min.css">
    <script src="https://cdnjs.cloudflare.com/ajax/libs/bootstrap/5.0.2/js/bootstrap.min.js"></script>
    <link rel="stylesheet" href="{{ url_for('static', filename='css/styles.css') }}">
    <title>Document</title>
</head>
<body>
    <nav class=" nav navbar p-2">
        <a class=" nav navbar-brand" href="/">Image_Text</a>
    </nav>
    <div class="container py-5">
        <div class="row">
          <div class="col-md-6">
            {% if img_str %}
                <h2>Image:</h2>
                <img src="data:image/jpeg;base64,{{ img_str }}" alt="Image">
            {% endif %}
          </div>
          <div class="col-md-6" style="background-color: lightcyan;">
            <h2>Text from the image</h2>
            <p>{{ text }}</p>
          </div>
        </div>
      </div>
</body>
</html>

Now you can see we have successfully built a web application that can extract text from images using Flask and Tesseract. This application demonstrates how powerful OCR technology can be when integrated with web frameworks, allowing users to easily upload images and retrieve the embedded text.

Project Demo Video

This project can be extended with additional features, such as support for multiple languages, better image preprocessing techniques, or even deploying the application to a cloud platform for broader accessibility.