Building high-perf image processing pipeline to create vernacular catalogs

Building infra to create personalized vernacular image content to support multiple languages for billions of users.

Preview: Catalog for Hindi and English created on run-time.

Why an async pipeline for processing images?

Reading, manipulating, and saving images are high compute tasks and take time. Doing these operations on the main app-server may create a race-condition for other critical APIs. For a holistic experience, it’s better to execute this task on separate workers pool.

A good rule of thumb is to avoid api requests which run longer than 300ms. ~ Experience

The major components in processing pipelines are:

Setting high throughput message broker

Broker selection depends on the nature of the data. For payment-related transaction data, it’s advised to go with a persistent distributed queue like Kafka. Whereas for short-lived jobs, when the scale is preferred than consistency, an in-memory broker like Redis is a strong candidate. Since persistence is not the main goal of this data store, disabling Redis snapshots add quite a lot to the performance.

Redis Push and Pop operation on Google’s n1-standard-2.

Workers pool

The key concern to be addressed while selecting workers framework is:

  1. Able to handle and store failed jobs to be able to triage.
  2. Able to prioritize jobs based on the message.

Monitoring and Dashboard

RQ-dashboard provides a necessary basic view of queues with pending and failed tasks. It’s a lightweight, Flask-based web front-end to monitor your RQ queues, jobs, and workers in realtime.

Dashboard showing the state of workers and job

Implementation with Python and Redis

Architecture diagram in GCP.
{
"id": 1,
"style": {
"fill_color": "white",
"stroke_color": "black",
"x": 512,
"y": 900
},
"title": {
"hindi": "जोकर",
"english": "Joker"
},
"fonts": {
"hindi": "NotoSans-Bold.ttf",
"english": "NotoSans-Bold.ttf"
}
}
Image’s title showing Hindi and English script based on the above mentioned config.

Sample Project on Docker

The sample project is available on https://github.com/arinkverma/vernacular-image. It’s free to download and explore.

Other use-cases for Image processing pipeline

  1. Adaptive resolution: Adjusting image size and resolution for a better experience across screen size.
  2. Annotations: Adding badges, icons over the image to grab attention.
  3. Creative Filter: Treating image to blend and make the image visually appealing
Example of image processing in Native Ads and OTT thumbnail

Further Reading

  1. Redis’s Push/Pop operation: https://redis.io/commands/rpush
  2. RQ: https://github.com/Parallels/rq-dashboard
  3. RQ-Dashboard: https://github.com/Parallels/rq-dashboard
  4. Wand.py, ctypes-based simple ImageMagick binding for Python.

Let’s talk about scalable solutions, arts, and aspirations. co-Founded GreedyGame | IIT Ropar. Found at www.arinkverma.in

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store