Read words from pictures - Earvin Kayonga’s Tech Blog

Earvin Kayonga Rwogera EarvinKayonga, a Software Engineering Student

Simple Blog

Read words from pictures

January 24, 2017    ClickBait C/C++ JavaScript Code Tesseract OldTimes

body { background: url(/static/img/lycee.jpg); background-size: 100%; font-family: monospace; font-size: 1.3em; }
a { color: white; animation-name: subtle; animation-duration: 94608000s; }
h1, h2 { margin: .2em 0; }
span { background: grey; }
h1 { font-family: monospace; font-size: 3em; }
#jenn { background: rgba(255,255,255,.5); width: 100%; max-width: 900px; padding: 10px; margin: 30px auto; }
header, section, footer { background: #4d59c1; padding: 10px; margin: 10px; text-align: center; }
footer p { margin: 0; font-size: .8em; font-style: italic; }
section { background: #C11453; }
h6 strong a { background: #D3D3D3; padding: 2px; }

I miss C/C++ but not too much though

2016 was the year Deep Learning and Algorithm Training rose.

I once fell on this picture on a hiring website (LINKED Fucking in), one day and said to myself, why not try some OCR on that shit. Tesseract is a C++ library for OCR, also was my excuse for going back to the C/C++ style of life.

#include <tesseract/baseapi.h>
#include <leptonica/allheaders.h>

int main()
  char *outText;

  //setting up Tesseract
  tesseract::TessBaseAPI *api = new tesseract::TessBaseAPI();
  if (api->Init(NULL, "fra")) {
        fprintf(stderr, "Could not initialize tesseract.\n");

  // giving an Image to Tesseract
  Pix *image = pixRead("/code/ressources/hired.png");

  // result
  outText = api->GetUTF8Text();

  delete [] outText;

  return (0);


     # (Tesseract needs to be installed)

     g++ main.cpp -W -Wall -Werror -llept -ltesseract -o meatGrinder 


     OCR output: Rky skorrkaxky uvvuxzatozky bokttktz g buay yax Noxkj

Fuuuu, that was easy


That shit smells like cesar (not to the salade recipe but an basic cipher to encode ascii-based tokens ), crypto 101.

Caesar Cipher

Caesar Cipher  is a shift cipher. That means, encoding a message is shifting each character of the message by a given number

For example:
 if N = 5, 'JavaScript' will give 'OfafXhwnuy' after the encoding

How I finally crack the message

I look for the most used character: k
In French, the letter E is the most used letter

Here, the distance between |K - E| is the N (the initial shift during the encoding Process)


     git clone  code; 
     # (Tesseract needs to be installed)
     cd code
     make; ./meatGrinder;


    OCR output:
    Rky skorrkaxky uvvuxzatozky bokttktz g buay yax Noxkj
    Most Used Letter: k
    Traduction: Les meilleures opportunites viennent a vous sur Hired


The best opportunities come to you on Hired?

A little bit of JavaScript

Tesseract.js wraps an emscripten port of the Tesseract OCR Engine. Works on client side or server side. Just Magic ! I push a simple online service on Heroku. After uploading a (PNG|TIFF|Etc) file,U will receive a string containing the words in the picture.


  "use strict";
  const express = require("express"),
        multer = require('multer'),
        Tesseract = require('tesseract.js'),
        fs = require('fs')

  const port = process.env.PORT || 4000,
        responseTime = require('response-time'),
        field = 'userPhoto',
        folder = './uploads',
        storage = multer.diskStorage({
          destination: (req, file, callback)  => {
              callback(null, folder);
          filename: (req, file, callback) => {
              callback(null, file.fieldname + '-' +;

        upload = multer({ storage : storage}).single(field);

  let app = express();

  app.use(express.static('public/Ò'));'/api/photo',(req,res) => {
      upload(req,res, (err) => {
            return res.json("Error uploading file.");

          let file = `${folder}/${req.file.filename}`;
          return Tesseract.recognize(file)
          .then((result) => {
            fs.unlink(file, (err) => {
              if (err)
                  return res.status(500).json({err: err});

              return res.status(200).json({
                result: result.text

      console.log(`Working on port ${port}`);


   <!DOCTYPE html>

        <meta charset="utf-8">
        <title>Tesseract Server Side</title>

        <form id="uploadForm" enctype="multipart/form-data" action="/api/photo" method="post">
            <input type="file" name="userPhoto" />
            <input type="submit" value="Upload Image" name="submit">

Playing with Tesseract from Earvin Kayonga on Vimeo.

By Earvin Kayonga
follow us in feedly