Accurate binary image classification

Go To


I'm trying to extract letters from a game board for a project. Currently, I can detect the game board, segment it into the individual squares and extract images of every square.

The input I'm getting is like this (these are individual letters):

enter image description hereenter image description hereenter image description hereenter image description hereenter image description hereenter image description here

At first, I was counting the number of black pixels per image and using that as a way of identifying the different letters, which worked somewhat well for controlled input images. The problem I have, though, is that I can't make this work for images that differ slightly from these.

I have around 5 samples of each letter to work with for training, which should be good enough.

Does anybody know what would be a good algorithm to use for this?

My ideas were (after normalizing the image):

  • Counting the difference between an image and every letter image to see which one produces the least amount of error. This won't work for large datasets, though.
  • Detecting corners and comparing relative locations.
  • ???

Any help would be appreciated!

2012-04-04 20:10
by Blender
Welcome to OCR - NoName 2012-04-04 20:14
Heh, I tried Tessearact on the test images after dilating them a little bit, but it failed miserably (even after setting the segmentation mode to "one word"). OCR seems like an overkill for this specific case, IMO, as the images are really similar in every case - Blender 2012-04-04 20:17
what about scale and rotation invariance - moooeeeep 2012-04-04 20:36
The rotation is negligible and doesn't distort the letters any more than a horizontal compression does. As for scale, I normalize every image to a fixed size - Blender 2012-04-04 20:40


I think this is some sort of Supervised Learning. You need to do some feature extraction on the images and then do your classification on the basis of the feature vector you've computed for each image.

Feature Extraction

On the first sight, that Feature Extraction part looks like a good scenario for Hu-Moments. Just calculate the image moments, then compute cv::HuMoments from these. Then you have a 7 dimensional real valued feature space (one feature vector per image). Alternatively, you could omit this step and use each pixel value as seperate feature. I think the suggestion in this answer goes in this direction, but adds a PCA compression to reduce the dimensionality of the feature space.


As for the classification part, you can use almost any classification algorithm you like. You could use an SVM for each letter (binary yes-no classification), you could use a NaiveBayes (what is the maximal likely letter), or you could use a k-NearestNeighbor (kNN, minimum spatial distance in feature space) approach, e.g. flann.

Especially for distance-based classifiers (e.g. kNN) you should consider a normalization of your feature space (e.g. scale all dimension values to a certain range for euclidean distance, or use things like mahalanobis distance). This is to avoid overrepresenting features with large value differences in the classification process.


Of course you need training data, that is images' feature vectors given the correct letter. And a process, to evaluate your process, e.g. cross validation.

In this case, you might also want to have a look at template matching. In this case you would convolute the candidate image with the available patterns in your training set. High values in the output image indicate a good probability that the pattern is located at that position.

2012-04-04 20:55
by moooeeeep
Thank you very much for your help! I got as far as computing the Hu moments for the individual images, but after that the classification has stumped me with loads of errors. Hopefully I can get it to work within the next day or so and see how well it works - Blender 2012-04-05 07:56
Just got the classifier to work! It's 100% accurate for my training data (duh) but has some trouble with new input. I'm going to train it some more with more accurate samples - Blender 2012-04-05 08:26
@Blender - glad this helped - moooeeeep 2012-04-05 08:49
Just as a status update, I found out that having high-quality training data and images isn't a good idea. My accuracy increased to 100% when I resized my images to (oddly enough) 5px by 5px - Blender 2012-04-06 04:22
@Blender that is a rather odd result. How do you compute your accuracy? Anyway, if it is a study project and you are not able to explain why is it working, it might be a problem for you - Simon Bergot 2012-04-06 07:58
This isn't really a study project, but I am trying to learn about machine learning and image processing in my spare time. I think the computed Hu moments are not distinct enough for the different image classes, which degraded the K-means algorithm's ability to classify input images properly. I am currently converting the images to 15x15 matrices of ones and zeroes (ones representing black) and then flattening them to 255 element lists. This gives me a bit more accuracy for arbitrary input. Thanks so much for your help - Blender 2012-04-07 06:38


This is a recognition problem. I'd personally use a combination of PCA and a machine learning technique (likely SVM). These are fairly large topics so I'm afraid I can't really elaborate too much, but here's the very basic process:

  1. Gather your training images (more than one per letter, but don't go crazy)
  2. Label them (could mean a lot of things, in this case it means group the letters into logical groups -- All A images -> 1, All B images -> 2, etc.)
  3. Train your classifier
    • Run everything through PCA decomposition
    • Project all of your training images into PCA space
    • Run the projected images through an SVM (if it's a one-class classifier, do them one at a time, otherwise do them all at once.)
    • Save off your PCA eigenvector and SVM training data
  4. Run recognition
    • Load in your PCA space
    • Load in your SVM training data
    • For each new image, project it into PCA space and ask your SVM to classify it.
    • If you get an answer (a number) map it back to a letter (1 -> A, 2 -> B, etc).
2012-04-04 20:28
by Chris Eberle
Thank you! I'm reading up on PCA right now. Finally a use for Linear Algebra.. - Blender 2012-04-04 20:40


2012-04-04 21:02
by karlphillip
I read through the second one and I seem to be doing this already (comparing differing pixels and finding the image which minimizes that error). The first one is a bit covert and doesn't explain what happens very well, but thank you for the links! I'll do some research into how the first one works - Blender 2012-04-04 21:09


I had a similar problem few days back. But it was digit recognition. Not for alphabets.

And i implemented a simple OCR for this using kNearestNeighbour in OpenCV.

Below is the link and code :

Simple Digit Recognition OCR in OpenCV-Python

Implement it for alphabets. Hopes it works.

2012-04-05 05:42
by Abid Rahman K
This answer was really helpful when I was actually coding the algorithms. Thank you - Blender 2012-04-05 09:01


Please look at these two answers related to OCR

Scoreboard digit recognition using OpenCV

and here

OCR of low-resolution text from screenshots

2012-04-05 05:38
by Sam


You can try building a model by uploading your training data (~50 images of 1s,2s,3s....9s) to (free to use)

1) Upload your training data here:

2) Then query the API using the following (Python Code):

import requests
import json
import urllib
model_name = "Enter-Your-Model-Name-Here"
url = ""
files = {'uploadfile': urllib.urlopen(url).read()}
url = ""+model_name
r =, files=files)
print json.loads(r.content)

3) the response looks like:

  "message": "Model trained",
  "result": [
      "label": "1",
      "probability": 0.95
      "label": "2",
      "probability": 0.01


      "label": "9",
      "probability": 0.005
2016-12-13 11:33
by sj7


Since your images are coming off a computer screen of a a board game, the variation can't be 'too crazy'. I just got something working for the same type of problem. I normalized my images by cropping right down to the 'core'.

With 5 samples per letter, you might already have complete coverage.

I organized my work by 'stamping' the identifier at the start of the image filename. I then could sort on the filename (=identifier). Windows Explorer allows you to view the directory with Medium Icons turned on. I would get the identifier by a 'fake-rename' action and copy it into the Python program.

Here is some working code that can be revamped for any of these problems.

def getLetter(im):
    area = im.height * im.width
    white_area = np.sum(np.array(im))
    black_area = area - white_area
    black_ratio = black_area / area           # between 0 and 1
    if black_ratio == .740740740740740 or \
       black_ratio == .688034188034188 or \
       black_ratio == .7407407407407407:  
       return 'A'
    if black_ratio == .797979797979798:
       return 'T'
    if black_ratio == .803030303030303:
       return 'I'
    if black_ratio == .5050505050505051 or \
       black_ratio == .5555555555555556:
       return 'H'
    ############ ... etc.

    return '@' # when this comes out you have some more work to do

Note: It is possible that the same identifier (here we are using black_ratio) might point to more than one letter. If it happens, you'll need to take another attribute of the image to discriminate between them.

2019-02-05 02:20
by CopyPasteIt