Well, I decided to workout myself on my question to solve the above problem. What I wanted is to implement a simple OCR using KNearest or SVM features in OpenCV. And below is what I did and how. (it is just for learning how to use KNearest for simple OCR purposes).
1) My first question was about letter_recognition.data
file that comes with OpenCV samples. I wanted to know what is inside that file.
It contains a letter, along with 16 features of that letter.
And this SOF
helped me to find it. These 16 features are explained in the paper Letter Recognition Using Holland-Style Adaptive Classifiers
.
(Although I didn’t understand some of the features at the end)
2) Since I knew, without understanding all those features, it is difficult to do that method. I tried some other papers, but all were a little difficult for a beginner.
So I just decided to take all the pixel values as my features. (I was not worried about accuracy or performance, I just wanted it to work, at least with the least accuracy)
I took the below image for my training data:
(I know the amount of training data is less. But, since all letters are of the same font and size, I decided to try on this).
To prepare the data for training, I made a small code in OpenCV. It does the following things:
- It loads the image.
- Selects the digits (obviously by contour finding and applying constraints on area and height of letters to avoid false detections).
- Draws the bounding rectangle around one letter and wait for
key press manually
. This time we press the digit key ourselves corresponding to the letter in the box. - Once the corresponding digit key is pressed, it resizes this box to 10×10 and saves all 100 pixel values in an array (here, samples) and corresponding manually entered digit in another array(here, responses).
- Then save both the arrays in separate
.txt
files.
At the end of the manual classification of digits, all the digits in the training data (train.png
) are labeled manually by ourselves, image will look like below:
Below is the code I used for the above purpose (of course, not so clean):
import sys
import numpy as np
import cv2
im = cv2.imread('pitrain.png')
im3 = im.copy()
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
blur = cv2.GaussianBlur(gray,(5,5),0)
thresh = cv2.adaptiveThreshold(blur,255,1,1,11,2)
################# Now finding Contours ###################
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
samples = np.empty((0,100))
responses = []
keys = [i for i in range(48,58)]
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,0,255),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
cv2.imshow('norm',im)
key = cv2.waitKey(0)
if key == 27: # (escape to quit)
sys.exit()
elif key in keys:
responses.append(int(chr(key)))
sample = roismall.reshape((1,100))
samples = np.append(samples,sample,0)
responses = np.array(responses,np.float32)
responses = responses.reshape((responses.size,1))
print "training complete"
np.savetxt('generalsamples.data',samples)
np.savetxt('generalresponses.data',responses)
Now we enter in to training and testing part.
For the testing part, I used the below image, which has the same type of letters I used for the training phase.
For training we do as follows:
- Load the
.txt
files we already saved earlier - create an instance of the classifier we are using (it is KNearest in this case)
- Then we use KNearest.train function to train the data
For testing purposes, we do as follows:
- We load the image used for testing
- process the image as earlier and extract each digit using contour methods
- Draw a bounding box for it, then resize it to 10×10, and store its pixel values in an array as done earlier.
- Then we use KNearest.find_nearest() function to find the nearest item to the one we gave. ( If lucky, it recognizes the correct digit.)
I included last two steps (training and testing) in single code below:
import cv2
import numpy as np
####### training part ###############
samples = np.loadtxt('generalsamples.data',np.float32)
responses = np.loadtxt('generalresponses.data',np.float32)
responses = responses.reshape((responses.size,1))
model = cv2.KNearest()
model.train(samples,responses)
############################# testing part #########################
im = cv2.imread('pi.png')
out = np.zeros(im.shape,np.uint8)
gray = cv2.cvtColor(im,cv2.COLOR_BGR2GRAY)
thresh = cv2.adaptiveThreshold(gray,255,1,1,11,2)
contours,hierarchy = cv2.findContours(thresh,cv2.RETR_LIST,cv2.CHAIN_APPROX_SIMPLE)
for cnt in contours:
if cv2.contourArea(cnt)>50:
[x,y,w,h] = cv2.boundingRect(cnt)
if h>28:
cv2.rectangle(im,(x,y),(x+w,y+h),(0,255,0),2)
roi = thresh[y:y+h,x:x+w]
roismall = cv2.resize(roi,(10,10))
roismall = roismall.reshape((1,100))
roismall = np.float32(roismall)
retval, results, neigh_resp, dists = model.find_nearest(roismall, k = 1)
string = str(int((results[0][0])))
cv2.putText(out,string,(x,y+h),0,1,(0,255,0))
cv2.imshow('im',im)
cv2.imshow('out',out)
cv2.waitKey(0)
And it worked, below is the result I got:
Here it worked with 100% accuracy. I assume this is because all the digits are of the same kind and the same size.
But anyway, this is a good start to go for beginners (I hope so).