Sunday, August 30, 2009

Activity 14: Pattern Recognition

With today's technology, cameras with smile recognition and laptops having face recognition security are not just devices seen in a science fiction movie but are available to ordinary peoples' use. But the big question is, how was a camera able to detect that a person is smiling? In the same way that how was a laptop able to recognize the owner's face?

In this activity, we studied pattern recognition. In image processing, features are defined as quantifiable properties such as color, shape, size etc. A pattern is a set of features. A class, on the other hand, is a set of patterns that share a common property. The aim in pattern recognition is to decide if a given feature belongs to one of several classes [1].

To understand the concept further, let's consider an image of a plant with its flower and leaves. Our task is to identify a leaf from a flower. The leaf and flower are both classes that are composed of patterns that make these classes unique from one another. For this example, we may use color, size, eccentricity and even shape as features to identify a flower from a leaf. After which we are now ready to classify the classes. We have to know to if a class belongs to a group of another class. To be able to do such decisions, classifiers are needed.

Classifiers attempt to find decision boundaries that separate the classes. Depending on the features employed the decision boundaries may be a plane, a convex or concave surface, or arbitrary closed regions in feature space [1].

For this activity, the classifier used is the minimum distance classification. With this type of classifier class membership is calculated by computing for the distance d expressed in the following equation.
where m is the mean pattern for each class. A class x belongs to class j with mean m if the calculated distance d is maximum.

In this activity, four different objects were classified: long leaf, rectangular leaf, flower and 25 centavo coin.
Each of these groups of objects were cropped from the image shown above. For each set of objects, 5 were used as training while the other five were used as test objects. The features used are: quotient of the area and the squared of the perimeter, and the quotient of the perimeter and the width of the object(length of in the y-direction). As can be observed the features used are both unit-less. This is because a much as possible, it is desired that the feature used be scale-invariant. This means that the setting and other condition (height of the camera from the object) won't affect classification.

The graph above shows the plot of the classes extracted from the training set. As can be noticed the class of the flower and 25 cents are far from the rest of the classes. However, it can be observed that the class of the long leaf and the rectangular leaf are near from one another.

After applying the minimum distance classification in order to classify to what class the remaining 20 test objects (five for each kind of four objects) belongs. The following results were obtained.
The tables above shows the calculated distances for each of the test objects. Class membership is determined by the largeness of the calculated value. In the first table, the five test objects are from the long leaf class. As can be observed, the calculated distance was maximum at the long leaf column for all test objects. This means that the test objects were correctly classified as long leaves. This is the same for the remaining three graphs: rectangular leaf, flower and 25 cents test objects.

For this activity I give myself a grade of 10 for I was able to use features that resulted to 100% accuracy of the classification.

I would like to thank Irene Crisologo, Jica Monsanto and Thirdy Buno for useful discussions.


References:
[1] Activity 12: Pattern Recognition Manual

Activity 13: Correcting Geometric Distortion

Have you ever observed bent lines that should actually look straight in images? Image distortions are introduced by the spherical lens used in capturing the image. There are two common kind of image distortion: barrel and pincushion. In barrel distortion, the image seems bloated in the middle and pinches at the sides where as in pincushion distortion, the image seems pinched in the middle and expanded in the boundaries [1]. Figure 1 shows an image of the Petronas Twin tower that exhibits barrel distortion.

Figure 1. http://www.photos-of-the-year.com/barrel-distortion/1.jpg

In this activity, we corrected distortions found in images. To be able to do the correction, the pixel locations and grayscale values of the image would be altered.

The first step in generating the undistorted image is to find the ideal pixel locations of the grids.

Let g(x,y) be the function describing the distorted image while the undistorted image is represented by the function f(x,y) . The transformation from the distorted to the undistorted image can be expressed by the equations,
From this equations, the distorted pixel locations is expressed by,
To be able to calculate the values of all the c's, four points are needed. These points should be near one another (corner points). Thus a set of c's for each set of corner points should be calculated.

After looking for ideal pixel locations, gray level interpolation techniques should be applied. For this purpose, we used the bilinear interpolation. In this method, the graylevel v at an arbitrary location (x, y) in the ditorted image can be found using the equation,

The four equations needed to compute for the four constants (a, b, c, d) maybe formed using the four nearest neighbor pixels encompassing the pixel point (x, y).

For this activity, we use the distorted image of a grid,

image taken from: http://images.trustedreviews.com/images/article/inline/7690-FujiJ10barrel.jpg

The ideal grid point of this image was calculated using one of the grids that seems to be undistorted. Using the dimensions of this grid, the ideal grid points were recovered.

Applying the algorithm discussed above, the undistorted image was recovered.


The undistorted image still seems a little bit distorted. This maybe due to the fact that the corners(four neighbor points) used for the computation are relatively far from one another. In this case, the corner points of one hollow block compose a set.

For this activity I give myself a grade of 9.3. This is because I was able to recover the undistorted image. However it seems like the recovered undistorted image still have a little distortion.

I thank Jaya Combinido, Thirdy Buno, Miguel Sison and Jica Monsanto for useful discussions.

References:
[1] Activity 13: Correcting Geometric Distortion Manual


Thursday, August 6, 2009

Activity 12:Color Image Segmentation

Image segmentation is used in various applications such as in medical imaging, face recognition, fingerprint recognition and machine vision [1]. It can be thought as the process of locating an object within an image.

There are different ways in implimenting image segmentation, one of which is by thresholding grayscale images. However, this method is sometimes not plausible because there are cases in which the object to be segmented has the same grayscale value as that of the background. In cases such as this, Color image segmentation can be implimented.

Basically in color image segmentation, the color of the object of interest is used in order for it to be segmented in the image. However, the color space is not represented by the RGB values. This is because 3D objects have shading variations that cannot be represented alone by RGB values. Thus for this purpose we use the normalized chromaticity coordinates. That is per pixel,

I = R + G + B;
r = R/I ; g = G/I ; b = B/I

We note that r + g + b = 1, thus b can be represented by,

b = 1 - r - g;

This only means that the chromaticity coordinates can be represented by only two coordinates, r and g. The normalized chromaticity space is therefore represented by the following graph.
Normalized Chromaticity space. The x-axis is r while the y-axis is g.

There are two methods in image segmentation: parametric and non-parametric. In the parametric method, tagging pixels that are similar to those found in the region of interest is done by calculating the probability that it is found in the ROI. This is done by assuming a gaussian distribution independently along the normalized chromaticity coordinates of red and green, the probability for r is therefore given by the equation.
where µr and σr are the mean and standard deviation of r, respectively. An equation of the similar form is applied to calculate the p(q). The joint probability of obtaining r and q is the product of p(q) and p(r).

While parametric segmentation uses gaussian probability, non-parametric segmentation uses histogram backprojection. In this method, the histogram of the ROI is calculated and this is used as a look-up table in backprojection. That is a pixel location is given a value equal to its corresponding value in chromaticity space.

To test this two methods, the image below was used.

http://www.babygadget.net/pics/kitty-crayons.jpg

A patch of this image was cropped and was used as the region of interest.

The histogram of this patch was calculated using the code given in the exercise protocol and is presented in the image below.

The obtained histogram when compared to the plot of the normalized chromaticity space indeed represents a color in the shade of the green region.

Using the green patch, the green kitten in the original image is segmented first by using the parametric distribution estimation.

As can be observed from the image above, the method was able to locate and segment the object having the same color as that of the patch. Also, notice that the method was able to detect (some amount) portions of the light green kitten. This is because the two kittens detected (green and light green) are of the same color, however differ in brightness.

Using the same patch and its calculated histogram, image segmentation was implemented by using the non-parametric probability distribution and histogram backprojection. The result of applying this method is give by the image below.

Observe that this method was also able to detect both of the kittens just like of the previous one. However, notice that the from the image above it looks like the method was able to locate more portions of the light green kitten than the green one despite the fact that the patch was obtained from the green kitten.

For the second time the same process was applied using a brown patch to segment the brown kitten shown in the reference image.
The images above shows the patch used and the histogram of the patch. As can be observed the brown patch is composed of the colors, yellow orange and red.

After applying parametric and non-parametric image segmentation to the image,


Parametric Distribution EstimationNon-parametric Distrbution Estimation

As can be observed in the obtained results for both of the method, 5 colors were spotted. These are dark orange, orange, dark yellow, yellow and brown. Orange and yellow color of varying brightness (light and dark) were spotted because these colors makes up brown. This is clearly shown in the calculated histogram of the brown patch.

Comparing the two methods, the parametric distribution estimation resulted to more accurate results. I think this is because the non-parametric method is highly dependent on the patch chosen. Remember that in the non-parametric distribution, the histogram of the patch is used for backprojection. On the other hand, in the parametric distribution probability of the occurence of the normalized chromaticity red and green are the ones calculated. A wider range of values is present in the parametric approach as compared to the non-parametric.

For this activity, I will give myself a grade of 10 for I was able to do all the required tasks while enjoying the activity.

I thank Thirdy Buno and Irene Crisologo for discussing with me informations regarding this activity.

References:
[1]http://en.wikipedia.org/wiki/Segmentation_(image_processing)
[2] Activity 12: Color Image Segmentation Manual

Activity 10: Preprocessing Text

For this activity, we were asked to extract handwritten text from an imaged document with lines. The given image is shown below.

Observe that the image is rotated (the lines are not horizontal and is tilted by some angle). To be able to tilt the image such that horizontal lines are horizontal, the function mogrify was used. This resulted to the image below.

After rotating the image, a small portion containing of it was cropped.

Since we want to extract the text, what should be done first is to remove the lines in the image. To be able to do that the Fourier transform of the cropped image was calculated.

Equipped with the knowledge of the previous activities (activity 6 and 7), a filter was created using Gimp to block the frequencies of the lines.

We know from Fourier optics that the frequencies of horizontal lines can be found in a vertical region in the Fourier space. Knowing this, a filter as shown above was created. Notice that the center of the Fourier transform was not blocked. This is because this region contains large amount of information, not only of the lines but the texts as well.

Applying the the created filter and inverting the image resulted to..

The image was then binarized using thresholding,

Notice that the image contains noise, for it to be "clean" morphological operations must be applied. This is done by applying erosion and dilation to the image.


Then afterwards it was thinned to be a pixel thick. This was done by applying the thin function in Scilab.


Lastly, the occurrence of the the word DESCRIPTION was located using the correlation. This was done by creating a binarized image of the same size of the text with the word correlation as the object.The word DESCRIPTION was created with font Arial and a fontsize of 11.


In the image above white signifies high correlation. As can be observe, the locations of the word description was located however locations of other text as well.

For this activity, I'm giving myself a grade of 8. This is because I know I could have done more image processing to obtain better results. However, I was not able to do so because of the time constraint.

I thank Irene and all my classmates who discussed with me this activity.

Tuesday, August 4, 2009

Activity 11: Color Camera Processing

Almost all cameras today including built-in ones found in regular cellphones have white balancing options. Common white balancing options include cloudy, daylight, fluorescent, tungsten and AWB (Auto White Balance). Before white balancing was explained to me by a senior in the laboratory I belong to, I thought that it was a camera option that was there for color enhancement depending on the scene to be captured. Somehow I was correct, but I actually did not know its deeper definition.

An image is composed of three channels, namely: red, green and blue. These three channels can be described by the following equations.

whereIn these equations, S(λ) represents the spectral power distribution of the incident light, p(λ) is the surface reflectance while n(λ) for each channel are the spectral sensitivity of the camera. Observe that in the RGB equations, a factor K is present. This factor is termed as the white balancing constant that is equal to the inverse of the camera output when shown a white object.

White balancing can be thought as the process of finding and applying the right white balancing constant such that a white patch in an image is visually seen as white.

There are two known methods in achieving automatic white balance: the White patch Algorithm and the Gray World Algorithm. In the White patch algorithm, given an unbalanced image, the RGB values of a known white in the image is used as the coeffient K. On the other hand, the Gray world algorithm assumes that the average color of the world is gray. Thus in this algorithm, the balancing constant is equal to the average R, G and B multiplied by some constant since gray is a family of white.

For this exercise, the two methods: White Patch algorithm and Gray World Algorithm were applied to images of varying white balancing conditions.

The resulting images when the White patch algorithm was implemented. Each column in the images above denotes a white balancing condition. The images in the first column has an auto white balance setting, second column has a cloudy setting, third column has a daylight setting and finally the last column was taken with a fluorescent white balancing setting.

When the gray world algorithm was applied, the resulting images are as follows.


The order of the images is the same as descried in the previous one. It can be obseved that for the gray world algorithm, the result shows that the rendered images are generally whitish. Also notice that in the images rendered using gray world, part of the image is somewhat saturated. This maybe due to the possibility that the cropped white object in the image is saturated. On the other hand, it can be observed that for both of the algorithm, images with white objects appearing as white were able to be reconstructed.

Images of varying brightness were also captured with a white balancing setting of daylight.

The image rendered using the White Balancing Patch algorithm,
On the other hand, that of the gray world algorithm,

Again, it can be observed that the rendered image using the WPA is better than that rendered by the GWA.

For this activity I give myself a grade of 8. This is because The patch that I used for the GWA is saturated.