Abstract:
In this thesis project, we explore the problem of recognizing the products in retail scenes. We proposed a method where we localize the products by detecting their separating boundaries and recognize the product images using supervised learning methods. We experimented with various image descriptors like Histogram of Oriented Gra dients, Local Binary Pattern Histograms, and response features to a Root Filter Set. Concurrently we tried various classifiers like Quadratic Classifier, AdaBoost with single level decision trees and Support Vector Machines. We found out that using Histograms of Oriented Gradients as descriptors and classifying them with a Support Vector Ma chine enabled us to achieve the highest accuracy with a query time constraint. We achieved 88.5% accuracy at classifying a set of sparsely collected descriptors from boundary and non-boundary regions. Using the trained SVM classifier we calculated the probability of the existence of a product boundary for all possible locations with a sliding window. We localized the boundaries using probability peak detection and achieved a product boundary detection precision and recall over 75%. The same method is used to detect shelf boundaries with 97% precision. We described the product images with sparse and dense SIFT descriptors and achieved 96% accuracy using a bag of words approach on a 20 class subset of products. We experimented with various techniques and verified that using dense SIFT descrip tors and random codebook generation instead of K-means based clustering yields better performance at recognizing retail products.