Real time multiple object localization remains a grand debate in the field of digital image processing since many years. With the invent of Deep Learning and convolutional neural networks, the efforts have yielded quite promising results and the ability of well trained models detecting many classes of objects very accurately is in our hands now.
In this post I intend to present a model famously known as Yolo which stands for ‘You Only Look Once’, proposed by Joseph Redmo et al, that has shown great strides towards very fast multiple localizations of objects and its implementation using Keras, which is a high level Deep Learning library.
Let us first look differentiating among the terms classification, localization and detection. We hear these terms often in the image processing world and these are distinctive to each other in their applications.
Classification — Refers to identifying if a given object is present inside an image or not. Common example: Cat or no-cat. Localization — Refers to not only identifying is a given object is present inside an image, but also distinguishing the object’s location using a bounding box. Detection — Simply refers to multiple localizations in a single image.
Yolo is addressing the detection of objects in images and with the publication of Yolo V2 paper, this technique was quickly popularized it the field. Let’s look at the main steps in the Yolo V2 algorithm. These can be pointed out as below;
Divide the image using a grid (eg: 19x19) Dividing the image into a grid of smaller images increases the possibilities of object detection inside each cell easier.
Perform image classification and Localization on each grid cell A vector for each cell representing the probability of an object detected, the dimensions of the bounding box and class of the detected image are given as the output at this step.
Perform thresholding to remove multiple detected instances Thresholding picks the cells with the highest probabilities so that the more correcet bounding boxes are selected
Perform Non-max suppression to refine the boxes more The technique of non-max suppression offers a convenient way to refine the results more using a calculation know as Intersection of Union
Additional point — Anchor boxes are used to detect several objects in one grid cell. This is a specialty in the Yolo V2 algorithm compared to the others.
The first implementation of Yolo was presented using a model in C known as Darknet by Joseph Redmon et al and over the evolution of the method, implementation with currently more popular ML libraries such as Tensorflow and Keras were also built.
My Github repository here presents a quick implementation of this algorithm using Keras. Using the code anyone can test with their own images and dig down into its workings. All the details regarding its installation and execution can be found in the repo.
Feel free use the code in your applications and share to spread the knowledge!