Scale Invariant Feature Transform Algorithm

Topic > Scale Invariant Feature Transform Algorithm

SIFT (Scale Invariant Feature Transform) is an image descriptor for image-based matching and recognition developed by David Lowe. Like other descriptors, this descriptor is used for a large number of purposes in computer vision topics related to point matching for object recognition. The SIFT descriptor is invariant to geometric transformations such as translation, rotation, and scaling in the image domain, and is robust to moderately robust to perspective transformations and variations in degrees of illumination. It has been experimentally proven to be useful and effective in practice for object recognition and image matching in real-world conditions. Say no to plagiarism. Get a tailor-made essay on "Why Violent Video Games Shouldn't Be Banned"? Get an original essaySIFT included a method for detecting points of interest from a gray-level image, in which the statistics of the local gradient directions of the image intensities were accumulated in order to provide a summary description of the image structure local in a local neighborhood around each point of interest, where the descriptor should be used to match corresponding points of interest between different images. Subsequently, the SIFT descriptor was extended from gray-level images to color images. The SIFT algorithm uses the Difference of Gaussian (DoG), which is a Laplacian approximation of the Gaussian (LoG), which is a bit expensive. Gaussian difference is obtained as the Gaussian blur difference of an image with two different σ, which serves as a scaling parameter. Once the DoG is found, the images are searched for local extrema in scale and space. For example, a pixel in an image is compared with its 8 neighbors, as well as with 9 pixels in the next scale and 9 pixels in the previous scale as well. In case it is a local extrema, it is a potential key point. This process is performed over several octaves of the image in the Gaussian pyramid as shown in 2.12. An image pyramid is a series of images, each image is the result of a downsampling (rescaling by a certain factor) from the previous element. Then we move on to the next step, which is locating the key point. Once potential keypoint locations are found, they need to be refined to get more accurate results than the location of the extremes, where a threshold value exists, and if the intensity at these extremes is less than this threshold value, then it is rejected. Now we need to take the orientation into consideration and, for this, assign an orientation to each key point to obtain rotation invariance of the image. A neighborhood will be taken around the keypoint location based on the scale and the magnitude and direction of the gradient will be calculated in this particular region. To find the dominant orientation, the peaks in this orientation histogram are detected. In case there is more than one dominant orientation around the point of interest, more peaks are accepted if the height of the secondary peaks is greater than 80% of the height of the highest peak and in this case each peak is used to compute a new image descriptor for estimating the corresponding orientation. Please note: this is just an example. Get a custom paper from our expert writers now. Get a Custom Essay Now that the key point descriptor has been created, the neighborhood around the key point is taken. It is divided into subblocks and an orientation histogram is created for each subblock. So i.