Feature Detection
A feature detection and matching pipeline that can reliably compare two images even when the camera shifts position, rotates, or the lighting changes.
Project Details
This project implements a feature detection and matching pipeline that can reliably compare two images even when the camera shifts position, rotates, or the lighting changes. Pixel level comparison breaks immediately under those transformations, so this project focuses on a different idea: find distinctive points in each image, describe what those points look like in a compact way, then match those descriptions across images. That single loop is the foundation for panorama stitching, object recognition, structure from motion, and visual SLAM.
Project Details
This project is implemented in Python using NumPy, SciPy, and OpenCV. The core concepts are Harris corner detection, local feature descriptors (Simple and MOPS), and matching strategies that trade off recall and precision depending on how aggressively you reject ambiguous correspondences.
Overview
Feature matching is basically a question of agreement. If two images show the same scene from different viewpoints, which parts of the images are stable enough to line up? The pipeline here has three moving parts: a detector that proposes interest points, a descriptor that turns each interest point into a vector you can compare, and a matcher that decides which pairs are trustworthy. When those three parts cooperate, the output is a set of correspondences that makes downstream tasks like stitching feel almost inevitable.
The detector is Harris corner detection, which finds points where intensity changes strongly in more than one direction. I compute image gradients with 3×3 Sobel filters, build the local structure tensor, and score each pixel with the Harris corner response. To keep detections meaningful instead of noisy, I then select corners that are local maxima within a 7×7 neighborhood. The result is a set of repeatable, distinctive points that tend to survive small translations and rotations.
|
|
|
After detection, I describe each interest point in two different ways. The Simple descriptor is a 5×5 patch of pixel intensities around the keypoint. It is small and surprisingly effective when the image pair is mostly related by translation, but it is not built to handle rotation. The MOPS descriptor is designed to be more robust to rotation by turning a larger neighborhood into a normalized, canonical patch. I extract a 40×40 region around the keypoint, rotate it so the keypoint orientation points to the right, subsample down to 8×8 using an affine warp (cv2.warpAffine), then normalize to zero mean and unit variance. That produces a compact 64 dimensional descriptor that is easier to compare and more stable under rotation.
For matching, I implemented two strategies. SSD compares descriptors using straight Euclidean distance, which is simple but tends to accept ambiguous matches in repetitive textures. The Ratio Test compares the best match distance to the second best match distance and rejects matches where the best is not decisively better than the runner up. In practice, this one rule dramatically improves match quality because it filters out cases where a feature could plausibly correspond to many locations.
Results
I evaluated four combinations on the Yosemite benchmark dataset by pairing two descriptors (Simple and MOPS) with two matching rules (SSD and Ratio Test). The results were consistent with what I hoped to see: the Ratio Test improves reliability across the board, and MOPS benefits the most when paired with it.
The average AUC scores were:
- Simple + SSD: 0.8855
- Simple + Ratio: 0.9007
- MOPS + SSD: 0.7988
- MOPS + Ratio: 0.9039 (best)
The ROC curves make the matching behavior feel concrete. Ratio Test consistently produces cleaner separations because it rejects ambiguous correspondences, and MOPS benefits most when it is paired with a matcher that filters uncertainty rather than accepting the nearest neighbor by default.
|
|
|
|
|
|
Best Method: MOPS descriptor with Ratio Test provides the best performance due to rotation invariance and robust matching criteria.
A useful side experiment was tuning the Harris threshold on yosemite1.jpg. At a very strict threshold, detections are extremely sparse and only the most dominant corners survive. Lowering the threshold steadily increases coverage across ridges, rock textures, and forest detail, but eventually produces a dense field of weak points that are less stable and more likely to generate bad matches. Seeing that progression made the tradeoff feel concrete: sensitivity buys recall, but it also buys noise, and matching has to compensate for it.
Discussion
This pipeline is a practical building block for real vision systems. Panorama stitching relies on feature correspondences to find overlap regions. Object recognition matches local features across different views of the same object. Structure from motion uses matches across many frames to recover 3D geometry. Visual SLAM depends on stable features and reliable matching to localize and map in real time. What I like about this project is that it makes those larger systems feel less mysterious, because you can see the exact moment where two images begin to agree on what they share.