Project 4

IMAGE WARPING and MOSAICING

Yuan-Hao Huang

yhhuang20@berkeley.edu

Overview

This project warps images into different perspectives. Combining with blending techniques, we can further build mosiacs that consist of multiple photos. In the second part, we implement the algorithm that automatically find corresponding points of two images.

Part A

1. Shoot and digitize pictures

Omitted. Please see the following sections.

2. Recover homographies

First, label the two images using the provided tool. Suppose \((x, y, 1), (wX, wY, w)\) is a pair of corresponding points. We got the equation \(H\begin{bmatrix}x\\y\\1\end{bmatrix}=\begin{bmatrix}wX\\wY\\w\end{bmatrix}\), where \(H\) is the transformation matrix. Let \(H=\begin{bmatrix}a&b&c\\d&e&f\\g&h&1\end{bmatrix}\). We can rewrite it as \[\begin{cases} ax+by+c = wX\\ dx+ey+f = wY\\ gx+hy+1 = w \end{cases}\] Substituting the first two equations by the third one, we get \[\begin{cases} ax+by+c = (gx+hy+1)X\\ dx+ey+f = (gx+hy+1)Y \end{cases}\] Thus \[\begin{cases} ax+by+c-gxX-hyX = X\\ dx+ey+f-gxY-hyY = Y \end{cases}\] Since \(x, y, X, Y\) are constants, we have eight unknown variables. They can be solved with at least 4 pairs of corresponding points. If we have more than 4 pairs (over-determined), we can obtain a least-square error solution.

3. Warp the images

As we did in project 3, the warped image is obtained by inverse mapping (\(H^{-1}\)) the coordinates and interpolating the nearby pixels. However, it is helpful to map the four corners into new coordinates first to determine the size and boundaries of the new image. With this information, we can create an array of coordinates in the region of the new image and then map them back to the original image to obtain the pixel values. Additionally, we can create a mask of the shape of the new image, which is helpful for the blending task later.

Original:

Rectified (the left side of Caltrain):

Original:

Rectified (the flag of Taiwan):

Original:

Rectified (the monitor):

Note that if we warp too much, it would fail. In the image below, the top and bottom lines of the monitor intersect within the image. If we want the monitor to become a rectangle in the warped image, the resulting \(H\) will map the points in the right of the intersection to some coordinates \(\begin{bmatrix}\cdot\\\cdot\\w\end{bmatrix}\) where \(w\) is negative, which is impossible to transform into a valid coordinate.

4. Blend images into a mosaic

First, we warp the second image so that its labels match the first image. We also build masks for both images to indicate their meaningful region (because many parts of the warped image is completely black), and the distance maps that indicate the distance of each pixel to the bounderies of the images. Since the shape and size of the second image changes significantly after warping, we add zero paddings to the two images so that their shapes and positions align. The same paddings also apply to the masks and distance maps. Then we separate the low and high frequency components of both images using a Gaussian filter. For the overlapped region, each pixel of the low frequency component is the weighted sum of two images, weighted by their distance map values. Each pixel of the high frequency component is either from image 1 or image 2, depending on which one has larger distance map value. This 2-level Laplacian pyramid blending significantly improves the results, comparing to direct mixing (taking the average values of two images in the overlapped region).

Original image 1 (left):

Original image 2 (right):

Image 2 warped:

Image 1 padded:

Image 2 padded:

Image 1 mask padded:

Image 1 distance map padded:

Image 2 mask padded:

Image 2 distance map padded:

Image 1 low frequency weight:

Image 1 high frequency weight:

Image 2 low frequency weight:

Image 2 high frequency weight:

Image 1 low frequency:

Image 1 high frequency:

Image 2 low frequency:

Image 2 high frequency:

Image 1 apply weight:

Image 2 apply weight:

Mix result:

Direct mixing (directly taking the average values of two images in the overlapped region), we can see obvious defects at boundaries:

Here are some other examples:

Original image 1 (left):

Original image 2 (right):

Image 2 warped:

Mix result:

Original image 1 (right):

Original image 2 (left):

Image 2 warped:

Mix result:

Part B

For this part, we demonstrate the intermediate steps using this set of images:

1. Detecting corners

Harris corners are detected by the sample code.

Heat map of Harris response \(h\):

Harris Corners (local max of \(h\)):

2. ANMS

We first implement an array that contains \(r_i\) for all Harris corners, where \(r_i\) is defined in the paper as \[r_i = \min\limits_j |\mathbf{x}_i - \mathbf{x}_j| \text{, s.t. } f(\mathbf{x}_i) < c_{\text{robust}} f(\mathbf{x}_j), \mathbf{x}_j\in I\] Intuitively, \(r_i\) is the distance to the nearest neighbor that has larger \(h\) value than itself. We then sort the Harris corners by \(r_i\) (decreasingly) and keep the first 1000 as the result of ANMS.

ANMS points:

In contrast, if we simply pick the 1000 points with highest \(h\) values, it will look like this:

hMax points:

Apparently, ANMS gives more evenly distributed results.

3. Feature Descriptor

We sub-sample a 8x8 square from a 40x40 window within the blurred image. Then we take the z-score of each pixel within the 8x8 square. Some examples are visualized below, where the 8x8 square in the left is sampled from the 40x40 window (bounded by 4 red dots) in the blurred image in the middle, and in the right is the original (unblurred) image.

4. Matching Feature Descripters

We first compute the L2 distance of each pair of feature descriptors. This wouldn't take too long using the provided function. Then for each point in image A, we find its nearest neighbor and 2nd nearest neighbor within image B. If the ratio of these two distances is smaller than certain threshold (set as 0.3), we believe the nearest neighbor is a correct match.

NN matched points:

5. RANSAC

For many many many (50000) rounds, we first choose 4 random pair of matched points, compute their \(H\) matrix, and see if it is valid for other pairs (error<1.5). Only the largest set of pairs that agrees with a matrix is taken to the next step, others are removed.

RANSOC matched points:

The figures below summarize the corners matching process.
After ANMS: green+blue+red
After nearest neighbor: green+blue
After RANSAC: green

6. Producing Mosaic

The approach is explained in Part A.

Original images:

Automatic matching result:

Manual labelling result:

Original images:

Automatic matching result:

Manual labelling result:

Original images:

Automatic matching result:

Manual labelling result:

Original images:

Automatic matching result:

Original images:

Automatic matching result (some ghost exist because players are moving):

Original images:

Automatic matching result (some ghost exist because players are moving):

Original images:

Automatic matching result:

What have you learned?

I was surprise that the algorithm actually works. When I learned it in the course, I felt like it makes a little sense but would definitely not work in practice because there are so many factors, while the algorithm just assume some ideal circumstances. This is quite true actually. Each component cannot give promising results individually. However, by aggregating them all together, we are able to make the results more "pure". The process of selecting corresponding points is kind of like the Swiss Cheese model. Going through Harris corners -> ANMS -> nearest neighbor -> RANSOC is like passing through layers of cheese. A single layer of cheese, or algorithm, cannot give results that we can rely on. However, if a pair of points passes through all layers of cheese, then it is very likely to be valid.

References

numpy

numpy.linalg.lstsq
numpy.min
numpy.linalg.inv
numpy.round
numpy.flip
numpy.greater_equal
numpy.copy
numpy.median
numpy.argsort
numpy.argmin
numpy.take
numpy.sum

scipy

stats.zscore

stackoverflow

python how to pad numpy array with zeros
how to modify a 2D numpy array at specific locations without a loop?
Set numpy array elements to zero if they are above a specific threshold
How to plot a high resolution graph
Plotting a 2D heatmap
How do I extract a sub-array from a numpy 2d array? [duplicate]
Numpy Resize/Rescale Image
How do I create a list of random numbers without duplicates?