Project 1
Images of the Russian Empire: Colorizing the Prokudin-Gorskii photo collection
Yuan-Hao Huang
yhhuang20@berkeley.edu
Overview
In this project, we want to find proper displacement to align the r, g, b-layers of the images taken by Prokudin-Gorskii.
Algorithms
For small figures (.jpg)
For small figures, simply shift (using np.roll) the g-layer within a [-20, 20] range in both x, y directions, and find the displacement that lets the g-layer matches the b-layer most.
Then do the same for the r-layer.
What do you mean "matches"?
We believe two layers "matches" when the "distance" between them is minimized.
The distance of two layers is the sum of distance of each corresponding pixels in the two layers.
What's the distance of two pixels?
Let a, b be the corresponding pixels in the g-layer, b-layer, respectively.
Directly comparing the value of a and b is not reasonable because they just means the brightness in green and blue.
Instead, we want to compare their role, or position, in the entire figure.
Therefore, we represent each pixel by a 4-vector representative, which is the difference of the pixel and its N, E, S, W neighbors.
For example, a is represented by (aN, aE, aS, aW), where aN = |value of a - value of the pixel above a| in the g-layer, etc.
Comparing a pixel with its neighbors allows us to identify the "edges" within the image, and we can further aligns the "edges" of two layers.
The distance of a and b is simply |aN-bN|+|aE-bE|+|aS-bS|+|aW-bW|.
Summing up the distance of all pixels gives the distance of the two layers.
For large figures (.tif)
Recursively downsize the figure by building pyramids until the height is less than 100.
Then search for best fit within a [-5, 5] range in both x, y directions.
Tracing back the recursive process, given the best displacement of the down-sized (paramid) version, find the best displacement for the original figure by searching within a [-2, 2] neighbor of the given displacement.
Final result is the best displacement of the entire layers.
Pyramid
Pyramid is built by scaling an image to 1/4 size by taking average of four (2 by 2) adjacent pixels.
Results of small figures (.jpg)
displacement = [x-offset of g-layer, y-offset of g-layer, x-offset of r-layer, y-offset of r-layer]
cathedral, displacement = [2 5 3 12]
monastery, displacement = [2 -3 2 3]
tobolsk, displacement = [2 3 3 6]
Results of large figures (.tif)
displacement = [x-offset of g-layer, y-offset of g-layer, x-offset of r-layer, y-offset of r-layer]
church, displacement = [4 25 -4 58]
emir, displacement = [23 50 41 106]
harvesters, displacement = [14 60 12 123]
icon, displacement = [16 39 23 89]
lady, displacement = [9 57 13 120]
melons, displacement = [10 80 13 177]
onion_church, displacement = [24 52 35 108]
sculpture, displacement = [-11 33 -27 140]
self_portrait, displacement = [30 82 37 175]
three_generations, displacement = [12 56 8 111]
train, displacement = [-2 40 28 85]
displacement = [x-offset of g-layer, y-offset of g-layer, x-offset of r-layer, y-offset of r-layer]
Religious painting, displacement = [6 29 7 69]
Arched entranceway, displacement = [1 38 -8 97]
Gondola, displacement = [15 15 30 81]
castle, displacement = [15 41 25 91]
Religious candlestick, displacement = [1 48 -7 106]
References
numpy
numpy.dstack
numpy.roll
numpy.linalg.norm
numpy.sqrt
numpy.concatenate
numpy.heaviside
Indexing on ndarrays
numpy.average
scikit-image
skimage
stackoverflow
PIL TypeError: Cannot handle this data type
int() vs .astype('int') Python
TypeError: only integer scalar arrays can be converted to a scalar index with 1D numpy indices array
How do I get time of a Python program's execution?
https://stackoverflow.com/questions/8885663/how-to-format-a-floating-number-to-fixed-width-in-python