Colorizing the Prokudin-Gorskii Photo Collection

Shivansh Baveja

Project Description

The goal of this project originates with the brilliant Russian photographer, Sergei Mikhailovich Prokudin-Gorskii, who travelled across Russia photographing various people, places, and objects. Though this may sound routine, his methodology was unique for his time, as he captured his pictures across three—red, green, and blue—color plates. These glass plate negatives were purchased by the Library of Congress in 1948, and have recently been digitized. In this project, I have reconstructed the various color images through phased alignment, percentile-based white balancing, and automatic cropping.

Initial Approaches

Motivations

I started off this project by testing various different approaches to aligning the images. The idea was to see which approaches worked best qualitatively (clearest images, least artifacts) and quantitatively (how long it took to align the images). Since many of these initial approaches were based on a window-search algorithm, they were too slow to run on the .tif files and thus these initial tests were constrained to the .jpg files. The purpose of these tests was to determine the best initial method onto which I would add improvements.

Implementations

Out of the methods depicted below, L2 and NCC are as defined on the project spec, kernel is a convolution based approach using scipy.signal.correlate2d and SSIM is used from skimage.metrics.structural_similarity. Lastly, the phase based alignment method was implemented by hand, and is further detailed in the Bells & Whistles section.

Observations

We notice that although most of the methods converge to very similar offsets for the red and green channels, they do so at various speeds and with varying accuracy. For example, the kernel based method converges far slower than the others and very often disagrees with the majority (and produces a slightly more blurry output). On the other hand, the custom phase-based method runs multiple orders of magnitude faster than the other methods and consistently provides the clearest images. As such, this phase based alignment method was chosen for further iteration and improvement. The details of this implementation can be found in my code and will be discussed in the Bells & Whistles section.
Kernel L2 NCC Phase SSIM
Cathedral
Cathedral Kernel
red: [2, 11]
green: [1, 4]
time: 39.70 sec
Cathedral L2
red: [3, 12]
green: [2, 5]
time: 0.44 sec
Cathedral NCC
red: [3, 12]
green: [2, 5]
time: 0.40 sec
Cathedral Phase
red: [3, 12]
green: [2, 5]
time: 0.01 sec
Cathedral SSIM
red: [3, 12]
green: [2, 5]
time: 8.41 sec
Monastery
Monastery Kernel
red: [1, 2]
green: [1, -4]
time: 39.29 sec
Monastery L2
red: [2, 3]
green: [2, -3]
time: 0.45 sec
Monastery NCC
red: [2, 3]
green: [2, -3]
time: 0.39 sec
Monastery Phase
red: [2, 3]
green: [2, -3]
time: 0.01 sec
Monastery SSIM
red: [2, 3]
green: [2, -3]
time: 8.94 sec
Tobolsk
Tobolsk Kernel
red: [3, 5]
green: [2, 2]
time: 37.61 sec
Tobolsk L2
red: [3, 6]
green: [3, 3]
time: 0.45 sec
Tobolsk NCC
red: [3, 6]
green: [3, 3]
time: 0.39 sec
Tobolsk Phase
red: [3, 6]
green: [2, 3]
time: 0.02 sec
Tobolsk SSIM
red: [3, 7]
green: [3, 3]
time: 8.51 sec

Improvements and Pipelining

Improvements

It is important to note that these alignment techniques work more robustly when the images are cropped, since metrics such as NCC are not robust to borders and other image artifacts. An example of this is seen below, since the L2 and NCC alignment techniques produce slightly blurry photos when the images are not cropped. In this case, the images had 10% cropped off of each side. It is also important to note that our phase based alignment method seems to be robust to these artifacts, further making it the obvious choice to proceed with.
Cropped Not Cropped
L2
Cropped L2
Not Cropped L2
NCC
Cropped NCC
Not Cropped NCC
Phase
Cropped Phase
Not Cropped Phase

General Pipeline

Now that we have identified our optimal alignment approach, we can sketch out the pipeline we will use to generate the cleanest, crispest possible images.
  1. Splitting the image into three channels. This is done through simple array slicing. We proceed by aligning both the red and green channels to the blue channel.
  2. Image-pyramid based alignment. We use our phase-alignment method to recursively align the larger .tif images, providing a large speedup without any drop in alignment accuracy or image quality.
    1. The image is halved in both dimensions 4 times to give us our lowest resolution image. Since most of the larger .tif images are 4000px x 10000px the size of this image is about 250px x 625px.
    2. An exhaustive window search is performed on this lower resolution, image from the offsets of [-16, +16]
    3. As the alignments propogate recursively upwards, we begin to test the displacements at higher resolutions. Though we are still using a sliding window approach, we now only examine displacements in the window [-2, +2]. Notice that these displacements are centered at the optimal alignments found at the lower resolutions.
    4. This method allows for a much larger search window to be covered with far less computation, providing the necessary speedup to make the .tif files tractable.
  3. Automatic cropping is performed by thresholding row and column values to remove rows and columns that are outliers w.r.t their neighbors or almost entirely white or black pixels. This will be expanded upon in the Bells & Whistles section.
  4. Percentile based white balancing recolorizes the images into a more photo-realistic space. This will be expanded upon in the Bells & Whistles section.

Bells & Whistles

Phase-Based Alignment

The fastest, most robust alignment technique for me ended up being phased based alignment, which was derived in a paper titled, "The Phase Correlation Alignment Method" by C. D. Kuglin D. C. Hines.

This technique works generally by first converting both images from the spatial domain to the frequency domain using a 2-dimensional FFT. It then computes the cross-power spectrum which works to identify common frequencies between the images. These results are them normalized and taken back to the frequency domain, directly providing a shift.

The intuition for this is that if one image is a "shifted" version of another, this shift will be depicted as a "peak" in the final result. Since this technique operates in the frequency domain, we circumvent having to test a polynomial number of offsets and allow ourselves to compute the optimal shift in constant time regardless of the underlying shift. It is also important to note that operating in the frequency domain grants us additional robustness to artifacts such as borders.

Automatic Cropping

Automatic cropping was implemented through a relatively simple procedure. We used thresholding to identify rows and columns that were either almost completely white or black and remove them from the image. Analyzing histograms helped me empirically determine the thresholding interval [0.05, 0.95]. Results from this cropping on a handful of images are shown below.
Not Cropped Cropped
Cathedral
Not Cropped Cathedral
Cropped Cathedral
Train
Not Cropped Train
Cropped Train
Emir
Not Cropped Emir
Cropped Emir

Percentile-Based White Balancing

My percentile-based white balancing approach has a few main components. First, we split the image into its three channels and identify the 5th an 95th percentiles of vales. We then ignore the values outside this interval and stretch the remaining values to fill the [0, 1] interval for float images. Intuitively, we are ignoring the darkest and brighest pixels and taking what was the most "normal bright" and making it correspond to the color white. This is done separately across channels to allow for more degrees of freedom for the image to remain completely balanced. Empirically, this produces a more photo-realistic version of the images as can be seen below.
Not White Balanced White Balanced
Cathedral
Not White Balanced Cathedral
White Balanced Cathedral
Train
Not White Balanced Train
White Balanced Train
Emir
Not White Balanced Emir
White Balanced Emir

Final Results

Below are all 14 images in their fully aligned, cropped, and white balanced forms with the relevant offsets provided.
Image Name Image Offsets
monastery.jpg Monastery Phase red: [2, 3]
green: [2, -3]
tobolsk.jpg Tobolsk Phase red: [3, 6]
green: [2, 3]
cathedral.jpg Cathedral Phase red: [3, 12]
green: [2, 5]
emir.tif Emir Phase red: [41, 106]
green: [24, 49]
church.tif Church Phase red: [-4, 58]
green: [4, 25]
three_generations.tif Three Generations Phase red: [8, 111]
green: [12, 55]
melons.tif Melons Phase red: [14, 176]
green: [8, 79]
onion_church.tif Onion Church Phase red: [34, 107]
green: [19, 51]
train.tif Train Phase red: [28, 85]
green: [0, 40]
icon.tif Icon Phase red: [23, 88]
green: [16, 39]
self_portrait.tif Self Portrait Phase red: [37, 175]
green: [29, 77]
harvesters.tif Harvesters Phase red: [11, 118]
green: [18, 60]
sculpture.tif Sculpture Phase red: [-27, 140]
green: [-11, 33]
lady.tif Lady Phase red: [13, 120]
green: [9, 57]