Towards a holistic approach to the socio-historical analysis of vernacular photos
Lorenzo Stacchio
Alessia Angeli
Giuseppe Lisanti
Gustavo Marfia
[GitHub]
[Paper]


Example input grayscale photos and output colorizations from our algorithm. These examples are cases where our model works especially well. For randomly selected examples, see the Performance comparisons section below.

Abstract

Although one of the most popular practices in photography since the end of the 19th century, an increase in scholarly interest in family photo albums dates back to the early 1980s. Such collections of photos may reveal sociological and historical insights regarding specific cultures and times. They are, however, in most cases scattered among private homes and only available on paper or photographic film, thus making their collection and analysis by historians, socio-cultural anthropologists, and cultural theorists very cumbersome. Computer-based methodologies could aid such a process in various ways, speeding the cataloging step, for example, with the use of modern computer vision techniques. We here investigate such an approach, introducing the design and development of a multimedia application that may automatically catalog vernacular pictures drawn from family photo albums. To this aim, we introduce the IMAGO dataset, which is composed of photos belonging to family albums assembled at the University of Bologna's Rimini campus since 2004. Exploiting the proposed application, IMAGO has offered the opportunity of experimenting with photos taken between the years 1845 and 2009. In particular, it has been possible to estimate their socio-historical content, i.e., the dates and contexts of the images, without resorting to any other sources of information. Exceeding our initial expectations, such an approach has revealed its merit not only in terms of performance but also in terms of the foreseeable implications for the benefit of socio-historical research. To the best of our knowledge, this contribution is among the few that move along this path at the intersection of socio-historical studies, multimedia computing, and artificial intelligence.


[Bibtex]



Semantic interpretability of results

Here, we show the ImageNet categories for which our colorization helps and hurts the most on object classification. Categories are ranked according to the difference in performance of VGG classification on the colorized result compared to on the grayscale version. This is an extension of Figure 6 in the [v1] paper.

Click a category below to see our results on all test images in that category.

Top
Bottom
  1. Rapeseed
  2. Lorikeet
  3. Cheeseburger
  4. Meat Loaf
  5. Pomegranate
  1. Green Snake
  2. Pizza
  3. Yellow Lady's Slipper
  4. Orange
  5. Goldfinch
  1. Chain
  2. Wok
  3. Can opener
  4. Water bottle
  5. Modem
  1. Standard Schnauzer
  2. Pickelhaube
  3. Half Track
  4. Barbershop
  5. Military Uniform




Recent Related Work

There have been a number of works in the field of automatic image colorization in the last few months! We would like to direct you to these recent related works for comparison. For a more thorough discussion of related work, please see our full paper.

Concurrent Work

Gustav Larsson, Michael Maire, and Gregory Shakhnarovich. Learning Representations for Automatic Colorization. In ECCV 2016. [PDF][Website]
Satoshi Iizuka, Edgar Simo-Serra, and Hiroshi Ishikawa. Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification. In SIGGRAPH, 2016. [PDF][Website]


Previous Work

Ryan Dahl. Automatic Colorization. Jan 2016. [Website]
Aditya Deshpande, Jason Rock and David Forsyth. Learning Large-Scale Automatic Image Colorization. In ICCV, Dec 2015. [PDF][Website]
Zezhou Cheng, Qingxiong Yang, and Bin Sheng. Deep Colorization. In ICCV, Dec 2015. [PDF]



Acknowledgements

This research was supported, in part, by ONR MURI N000141010934, NSF SMA-1514512, an Intel research grant, and a Tesla K40 GPU hardware donation by NVIDIA Corp. We thank members of the Berkeley Vision Lab for helpful discussions, Philipp Krähenbühl and Jeff Donahue for help with self-supervision experiments, and Aditya Deshpande and Gustav Larsson for providing help with comparisons to Deshpande et al. and Larsson et al.