A quick summary of CVPR 2015

Sebastian Nowozin - Thu 11 June 2015 -

CVPR 2015, "Computer Vision and Pattern Recognition" is the main conference of the computer vision community and just finished. I unfortunately was only able to stay for the three main conference days, but here is my short subjective summary.

For an overview of individual research papers, see this excellent summary page by Andrej Karpathy.

From the papers I have seen at the conference my personal favorite is Barron et al., "Fast Bilateral-Space Stereo for Synthetic Defocus", PDF here. I liked it for a number of reasons. First, this research is already successfully productized in a high-profile product and the presentation of the work was excellent. Second, the flavour of this work is to take a data structure (the permutohedral lattice) which has been used for one problem successfully (bilateral filtering), and use it to solve a more difficult problem (disparity from stereo) within the domain of the data structure. This general idea may be useful in other contexts. To admit the truth, I never liked pixels as a representation of image data, and many statistical models are just awkward to specify on the pixel level; for this reason we as a community often use higher representations such as superpixels or region proposals. This paper provides an alternative method on how a regular representation that is more aligned with the semantic content of the image could be used to solve problems in such a way that one can reconstruct a solution on the pixel level.

Research Trends

  • Deep Learning and Convolutional Neural Networks. Since the seminal ECCV 2012 workshop presentation by Alex Krizhevsky that announced the ImageNet results and was published as a NIPS paper the same year the computer vision community has rapidly adopted convolutional networks and some of the largest vision labs developed toolkits that democratized this technology, such as Caffe, and existing toolkits such as Torch, and Theano are also used. In effect, I estimate that around 30 percent of all papers used convolutional networks or features derived from them in their work, often substantially increasing predictive performance on the given task. Significant research directions remain open to everyone, but it is fair to say that standard convnets are now a mature vision technology regularly used by large parts of the community.
  • Rich Linguistic Outputs. Automatic image captioning is now feasible. There is an excellent summary of the may works at Piotr Dollar's blog and also in another summary by John Platt. Many of these works are enabled by the recent Microsoft COCO dataset and by recurrent neural networks.

Non-Research Trends and the IEEE Controversy

  • Growth in attendance. Attendance was at more than 2,400 persons, continuing the rapid growth of the computer vision community.
  • More code published. On almost every second poster there was a github URL and the licenses are generally very liberal (MIT, BSD, etc.) so as to permit wide distribution; this is great as it further accelerates the speed at which efforts can be redirected towards promising approaches.
  • IEEE splits from CVPR. The conference has always been organized in part by IEEE in various capacities as an insurer, organizer, and publisher. However, with traditional publishing models being obsoleted, and with examples of independent conferences and journals in the machine learning community (NIPS, ICML, and JMLR), and considering that CVPR as one of the premier conferences in all of computer science, the power balance has shifted away from IEEE towards the computer vision community; as a result, over the last few years the ties with IEEE have been weakened and now seem to be lost. To be fair, following CVPR 2011, IEEE has moved and negotiated a fairer deal, with CVPR papers made available open-access since CVPR 2013, and allowing co-sponsoring arrangements with the Computer Vision Foundation. But now, after threats made by IEEE, it has been voted at the PAMI-TC meeting that future CVPR conferences (starting with CVPR 2016) that the computer vision foundation will take over the functions previously carried out by the IEEE. More details will be announced shortly, I am sure. Whether this has any repercussions for the TPAMI journal is unclear at the current point, but before making threats and actions that would serve as a catalyst for community action, IEEE would be wise to consider what has happened to Springer's Machine Learning journal in 2001 and the events that led to the founding of the Journal of Machine Learning Research, a very successful experiment.