Efficient Image Scene Analysis and Applications
- 👤 Speaker: Ming-Ming Cheng, University of Oxford
- 📅 Date & Time: Thursday 24 April 2014, 15:00 - 16:00
- 📍 Venue: Cambridge University Engineering Department, LR3
Abstract
Images remain one of the most popular and ubiquitous media for capturing and documenting the world around us. Developing efficient algorithms for understanding such images is of great importance for many applications in computer vision and computer graphics. In this report, I will present three algorithms for efficient image scene understanding as well as their applications.
Automatic estimation of salient object regions across images, without any prior assumption or knowledge of the contents of the corresponding scenes, enhances many computer vision and computer graphics applications. We introduce a regional contrast based salient object extraction algorithm, which simultaneously evaluates global contrast differences and spatial weighted coherence scores. Experimental results on famous benchmarks demonstrated that our algorithm consistently outperforms existing salient object detection and segmentation methods, yielding higher precision and better recall rates. The proposed method, which do not require having expensive training data annotation in advance, provides an economical and practical tool to analysis large scale unlabeled dataset (e.g. internet images).
Training a generic objectness measure to produce a small set of candidate object windows, has been shown to speed up the classical sliding window object detection paradigm. We proposed a novel binarized normed gradients (BING) feature for objectness estimation of image windows. Our novel feature enables a few atomic operations (e.g. ADD , BITWISE SHIFT , etc.) to test the objectness score of an image window. Experiments on the challenging PASCAL VOC 2007 dataset show that our method efficiently (300fps on a single laptop CPU , 1000 times faster than existing methods) generates a small set of category-independent, high quality object windows, yielding 96.2% object detection rate (DR) with 1,000 proposals.
Humans describe images in terms of nouns and adjectives while algorithms operate on images represented as sets of pixels. Bridging this gap between how we would like to access images versus their typical representation is the goal of image parsing. In this paper we propose treating nouns as object labels and adjectives as visual attributes. This allows us to formulate the image parsing problem as one of jointly estimating per-pixel object and attribute labels from a set of training images. We propose an efficient (interactive time) solution to this problem. Using the extracted attribute labels as handles, our system empowers a user to verbally refine the results. This enables hands free parsing of an image into pixel-wise object/attribute labels that correspond to human semantics.
Series This talk is part of the CUED Computer Vision Research Seminars series.
Included in Lists
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Cambridge University Engineering Department, LR3
- Chris Davis' list
- CUED Computer Vision Research Seminars
- Information Engineering Division seminar list
- Interested Talks
- ndk22's list
- ob366-ai4er
- rp587
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Ming-Ming Cheng, University of Oxford
Thursday 24 April 2014, 15:00-16:00