Ground Truth Dataset and Benchmarks for Mid-Level Vision

People

David Martin (PI)

Sponsor

National Science Foundation
NSF 05-579 CAREER Program
Award IIS-0643887

Abstract

Machine vision systems can now do amazing things: Reading irises and faces, helping to drive autonomous cars in real environments, locating and measuring anatomical structures in medical scans -- these are just a few examples of capabilities that have emerged in recent years. Special-purpose domains still mark the limit of our success, however. The goal of human-level machine vision is still out of reach because the solutions found to these problems do not require the machine to understand the rich structure of visual information.

It is essential to take an empirical approach to the problem of visual perception. The primary goal of this project is to build a dataset of ground truth image annotations that provides the perception of scenes at the level of surfaces, objects, and basic 3D scene geometry. This dataset will be unprecedentedly rich and detailed, providing precisely the information and representations needed to bring general purpose capabilities to machine vision systems. A secondary goal of this project is to create the associated benchmarks and methodologies for evaluating machine systems with respect to the ground truth data.

Intellectual Merit: At the heart of this proposal is the design of a dataset for mid-level vision. The mid-level representation of visual information proposed in this project is of fundamental importance, because there is currently no viable mid-level representation in machine vision. A good mid-level representation is both computable from images as well as useful for higher level tasks. A generic, concrete, and testable mid-level representation is perhaps the most important deliverable of the proposed project.

Broader Impact: The proposed project will have broad impact on the machine vision and human vision research communities. Machine vision models require complex ground truth data for training and benchmarks for evaluation, while psychophysics modelers face the challenge of data from natural images. Additionally, this project will make all its data and tools freely available to the research community. In general, this is an exciting time for machine vision. We are at the threshold of building machines that attain human-level visual perception, which would dramatically alter the relationship between people and machines. With datasets targeting the strategic research problems, significant progress is at hand.

Publications

TBA

Downloads

TBA

Related Links


Berkeley Segmentation Dataset and Benchmark
LabelMe: The open image labeling tool
Caltech 101
Caltech 256
Lotus Hill Research Institute
PASCAL Visual Object Classes
MNHIST Dataset of Handwritten Digits