63
ECS 289H: Visual Recognition Fall 2014 Yong Jae Lee Department of Computer Science

ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

ECS 289H: Visual RecognitionFall 2014

Yong Jae Lee

Department of Computer Science

Page 2: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Plan for today

• Topic overview

• Introductions

• Course requirements and administrivia

• Syllabus tour (if time permits)

Page 3: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Computer Vision

• Let computers see!

• Automatic understanding of visual data (i.e., images and video)

─ Computing properties of the 3D world from visual data (measurement)

─ Algorithms and representations to allow a machine to recognize objects, people, scenes, and activities (perception and interpretation)

Kristen Grauman

Page 4: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

What does recognition involve?

Fei-Fei Li

Page 5: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Detection: are there people?

Page 6: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Activity: What are they doing?

Page 7: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Object categorization

mountain

building

tree

banner

vendor

people

street lamp

Page 8: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Instance recognition

Potala

Palace

A particular

sign

Page 9: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Scene categorization

• outdoor

• city

• …

Page 10: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Attribute recognition

flat

gray

made of fabric

crowded

Page 11: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Why recognition?

• Recognition a fundamental part of perception• e.g., robots, autonomous agents

• Organize and give access to visual content• Connect to information

• Detect trends and themes

• Why now?

Kristen Grauman

Page 12: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Posing visual queries

Kooaba, Bay & Quack et al.

Yeh et al., MIT

Belhumeur et al.

Kristen Grauman

Page 13: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Interactive systems

Shotton et al.

Page 14: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Autonomous agents

Google self-driving car

Mars rover

Page 15: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Exploring community photo collections

Snavely et al.

Simon & SeitzKristen Grauman

Page 16: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Discovering visual patterns

Actions

Categories

Characteristic elements

Wang et al.

Doersch et al.

Lee & Grauman

Page 17: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Recognizing flat, textured objects (e.g. books, CD covers, posters)

Reading license plates, zip codes, checks

Fingerprint recognition

Frontal face detection

What kinds of things work best today?

Kristen Grauman

Page 18: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

What are the challenges?

Page 19: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

What we see What a computer sees

Page 20: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: robustness

Illumination Object pose

ViewpointIntra-class appearance

Occlusions

Clutter

Kristen Grauman

Page 21: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Context cues

Kristen Grauman

Challenges: context and human experience

Page 22: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: context and human experience

Context cues Function Dynamics

Kristen Grauman

Page 23: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: context and human experience

Fei Fei Li, Rob Fergus, Antonio Torralba

Page 24: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: context and human experience

Fei Fei Li, Rob Fergus, Antonio Torralba

Page 25: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: context and human experience

Fei Fei Li, Rob Fergus, Antonio Torralba

Page 26: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: scale, efficiency

• Half of the cerebral cortex in primates is devoted to processing visual information

Page 27: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

6 billion images 70 billion images 1 billion images served daily

10 billion images

100 hours uploaded per minute

Almost 90% of web traffic is visual!

:From

Challenges: scale, efficiency

Page 28: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Biederman 1987

Challenges: scale, efficiency

Fei Fei Li, Rob Fergus, Antonio Torralba

Page 29: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Challenges: learning with minimal supervision

MoreLess

Kristen Grauman

Page 30: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Slide from Pietro Perona, 2004 Object Recognition workshop

Page 31: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Slide from Pietro Perona, 2004 Object Recognition workshop

Page 32: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Inputs in 1963…

L. G. Roberts, Machine Perception

of Three Dimensional Solids,

Ph.D. thesis, MIT Department of

Electrical Engineering, 1963.

Kristen Grauman

Page 33: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Personal photo albums

Surveillance and security

Movies, news, sports

Medical and scientific images

… and inputs today

Svetlana Lazebnik

Understand and organize and index all this data!!

Page 34: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)
Page 35: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Introductions

Page 36: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Course Information

• Website: https://sites.google.com/a/ucdavis.edu/ecs-289h-visual-recognition/

• Office hours: by appointment (email)

• Email: start subject with “[ECS289H]”

Page 37: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

This course

• Survey state-of-the-art research in computer vision

• High-level vision and learning problems

Page 38: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Goals

• Understand, analyze, and discuss state-of-the-art techniques

• Identify interesting open questions and future directions

Page 39: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Prerequisites

• Interest in computer vision!

• Courses in computer vision, machine learning, image processing is a plus (but not required)

Page 40: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Requirements

• Class Participation [25%]

• Paper Reviews [25%]

• Paper Presentations (1-2 times) [25%]

• Final Project [25%]

Page 41: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Class participation

• Actively participate in discussions

• Read all assigned papers before each class

Page 42: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Paper reviews

• Detailed review of one paper before each class

• One page in length─ A summary of the paper (2-3 sentences)

─ Main contributions

─ Strengths and weaknesses

─ Experiments convincing?

─ Extensions?

─ Additional comments, including unclear points, open research questions, and applications

• Email by 11:59 pm the day before each class

• Email subject “[ECS 289H] paper review”

Page 43: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Paper presentations

• 1 or 2 times during the quarter

• ~20 minutes long, well-organized, and polished:─ Clear statement of the problem

─ Motivate why the problem is interesting, important, and/or difficult

─ Describe key contributions and technical ideas

─ Experimental setup and results

─ Strengths and weaknesses

─ Interesting open research questions, possible extensions, and applications

• Slides should be clear and mostly visual (figures, animations, videos)

• Look at background papers as necessary

Page 44: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Paper presentations

• Search for relevant material, e.g., from the authors' webpage, project pages, etc.

• Each slide that is not your own must be clearly cited

• Meet with me one week before your presentation with prepared slides

• Email slides on day of presentation

• Email subject "[ECS 289H] presentation slides"

• Skip paper review the day you are presenting

Page 45: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Final project

• One of:─ Design and evaluation of a novel approach

─ An extension of an approach studied in class

─ In-depth analysis of an existing technique

─ In-depth comparison of two related existing techniques

• Work in pairs

• Talk to me if you need ideas

Page 46: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Final project

• Final Project Proposal (5%) due 10/31─ 1 page

• Final Project Presentation (10%) due 12/9, 12/11─ 15 minutes

• Final Project Paper (10%) due TBD─ 6-8 pages

Page 47: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)
Page 48: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Local Features and Matching

• Local invariant features; detectors and descriptors

• Matching, indexing, bag-of-words

Page 49: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Image classification

• Object and scene classification

• Global and object-based representations

Page 50: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Object detection

• Discriminative methods

• Faces, people, objects

Page 51: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Unsupervised/weakly-supervised discovery

• Clustering, context

• Mid-level visual elements

Page 52: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Segmentation

• Grouping pixels

Page 53: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Human-in-the-loop

• Interactive systems, crowdsourcing

• Active learning

Page 54: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Activity recognition

• What are the humans in the image or video doing?

Page 55: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Human pose

• Finding people and their poses

Page 56: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Attributes

• Describing properties of objects, people, scenes

Page 57: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Image search and mining

• Large-scale instance-based recognition

• Discovering connections

Page 58: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Language and images

• Generating image descriptions

Page 59: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Big data

• Data-driven techniques

• Dataset bias

Page 60: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

First-person vision

• Recognition from the first-person view

• Social scene understanding

Page 61: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Deep convolutional neural networks

Page 62: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)
Page 63: ECS 289H: Visual Recognitionweb.cs.ucdavis.edu/~yjlee/teaching/ecs289h-fall2014/introduction.pdf · to recognize objects, people, scenes, and activities (perception and interpretation)

Coming up

• Carefully read class webpage

• Sign-up for papers you want to present

• Next class: overview of my research