29
CSE 659A: Advances in Computer Vision Use Leʿ/Right PgUp/PgDown to navigate slides Spring 2020: T-R: 1:00-2:20pm @ Whitaker 216 Instructor: Ayan Chakrabarti ([email protected]). Jan 14, 2019 http://www.cse.wustl.edu/~ayan/courses/cse659a/ 1

CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CSE 659A: Advances inComputer Vision

Use Le /Right PgUp/PgDown to navigate slides

Spring 2020: T-R: 1:00-2:20pm @ Whitaker 216

Instructor: Ayan Chakrabarti ([email protected]).

Jan 14, 2019

http://www.cse.wustl.edu/~ayan/courses/cse659a/

1

Page 2: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

INTRODUCTIONINTRODUCTION

No. of Papers Submitted & Accepted at CVPR (CVPR 2018 Opening Slides)

(Many rejected papers still have good ideas, and just end up on arXiv)

3

Page 3: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

INTRODUCTIONINTRODUCTION

Source: Carv en v on Bearnensquash, "Paper Gestalt", 2010.

4

Page 4: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

INTRODUCTIONINTRODUCTIONComputer Vision is a rapidly changing field !

Every conference (CVPR, ICCV/ECCV, NIPS, ...) leads to improvements in performance on many tasks.At any one time, researchers are exploring diff erent strategies in parallel for each task.At the same time, people are coming up with new tasks as old ones get "solved".

This is both ex citing, and ov erwhelming ...

What it means for y ou:

Need to do have an idea of the diff erent advanced techniques that people are trying to solve problems.Have a broad sense of how state-of-the-art methods for diff erent tasks work.

Know general techniques that are commonly applied to diff erent application domains.Given a problem you're trying to solve, know what's a good starting point.Make sure you don't reinvent the wheel on a problem !

Have the ability to read and understand papers that are, and continue, appearing in conferences.

5

Page 5: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

PRE-REQUISITEPRE-REQUISITECSE 559A (or some other foundational vision c ourse)

http://www.cse.wustl.edu/~ayan/courses/cse559a/

6

Page 6: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

PRE-REQUISITEPRE-REQUISITEIf y ou hav en't taken CSE 559A (with me)

Go to http://www.cse.wustl.edu/~ay an/courses/cse559a/Read the list of topics, and make sure you are familiar with all / most.

What CSE 559A taught y ou: (hopefully)

Basics of the math behind computer vision (image representations, geometry, photometry, ML, ...)Building blocks of image processing and vision algorithms (convolutions, optimiz ation, ...)Linking up equations and concepts to implementations (guided by problem sets)

Read about an algorithm, and figuring out how to implement it (efficiently).

7

Page 7: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

THIS COURSETHIS COURSEAssuming y ou hav e the foundations cov ered:

Will teach you about more advanced algorithms that build on these foundations.Significant fraction of the course will be talking about specific methods in published papers.

Including 'competing' alternate methods, since "the best approach" is not settled.But as part of covering these specific examples, you'll learn about broader techniques and strategies.

Will introduce you to emerging new applications.

8

Page 8: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

TOPICSTOPICSTopic 1: Image Priors / Spatial Models

We saw simple regulariz ers and smoothness models in 559A (image restoration, stereo, ...)Describe more complex models for image intensities, and other continuous and discrete valued variables.

Gradient-based regulariz ers, field-of-experts, Markov Random Fields.Talk about optimiz ation algorithms that incorporate these models for inference.Talk about how these interact with neural networks.This is perhaps going to be the technically "heaviest" part of the course.

But these concepts are widely used in modern vision algorithms.Community knowledge, but not typically covered in intro vision courses.

9

Page 9: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

TOPICSTOPICSTopic 2: Depth and Motion Estimation

More complex methods for stereo and optical flowComplex in the kind of 'priors' they assumeAlso look at un-rectified stereo

CNN-based methods for stereo and flowMonocular depth and surface normal estimation

10

Page 10: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

TOPICSTOPICSTopic 3: Classification and Recognition

Begin by describing 'classical' methods for semantic reasoningInterest point detectors and descriptorsWhole Scene DescriptorsAlgorithms based on these

Advanced CNN-based methods for detection (and how their architectures match up to traditional methods)Algorithms for segmentation (including interaction with spatial models)New tasks: image captioning, visual question answering, ...

11

Page 11: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

TOPICSTOPICSTopic 4: Computational Photography I

"Advanced Image Processing"

Texture synthesis, image retargeting (seam carving)Image harmoniz ation (for image splicing), CG2RealMotion MagnificationImage Inpainting and "Un-cropping"Image Editing with Smart Contours

12

Page 12: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

TOPICSTOPICSTopic 5: Photometric Reasoning

Deeper conceptual dive into what does shading and appearance tell us about shape ?Modern algorithms for photometric stereo.Shape from shading, intrinsic image decomposition.Color Constancy with CNNs. Multiple illuminant estimation & separation.

13

Page 13: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

TOPICSTOPICSTopic 6: Computational Photography II

Wacky cameras

Dark flash photographyCoded aperture and coded exposure photography. Light-field cameras.Structured light, Time-of-Flight cameras.Others (flatcam, modulo cameras, ....)Learning camera designs and acquisition setups.

14

Page 14: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

GRADINGGRADING(probably) less work than 559A, but more self-directed !

Most of your grade (60%) will be based on two projects (30% each)You will also give a 15-20 minute presentation on your first project (15%)

Length will be based on no. of people.25% of your grade will be based on completing 5 homework 'paper reviews'.

Again: No Exams !

But do read the late and collaboration policies on course website. (No late days!)

16

Page 15: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

GRADING: HOMEWORK REVIEWSGRADING: HOMEWORK REVIEWSFor each homework, I will assign a paper for you to read and review.

See website for (tentative) schedule of when papers assigned, and reviews due.Your review will be short: 1-2 pages (LaTeX template will be posted)

Should summariz e three things:

Novelty/Significance of the paperBased on stuff w e've been talking about in class, what's new and diff erent about this paper ?

Summary of Technical ContentGenerally brief. You should read and understand the paper to the level that you're confident you couldimplement it if you tried.Summariz e the content in your own words so that I know you've understood.Think of eff ort being equal to writing a 'project proposal' if you were going to implement this paper asyour project.

Criticisms and LimitationsNeed to understand that papers, even those accepted to CVPR, aren't 'authoritative'. Read it critically.Talk about where the paper falls short: e.g., certain cases not considered in experiments, novelty lessthan claimed, real-world utility less than claimed, ...

17

Page 16: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

GRADING: PROJECTSGRADING: PROJECTSTwo projects: due in the middle and end of the semester.Expect you to read and implement 1-2 published papers in each.Restriction: Project 1 should be on a paper related to topics 1-3, and Project 2 on topics 3-5.

You can't have both projects based on topic 3 (recognition/classification)Paper can be one we discussed in class: but then bar is higher in terms of analysis.Like for 559A, a short proposal will be due where you tell me which paper(s) you're working on.Analyz e method. Implement yourself / modify or analyz e existing implementation.Write a report (about 6 pages). Template and Rubric will be posted.

18

Page 17: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

GRADING: PRESENTATIONGRADING: PRESENTATIONPrepare a 15-20 minute talk based on Project 1.Including 3-5 minutes for questions.Last month of semester, one day a week will be for these presentations.Schedule will be determined randomly.But irrespective of when you present, everyone's slides will need to be submitted at the same time (prior tothe first day of presentations).

19

Page 18: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

GRADINGGRADINGUnlike 559A, you won't dive into and implement every algorithm we discuss.

Too many of them !

Three Levels of Familiarity

Basic overview of many algorithms in lectures.Suggested readings for you to choose, when you want to go deeper.

Read and Review five papers for homeworks.Read analyz e and implement 1-2 paper each for two projects.

20

Page 19: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

ADMINISTRIVIAADMINISTRIVIAGo to the course website: http://www.cse.wustl.edu/~ay an/courses/cse659a/Has links to piazz a and canvas page for course.My Offic e Hours: By appointment, make a private post on piazz a (no TAs)Slides will be posted on the course website a er class.Readings will also be posted:

Typically papers we are discussing, will discuss in class.Slides will announce suggested papers to read before coming lectures. . . .

Use Piazz a to air questions and have more in-depth discussions about papers we're covering in class.Take part in the discussion !

Any questions ?

21

Page 20: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

MODELING SPATIAL RELATIONSHIPSMODELING SPATIAL RELATIONSHIPSConsider the generic task of estimating an image like object

is the value of at pixel location

Continuous valued: like images, depth maps, flow fields, normal maps, ...

Discrete valued: for some label set

Per-pixel labels or segmentation maps, discrete valued disparities, etc.Since inference problems in vision are ill-posed or under-determined, we want to impose a "prior" orregulariz er on

Typically, this takes the form of constraining neighboring values of to be related in some way (e.g.,imposing smoothness).

Diff erent such models for spatial relationships: for continuous and discrete values.

For each model, we must ensure that it faithfully encodes the expected structure of , and allows efficientinference.

X

X[n] X n

X[n] ∈ ℝ

d

X[n] ∈ � �

X

X[n]

X

22

Page 21: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPS

is the data terms: derived from observations in some way.

E.g., image restoration:

is based on our prior expectation for . And that's where we encode these spatial relationships.

Recap: Bay esian interpretation

= arg    ϕ(X) + R(X)X

 

min

X

ϕ(X)

ϕ(X) = ‖X − Y ,ϕ(X) = ‖AX − Y‖

2

2

R(X) X

Φ(X) = − log p(Y|X),          R(X) = − log p(X)     ⇒ = arg p(X|Y) = arg p(Y|X)p(X)X

 

max

X

max

X

23

Page 22: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPSSimplest form of prior: gradients are small !

Define as sum of independent penalties on diff erent gradients of (based on gradient filters )

For example: the summation could be over the diff erent wavelet coefficients of ( correspond to waveletfilters).

Each takes a scalar-valued input and imposes its own penalty.

E.g., squared regulariz ers:

We can also have L1 penalties (remember problem set 2 from 559)

More generally,

R(X) X { }∇

f

R(X) =

(

( ∗ X)(n)

)

n

f

R

f

f

X ∇

f

(⋅)R

f

(v) = λR

f

v

2

(v) = λ|v|R

f

(v) = λ|v ,    p > 0R

f

|

p

24

Page 23: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPS

What is the eff ect of diff erent values of ?

Two ways to analyz eThink of what probability distribution each corresponds to.

Think of the eff ect of minimizing

For , this is a z ero-mean Gaussian distribution, with variance .

For , this is a "Laplace" distribution. It also has z ero mean, and variance

More generally for arbitrary , these correspond to Generaliz ed Gaussian Distributions

As you decrease the value of , the distribution gets "heavier tailed".

R(v) = λ|v ,    p > 0|

p

p

(v − + R(v)v

0

)

2

R(v) = λ|v     ⇒    p(v) ∝ exp(−λ|v )|

p

|

p

p = 2 =σ

2 1

p = 1 � =v

2 2

λ

2

p

p

25

Page 24: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPSPlots of , for diff erent values of . Diff erent values were chosen for each , sothat these distributions have the same variance.

We see that has higher probability than a Gaussian ( ) at both 0 and high-magnitudes (tails) !

The distributions drop sharply from 0, but then flatten out.

Turns out, heavier-tailed distributions are a closer match to actual distributions of gradients in naturalimages (and depth maps, etc.)This is because gradients are very small in most (smooth) regions except at discontinuities / edges. But atthese edges, the gradients can be arbitrarily large.

log p(v), p(v) ∝ exp(−λ|v )|

p

p λ p

p < 2 p = 2

p < 2

26

Page 25: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPSUnderstanding the eff ect of heavy-tailed regulariz ers

Shrinkage Function

Can be thought of as "denoising" with the prior corresponding to .

Would correspond to an algorithm for actual denoising if you defined on the entire image in terms ofsums of regulariz ers on wavelet coefficients.

is a unitary matrix corresponding to the wavelet transform, and is the wavelet coefficient of .

Then, you would take the wavelet transform to , solve an independent minimiz ation for each coefficient ,and take the inverse Wavelet transform.

( ) = arg α(v − + R(v)s

R:α

v

0

min

v

v

0

)

2

R

R(X)

= α|X − Y + ((WX )X

 

|

2

i

R

i

)

i

W (WX)

i

i

th

X

Y

C ← WY ,      ← ( ),      =C

 

i

s

:αR

i

C

i

X

 

W

−1

C

 

27

Page 26: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPSShrinkage Function: for diff erent

Again, regulariz ers correspond to the same prior variance. Same value of used.

For / Gaussian, we get a straight line with a lower slope. Because it shrinks by a constant factor.

: The eff ect is non-linear. All values below some threshold are z ero-ed out. But conversely, these betterpreserve higher-magnitude values which could correspond to true edges.

( ) = arg α(v − + R(v)s

R:α

v

0

min

v

v

0

)

2

p

α

p = 2

p < 2

28

Page 27: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

CONTINUOUS VALUED MAPSCONTINUOUS VALUED MAPSSo, when talking about simple gradient regulariz ers, or heavy-tailed regulariz ers are o en preferable.

Because they better encode true distributions of gradients, and "do what we want".But denoising by shrinkage is (mostly) a toy example. How do you apply them in a general optimiz ationscenario ?

When its not simple denoising ?When we aren't considering coefficients of a W avelet or some other 'unitary' transform ?

We've seen the case in 559A. This is a least-squares problem. Can be solved in the Fourier domain, orby conjugate gradient.

is a convex optimiz ation problem, but not least-squares.

is not even convex !

p < 2

= arg ‖AX − Y + λ + λX

 

min

X

2

n

|( ∗ X)[n]|∇

x

p

n

( ∗ X)[n]

y

p

p = 2

p = 1

p < 1

29

Page 28: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

OPTIMIZATIONOPTIMIZATION

Recall for the case when

are matrices corresponding to convolution with and

Represent a image as a single vector with elements.

Do the same for the gradient maps.

and are matrices which represent convolution.

Sparse matrices: most elements are 0, and you have the same values in diff erent columns but shi edaround.

This is a least squares problem for minimizing

where is a matrix and is a -dimensional vector.

= arg ‖AX − Y + λ + λX

 

min

X

2

n

|( ∗ X)[n]|∇

x

p

n

( ∗ X)[n]

y

p

p = 2

= arg ‖AX − Y + λ‖ X + λ‖ XX

 

min

X

2

G

x

2

G

y

2

,G

x

G

y

x

y

W × H X[n] X WH

G

x

G

y

WH × WH

QX − 2 B    Q = A + λ( + ),   B = YX

T

X

T

A

T

G

T

x

G

x

G

T

x

G

y

A

T

= BX

 

Q

−1

Q WH × WH B WH

30

Page 29: CSE 659A: Advances in Computer Visionayan/courses/cse659a/pdfs/lec1.pdfSource: Carven von Bearnensquash, "Paper Gestalt", 2010. 4. INTRODUCTION Computer Vision is a rapidly changing

OPTIMIZATIONOPTIMIZATION

is a huge matrix but sparse, and in this case (because of convolutions) diagonaliz ed by the Fouriertransform.

We talked about how you can compute in the Fourier domain, or by using conjugate gradient (whereat each iteration, you only have to multiply by ).

But what about ?

This is the domain of continuous optimiz ation. A number of algorithms available (ESE 415).But we'll talk about two simple (yet eff ective) approaches:

Iterative Reweighted Least Squares (IRLS)Half-quadratic Splitting

= arg ‖AX − Y + λ + λ ,    p = 2X

 

min

X

2

n

|( ∗ X)[n]|∇

x

p

n

( ∗ X)[n]

y

p

= arg ‖AX − Y + λ‖ X + λ‖ XX

 

min

X

2

G

x

2

G

y

2

= B,     Q = A + λ( + ),   B = YX

 

Q

−1

A

T

G

T

x

G

x

G

T

x

G

y

A

T

Q

BQ

−1

Q

p < 2

31