Art and Visual Perception Thesis

Embed Size (px)

Citation preview

  • 8/14/2019 Art and Visual Perception Thesis

    1/128

    THE ART OF SEEING: VISUAL PERCEPTION INDESIGN AND EVALUATION OF

    NON-PHOTOREALISTIC RENDERING

    BY ANTHONY SANTELLA

    A Dissertation submitted to the

    Graduate SchoolNew Brunswick

    Rutgers, The State University of New Jersey

    in partial fulllment of the requirements

    for the degree of

    Doctor of Philosophy

    Graduate Program in Computer Science

    Written under the direction of

    Doug DeCarlo

    and approved by

    New Brunswick, New Jersey

    May, 2005

  • 8/14/2019 Art and Visual Perception Thesis

    2/128

  • 8/14/2019 Art and Visual Perception Thesis

    3/128

    ABSTRACT OF THE DISSERTATION

    The Art of Seeing: Visual Perception in Design and

    Evaluation of Non-Photorealistic Rendering

    by Anthony Santella

    Dissertation Director: Doug DeCarlo

    Visual displays such as art and illustration benet from concise presentation of in-

    formation. We present several approaches for simplifying photographs to create such

    concise, artistically abstracted images. The difculty of abstraction lies in selecting

    what is important. These approaches apply models of human vision, models of image

    structure, and new methods of interaction to select important content. Important loca-

    tions are identied from eye movement recordings. Using a perceptual model, features

    are then preserved where the viewer looked, and removed elsewhere. Several visual

    styles using this method are presented. The perceptual motivation for these techniques

    makes predictions about how they should effect viewers. In this context, we validate

    our approach using experiments that measure eye movements over these images. Re-

    sults also provide some interesting insights into artistic abstraction and human visual

    perception.

    ii

  • 8/14/2019 Art and Visual Perception Thesis

    4/128

    Acknowledgements

    Thanks go to the many people whose help and support was essential in making thiswork possible. None of this would have happened without my advisor Doug DeCarlo.

    Thanks go also to my other committe members: Adam Finkelstein, Eileen Kowler,

    Casimir Kulikowski and Peter Meer for their advice and encouragement at various (in

    some cases many) stages of this process.

    Thanks go also to the many friends and family members who have supported and

    kept me sane through this long process. I wouldnt have survived it without my parents

    and brothers Nick and Dennis. Special thanks go to Bethany Weber. Thanks also to

    Jim Housell, all the old NYU crowd, the grad group at St. Peters and all the supportive

    souls in the CS Department, RuCCS and the VILLAGE.

    Finally, thanks go to Phillip Greenspun for photos used in several renderings that

    appear in chapters 7 and 9, as well as models Marybeth Thomas, Adeline Yeo and

    Franco Figliozzi. Special thanks to Georgio Dellachiesa for looking equally thoughtful

    in countless illustrative examples.

    iii

  • 8/14/2019 Art and Visual Perception Thesis

    5/128

    Table of Contents

    Abstract . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iiAcknowledgements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . iii

    List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . vii

    1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.1. Inspirations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.1.1. Artistic Practice . . . . . . . . . . . . . . . . . . . . . . . . 3

    1.1.2. Psychology . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    1.1.3. Computer Graphics . . . . . . . . . . . . . . . . . . . . . . . 7

    1.2. Our Goal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2. Abstraction in Computer Graphics . . . . . . . . . . . . . . . . . . . . 11

    2.1. Manual Annotation . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

    2.2. Automatic Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

    2.3. Level Of Detail . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

    3. Human Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.1. Eye Movements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    3.1.1. Eye Movement Control . . . . . . . . . . . . . . . . . . . . . 19

    3.1.2. Salience Models . . . . . . . . . . . . . . . . . . . . . . . . 20

    3.2. Eye Tracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

    3.3. Limits of Vision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

    3.3.1. Models of Sensitivity . . . . . . . . . . . . . . . . . . . . . . 24

    3.3.2. Sensitivity Away from the Visual Center . . . . . . . . . . . . 26

    3.3.3. Applicability to Natural Imagery . . . . . . . . . . . . . . . . 26

    iv

  • 8/14/2019 Art and Visual Perception Thesis

    6/128

    4. Vision and Image Processing . . . . . . . . . . . . . . . . . . . . . . . 30

    4.1. Image Structure Features and Representation . . . . . . . . . . . . . 30

    4.2. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

    4.3. Edge Detection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

    5. Our Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

    5.1. Eye tracking as Interaction . . . . . . . . . . . . . . . . . . . . . . . 38

    5.2. Using Visibility for Abstraction . . . . . . . . . . . . . . . . . . . . . 40

    6. Painterly Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    6.1. Image Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    6.2. Applying the Limits of Vision . . . . . . . . . . . . . . . . . . . . . 43

    6.3. Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

    6.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    7. Colored Drawings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    7.1. Feature Representation . . . . . . . . . . . . . . . . . . . . . . . . . 50

    7.1.1. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 50

    7.1.2. Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

    7.2. Perceptual Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

    7.3. Rendering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

    7.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

    8. Photorealistic Abstraction . . . . . . . . . . . . . . . . . . . . . . . . . 64

    8.1. Image Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    8.2. Measuring Importance . . . . . . . . . . . . . . . . . . . . . . . . . 65

    8.3. Results and Discussion . . . . . . . . . . . . . . . . . . . . . . . . . 67

    9. Evaluation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72

    v

  • 8/14/2019 Art and Visual Perception Thesis

    7/128

    9.1. Evaluation of NPR . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

    9.1.1. Analysis of Eye Movement Data . . . . . . . . . . . . . . . . 75

    9.2. Experiment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    9.2.1. Stimuli . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

    9.2.2. Subjects . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

    9.2.3. Physical Setup . . . . . . . . . . . . . . . . . . . . . . . . . 78

    9.2.4. Calibration and Presentation . . . . . . . . . . . . . . . . . . 79

    9.3. Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    9.3.1. Data Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . 80

    9.3.2. Statistical Analysis . . . . . . . . . . . . . . . . . . . . . . . 82

    9.4. Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

    9.4.1. Quantitative Results . . . . . . . . . . . . . . . . . . . . . . 86

    9.4.2. Discussion . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

    9.5. Evaluation Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . 92

    10. Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    10.1. Image Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9510.1.1. Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . 95

    10.1.2. Edges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

    10.2. Perceptual Models . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

    10.3. Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

    11. Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .105

    References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .108Curriculum Vita . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .116

    vi

  • 8/14/2019 Art and Visual Perception Thesis

    8/128

    List of Figures

    1.1. (a) Henri de Toulouse-Lautrecs Moulin RougeLa Goulue (Litho-

    graphic print in four colors, 1891). (b) Odd Nerdrums Self-portrait

    as Baby (Oil, 2000). Artists control detail as well as other features

    such as color and texture to focus a viewer on important features and

    create a mood. La Goulues swirling under-dress is a highly detailed

    focal point of the image, and contributes to the pictures air of reck-

    less excitement. Artists have a fair amount of latitude in how they

    allocate detail to create an effect. Nerdrum renders his eyes (usually

    one of the most prominent features in a portrait) in a sfumato style

    that makes them almost nonexistent. Detail is instead allocated to the

    childs prophetic gesture. These choices change a common baby pic-

    ture into something mysterious and unsettling. . . . . . . . . . . . . 4

    1.2. Judith Schaechters, Corona Borealis (Stained glass, 2001). Skill-

    ful artists use the formal properties and constraints of a medium forexpressive purposes. The high dynamic range provided by transmit-

    ted light and the heavy black outlines of the lead caming that holds

    the glass together are used to set the gure off from the background

    creating a powerful image of joy in isolation. . . . . . . . . . . . . . 5

    2.1. Direct placement of strokes. Complete control of abstraction is pos-

    sible when a user provides actual strokes that are rendered in a given

    style. Reproduced from [Durand et al, 2001]. . . . . . . . . . . . . . 11

    2.2. Manual annotation for textural indication. Important edges on a 3D

    model are marked and have texture rendered near them, while it is

    omitted in the interior. Reproduced from [Winkenbach and Salesin,

    1994]. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

    vii

  • 8/14/2019 Art and Visual Perception Thesis

    9/128

  • 8/14/2019 Art and Visual Perception Thesis

    10/128

    4.1. (a) Scale space of one dimensional signal. Features disappear through

    scale space but no new features appear. (b) Plot of inection points of

    another one dimensional signal through scale space. Reproduced from

    [Witkin 1983] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2. Interval tree for 1D signal illustrating decomposition of the signal into

    a hierarchy. Reproduced from [Witkin 1983]. . . . . . . . . . . . . . 33

    5.1. (a) Computing eccentricities with respect to a particular xation at p .

    (b) A simple attention model dened as a piecewise-linear function for

    determining the scaling factor a i for xation f i based on its duration

    t i. Very brief xations (below t min ) are ignored, with a ramping up (at

    t max ) to a maximum level of a max . . . . . . . . . . . . . . . . . . . . 40

    6.1. Painterly rendering results. The rst column shows the xations made

    by a viewer. Circles are xations, size is proportional to duration, the

    bar at the lower left is the diameter that corresponds to one second. The

    second column illustrates the painterly renderings built based on that

    xation data. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

    6.2. Detail in background adjacent to important features can be inappro-priately emphasized. The main subject has a halo of detailed shutter

    slats. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

    6.3. Sampling strokes from an anisotropic scale space avoids giving the

    image an overall blurred look, but produces a somewhat jagged look in

    background areas. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

    6.4. Color and contrast manipulation. Side by side comparison or render-

    ing with and without color and contrast manipulation (precise stroke

    placement varies between the two images due to randomness). . . . . 48

    7.1. Slices through several successive levels of a hierarchical segmentation

    tree generated using our method. . . . . . . . . . . . . . . . . . . . . 51

    7.2. Line drawing style results. . . . . . . . . . . . . . . . . . . . . . . . 60

    ix

  • 8/14/2019 Art and Visual Perception Thesis

    11/128

    7.3. Stylistic decisions. Lines in isolation (a) are largely uninteresting. Un-

    smoothed regions (b) can look jagged. Smoothed regions (c) have a

    somewhat vague and bloated look without the black edges superimposed. 61

    7.4. Renderings with uniform high and low detail. . . . . . . . . . . . . . 62

    7.5. Several derivative styles of the same line drawing transformation. (a)

    Fully colored, (b) color comic, (c) black and white comic . . . . . . . 62

    8.1. Mean shift ltering tends to create images that no longer look like pho-

    tographs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

    8.2. Photo abstraction results . . . . . . . . . . . . . . . . . . . . . . . . 68

    8.3. Photo in (a) is abstracted using xations in (b) in a variety of differ-

    ent styles. (c) Painterly rendering, (d) line drawing, (e) locally disor-

    dered [Koenderink and van Doorn, 1999], (f) blurred, (g) anisotropi-

    cally blurred. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

    8.4. (a) Detail of our approach, (b) the same algorithm using an importance

    map where total dwell is measured locally. Notice in (b) the leaking of

    detail to the wood texture from the object on the desk. Here differences

    are relatively subtle; but in general it is preferable to allocate detail in

    a way that respects region boundaries. . . . . . . . . . . . . . . . . . 70

    8.5. The range of abstraction possible with this technique is limited. With

    greater abstraction the scene begins to appear foggy. In some sense it

    no longer looks like the same scene. . . . . . . . . . . . . . . . . . . 71

    9.1. Example stimuli. Detail points in white are from eye tracking, black

    detail points are from an automatic salience algorithm. . . . . . . . . 76

    9.2. Illustration of data analysis, per image condition. Each colored collec-

    tion of points is a cluster. Ellipses mark 99 % of variance. Large black

    dots are detail points. We measure the number of clusters, distance

    between clusters and nearest detail point, and distance between detail

    points and nearest cluster. . . . . . . . . . . . . . . . . . . . . . . . 80

    x

  • 8/14/2019 Art and Visual Perception Thesis

    12/128

    9.3. Statistical signicance is achieved for number of clusters over a wide

    range of clustering scales. The magnitude of the effect decreases, but

    its signicance remains quite constantly over a wide interval. Our re-

    sults do not hinge on the scale value selected. . . . . . . . . . . . . . 829.4. Average results for all analyses per image. . . . . . . . . . . . . . . 84

    9.5. Average results for all analyses per viewer. . . . . . . . . . . . . . . 85

    9.6. Original photo and high detail NPR image with viewers ltered eye

    tracking data. Though we found no global effect across these image

    types, there are sometimes signicantly different viewing patterns, as

    can be seen here. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

    10.1. A rendering from our line drawing system (b), can be compared to

    an alternate locally varying segmentation (c). This segmentation more

    closely follows the shape of shading contours. . . . . . . . . . . . . 96

    10.2. Locally varying segmentation cannot replace a segmentation hierar-

    chy. Another example of a locally varying segmentation controlled by

    a perceptual model (c), compared to a rendering from our line drawing

    system. Note ne detail in the brick preserved near the subjects headin (c). This is a consequence of the threshold varying continuously as

    a function of distance from the xations on the face. . . . . . . . . . 97

    10.3. A rendering from our line drawing system demonstrates how long but

    unimportant edges can be inappropriately emphasized. Also, promi-

    nent lower frequency edges like creases in clothing are detected in

    fragments and ltered out because edges are detected at only one scale. 100

    10.4. Attempting technical illustration of mechanical parts pushes our image

    analysis techniques close to (if not over) their limits. . . . . . . . . . 103

    xi

  • 8/14/2019 Art and Visual Perception Thesis

    13/128

    1

    Chapter 1

    IntroductionIn all eras and visual styles, artists control the amount of detail in the images they

    create, both locally and globally. This is not just a technique to limit the effort in-

    volved in rendering a scene. It makes a denite statement about what is important

    and streamlines understanding. Our goal is to largely automate this artistic abstrac-

    tion in computer renderings. The hope is to remove detail in a meaningful way, while

    automating individual decisions about what features to include. Eye tracking allows

    the capture of what a viewer looks at and indirectly, what they nd important. We

    demonstrate that this information alone is sufcient to control detail in an image based

    rendering, and change the way successive viewers look at the resulting image. Our

    method is grounded in the mechanisms and nature of visionhow we see and un-

    derstand the world. This is an intuitive idea, if often overlooked. Artists must rst

    be viewers [Ruskin, 1858] and viewers ultimately consume the resulting images. So,

    vision must be central in the design of algorithms for creating imagery.

    Vision appears simple and effortless. Because under most circumstances it requires

    no conscious effort or exertion, it seems like a trivial operation, something that just

    happens, as if the light falling on the eye made one see in the same way it warms a stone.

    But sight is the product of an extraordinarily developed and complicated visual system.

    In seeing we are all experts, and experts make things seem easy. Without any effort we

    can navigate and act in the world and recognize objects even under difcult conditions.

    The abilities of our sight outreach even our awareness of them. Experiments have

    shown that the eyes of radiologists searching for tumors linger longer over tumors

    that they fail to notice and report [Mello-Thoms et al., 2002]. The limited success of

    attempts to mimic these human abilities in computer vision systems highlight both the

    difculty of the computations involved, and our phenomenal success at them.

  • 8/14/2019 Art and Visual Perception Thesis

    14/128

    2

    The apparent ease with which we see slips when our vision is stressed: struggling

    to keep a written page in focus as we fall asleep, searching for a loved ones face in the

    shifting crowd of an airport. At these times we become conscious of sight as a struggle

    to organize and make sense of the world. This struggle has continual victories, but alsofailures. An old friend waving to us on the street is passed by, a typo make its way into

    an important document. The apparent ease of vision also masks our limitations. We

    miss much, and are easily overloaded. Sometimes our failures are engineered: a cam-

    ouaged soldier, the proverbial ne print. More often, however, they are accidental.

    Some information was present, or presented, and we failed to notice it.

    Well designed displays of visual information ensure we dont miss anything impor-

    tant by careful arrangement and manipulation. A wide variety of techniques are used

    to make meaning clear. Detail is put just where it is important, shapes can be changed

    or removed, colors and textures enhanced or suppressed. Paintings, sketches, technical

    illustrations, and even the most apparently photorealistic of artall products of the hu-

    man handhave been simplied and manipulated for ease of understanding. Reality

    is complicated and messy. Rather than realism, what is more often desired is verisimil-

    itude. We want the appearance of reality which has been organized and structured tomake its meaning clearer, if necessarily more limited than the innite complexity of

    reality.

    Achieving this kind of clarity has always been the job of artists and designers who

    make subjective, but not arbitrary, decisions about what is important, and how to con-

    vey it. The ubiquity of digital media creates a need for automation in achieving this

    kind of good design. The goal is not to replace the artist who creates a carefully crafted

    one-off display, but instead to create a potentially vast number of adaptive displays,

    tailored to particular situations and viewers. This information would otherwise be dis-

    played in some less well-designed manner, laying more of a cognitive burden on the

    user. It has been argued in fact, that avoiding this burden is one of the primary char-

    acteristics of powerful art [Zeki, 1999]. If good design can be formalized, this will

  • 8/14/2019 Art and Visual Perception Thesis

    15/128

    3

    enhance understanding and aid effective communication, as well as improve our own

    understanding of the workings of visual communication. This thesis presents some

    initial steps toward this goal.

    1.1 Inspirations

    There are many techniques proposed by various artists, and perhaps even more theory

    proposed by various researchers and critics on how to achieve good visual design. Yet

    it remains imperfectly understood in all of the elds where it has been studied. Because

    of this, a successful practical approach must necessarily draw on elements from many

    areas of practice and theory. If a practical system is designed to be as general as

    possible, its creation can improve understanding of what visual clarity means, and how

    it relates to communication. It can also provide a framework in which to unify concepts

    and techniques from many elds.

    1.1.1 Artistic Practice

    One important source of inspiration for this work is artistic practice and practical the-

    ory. Artists have always had strong motivation to capture the attention and interest of

    uninterested, sometimes hostile viewers. Much ingenuity has been applied to creating

    images that are as gripping and clearly communicative as possible. Careful observation

    of such images can yield interesting insights (see Figure 1.1). Similarly, artists have

    throughout history given advice on the practice of their craft. Theorists and art histo-

    rians have tried to make generalizations and analyze techniques [Ruskin, 1857, Gom-

    brich et al., 1970, Graham, 1970, Arnheim, 1988]. This is true in graphic design as

    well as ne art. Classical texts like Tufte [1990] try to explore the qualities of good

    and bad presentations of information and make generalizations from carefully chosen

    examples.

    However, these instructions and recommendations are often difcult to apply. They

  • 8/14/2019 Art and Visual Perception Thesis

    16/128

    4

    (a) (b)

    Figure 1.1: (a) Henri de Toulouse-Lautrecs Moulin RougeLa Goulue (Litho-graphic print in four colors, 1891). (b) Odd Nerdrums Self-portrait as Baby (Oil,2000). Artists control detail as well as other features such as color and texture to focusa viewer on important features and create a mood. La Goulues swirling under-dress isa highly detailed focal point of the image, and contributes to the pictures air of recklessexcitement. Artists have a fair amount of latitude in how they allocate detail to createan effect. Nerdrum renders his eyes (usually one of the most prominent features in aportrait) in a sfumato style that makes them almost nonexistent. Detail is instead allo-cated to the childs prophetic gesture. These choices change a common baby pictureinto something mysterious and unsettling.

    are sometimes limited in scope, providing specic instructions for a particular narrow

    problem. More often, guidelines are too broad and vague in their application. They

    count for their functioning on the judgment of the artist. The advice of artists and

    designers often comes in the form of heuristics, rules of thumb to be taken with a grain

    of salt, kept in the back of ones mind, and applied when the moment seems right.

    Becoming an expert in a visual eld is often a question of cultivating, through practice

    and observation, an instinctive sense of when to apply such rules, and conversely when

    to break them.

  • 8/14/2019 Art and Visual Perception Thesis

    17/128

    5

    Figure 1.2: Judith Schaechters, Corona Borealis (Stained glass, 2001). Skillfulartists use the formal properties and constraints of a medium for expressive purposes.

    The high dynamic range provided by transmitted light and the heavy black outlinesof the lead caming that holds the glass together are used to set the gure off from thebackground creating a powerful image of joy in isolation.

    1.1.2 Psychology

    A somewhat different approach is to study good design with the methodologies of psy-

    chology, psychophysics and neuroscience. This is in essence an attempt to understand

    good design from rst principles: the functioning of the human mind and visual sys-

    tem. Visual perception obviously mediates all information that passes from a display

    to a user. So, as a form of visual communication, art must be constrained by the laws

    of psychology and the visual system [Arnheim, 1988, Zeki, 1999,Ramachandran and

  • 8/14/2019 Art and Visual Perception Thesis

    18/128

    6

    Hirstein, 1999]. This is an attractive idea. By understanding the strengths and weak-

    nesses of the process that allows us to see, it should be possible to maximize use of the

    limited cognitive bandwidth between a display and viewer.

    This is perhaps not so far from what artists have done all along. One could view

    every daub of paint, every pen stroke as an informal experiment in vision. Artists test

    their actions against the evidence of their own visual systems, and make predictions

    about how they will affect others. Formal attempts to understand perception and art

    are simply more conscious, more systematic, and more interested in understanding the

    creative process itself than making a statement through it. A number of psychologists

    have speculated on this, and pointed to specic examples from art history [Arnheim,

    1988, Leyton, 1992, Zeki, 1999, Ramachandran and Hirstein, 1999]. Studies have in-

    deed found empirical evidence of perceptual effects resulting from artistic style or com-

    position [Ryan and Schwartz, 1956,Locher, 1996].

    Like most attempts to do anything complicated from rst principles, looking at art

    and design using cognition is hard. There is much that has been understood about the

    visual system, but also much that is not. The more basic and low level the area of

    visual function is, the more we know about it, and the less useful that information is

    for design. Much for example, is known about the physical mechanism of how we per-

    ceive color, substantially less is known about how we parse shapes out of a background

    and assemble them into objects. Its not surprising that many researchers looking at art

    from a cognitive standpoint consider primarily 20th century painters, like Mondrian,

    Kandinsky, or even Picasso at his more abstract, who themselves were largely con-

    cerned with the purely formal aspects of pictorial space rather than the semantics of

    subject matter. The semantic aspects of vision which reference the rest of the world

    and its non-visual aspects are ill understood, so little cognitive research can be brought

    to bear on the semantics of art.

    Given the limited basic knowledge, general theories of how art functions cogni-

    tively are, almost of necessity, rather vague in their application. Ramachandran [1999]

  • 8/14/2019 Art and Visual Perception Thesis

    19/128

    7

    for example, suggests that all art is guided by the peak shift principle. This principle,

    found in a number of situations in psychology, says that if a response is trained to some

    stimuli, the greatest, or peak, response will be found with a stimulus that is greater than

    the one used in training. A depiction functions by emphasizing the features that nor-mally let one know what it is. In this view all art is a form of caricature. However,

    this does not tell us the qualities of a successful caricature. In another example, Leyton

    [1992] argues that art maximally encodes a causal history that can be read by viewers.

    Good art should contain as much information in the form of asymmetry as possible to

    stimulate viewers, but not too much, which will disturb them. Though a reasonable

    sounding standard, this only hints at what the correct level of complexity is.

    The application of psychology to design is difcult. However, we do not need to

    build a system directly on these principles. Inspired by them, we can apply knowledge

    from low-level vision and computer graphics techniques to build practical systems.

    1.1.3 Computer Graphics

    A large body of work in computer graphics ignores all these difculties and sets out

    to create attractive synthetic art and illustration. Attempts at algorithmic denitions of

    good design surface in a number of areas in computer science, graphics, scientic vi-

    sualization, document layout, human computer interaction, and interface design. Con-

    cerns of effective art-like visual communication have particularly come to the forefront

    in the realm of non-photorealistic rendering, or NPR. This area is perhaps excessively

    broad. It includes almost any part of graphics that aims to create images that are not an

    imitation of reality. It includes things as diverse as computer generation of geometrical

    patterns, instructional diagrams and impressionist paintings. NPR images run a gamut

    between the purely ornamental and those designed to convey very specic information.

    A large area of research in NPR has been the production of many, often quite impres-

    sive, phenomenological models for rendering in various traditional media and styles.

    There is however an increasing interest in NPR as not just a way to imitate traditional

  • 8/14/2019 Art and Visual Perception Thesis

    20/128

    8

    visual styles, but also as a set of techniques for trying to display visual information in

    a concise and abstract way.

    The link between concise presentation and imitating traditional artistic styles is not

    accidental. Almost all the visual styles of traditional media, line drawings, wood-block

    prints, comics, expressionist or impressionist paintings, pencil sketches, necessarily

    discard vast amounts of information as a direct consequence of their visual style. There

    is, for example, no color or shading in a pure line drawing. However, these images still

    carry the essential content that the artist (and viewer) requires of them. Skillful artists

    can use the properties and constraints of a medium to enhance the expressiveness of

    a work (see Figure 1.2). A brief time spent working with photo lters in a program

    like Adobe Photoshop suggests that computer implementations of these styles capture

    some of the effects of traditional media, but often in a way that does not adapt to

    particular situations with an artists exibility. Artists ultimately can judge their results

    as they go. Applying a technique in a blanket manner is often less satisfactory. What

    is acceptable as reality in a photograph can look fussy and crowded as a painting.

    1.2 Our Goal

    Though todays algorithms cannot model the general intelligence of an artist, we argue

    that carefully designed systems can make use of minimal user interaction to create

    much more expressive images. Specically, we look at modulation of local detail, an

    important cue used in traditional art and visualization. Including detail only where it

    is needed focuses viewer interest and can help clarify the point of an image. As well

    as being a feature of art and illustration, applications in visualization could benet

    from this. It would allow the computer to hand-craft displays for clarity and efcient

    understanding in a particular situation.

    This work does not directly address specic visualization applications. Rather than

    exploring visualization directly, art remains the focus, and this thesis remains rmly in

  • 8/14/2019 Art and Visual Perception Thesis

    21/128

    9

    the relm of artistic NPR. Our hope however, is that insights gained in this way should

    be applicable to a number of areas in visualization. Art is a particularly good place to

    explore the link between cognition and design of displays. Specic applications tend

    to distract with their own implementation details and domain constraints. Radiology,for example, is a domain where complexity and high stakes greatly constrain practical

    applications. Art encourages a wider view, in which it is easier to look at general

    techniques and patterns that are widely useful. Similarly, in evaluation, validation of

    a particular system is of limited interest, while evaluation of more general techniques

    can provide insights into cognition and be more widely relevant.

    Grounding our work in knowledge of visual perception also helps focus attentionaway from application engineering and towards general concepts. We are interested in

    interactively efcient methods for achieving expressive NPR images. Knowledge of

    visual perception suggests that by exploiting the visual system we can reserve human

    effort for just the hardest parts of the process of crafting images, and pass the major-

    ity of the work over to a computer. For a computer application, the hardest part of

    abstraction is deciding what is important. This is not hard for people, since it is done

    instinctively. Deciding what to paint a picture of is the easy part for an artist. It is the

    mechanics of turning that intention into an image that takes training, time and effort.

    This leads us to a simple, minimally interactive method for controlling detail via eye

    tracking. As we will soon see, vision research leads us to believe that where people

    look indicates importance. Such areas should be portrayed in detail. Conversely, what

    viewers dont look at is unimportant to them and can be removed or de-emphasized.

    The same insights about vision that leads to this methodology also leads us to quanti-

    tative methods for evaluation. If our approach is successful, increased interest in areas

    highlighted with detail should be reected in eye movements. This methodology holds

    the promise of images that are carefully crafted for understanding on sound principles,

    and can be formally evaluated for effectiveness. Such images and techniques can in

    turn serve as a tool for further investigating human vision in a way targeted toward the

  • 8/14/2019 Art and Visual Perception Thesis

    22/128

    10

    questions that are important for crafting images. With more information, even better

    techniques and images can be built.

    In this thesis we begin in Chapter 2 by laying out the basic problem of control-

    ling detail in NPR imagery, and look at the range of techniques that have been usedto address it. In Chapters 3 and 4 we then review the basic background in human and

    computer vision underlying our approach to this problem. The nature of vision leads

    us to an approach of capturing the intentionality central to design via eye tracking.

    Information about where people look alone is sufcient to control detail in a directed

    way, allowing us to craft semi-automatic NPR images with much of the attractive and

    engaging intentionality of completely hand made art. The basic nature of this interac-

    tion is described in Chapter 5. In Chapter 6, 7 and 8 we then present several systems

    for creating NPR renderings built on this idea, and discuss their strengths and weak-

    nesses. An evaluation of one of these systems is presented in chapter 9, which not only

    validates the general approach but gives some interesting insights into abstraction and

    human vision. Finally, in Chapter 10 we discuss some directions for future research.

  • 8/14/2019 Art and Visual Perception Thesis

    23/128

    11

    Chapter 2

    Abstraction in Computer GraphicsIn any work of art all parts of the picture plane do not receive equal attention from the

    artist. Critical areas are more detailed, while others are left relatively abstract. This is

    the case even in quite realistic styles, and in technical illustration. Such effects have not

    been ignored in computer graphics and NPR. Local control of detail has been addressed

    in several visual styles. Whatever the rendering techniques used, important areas can

    be identied and depicted with greater detail, or emphasis on delity. Deciding what is

    important is difcult to do automatically. Two broad approaches to selecting important

    areas can be characterized: manual user annotation, and simple heuristics.

    Figure 2.1: Direct placement of strokes. Complete control of abstraction is possiblewhen a user provides actual strokes that are rendered in a given style. Reproduced from[Durand et al, 2001].

    2.1 Manual Annotation

    At one extreme, near complete control of detail can remain in the hands of a user.

    This provides many expressive possibilities at the expense of much interaction. At its

  • 8/14/2019 Art and Visual Perception Thesis

    24/128

    12

    Figure 2.2: Manual annotation for textural indication. Important edges on a 3D modelare marked and have texture rendered near them, while it is omitted in the interior.Reproduced from [Winkenbach and Salesin, 1994].

    Figure 2.3: Manual local importance images. Hand painted images can indicate im-portant areas to be rendered in greater detail or delity. Reproduced from [Hertzmann,2001]

    furthest extreme the computer becomes merely a digital paintbrush the user directly

    manipulates [Baxter et al., 2001]. A number of intermediate approaches exist that aid

    the user in the technicalities of creating an image while still giving them complete

    control over detail. The earliest work creating a painting-like appearance, or painterly

    rendering effect [Haeberli, 1990] took this approach. A user places strokes entirely

    by hand, their color being sampled from an underlying source image. The approach

    is in effect a form of tracing, where the user ultimately remains in control of stroke

    placement and size while, like a traditional media artist, making their own decisions

    about which details are important as they go. A similar kind of interaction has been

    used [Durand et al., 2001] in generating pencil renderings (see Figure 2.1. The user

    places strokes which are shaded and shaped automatically to create a nal drawing.

    The same stroke based interactive methods are applicable in 3D [Kalnins et al., 2002].

  • 8/14/2019 Art and Visual Perception Thesis

    25/128

    13

    One step distant from actually drawing strokes, it is also possible to indicate in-

    creased importance for some areas of a rendering using an importance map , where

    higher intensity indicates the need for more attention or detail in that area. For exam-

    ple in a painterly rendering framework [Hertzmann, 2001], a hand drawn importancemap was used to indicate that a source image should be more closely approximated in

    certain locations (see Figure 2.3). Similarly, [Winkenbach and Salesin, 1994] in 3D

    hand drawn lines have been used to indicate locations near which textural detail should

    be included (see Figure 2.2). In another painterly rendering application [Gooch and

    Willemsen, 2002] rectangles to be painted in greater detail could be drawn by hand.

    Various digital versions of other media, such as pen and ink [Salisbury et al., 1994]

    and watercolor [Curtis et al., 1997] have been developed that provide the user with a

    signicant control over the detail present in different areas. Such approaches can yield

    attractive results, but require careful attention on the part of a user.

    (a) (b) (c)

    Figure 2.4: (a) original image. (b) corresponding salience map [Itti et al, 1998]. (c)corresponding salience map [Itti and Koch, 2000]. Salience methods picks out poten-tially important areas on the basis of contrast in some space (not limited to intensity).The two methods pictured here differ in the method of normalization used to enhancecontrast between salient and nonsalient regions.

    2.2 Automatic Methods

    More common in NPR have been purely automatic methods. Automatic methods also

    run a gamut, from approaches that process an image in a completely local, uniform

    manner to those that automatically extract some quantity from an image as a proxy for

  • 8/14/2019 Art and Visual Perception Thesis

    26/128

    14

    importance. Uniform approaches perform some (not necessarily local) operation uni-

    formly across an image, and have been used extensively in painterly rendering [Hertz-

    mann, 1998,Litwinowicz, 1997, Shiraishi and Yamaguchi, 2000]. A global effect pro-

    vides users with only limited control. Rather than being truly uniform, some of theseapproaches make a (largely implicit) simple assumption that some low level features

    are important and worth preserving. Automatic painterly rendering methods for ex-

    ample, largely assume strong high frequency features are important and should be

    preserved in a rendering. In fact, painterly techniques vary largely in their method

    for respecting these boundaries: aligning strokes perpendicular to the image gradi-

    ent [Haeberli, 1990], terminating strokes at edges [Litwinowicz, 1997], or drawing in

    a coarse-to-ne fashion [Hertzmann, 1998, Shiraishi and Yamaguchi, 2000, Hays and

    Essa, 2004]. Similarly, automatic line drawing approaches (both 2D and 3D) assume

    the importance of all lines that meet certain purely geometrical denitions, occluding

    contours, creases, [Saito and Takahashi, 1990,Interrante, 1996,Markosian et al., 1997],

    and suggestive contours [DeCarlo et al., 2003]. Such techniques can create attractive

    images, but lack the selective omission which gives art much of its expressive power.

    The kind of omission commonly used in depicting specic objects can sometimesbe explicitly stated. In drawing trees for example, [Kowalski et al., 1999,Deussen and

    Strothotte, 2000] you can avoid drawing detail in the center of the tree, especially as the

    tree is drawn smaller. Though this may be an accurate characterization of a particular

    common style of depiction, it is not generally applicable to any subject.

    For general images, there are relatively few options for automatically selecting

    important areas. Some attempts have been made to predict importance using various

    image analysis techniques. In 3D, image pyramids have been applied to omit detail in

    the interior of a shape [Grabli et al., 2004]. In 2D, drawing on vision research, some

    approaches have attempted to use salience measures to capture importance. Salience

    measures are a guess at the ability of a feature to capture interest based on its low level

    properties [Itti et al., 1998,Itti and Koch, 2000]. Similarly motivated salience measures

  • 8/14/2019 Art and Visual Perception Thesis

    27/128

    15

    have been applied to attempt to predict features worth preserving in painterly rendering

    [Collomosse and Hall, 2003]. Because faces are often an important component of

    images, detecting them also provides a useful (though not always reliable) automatic

    cue for what areas are important. Face detection has been used alongside saliencemethods in other areas of graphics loosely related to NPR where identifying important

    features is useful, such as automatic cropping [Chen et al., 2002, Suh et al., 2003] and

    recomposing of photographs [Setlur et al., 2004].

    2.3 Level Of Detail

    An area of computer graphics left out in the above discussion has dealt with many of

    these same issues. Various adaptive rendering and level of detail ( LOD ) schemes have

    used the visibility or potential interest of features to skip computations that are unlikely

    to be noticed. This is different from our goal. We are interested in detail modulation for

    stylistic and expressive reasons. Level of detail seeks to control the computational cost

    of rendering through approximation, not abstraction. Though both are concerned with

    simplication, LOD and various other corner cutting is usually meant to be invisible,

    or nearly so, while expressive abstraction is meant to be seen and indeed have a strong

    effect on the way a viewer looks at an image. Though the goals are different, some

    of the methodologies overlap. The goal of imperceptible omission has encouraged

    researchers to look at perceptually motivated methods. Salience measures have been

    applied to concentrate computation on noticeable areas, [Yee et al., 2001, Cater et al.,

    2003]. In addition, a variety of low level perceptual models have been applied to try

    to quantify the visibility of features and guarantee that simplication is invisible, or

    minimize visibility. We adopt several of these metrics in our own efforts. One of

    our contributions can be seen as applying and expanding perceptual models originally

    adopted in LOD to create expressive artistic abstraction.

    Both perceptually motivated LOD methods and the methods we present in this

  • 8/14/2019 Art and Visual Perception Thesis

    28/128

    16

    thesis use models of vision to identify expendable areas of an image. It is the functional

    denition of an expendable area that differs between the two. In the following chapter

    we present the relevant background in human vision necessary for understanding why

    such areas exist, and how they may be identied.

  • 8/14/2019 Art and Visual Perception Thesis

    29/128

    17

    Chapter 3

    Human VisionA background in human vision is essential in computationally dening artistic abstrac-

    tion. We have extraordinarily complex abilities to analyze images, these abilities have

    weaknesses and strengths. Level of detail simplication methods seek to exploit the

    limits of vision to cut corners in an unnoticeable way. In contrast, we hope to use the

    related strengths of the visual system to improve visual design, clarifying content and

    make things that need to pop out, pop out. Our interactive technique uses eye move-

    ments and the limits of vision to indirectly measure the importance of features. Some

    background will clarify the motivation for this approach.

    3.1 Eye Movements

    The human eye is maximally sensitive over a relatively small central area called the

    macula. This area of relatively high resolution is approximately 5 degrees across, while

    the most sensitive region (the fovea) is only 1.3 degrees (from a total visual angle of

    about 160 degrees) [Wandell, 1995]. Sensitivity rapidly degrades outside of this central

    region. Our perception of uniform detail throughout space is a result of continually

    switching the point at which our eyes are looking (the point of regard or POR).

    This process involves two important types of eye motions: xations , relatively long

    periods spent looking at a particular spot, and saccades , very rapid changes of eye po-

    sition. These are not the only kinds of motion of which the eye is capable. In smooth

    pursuit the eye follows a moving object, and even when xated the eye continually

    makes very small jittery motions. Fixations and saccades however are the most signif-

    icant motions when viewing static scenery. Saccades can be initiated consciously, but

    for the most part occur naturally as we explore a scene. Though xating on a location

  • 8/14/2019 Art and Visual Perception Thesis

    30/128

    18

    Figure 3.1: Patterns of eye movements of a single subject over an image when givendifferent instructions. Note (1) free observation which shows xations that are rel-atively dispersed yet still focused on relevant areas. Contrast it with (3) where theviewer is instructed to estimate the gures ages. Reproduced from Yarbus 1967.

  • 8/14/2019 Art and Visual Perception Thesis

    31/128

    19

    is not identical to attending it, for the most part an attended location is xated, (i.e. if

    we pay attention to something, we strongly tend to look at it directly) [Underwood and

    Radach, 1998].

    Figure 3.2: Similar effects to [Yarbus, 1967] are easily (even unintentionally) achievedwhen using eye tracking for interaction. Circles are xations, their diameter is propor-tional to duration. The rst viewer was instructed to nd the important subject matterin the image. The second viewer was told to just look at the image. The viewer as-sumed, from prior experience in perceptual experiments, that he was going to be laterasked detailed questions about the contents of the scene. This resulted in a much morediffuse pattern of viewing.

    3.1.1 Eye Movement ControlQualitatively, a great deal is known about xations. Eye movements are highly goal

    directed. Viewers dont just look around at random. Instead, they xate meaningful

    parts of images [Mackworth and Morandi, 1967, Underwood and Radach, 1998, Hen-

    derson and Hollingworth, 1998], and xation duration is related to processing [Just

    and Carpenter, 1976, Henderson and Hollingworth, 1998]. Viewing is highly inu-

    enced by task. The classic example of this [Yarbus, 1967] showed that viewers ex-

    amining the same image, with different tasks to perform, showed drastically differ-

    ent patterns of viewing, in which they focused on the features relevant to their task

    (see Figure 3.1). Given the same task, the motions of a particular viewer over an

    image at different viewings can be quite different, yet the overall distribution of x-

    ations remains similar [Yarbus, 1967]. In real activities, actions, even those thought

  • 8/14/2019 Art and Visual Perception Thesis

    32/128

    20

    of as automatic, are usually preceded by (largely unperceived) xations of relevant

    features [Land et al., 1999]. These effects have been noted from some of the earliest

    research in the eld [Yarbus, 1967], but the mechanisms involved remain for the most

    part informally understood.In general, understanding of most higher-level aspects of eye movement control

    is largely qualitative. In limited domains such as reading, attempts have been made

    to formulate mathematical models of viewing behavior. For complex natural scenes,

    much less is known [Henderson and Hollingworth, 1998]. Clearly any information

    used in guiding eye movements must come from the scene. Likewise, the process of

    selecting a new location to view must be guided in part by low frequency information

    gathered from the periphery during earlier xations. A matter of debate is whether low-

    level visual information gained like this is a direct control of behavior or whether it is

    primarily used when integrated into a higher level understanding. The precise factors

    involved in control and planning of eye movements are an active and highly debated

    topic [Kowler, 1990].

    3.1.2 Salience ModelsMuch effort has gone into attempts to identify purely low-level image measurements

    that can account for a signicant amount of viewing behavior. Clearly it would be inter-

    esting if what appears to be a highly complex behavior requiring general understanding

    could be modeled or at least reasonably predicted by a simple approach. Results have

    been mixed. Fixation locations do not correlate very well over time with the presence

    of simple low level image features such as areas of high contrast, junctions, etc... [Un-

    derwood and Radach, 1998].

    More complex models have been formulated, such as the salience methods men-

    tioned earlier. All measure contrast in one sense or another. In general, salience meth-

    ods embody the assumption that unusual features are likely to be important and looked

    at. Choice of feature space, and scale of measurement and comparison differ. One

  • 8/14/2019 Art and Visual Perception Thesis

    33/128

    21

    popular approach [Itti et al., 1998, Itti and Koch, 2000] uses center surround lters to

    measure local contrast in color, orientation and intensity to model general viewing be-

    havior. [Rosenholtz, 2001] uses a probabilistic framework to measure the probability of

    a feature given a Gaussian model of color or velocity in the surround. This was used topredict visual search performance. A related salience framework was proposed [Walker

    et al., 1998] to select unique image locations to match for image alignment. This ap-

    proach used kernel estimation to measure the rarity of local differential features in the

    global image wide distribution of those features.

    These approaches share the same basic idea but vary in what they attempt to model.

    This begs the question of what one is really trying to capture with salience. One can

    look at salience as simply a quantitative method of deciding whether something is

    present in a particular location in the visual eld. In this context, salience doesnt actu-

    ally state the location is important, just that it might be because something is there. It

    seems quite plausible that a measure like this plays a role in perception. However, more

    is usually claimed for salience, for example that it predicts most of viewing behavior

    or the valuable content in an image.

    Salience would seem to have some additional predictive power because in a wide

    class of images the semantically important subject does contrast with the rest of the

    scene. Relatively few people take pictures of their family members dressed in camou-

    age and lurking in the bushes. Nobody takes a picture of a leaf of grass in a eld. The

    tendency of meaningful features to be visually prominent is by no means universal. It

    is also unclear if this is really a property of the world, or a property of pictures people

    take, but it does seem to underlie some of the success of salience as an engineering tool

    in graphics.

    Salience models have also been used to model viewing in narrower domains where

    their applicability is more clear. The presence or absence of pop out effects in search

    for example [Rosenholtz, 1999, Rosenholtz, 2001] is effectively modeled by simple

    salience models that measure how distracting a distracter actually is.

  • 8/14/2019 Art and Visual Perception Thesis

    34/128

    22

    Debate about how useful salience is in understanding general viewing is ongoing.

    Some optimistically state that salience predictions correlate well with real eye motions

    of subjects free viewing images [Privitera and Stark, 2000,Parkhurst et al., 2002]. Oth-

    ers are more doubtful and claim that when measured more carefully and in the contextof a goal driven activity, the correlation is quite poor [Land et al., 1999, Turano et al.,

    2003]. This mismatch in experimental results ts the intuition that visually promi-

    nent, eye catching features might be more correlated with idle exploration of a scene,

    and much less related to eye movements made during a task. In spite of this contro-

    versy, salience methods are quite popular and have seen a fair amount of application

    in computer graphics. They show some correlation with visually prominent features

    and are fairly simple to implement. Code for some is publicly available. Clearly both

    semantics and low-level features play a part in eye movements. Further investigation

    is necessary to clarify the contributions to viewing behavior of salience and scene se-

    mantics. Though they seem unable to model important aspects of viewing behavior,

    salience models may provide important measures of visual prominence.

    3.2 Eye Tracking

    Much of the knowledge above about human eye motion has been gained through the

    use of eye-tracking. A system measures a viewers eye in one of several manners

    and records the point where it is looking, termed the point of regard or POR . One

    common approach involves a video camera and an infrared light source. The relative

    positions of the pupil and corneal reection in the resulting image are used to calculate

    point of regard [Duchowski, 2000]. These systems are reasonably reliable and accurate

    and improve with each generation, though they are still subject to drift over time and

    variability between viewers. The same technology is used in producing units that sit

    in front of a xed display, and in head mounted units for use in more general scenes.

    Video based trackers have the virtue of not interfering directly with a viewer, making

  • 8/14/2019 Art and Visual Perception Thesis

    35/128

    23

    them useful as both a natural interactive method and a research tool.

    Outside of research in human vision, eye-trackers have seen increasing use as a

    mode of human computer interaction. It has also enabled the use of eye movements

    as a gauge of cognitive activity for psychological investigations and for evaluation of

    visual displays.

    Eye position has been used as a cursor for selection tasks in a GUI [Sibert and Ja-

    cob, 2000]. They have also been used to indicate a users attention to others in a video-

    conferencing environment [Vertegaal, 1999]. Another class of use, related to ours, uses

    POR to control simplifying images or scenes for efciency purposes. Knowing where

    a user looks enables pruning of information that is not perceptible, and need not be

    transmitted in a video stream [Duchowski, 2000]. Similarly, unexamined content need

    not be rendered in a 3D environment. In practice, few current systems that make use

    of such simplication actually use eye tracking, presumably because of limited avail-

    ability, head tracking is typically used instead [Reddy, 2001].

    On the whole, eye tracking has been found more useful in interaction where it

    serves as an indirect measure of user interest. Eye movements are not under full vol-

    untary control. Because of this, when viewers attempt to explicitly point with their

    eyes the result tends to lack control and suffer from the so called Midas Touch prob-

    lem [Jacob, 1993] where struggling to control eye position, like a cursor, based on

    visual feedback creates even more uncontrolled looking, touching on many irrelevant

    or undesirable locations.

    The same involuntary link of eye movement to thought processes that makes eye

    tracking a bad mouse have made it useful as an indirect measure of interest and cog-

    nitive activity. Eye tracking has been used to evaluate the effectiveness of informa-

    tional displays including application interfaces [Crowe and Narayanan, 2000], web

    pages [Goldberg et al., 2002], and air trafc control systems [Mulligan, 2002]. As

    mentioned earlier, eye movements may even reveal information that viewers are trying

    to report, but cannot, because it is not consciously available. Experiments have shown

  • 8/14/2019 Art and Visual Perception Thesis

    36/128

    24

    that professional radiologists examining slides look longer at locations where tumors

    are present, even when they fail to identify and report them [Mello-Thoms et al., 2002].

    In the future, this might hold the promise of computer assisted technologies to avoid

    such mistakes. Several consulting companies currently sell evaluation services usingeye tracking to graphic design houses and web content creators among others 1 .

    3.3 Limits of Vision

    Eye movements are related to the resolutional limitations of the eye. At any of the x-

    ations with which a viewer explores a scene, the most detailed information is received

    only in the fovea, but lower frequency information is received throughout the visualeld. These limits on sensitivity within the visual eld are not a weakness of the visual

    system. On the contrary, they are part of our ability to efciently process wide elds

    of view and integrate information across eye movements and changes in viewpoint.

    3.3.1 Models of Sensitivity

    Quantitative models of visual acuity and contrast sensitivity have been developed tomodel sensitivity to stimuli with different properties. Models of acuity predict whether

    an observer can detect a black feature of a particular size on a white background. Con-

    trast sensitivity measures an observers ability to discriminate a repeating pattern of a

    particular contrast and frequency from a uniform gray eld. The drop-off in these sen-

    sitivities away from the visual center is modeled as a function of eccentricity , location

    relative to the point of xation.

    Contrast sensitivity has been studied extensively in a variety of conditions usually

    using monochromatic sinusoidal gratings (smoothly varying, repeating patterns of light

    and dark bands). This sensitivity declines sharply with eccentricity [Kelly, 1984, Man-

    nos and Sakrison, 1974,Koenderink et al., 1978]. Contrast threshold is dened as the

    1http://www.eyetools.com, http://www.factone.com, http://www.veridicalresearch.com

  • 8/14/2019 Art and Visual Perception Thesis

    37/128

    25

    (unitless) contrast value (0 to 1 with 1 being maximal contrast) at which a grating and

    uniform gray become indistinguishable. Contrast sensitivity is the reciprocal of this

    value.

    100

    101

    10-2

    10-1

    100

    101

    102

    103

    Contrast Sensitivity

    frequency cycles/degree

    i n v e r s e

    c o n

    t r a s

    t

    visible

    invisible

    Figure 3.3: Log-log plot of contrast sensitivity from equation (3.2) This function isused to dene a threshold between visible and invisible features.

    Many researchers have empirically studied human contrast sensitivity and several

    have developed mathematical models from their data. Researchers in computer science

    have also used existing data and models in applications. Different aspects of a stimuli

    are important in different situations. Fitting models to data collected from different

    viewers under different circumstances gives somewhat different results. Two examples

    are given here to illustrate the form these mathematical models take.

    Kelly [1984] developed a mathematical model for the contrast sensitivity curve (at

    the center of the visual eld) including appropriate scaling factors describing the effects

    of velocity ( v) as well as frequency ( f in cyles/degree) of a grating on sensitivity.

    A( f , v) = ( 6.1 + 7.3(log 10 (v/ 3)3)v f 2e 2 f (v+ 2)/ 4.59 (3.1)

    Mannos and Skarinson [1974] t a mathematical model appropriate to still imagery

    to results of prior empirical studies for use as a metric in evaluating image compression.

  • 8/14/2019 Art and Visual Perception Thesis

    38/128

    26

    A( f ) = smax 2.6(0.0192 + 0.144 f )e (0.144f )1.1

    (3.2)

    Where smax is the peak contrast sensitivity (this is around 400, but varies from

    person to person).

    3.3.2 Sensitivity Away from the Visual Center

    A number of researchers have explored how sensitivity varies with eccentricity [Kelly,

    1984,Rovamo and Virsu, 1979]. At larger eccentricities (expressed in degrees of visual

    angle) the contrast sensitivity function is multiplied by another function which models

    the drop-off of sensitivity in the visual periphery. This function is termed the cortical

    magnication factor. It is not radially symmetric, but drops off faster vertically than

    horizontally. It can be approximated [Rovamo and Virsu, 1979] with separate formulas

    for decrease in sensitivity in four areas. For simplicity a bound from the most sensitive

    area can be used in estimating visibility [Reddy, 2001, Reddy, 1997].

    M (e) =1

    1 + 0.29e + 0.000012 e3 (3.3)

    The cubic term can usually be ignored, as its contribution in the range of eccentricities

    normal in a screen display is negligible [Reddy, 1997]. The contrast sensitivity is then

    M (e) A( f ).

    3.3.3 Applicability to Natural Imagery

    Some caution is necessary in applying these models derived from simple monochro-

    matic repeating patterns to complex natural imagery. Though these models have been

    applied with good results in graphics [Reddy, 2001], our goal of creating visible ab-

    straction rather than conservative level of detail is more ambitious, and more likely to

    stress the models involved.

  • 8/14/2019 Art and Visual Perception Thesis

    39/128

    27

    0 10 20 30 40 50 60 70 80 90 1000

    0.1

    0.2

    0.3

    0.4

    0.5

    0.6

    0.7

    0.8

    0.9

    1

    eccentricity degrees

    Cortical Magnification

    Figure 3.4: Cortical Magnication describes the drop-off of visual sensitivity withangular distance from the visual center.

    How to measure contrast is relatively obvious in gratings, there are only two ex-

    trema. A single contrast exists for the entire grating. Between two regions in a scene

    the meaning of contrast is less clear. Regions are neither uniform in color nor uni-

    formly varying. No strong perceptually motivated approach to this problem appears to

    have been formulated. Lillesaeter [1993] attempts to address this by dening a contrastbetween a nonuniform gure and ground. This contrast measure is a weighted aver-

    age of the contrast between the region and background and the integral of the contrast

    along the edge of the region. This is demonstrated to provide more intuitive results

    than simpler alternatives on regions with at colors. Issues related to sampling in real

    images are not addressed. Measuring contrast in a color image presents another prob-

    lem. Contrast in colored gratings has been studied, and much work has been done in

    general on color perception. However, there does not appear to be a simple general

    contrast sensitivity model dened in color space [Regan, 2000]. Adapting a luminance

    based model therefore remains a plausible course of action in designing a model for a

    practical application.

    Applying the notion of visibility for a grating to a non-repeating pattern of regions

  • 8/14/2019 Art and Visual Perception Thesis

    40/128

    28

    also presents problems. The hump-like shape of the contrast sensitivity curve tells us

    something counterintuitive if the size of an area is treated as proportional to an inverse

    frequency [Reddy, 2001]. Very low frequencies are much less visible than some higher

    ones at a given contrast. This is because detectability of a grating is related to thedensity of receptive elds of corresponding size. There are upper bounds on the size of

    human receptive elds. Intuitively, a large slowly varying sine wave may be difcult

    to see.

    This has been less of a concern in previous work where judgments were being made

    mostly about high frequency parts of the curve [Reddy, 2001], but will be noticeable

    when visibly abstracting images.

    It can be argued [Reddy, 1997] that natural images, at least in places (and certainly

    the uniform color regions that we will ultimately use in rendering) more closely resem-

    ble square wave, rather than sine, gratings. Since a square wave can be approximated

    by the sum of an innite sequence of sine waves, and sensitivity to combined sinu-

    soidal patterns is closely related to that of the independent components [Campbell and

    Robson, 1968] one might think the visibility for low frequency square waves would

    be higher than that for equal frequency sine waves. The actual relation has been stud-

    ied empirically [Campbell and Robson, 1968] and conrms this intuition. For square

    waves at frequencies below about 1 cycle/degree sensitivity levels off rather than drop-

    ping. A theoretical derivation of the difference is presented in [Campbell and Robson,

    1968]. It matches some but not all features of the empirical data.

    These concerns remind us that when applying these models to real images they

    cannot serve as an accurate absolute perceptual measure of visibility. Rather, they

    provide a plausible relative sense of the visibility of different features. The absolute

    contrast or acuity threshold at which a feature becomes visible is not necessary for our

    application. What is important is the relative ordering of feature visibility, that allows

    us to create a prioritization. It is necessary to model visual sensitivity only up to the

    level where results correspond to our intuitions about this prioritization.

  • 8/14/2019 Art and Visual Perception Thesis

    41/128

    29

    To apply these models in actual scenes, we need to decide on a denition of the

    features whose visibility we are judging with these methods. For example, these mod-

    els have been used in 3D level of detail [Reddy, 1997] to avoid rendering invisible

    features. In this context the obvious choice of feature is a polygon which may or maynot be included in the rendering. For images the choice is less clear, as image prop-

    erties can be measured in an unstructured, local way or an image can be partitioned

    into a more structured representation. We review some of the possibilities for image

    representation in the following chapter.

  • 8/14/2019 Art and Visual Perception Thesis

    42/128

    30

    Chapter 4

    Vision and Image Processing4.1 Image Structure Features and Representation

    (a) (b)

    Figure 4.1: (a) Scale space of one dimensional signal. Features disappear throughscale space but no new features appear. (b) Plot of inection points of another onedimensional signal through scale space. Reproduced from [Witkin 1983]

    Image representation and processing is a large eld of relevance in both human and

    computer vision. We concentrate on some basic concepts relevant to the task of simpli-

    fying images. Scale space theory provides a way of characterizing the different scales

    of information present in an image and making correspondences between features at

    different scales. Segmentation divides an image into distinct regions, enabling an ex-

    plicit, non-local representation of image content. Edge detection provides a measure

    of the prominent boundaries in an image.

    An important unifying concept in image analysis is that the same image data can

    be represented in many forms. In any of these certain information in the image is

    explicit and other information is less easy to access [Marr, 1982]. The information and

    representation appropriate is task dependent. A variety of representations with different

    properties are available. With the exception of 3D techniques, NPR applications have

    largely used low-level representations, often functioning locally on the original image

    itself. However, human artistic processes operate on richer representations. Ruskin,

  • 8/14/2019 Art and Visual Perception Thesis

    43/128

    31

    one of the 19th centurys most prominent art historians and theorists, famously argued

    that in teaching art technique, the most important lesson was teaching the student to

    see [Ruskin, 1858]. There seems to be an assumption in image based NPR that seeing

    is simply capturing a bitmap representation of the scene, and that it can be consideredaccomplished in the presence of a source photo. Human vision however is much more

    than simply capturing an image. If a computer is to produce artistic renderings that

    capture some of the expressiveness of real art, especially in highly abstracted styles,

    some higher level representation is necessary, analogous to those created in the artists

    head as she understands the scene before her, and begins to paint. The better suited

    this representation is to the task, the easier it should be to drastically simplify an image

    while retaining its important features.

    The lowest level representation is the image itself, analogous to the retinal image.

    This is the starting point of any further representation, making explicit the light inten-

    sities at each pixel. There is structure here that can be more explicitly represented in

    other ways. Information in the image exists over a variety of scales, small and large

    features, making up parts and whole objects in the scene.

    One common way to come to terms with the multiple scales of information in an

    image is through its scale space . From a single image, a three dimensional stack of

    images is generated in which each contains progressively coarser scale information.

    Again, this representation has an analogue in human vision where neurons have recep-

    tive elds of different sizes, in effect generating a multi scale representation from the

    retinal image.

    Scale space has come to refer to such a space of increasingly simple images gener-

    ated by a range of processes. Generically this can be thought of as a stack of images

    with decreasing information contained at each level as scale increases. This stack is

    in theory continuous, in practice sampled at some discrete interval. Starting with the

    original image, detail is progressively lost until a uniform color is all that remains (see

    Figure 4.1).

  • 8/14/2019 Art and Visual Perception Thesis

    44/128

    32

    A number of constructions for such a space have been developed. Perhaps the sim-

    plest approach creates something like an image pyramid, successively downsampling

    the image so it is more coarsely pixelated. This approach has a problem in that de-

    tailed, high frequency information (the edges between the new larger pixels) may havebeen introduced which was not in the original image. This is the problem of spurious

    resolution [Koenderink, 1984]. New information has been hallucinated into existence

    by imposing a coarser grid structure on the data. Convolution with a Gaussian kernel

    (blurring) generates a space that avoids this problem [Witkin, 1983,Koenderink, 1984].

    In fact this blurring has been proven [Koenderink, 1984] to be the unique way to gen-

    erate a scale space which is both uniform or uncommitted, (i.e., the process is uniform

    across image space and through the scale dimension), and also avoids spurious reso-

    lution. Information disappears but cannot be created. In one dimension, this ensures

    that any feature will only disappear as scale increases. In two dimensions new features,

    maxima for example can appear. However in both cases clear judgments can be made

    about what features exist at what range of scales.

    That the process of blurring is uniform is an advantage in that ltering can be

    applied to any signal, one doesnt need to have a model of what the important featurespresent are. A disadvantage is that coarser features are more coarsely located, the

    blurring process that reveals them distorts their spatial extent.

    If you know what youre looking for, there is no reason why the blurring operation

    must be uniform or uncommitted. A number of nonuniform or nonlinear scale spaces

    have been formulated which do not introduce false content but remove information

    selectively in certain locations. One of the best known of such methods is anisotropic

    diffusion [Perona and Malik, 1990]. Here the diffusion process is not uniform but rather

    inversely proportional to the magnitude of the gradient at any position. This results in

    an edge preserving blurring which removes low contrast detail while preserving strong

    edges. This has the advantage that edges are better preserved in their initial location

    until the point at which they disappear. Niessen et al [1997] compares this and several

  • 8/14/2019 Art and Visual Perception Thesis

    45/128

    33

    other nonlinear methods in the context of segmentation. Nonlinear methods perform

    well but are signicantly more expensive.

    A practical application must sample the continuous scale space at some discrete

    intervals. One would like to sample sufciently nely to capture interesting events,the order of disappearance of different features, but not more densely than need be.

    Looking at the linear scale space, Koenderink [1984] derives an appropriate sampling

    as logarithmic along the scale axis corresponding to a uniform sampling in the scale

    parameter t , the standard deviation of the Gaussian kernel used in blurring. This is

    intuitive. At small scales many tiny regions are merging quite often, requiring dense

    sampling. At higher scales, there are fewer regions, fewer events to capture, and much

    less dense sampling in t is required. The issue is the same for nonlinear spaces. Re-

    lating scales in different spaces is not straightforward. Some attempt at doing this has

    been made in [Niessen, 1997].

    Figure 4.2: Interval tree for 1D signal illustrating decomposition of the signal into ahierarchy. Reproduced from [Witkin 1983].

    While a scale space such as this begins to capture structural relations of features

    across scales, this is still largely an implicit representation. To make this explicit,

    features at different scales need to be directly related to each other. Witkin [1983]

  • 8/14/2019 Art and Visual Perception Thesis

    46/128

    34

    addresses this problem in 1D signals. In the scale space of a one dimension signal

    features will never appear at coarse scales. So, any features found at a coarse scale

    can (if the sampling is dense enough) be traced directly back to their ne scale origin.

    This allows localization of features found at a coarse level. Witkin demonstrates thischoosing as a feature zero crossings in the second derivative, inection points in the

    signal (Figure 4.1).

    Similarly, using these correspondences across scale it is also possible to create

    a structure that captures the relationship between all features at all scales. Intervals

    between two zero crossings (which again correspond to sections of the signal between

    two inection points) disappear in only one way. Two successive zero crossings merge

    together, with the result that three intervals, the one between the crossings and those on

    either side, merge into one. These three intervals can be made children of the resulting

    interval to create an interval tree which characterizes the structure of the signal at all

    scales. Witkin observes that those intervals which have longer persistence through

    scale space appear to be those identied by human observers as subjectively salient or

    important in the signal.

    Extending this nice analytical derivation to a practical application in 2D is not

    trivial. In 2D features such as maxima, or curves dened by inection points may split

    into two at coarser scales. Koenderink [1984] suggests the use of equiluminance curves

    in the image as a 2D equivalent to Witkins intervals. Generic equiluminance curves

    form a single closed curve. There are two singularities: extrema where the curve is

    just a point, and saddle points where the curve forms multiple loops which intersect at

    one point. Each loop may contain other saddle points and has to contain at least one

    extrema [Koenderink and van Doorn, 1979]. The nesting of these saddle points gives

    the structure of the image regions. Though new saddle points may appear inside a loop,

    centermost saddle points must disappear before outer ones. Because of this the saddle

    points present at all scales can be represented as a tree. Such a structure is difcult to

    calculate in practice. It is not obvious how to nd these saddle points efciently or if

  • 8/14/2019 Art and Visual Perception Thesis

    47/128

    35

    they provide a subjectively intuitive partitioning of the image. In addition its not clear

    how color could be handled. In a naive approach, each band would produce its own

    surface with its own saddle points, resulting in 3 separate scale space trees that would

    need to be unied in some way.

    4.2 Segmentation

    The process described above of dividing up a signal based on the intervals between

    features is a particular approach to the general problem of segmentation. This problem

    again occurs in both computer and human vision. Segmentation makes explicit the

    association (or disassociation) between different areas of an image. It produces an ex-

    plicit representation of parts of the image that are associated with each other, assigning

    each pixel to one, usually connected group or region. These regions should be uniform

    by some measure. Separate regions, at least the adjoining ones, should be markedly

    different. How people do this, parsing shapes and objects from the background is only

    partially understood. In computer vision, a tremendous variety of methods have been

    devised to dene similarity measures for this using color, gray scale intensity, texture

    etc. This segmentation is usually a partitioning of an image at a single scale. However

    it is sometimes desirable to dene a segmentation over a range of scales.

    Scale space has been considered in segmentation. It is typically used to make seg-

    mentations produced with other methods more robust. Niessen et al [1997] link pixels

    with their neighbors who have similar color in both the spatial and scale dimensions

    to create a hierarchy. The end product is a single at segmentation taking its set of

    regions from a coarse scale and their spatial extent from a ne scale. A similar ap-

    proach is taken in Bangham et al [1998]. Here, the desire is to create a hierarchical

    segmentation tree that describes the image over a variety of scales. An alternate ap-

    proach [Ahuja, 1996] creates a multi-scale representation without explicitly generating

    a scale space.

  • 8/14/2019 Art and Visual Perception Thesis

    48/128

    36

    Each of these methods compute a hierarchical representation of image structure.

    However, there is no clear relation between the hierarchy and the theoretical hierarchy

    induced by scale space. This is not a major concern; scale space structure is attractive

    because of its simple formal denition, but is not the single correct answer in anymeaningful sense for a given practical application. Hierarchical representations are

    not general purpose, desirable properties depend on the application. For the purposes

    of image abstraction, an important question is whether each subtree in the structure

    represents some coherent area or region. This is guaranteed in some geometric sense

    by scale space proper, since nodes occur in the tree only when features disappear. In

    contrast, methods for building a hierarchy that i