7
THE 6SIGHT REPORT • The 6 sight Report THE FUTURE OF IMAGING Volume 19, Issue 1 • February 2010 www.6sight.com Processing Pixels Printing Pictures Photo book software Analysts Argue Issues

The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • � The 6sight ReportTHE FUTURE OF IMAGING Volume19,Issue1•February2010 www.6sight.com

Processing Pixels

Printing Pictures

Photo book software

Analysts Argue Issues

Page 2: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • �

EdItorIaltEam

Paul WorthingtonEditorSenior Analyst, Consumer Imaging;

Wor thington has specialized in photographic technology for 12 years, authoring eight research volumes as well as this newsletter.

Prior to Future Image, he was a technology correspondent and magazine editor for more than a decade, specifically covering imaging, multimedia, and digital video. [email protected]

Tony HenningSenior Analyst, Mobile Imaging

Henning helped found AXS/Digital Arts & Sciences in 1989, where he directed the development of applications and utilities to manage visual assets.

He is the author of a dozen repor ts on mobile imaging and camera-phones. He has presented on the above topics at a variety of conferences, and is also a noted exper t on the related areas of Visual Asset Management and Digital Rights [email protected]

Joseph M. ByrdPresident, 6Sight

Byrd is the co-founder of the 6Sight Conferences.

He was founder and chief operating officer of PhotoHighway.com, the first digital photography Web por tal.

He also developed online systems for CompuServe, The Source, and [email protected]

Consulting Editors: Bob McKay, Rudy Burger, Bob Goldstein.

Alexis J. Gerard Chairman, 6SightPresident, Future Image Inc.

Gerard founded Future Image in 1991 and 6Sight LLC in 2006. He chaired the inaugural DIMA confer-ence in 1995, the Future Image/Forbes Visual Communication Executive Summit in 2002, the Mobile Imaging Summit 2003-2005 and, since 2006, the 6Sight conference. He is the co-author of “Going Visual,” sits on the International Advisory Council of the George Eastman House, and is a past president of the Digital Imaging Group (DIG), an industry consortium now merged into the I3A. [email protected]

©2010 6sight Conferences LLC. No portion of The 6sight Report may be copied or reproduced in any form without authorization. The 6sight Report is published 10 times a year by 6sight Conferences LLC, 3000 Picture Place, Jackson, MI 49201 USA, Tel: 517-788-8100 • Fax: 517-788-8371. Subscription information available at www.6sight.com

A digital camera captures light and color values, converts that information into digital data, and either stores the Raw image file for later pro-cessing on a personal com-puter, or, more commonly, applies an assortment of pro-cessing techniques to yield a pleasing but compressed JPEG photo.

Enthusiasts enjoy working with those unprocessed Raw exposures on a PC, but cam-eras constantly upgrade their internal image processing as well. It was primarily driven by the need to handle more expo-sures at ever faster speeds at higher resolutions — but a side effect is that there is a lot of computational power

in those pocket-sized devices. And wouldn’t it be a waste to leave it sitting unused?

Cameras now apply Photoshop-like “Pop Art” effects, combine multiple exposures, apply “makeup” functions that smooth skin and soften facial shadows, and most impressively, instantly combine dozens of shots into high-resolution panoramas — even editing out moving people or objects from the consecutive photos before stitching them together.

Computational photography means many things, including taking that original data and doing other processes that render otherwise invisible images, and even using other

capture methods entirely to generate images we humans can discern.

It is not new: the combi-nation of photography and computer science, goes back at least to the 1960’s when NASA processed lunar images.

While the idea is not new, it is What’s Next: As the com-puters on our desktops and in our cameras and other mobile devices continue to increase in raw power and refined capabilities, it is computa-tional photography that holds the true future of imaging, not greater resolution sensors or ever-more evolved optics.

This issue spotlights Ramesh Raskar’s keynote at the 2009 6Sight Future of

Imaging Conference. Raskar is the head of the Camera Culture research group and codirector of the Center for Future Storytelling at the Massachusetts Institute of Technology. He focuses on developing tools to help cap-ture and share visual experi-ences. His research includes cameras with unusual optical elements and programmable illumination. In 2004, Raskar received the TR100 Award from Technology Review, which recognizes top young inno-vators under the age of 35. In 2009, he was awarded a Sloan Research Fellowship and has received four Mitsubishi Electric Invention Awards. He holds 35 U.S. patents.

Computational photography exceeds human vision

Inside this issue:Computational PhotographyRamesh Raskar surveys the scientific scene.Page 2

Print on Demand TechnologyLeading vendors debate the market potential.Page 7

Photo Book Creation SoftwareSimplicity versus functionality, and other obstacles.Page 12

Analysts Argue ImagingPage 17

Imaging NewsPage 24

Inside OutBob McKay’s insider’s view.Page 31

Cover and conference photos by Bill Fitz-Patrick

Page 3: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • � Computational Photography

Beyond the Human Eye by Ramesh Raskar

Ramesh Raskar is the imaging research director at MIT’s Media Lab.

The transition from film to digital hasn’t fundamentally changed photography.

Look inside the early Kodak DCS series from the ’90s, married with the Nikon F3. The cartridge is still there, but it’s digital instead of film.

Digital photography is like a lion that, after years in a cage, stays in one place when finally let loose in a jungle, rather than running off and exploring the space around him.

We may now have a billion people with visual communication tools, but we’re still following the principle of the human eye. If we look at successful biological vision, it’s very diverse, based on shadows, refraction, or reflection. For example, scallops don’t even have lenses. Light comes in, and it gets reflected off this concave mirror onto the sensor. Lobsters have vertical mirrors that focus light on a curved sensor. These are biological creatures that are not even doing sophisticated computation afterward.

So, like that uncaged lion, we have an opportunity to explore the space of imaging – the whole pipeline – from capture to display.

So far, the way we have solved the problem of capturing and sharing visual information is to make the photograph compat-ible with what the human eye sees. We try to mimic a lens that behaves like the cornea, and then we have a detector that mim-ics the retina – and that’s the end of the story, that’s our image. We have solved this problem by reproducing what the eye sees.

This is great for a direct view; but if we want to manipulate that, and if we want to understand the world and do something

additional with the photos, this model simply doesn’t work. We need to go beyond just mimicking what the human eye can see.

We’re going to see if we can push the envelope of what we

can dream. Can we create cameras and mechanisms that take a photo of what’s around the corner, beyond the line of sight? Can we create cameras and software so that, instead of a photo, we

get an emotive artistic rendering?

COMPUTATIONAl WIsH lIsTIf we ask the consumers about their wish lists for photography, these are the types of answers we get: • Amazing resolution, some kind of superhuman vision. • Having high speed. • Seeing inside their bodies – maybe a medical device. • Having automatic triggers, maybe based on a smile or some external event. • Dealing with the millions of photos they take, keeping only the good pictures and finding the most relevant ones. The most common request we get – we hear it all the time – is “Put the photog-

raskar
Typewritten Text
raskar
Typewritten Text
raskar
Typewritten Text
raskar
Typewritten Text
raskar
Typewritten Text
raskar
Typewritten Text
raskar
Typewritten Text
Summarized By Paul Worthington Keynote Talk
raskar
Typewritten Text
Page 4: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • �

rapher back in the picture so, when I go on a trip, I’m still in my own pictures.”

Other critical problems on everybody’s wish list are cost; low-light sensitivity; stereo and 3D; mechanically free motion for zoom and focus; and improved sharing, tagging, and recognition.

I look at what computational photography can deliver beyond the current vision of photography.

Digital photography captures raw information – just photons to electrons at a good signal-to-noise ratio – and the synthesis is very low level. With more spectrum, we are capturing some non-visual data, such as GPS and communication with other devices, metadata, and so forth. A lot of the work is focused on this axis.

Computational photography looks at this in very interest-ing ways – not just the low-level experience of photography, of matching the eye – but the ability to manipulate and to have a meaningful extraction of the semantics of those photons. Human vision does not care about absolute intensity. It cares about regions and boundaries, segmentation and motion, what’s fore-ground, what’s background, what’s lit directly by light, what’s lit by a scattering of light, and so on. Human stereo vision gives midlevel cues, but we don’t have to stop at what the biology can do for us. We really want to create an augmented human experi-ence, and create hyper-realistic synthesis off our photos.

POsT CAPTURE CONTROl First on my list is the ultimate post-capture control, where very few decisions are taken at the time of capture. I should be able to take the camera; wave it; not worry about low light, motion blur, focus, or the viewpoint; snap away; and then have amazing control of that in post. The movie special effects industry tries to do that, but how can we bring that to consumers?

FOCUsA good example many are familiar with is the concept of a plenoptic camera, which was invented 100 years ago, in 1908. It has pro-gressed significantly at MIT and Stanford, with Refocus Imaging turning a 16-mega-pixel sensor into a 300-by-300 pixel image, where we can change the focus after the photo is taken – very impressive, amazing post-capture control. The photo is encoded. Not just the raw pixels are captured; it’s the angular variation coming through the lens.

The problem is, in going from a 16-mega-pixel sensor to an image that’s only 300-by-300 pixels, we give up a lot of resolution. In addition, we’re in traditional optics, with all the usual issues of chromatic aberrations, geometric aberrations, alignment, and so on.

We can work any camera into this light

plenoptic camera, for a couple dollars, in 35 seconds. Drop in a clear film, which has a special pattern printed on it. The film rests just about a millimeter above the sensor. Think of this almost as a parallax barrier for 3D displays; but while most of the light passes through, it is doing the encoding of the incoming direction of light. We’re capturing information that’s angle-dependent.

Once we capture a 2D photo and convert that into not a single pair, but an array of views, then each view is as if it has moved the viewpoint through the aperture. So the wider the aperture, the more parallax and more disparity there is between those views.

From any pair, we can create 3D imagery and extract 3D depth information. We can also do digital refocusing.

This is all instead of using micro-lens array by adding more optics – and optics will always add issues for different colors and spectrums. By using this so-called heterodyne camera, there are no additional geometric or chromatic aberrations. We don’t’ have to worry about the zoom or F-number being compatible with this, because there are no additional optics. The most important part is we can still recover the full resolution of the images. We just lose some light. In our current prototype, it’s 50 percent; but we’re trying to improve that. Also, it has an extremely low cost.

MOTION What about correcting for motion? It’s a common problem for low-light conditions. Here’s a solution that seems to work: Instead of keeping the shutter open for the entire duration of the exposure, if we flutter it open and closed in a carefully chosen binary sequence, we can record sufficient information about the scene.

Page 5: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • �

In one photo, we can barely make out a car and license plate. If we deblur it, by fluttering the shutter, we are preserving all the high, special frequencies by applying a convolution.

lIGHTING We have gone from monstrous cameras to something we can carry in our pocket, but what’s happening to the light? The dif-ference between a professional and a consumer camera is still the lighting. They still have to carry around spotlights, umbrellas, and so on. So here’s a wish list: How can I use my camera and the compact flash on it so, in post-capture, I can create any lighting for the right mood? I want to emulate studio lights from a compact flash.

It’s challenging. There are lots of efforts going on in this direction.

FREEdOM FROM FORMIf I want a 50mm and a 200mm lens, I must get a lens that’s 4 times as long – and that is painful! I want freedom from that. Will I be able to carry a camera tomorrow that’s as flat as my busi-ness card? Can I just wave it around and capture a nice photo?

The way we have been solving this, especially for mobile cam-era phones, is to make the cameras really thin.

This means we are capturing less and less light, but remem-ber the scallop and the lobster? They are not trying to mimic a traditional lens. They are exploiting a very clever mechanism to capture a lot of light, using multiple lenses or different kinds of sensors.

lIGHT-sENsING lCd Can we convert the LCD on a mobile device into a camera? Sharp has created LCDs where every emitting pixel is also light sensi-tive. The main application for that now is touch, and to record a fingerprint. It has extremely high resolution.

What I would like to do is convert that LCD, which is already sensing light, into a full-fledged camera. It’s going to collect a lot of light. It’s 4-by-5 inches. It’s going to collect more light than the most fancy SLR I may have. The problem is the moment I take my finger off this LCD, everything’s going to be completely blurred because these light-sensing LCDs are not designed to do that.

So we think about how we can work these light-sensing LCDs into cameras and initially support 3D gestures, as well as video con-ferencing. The idea is very similar to parallax barriers, where the LCD can do double duty. In even frames, it shows the image; but in the odd frames, it acts as a parallax barrier, and senses touches and gestures. We have built a prototype with a very thin LCD form factor. Anything more than a meter away is

blacked out, so it can also preserve privacy. Since it’s collecting light over such a large area, low light is not a problem.

dEPTH OF FIEld People are working to extend depth of field, but we also pay a lot of money for the glass to make depth of field more shallow. In some applications, we would like excellent depth of field; but with a compact camera, we can’t get a shallow depth of field.

Can we build a tiny camera that can create shallow depth of field? The image stabilization mechanism, which corrects motion between lenses and sensors, actually does this. Typically, stabi-lization compensates for jitter or shake. If we hold the camera steady and then intentionally shake the camera lens and sensor, we get shallower depth of field. All we need to do is have a very precise relative motion between the lens and the sensor. If we move them with just the right relative speeds, we can focus on any particular plane. This can essentially create a lens in time, as opposed to in space. The result is a small aperture photo with which we can focus in the front, focus in the mid-ground, or focus all the way in the back.

NETWORkEd IMAGINGCan we take a photo at the Trevi Fountain or in Old Town in Prague – a simple 2D photo – and automatically turn it into a 3D photo? It’s very challenging to do that if seen as an artifi-cial intelligence problem; but if we exploit the concept of photo tourism [developed by the University of Washington, which, at Microsoft, became Photosynth], we can place the photo into the 3D space of all the photos other folks have taken. By doing that, we can figure out the exact location and the viewpoint of the camera; and, for every pixel in the photo, we can also figure out how far that point is from where the photo was taken.

We want to see how we can use the internet as the fifth ele-ment of the camera. The first four elements are the optics, the sensor, the illumination, and on-board processing. The fifth ele-ment is access to the online collection.

Page 6: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • �

sEE AROUNd A CORNERWe have been taught a photo is taken within the line of sight; but it seems we’re able to see beyond line of sight, based on echo. The way my voice echoes in this room is different with the door open than if it’s closed. By doing an analysis of that echo, we can tell what’s just around the door. We have developed femto photography, where we are using lasers that have a duration of a few femtoseconds – that’s 10-15 seconds (nano, pico, femto). If there is some serious synchronization between the flash and a very fast sensor, which is also working in the picoseconds range, we can analyze and compute what’s around the corner.

We have done some initial experiments, and it’s very promising. When we can see beyond the line of sight, we are not breaking any laws of physics – so far.

BARCOdEs Barcodes are taking over the world. They are everywhere, and they’re getting larger and larger. Barcodes, however, are for machines, not humans; so why are they cluttering our world? Can we create encoded information that’s aesthetic and pleasing?

We have developed a new code called Bokode (bokeh-based code). To the naked eye, or a camera that’s taking a photo in focus, it looks like a tiny dot. It’s only 3-by-3mm. When we take the camera out of focus, the blur reveals the encoded pattern, even from several meters away.

Typically, we encode information in space, such as a traditional 2D barcode; or in time by blinking; or by wavelength in fiber optics communications. Here, we’re encoding in angle. So, this Bokode exploits the “circle of confusion” [the blur spot in which light rays are not in focus], and converts it to a “circle of information.” We can apply them to product tags or support augmented reality applications, because they’re very aesthetic. They don’t clutter up the space. With a cell phone camera, we can capture more

than 10,000 bytes of information, as opposed to a traditional barcode that gives only 100 bits of information. We can put a song or a picture in 10,000 bytes of information.

ARTIFICIAl INTEllIGENCE ANd FACIAl RECOGNITIONA lot of work has gone into it, but those problems remain difficult to solve. Think about how we have solved the problems in the grocery shop. We are not trying to build smart AI systems to recognize which brand of Coke this is, and which brand of bananas this is. We just use barcodes to get around that. So would we be willing to put a barcode on our children?

I’m partly kidding there – but we’re going to see mechanisms where we’re going to create an ecosystem of devices, of opti-cal elements, of sensors that communicate with each other. As bizarre as that concept sounds – of putting barcodes on ourselves – there will be mechanisms allowing us to securely and privately identify ourselves to devices. So we might get a camera that has a communication device with a beacon that’s completely invisible, allowing us to record that. Or maybe we have a high-resolution camera that can directly recognize the iris of the person whose picture we’re taking, even if that person is meters away.

There will be mechanisms that will solve that problem by changing the rules of the game. We don’t know how to mimic human vision. We can probably walk around this room and recog-nize half the people; but today, face recognition software cannot do that. It’s quite challenging in any arbitrary lighting conditions and changes in viewpoint. So we need to get around that. The barcode industry didn’t wait for the AI problem to be solved. They said, “Let’s just tag everything.”

UNdERsTANdING THE WORld I want a camera where I can take a picture and it will recognize

not just people and things, but it will also give me an index of each material. If I take a picture, it will tell me what’s fabric, what’s wood, what’s skin, what’s glass, what’s metal, and all that.

As those of us familiar with astronomy, medical imaging, or forensics know, we can look at materials that are light years away by doing spectral analysis of those materi-als; but if we use the right types of lights, the right types of sensors, and sophisticated algorithms, we’ll be able to index all those things. Then, if we open a photo editor, we can just click and replace, and edit anything we want. Again, we’ll get closer to the dream of being able to relight and create new moods in a scene.

Page 7: The 6Sight Reportpages.cs.wisc.edu/~dyer/cs534/papers/raskar-wishlist.pdf · A digital camera captures light and color values, converts that information into digital data, and either

THE 6SIGHT REPORT • 6

sHARING OUR vIsUAl ExPERIENCE Photography is all about capturing and sharing our visual experi-ences. Processing power, storage, and communication are at a stage where 24/7 “lifelogs” are interesting – but who wants to look at them? They’re extremely boring, very repetitive.

So I want new techniques that create automatic summaries of not just what was captured from my own camera that was on 24/7, but from cameras around me. If I create the right mecha-nism between the cameras of strangers and my camera, I will actually get a story, a summary, a meaningful abstract of the activities.

That will be helpful – not just in a narcissistic way, but for health, education, entertainment, and even for government. There are lots of issues dealing with transparency and access to information, but if we start logging lots of data and sharing it…

CAPTURING EssENCEThere is an opportunity with computational photography to go beyond digital photography to enable new forms of visual arts – a bridge between purely synthetic to something live and real.

If we want to share with friends what’s inside our cars, we take a photo; but if car manufacturers want to tell us what’s inside the car, they hire artists for car instruction manuals.

Why do they do that? It seems strange to create sketches of something that can be photographed. The answer is straightfor-ward: The real world is not the best way to convey information and aesthetics – shadows, clutter, too many colors … Artists are great at highlighting what’s most important, and sometimes intentionally using very simple colors.

So how about this challenge? Why not create a camera – a whole pipeline – that gives me this illustration as opposed to this photo?

Here’s a trick we used to get closer to this goal. The idea is to use a multi-flash camera. When the shutter is released, instead of taking one photo, it takes four photos, flashing one light at a time. When we take a picture with a flash that’s offset from the lens, we get this annoying layer of shadow at each discontinuity. If we stand against a wall, we see this layer of shadow. We can exploit that and, by analyzing the shadows, figure out where all the shape contours are, the depth discontinuities. That’s exactly what is important perceptually, and that’s what artists will draw to convey the shape and the geometric relationship. With our technique, we can analyze all the shadows, take something that looks really complex, and create emotive line drawings.

We are not going to replace artists by creating smarter cam-eras, but we want to take away all the rotoscoping and cumber-some tasks, and let them really focus on the creative aspect of creating beautiful renditions.

A CAMERA sO AdvANCEd —WE dON’T NEEd ITLast, maybe all a consumer wants is a big black box with a big but-ton, with no lenses, no sensors, and no flashes. If I am in Times Square or at the Eiffel Tower, it’s really debatable whether I should take that picture, because lots of people before me have taken that picture. So all I want is when I release the shutter, I go online and trawl Flickr and retrieve an image taken in the right direction at about the right time of day and season. I can guarantee, with a few minutes of operation, it is going to be much better than any image I can take.

There are so many good pictures out there of so many great places, a person has to make the decision whether it’s worth investing money and time to take that photo, because it’s never going to look like that photo on Flickr and eBay. We might be approaching the time when we have completely saturated the space of all the photos we can take.

The only delta is the people we care about – how they look that day, and so on. If I’m standing in front of the Eiffel Tower and I take a picture, I don’t really care about how the Eiffel Tower looks. I don’t need a camera that captures the tower well. All I really care about is if my kid or my wife looks right in that picture. So I want this delta that ignores most of the pixels and only focuses on pixels I care about. Even then, I probably have a much better picture of my daughter and my wife somewhere in my photo col-lection. So they don’t have to be dressed the best and be in the best mood, and my daughter doesn’t have to smile at the right moment. All that information is already available.

We don’t know how many of these wishes will come true. Lots of smart people around the world are thinking about it. We’ll see many of them in the next 5 to 10 years; but we can be sure computational photography will be there. The photo of tomorrow will not be just recorded; it will be computed.