Interactivity = Reflective Expressiveness

1070-986X/07/$25.00 © 2007 IEEE Published by the IEEE Computer Society 1

All types of media influence our opinions,feelings, and mood. We consume it because

it reinforces our behavior and we identify withroles and values in it. It educates, informs, andentertains us through both escapism and emo-tional release. In short, it shapes our lives.

Today, however, we’re bombarded by mediaas we surround ourselves with tools that let usconsume as well as produce and exchange anysort of medium, at nearly any time and place.Much of this media is pre-prepared and pushedtoward us, such as local information via PDAsand mobile phones, high-definition advertise-ment clips, and music that’s so common that werealize its permanence only once it’s absent.

And at home, we really enter the game withmassive amounts of user-generated content. Weuse laptops to distribute media documents to thevarious channels, such as blogs for the news, Flickrfor photographs, YouTube for films, MySpace forinformation, and so forth. (See the “CommunityLinks for Getting Involved” sidebar on p. 4 for spe-cific URLs.) If we’re in a playful mode (or simplyhave had enough of this life altogether and wishfor a different experience), we enter a networkedmultiplayer virtual world, where we again gener-ate media information.

Yet, these forms of communication producethe phenomenon Neil Postman described in hisbook Amusing Ourselves to Death.1 He assertedthat mass communication—he used TV as anexample—confounds serious issues with enter-

tainment, demeaning and undermining discourseby making it less about ideas than image. Thetechnology that today lets us interact with nearlyeveryone about any topic fosters the broadcastingapproach and drives our interactive behavior to apoint of arbitrariness, where every media is appro-priate for every kind of knowledge transfer.

If that is so, can we get back to a place whereusers can make educated decisions about the appro-priate use of media for communication? Or are thecurrent tools sufficient? Are we happy to commu-nicate with others through technology because itlets us hide behind it? We’ve assembled some ideason the relationship between interactivity andmedia technology and drawn a few conclusions.

Interactivity is expressivenessPeople certainly like to express themselves,

and for that purpose, they will make use of anytype of technology offered to them. Looking atFlickr, YouTube, and MySpace, it’s apparent thatthese platforms serve people accustomed to com-munication tools such as mobile phones, digitalcameras, and camcorders. They let people expressthemselves via the captured material to a con-trolled smaller group or the infinite public.

Yet, this doesn’t mean that each contributionis meant to start a discussion or give an argu-ment, advice, or a recommendation. Often thecontribution is understood as merely a simplestatement—look at, listen to, or read what I thinkis funny, dramatic, aesthetically pleasing, fash-ionable, or uncool. The presented material can besimple, such as the photograph of a letter fromthe alphabet (a group at Flickr is doing just this)or a funny‚ candid-camera-type accident filmedwith a mobile phone.

More elaborate material that requires planningand authoring is also available. Examples includea karaoke piece portrayed with the camcorder andedited with a software tool or happenings such as

Visions & Views Susanne BollUniversity of Oldenburg

MichaelHausenblas

JoanneumResearch

Frank NackLIRIS, UniversitéClaude Bernard

Lyon 1

Multimedia has always aimed for interactivity. However, interaction haslong been seen as “click, select, and consume.” This article discusses thepotential and challenges of the next generation of interaction, in which usersget involved and shape their own multimedia world. Consumption and pro-duction of multimedia melt into a reflexive and directly interactive process.

—Susanne Boll

Editor’s Note

Interactivity = ReflectiveExpressiveness

continued on p. 4

4

IEEE

Mul

tiM

edia

Visions & Views

“street kissing,” where participants attack inno-cent victims on the street with a kiss and then dis-tribute the reactions on the Web.

These types of self-reflexive audiovisual state-ments don’t aim for sophisticated presentationand don’t require responses beyond a place onthe most popular contributions or latest uploadslists. These scenarios, which feature a type ofinteraction based on a user–system relationship,follow the TV model. A user provides the con-tent, and the observing audience rates it.

In this respect, current environments servetheir audience well. Interactivity is rooted insingle-direction networks that don’t focus onthe content per se, but on distributing it to fos-ter visibility and accessibility. Here current tech-nology supports a simple interactivity thatserves the inherent human urges for narcissismand voyeurism—the implementation of AndyWarhol’s theorem that in the future, everyonewill be famous for 15 minutes.

Interactivity is reflectionZennström and Friis (see the Joost project,

http://www.joost.com) demonstrate the most

indicative implementation of this user–systeminteraction model with their new universal TV pro-ject Joost (formerly known as the Venice Project).Millions of networked PCs fortified with tradition-al video servers establish interactive TV. The con-tent owners support it (the content is secure viaindustrial-strength encryption), users love itbecause it’s free, and advertisers adore it becausethey can continue using their established businessmodels since the medium is essentially the same.Yet, Joost differs from YouTube and others becauseit distributes high-quality content produced byprofessionals who’ve put lots of thought into it.

This quality aspect will sooner or later alsoinfiltrate the simple expression-based interactionmode that the current narcissism–voyeurismgame is based on. It will open interactivity up toa more complex, reflexive type of interaction,where the mediator is no longer the distributionsystem but rather the content itself.

This will occur because the compassion ele-ment drives the development. The ego-supportiveadrenaline of the new fame game and the atten-tion of peers (represented in the form of com-ments and friend tags) will become increasinglyrelevant. At this community-building stage, con-tributors start to think about what they want tocommunicate, how they wish to do it, and howtheir peers will perceive it. The aim becomesputting yourself out there in a way that allowsothers to relate. At this point, a shift occurstoward an interaction between user and materi-al as well as between users themselves.

This is the stage of reflected expressiveness. Whatdoes this mean? As users evolve, so will their urgefor better content. This will require user- andgroup-based rating systems that let them retrievewhat they desire. Yet, the desire doesn’t only covercurrently available material but, on a far largerscale, material they wish to produce. People willinvest even more effort into producing content forprivate matters than for their professional needs.

Eventually, users will reach a level where theirideas can’t be fully realized because the vision isbeyond their production capabilities. Thus, therewill be a need for tools that help them achievetheir goals, such as cameras that help develop aplot—an example is the common-sense cameraBarbara Berry from the Massachusetts Institute ofTechnology developed.2 The challenge for thesetools is to combine the user’s individuality andcapabilities with the intrinsic expressiveness ofthe medium the tool covers to achieve constant-ly improving results.

To see examples of successful endeavors with interactive multimedia,we recommend the following links:

❚ Flickr—http://flickr.com

❚ YouTube—http://youtube.com

❚ MySpace—http://myspace.com

❚ San Francisco International Film Festival, International Remixer—http://media.sffs.org/remix/RemixerHome.php

❚ Advene Project—http://liris.cnrs.fr/advene/

❚ New Media, New Millennium (NM2) project—http://www.ist-nm2.org

❚ Second Life (a 3D online digital world)—http://secondlife.com

❚ Joost (formerly the Venice Project)—http://www.joost.com

❚ Introduction to new Game Console generation—http://media.www.hartfordinformer.com/media/storage/paper146/news/2006/12/07/Entertainment/New-Game.Consoles.Swipe.The.Wii.Xbox.Playstation.3-2528632.shtml

Community Links for Getting Involved

continued from p. 1

Moreover, new ways of interaction will emergebetween users and content when users wish toalter existing high-quality material to achieve theircommunication goal. Millions of karaoke-stylevideos on YouTube demonstrate this, where usersadd their personal interpretation or presentationof professionally produced music; some even cre-ate videos with reused material. (For example, seethe anime culture from the West Coast of the US,where such re-editing is an essential part of theanime community (see for example the animeconvention at http://www.anime-expo.org/).

For the public to reuse material on a spectrumbetween low to high semantics—that is, search forimages by color schemes (low-level semantics) ver-sus for a particular mood in a video scene (high-level semantics)—available content must beautomatically or semiautomatically analyzed in farmore detail than for mere consumption. Seman-tics, such as validity, referentialness, aestheticalvalue, communicational usability, and productionmethods will be relevant. For the content to beaccessible on these layers requires different anno-tation tools that embed these tasks in the author-ing process itself.3 A simple example might be asemantically enhanced Microsoft PowerPoint thathelps establish the right structure for the presenta-tion, as well as utilize established genre patternsand reuse already-existing presentations on a sim-ilar topic. Such a system would also support theuser by retrieving the content that best matches theestablished structure with respect to its communi-cation role and its expressive form (modality).

Both the guided production as well as thereuse and manipulation of material are rooted inthe user’s wish to communicate successfully withthe peers on a face-to-face or group basis. Forevery community-building environment, it’sthen essential for the content and its context tomerge. This means that the clear line betweenconsumption and authoring we know today willdisappear, as will the linear content model. Con-tent will turn from a dead object into an evolvingelement as part of an ongoing communicationprocess, altering between idiosyncratic self-reflec-tion and distributed multilevel discourse. In otherwords, production (authoring) and consumption(perception) will be the same and every mediaitem will carry its own history.

Are we far off? Google has announced that itwill make YouTube’s material available throughGoogle video—but how they will do it still remainsto be seen. Yahoo has started using the machinetags from their acquisition of Flickr for more than

simple search capabilities. (See the “Machine Tagsin Flickr” sidebar for more details.) Microsoft nowoffers the Vista operating system, which alreadytries to blur production and consumption. In addi-tion, the newest generation of gaming consoles—including the PS3, Wii, and XBox 360—all provideInternet connectivity. They encourage massivegaming, where players can seamlessly crossoverfrom text-message communication to playing amultiplayer game to sharing a video from thegame with some friends. But the first real changeon how to interact with content in a multiuserenvironment will come from an unexpected con-tent provider—namely, the broadcasters.

Interactivity and nonlinearityFor those who believe traditional linear TV has

reached its limits, it might be astonishing to learnhow much innovation is coming from broadcast-ers. The main reason is that revenues from head-onadvertising are declining because audiences—espe-cially younger generations who have grown upwith the Web and modern technology—havefound other ways to be entertained and informedthan what the old monopoly can offer. More-over, broadcasters have to worry about broadbandproviders. These providers also are selling thebroadcasters’ services, so they have little concernfor experimenting with new formats that are com-patible with the iGeneration (http://en.wikipedia.org/wiki/Internet_generation), for whom bound-aries between end-devices and content blur.

Yet, broadcasters understand that people areinterested in the content and the possibilitiesconnected with it rather than the presentationsystem they use to distribute or receive it. Thus,in some way, broadcasters perform the sameapproach as Zennström and Friis. Broadcastershave a well-established technology for an old dis-tribution model; but they can also take the bestthing about TV—the content—and add the bestthing from the Internet—interactivity—to createsomething really new.

5

Ap

ril–June 2007

Flickr is rolling out a new feature called machine tags that lets users bemore precise in how they tag and search for photos. These tags use a spe-cial syntax to define extra information about a tag; they have a namespace,a predicate, and a value. The namespace defines a class or a facet that atag belongs to (geo, flickr, and so on), and the predicate is the name of theproperty for a namespace (latitude, user, and so on). See http://www.flickr.com/groups/api/discuss/72157594497877875/ for more details.

Machine Tags in Flickr

In late 2006, the first ShapeShifting TV program4

titled “Accidental Lovers”5 went live in Finland(see http://news.com.com/BT+invents+semantic+television/2100-1026_3-6143307.html). With this,a step toward the coalescence of two researchcommunities—multimedia and the SemanticWeb—was established.

ShapeShifting TV uses Internet Protocol televi-sion (IPTV) platform technologies for nonlinear,interactive, narrative-based movie production. Thetools for personalized, reconfigurable media pro-ductions are elaborated (or tested or evaluated) in sixaudiovisual productions that range from news todocumentaries to an experimental TV production.

“Accidental Lovers” is a participatory, blackcomedy musical (see Figure 1) that explores vari-ations of a deadly love relationship between a 61-year-old cabaret singer, Juulia, and a pop star,Roope, 30 years her junior. Viewers affect theepisodes’ twists and outcomes by sending textmessages to a system that triggers story eventsbased on keyword recognition (see Figure 1). Theinteraction grammar in “Accidental Lovers” isbased on keyword recognition: each submittedShort Message Service (SMS) is scanned for key-words that trigger a single thought (voice-overaudio clip) for one of the main characters. Eventsare collected from a large database of video andaudio clips of improvised scenes that are associ-ated with keywords and sensitive to the incom-ing SMSs. The system responds immediately byplaying audio dialog for the characters and indi-rectly through consequential thematic changeson the video images and scenes.

The Finnish National Broadcasting company,YLE, broadcasted “Accidental Lovers” in Decem-ber 2006 several times; the audience acceptancewas overwhelming, resulting in different storylines at every show. The analyzed statistics showed

that the production reached more than 1 millionviewers. During the show, approximately 10 per-cent of the received text messages were integratedinto the storyline in the form of thought bubbles.

“Accidental Lovers” is one example of inter-activity based on semantic manipulation, merg-ing end devices for the live manipulation ofaudio–visual media. Although it provides a firstlook at what interactivity means (with respect tothe relationship between content and user andbetween user and user), it also tampers with sub-stantial interactivity problems:

❚ Identification. Will an engager be able to findthe “correct” point of view and identify withhis or her hero?

❚ Relevance. What impact does a single vote havein a 100,000-user environment?

❚ Immediacy. How do participants get feedbackon their decision?

❚ Granularity of decision making: How manydecision points will be offered? Are there cheatcodes or shortcuts to step over problematicsections or avoid boring repetitions?

Another system that further addresses the prob-lems of content and user interactivity is Ryan Shawand Patrick Schmitz’s International Remix,6 a Web-based editing suite that lets the public edit videosto create their own 1-minute films. At the 2006 SanFrancisco International Film Festival, 19 directorsfrom Brazil, Canada, England, Macedonia, thePhilippines, South Africa, Spain, Taiwan, and theUnited States agreed to allow their films to be slicedand diced by the world remixers (see Figure 2). Thebest 50 films were shown at a special festival event.People used the tool enthusiastically, and the mainconclusion drawn by the creators is that commu-nity annotation and remix of multimedia archivesare paradigmatic human-centered computingapplications; their design requires careful attentionto user experience and social dynamics.

It’s a long way offSo, where is all this leading? We can certainly

say that the future of media is bright and increas-ingly interactive on all types of levels. Yet, interac-tivity will remain synonymous with handlingpre-prepared, linear media assets on several devicesfor a long time. By handling, we mean uploadingand downloading as well as navigating the con-

6

IEEE

Mul

tiM

edia

Visions & Views

Figure 1. “Accidental

Lovers” production

screen shot. The first

subtitle presents the

context (such as

commentator or reason

for comment) and the

larger subtitle presents

the comment itself. The

heart represents the

relationship’s status.

(Copyright 2007,

“Accidental Lovers”

[“Sydän kierroksella”],

courtesy of Crucible

Studio, University of

Art and Design,

Helsinki, and Mika

Lumi Tuomola; see

http://mlab.uiah.

fi/~lsaarine/

accidentallovers.html.)

tent. This monodirected interaction is based on theidea that media is there to be consumed (for thecouch potato in all of us). Its direct manipulationisn’t something for those of us conditioned by theTV distribution model. Thus, broadband providersand device producers will survive for some time onthe classic triple-play model: bundle a telephoneservice, a high-speed Internet connection, and TVat a flat rate and the world is fine. Hence, improv-ing these technologies to enhance their servicesisn’t a waste of time. The same goes for traditionalTV and other communication tools, such as themobile phone, the digital camera, or the cam-corder. They will stay with us because they servethe expressive side of interaction well.

But new generations will evolve, even if therewill be those that prefer the “laidback” experienceover interactivity. Yet, an increasing number ofpeople will come to expect interactivity that’sbeyond the click, search, and consume pattern.For them the content is the message, not the end-device presenting it, and they will make use of thebidirectional communication path already avail-able today to its full potential. For them, interac-tivity is reflective expressiveness, a way to makeeducated decisions with respect to the appropri-ate use of media for communication purposes.

Let’s face it, our children will request, soonerrather than later, ways to interact with contentand through that content with their peers. Thismeans we have to think about authoring, reuse,and presentation in environments mainly builtout of media items. Production and consumptionmust blur into one interaction process, driven bythe individual in group settings. What we need aretools that facilitate continuous authoring, editing,manipulation, and altering of content. We wouldperform all that in a context that the tools thenremember, but such tools aren’t ready yet.

The real challenge, however, will be to providethese tools not only for desktop environments,but also for interfaces that allow an even moredirect and interactive way of manipulating media,such as authoring tools on mobile phones (a pro-duction unit with camera and voice control).Such tools certainly will be more accessible tothose who are more accustomed to talking ratherthan writing but are highly trained in visual com-munication, such as large populations in Asia.

As a community, multimedia researchers haveignored content authoring for everyday users andprofessionals on a grand scale. (Some exceptionsinclude the groups run by Dick Bulterman andLynda Hardman at CWI, http://www.computer.

org/portal/cms_docs_multimedia/multimedia/content/blog/bulterman.wmv; the SILEX group atUniversity Claude Bernard Lyon 1; and SusanneBoll’s work at Oldenburg University.) Authoringfor interactivity is a hard problem, but it’s ourcommunity’s responsibility to address it as well.More is required to offer a media future—a futurethat will be interactive in different ways as we may think. MM

References1. N. Postman, Amusing Ourselves to Death, Methuen

Publishing, 1987.

2. B. Berry, “Mindful Documentary,” doctoral thesis,

Media Arts and Sciences, Massachusetts Inst. Tech-

nology, 2005.

3. F. Nack and W. Putz, “Designing Annotation Before

It’s Needed,” 2001; http://citeseer.ist.psu.edu/

nack01designing.html.

4. D. Williams et al., “ShapeShifted TV – Enabling Multi-

Sequential Narrative Productions for Delivery over

Broadband,” Proc. 2nd Institution of Eng. and Technolo-

gy Media Conf., IEE Press, 2006; http://conferences.iee.

org/multimedia/ithd_digest/pdf/Doug_Williams.pdf.

5. “Accidental Lovers,” Crucible Studio, M. Lumi Tuomola,

dir., Univ. of Art and Design, Helsinki, Finland, 2006;

http://mlab.uiah.fi/~lsaarine/accidentallovers.html.

6. R. Shaw and P. Schmitz, “Community Annotation

and Remix: A Research Platform and Pilot Deploy-

ment,” 2006; http://www.ludicrum.org/plsWork/

papers/ShawSchmitzHCM06.pdf.

Contact Frank Nack via email at [email protected]

and Michael Hausenblas via email at michael.hausenblas@

joanneum.at.

Contact Visions and Views editor Susanne Boll at

[email protected].

7

Ap

ril–June 2007

Figure 2. Visitors at the

San Francisco

International Film

Festival used

International Remix,6

a Web-based editing

suite, to remix

multimedia selections

from 19 films. (See

http://timetags.research.

yahoo.com/ for example

results.)

Documents

Interactivity = Reflective Expressiveness