Data Engineering

8/8/2019 Data Engineering

1/18

Draft: Keynote Talk at Int. Conf. On Data Engineering 2003.

Out-of-the-Box Data EngineeringEvents in Heterogeneous Data Environments

Ramesh Jain

Electrical and Computer Engineering, andCollege of Computing

Georgia Institute of Technology,

Atlanta, GA 30332-0250

[email protected]

Abstract

Data has changed significantly over the last few decades. Computing systems thatinitially dealt with data and computation rapidly moved to information and

communication. The next step on the evolutionary scale is insight and experience.Applications are demanding the use of live, spatio-temporal, heterogeneous data. Data

engineering must keep pace by designing experiential environments that let users applytheir senses to observe data and information about an event and to interact with aspects ofthe event that are of particular interest. I call this out-of-the-box data engineering because

it means we must think beyond many of our timeworn perspectives and technicalapproaches.

1.0 Introduction

Data engineering has evolved and continues to evolve faster than most people everimagined. While computing in the early 1970s had only alphabet and numbers,

technology now furnishes its users with an unprecedented volume and variety ofdatafrom encyclopedia pages to clips of the latest music, from a spreadsheet to a real-

time recording of a triple bypass. And access methods and requirements are evolving atthe same pace.

To keep pace with the demand for live, spatio-temporal, heterogeneous data, data

engineering must let go of old paradigms. They have outlived their application. It is timeto think out of the boxto consider what the operating environments of these new

systems should look like. How can we build something that is experiential, notinformation-centric?

Equally interesting is that user expectations of the data system have changed morerapidly than the data itself. To keep up with these changes, we must consider what theoperating environments of future systems would be and how to realize those

environments rather than how to accommodate new functionality in our existingparadigm.

In this paper, I look at the changing nature of applications by considering a few novel

applications that use large volumes of data and then discuss the functionality expected


2/18


from these systems. That computing systems have evolved to follow user demand andapplication development is an important insight in this discussion: applications initially

focused on data and computation, then information and communication, and now insightand experience. Most techniques in data engineering were developed to meet the needs of

data systems of the last quarter of the 20th century. Data engineers must now address the

needs ofthis century.

2.0 Emerging Applications

Some of us are old enough to remember the gentler times of database engineering. Todefine the requirements and structure of a database application, we merely looked at the

corporate database. An entity, usually an employee, consisted of alphanumeric fields,each of which represented some attribute. Users posed a query to discover an employee

attribute or to find all employees that satisfied certain attribute-related predicates.But although 2003 users have vastly different expectations, most databases still

have the 1970s philosophy: Users ask queries to get answers in an information-centricenvironment. This works well as long as all the users have the same requirements. The

database is a resource for many people and provides a well-defined environment forarticulating queries.

Problems come when user requirements differ or when users dont know enoughto ask the right questions. I see some of these problems already in applications that at first

glance dont appear to be database applications. But thats why we need out-of-the-boxthinkingto recognize that these applications are in fact the future of databases.

2.1 Personalized Event Experience

Suppose you are interested in cricket. A match lasts for five days and may not result in adecision, even after 30 hours of play. You may be an avid fan, but you dont want to

spend five days watching a game video. You want to watch specific events in the game,how specific players perform, all scoring or all defensive highlights of a specific team, or

any player comments on a specific play. You may also want to understand how the gameevolved by seeing a fast animation of statistics related to different categories, or see a

particular event from several camera perspectives or listen to different commentatorsdescribe the same event.

All these desires translate to query types that the game database must answer, andit must present the answer in way that lets you enjoy the game. Current portals isolate

information in silos so completely that users spend more time trying to navigate withinand across silos than in enjoying the game. Also, users, not portals, should determine the

events of interest.

2.2 EventWeb

Web search engines, for example, are notorious for their lack of discrimination. XML has

not solved these problems because ultimately no search engine can anticipate a usersexact needs. The semantic Web is receiving a lot of press and people are pinning many

expectations on personal agents to help find the right information and services.Im not convinced that this will solve the problem. The semantic Web still follows

the legacy of Gutenberg. It is a web of pages that are predominantly prepared in adocument mode. Again, this is fine if all you want are descriptions. But we can do so


3/18


much more. Visualize instead a web of events, in which each node represents an event,past or present. Each event is not just someones description of the event or some

statistics related to it. It is the event, brought to you by one or more cameras,microphones, infrared sensors, or any other technology that lets you experience the event.

For each event, all the information from sensors, documents, and other sources is united

and presented to the user independently of the source. The user then experiences thepreferred parts of a particular event in the preferred medium.In this vision, events are treated equally. The archived video of a news event is

accessible in the same way as the CEOs Web cast or your sons first football game. Thesource can be anything from CNN to the local elementary schoolwhatever or whoever

generates events worth archiving. And perhaps most important, because it is not textcentered, the event Web will reach the 90% of humanity who either cannot grasp or

cannot access current information and communication technology.I see the rudiments of this vision already. Stores are stocking Web cams in every

shape and size at prices that even students can afford. Sensors that were once discrete arenow being connected to form networks for various Internet applications, from a sushi bar

in San Francisco to an ant colony in Lansing, Michigan. Multimedia phones with built incameras will be next. In short, we are witnessing the beginnings of the event Web

explosion; just as decades ago we saw the document Web transform.

2.3 Scientific Applications and Data Warehouses

Transforming data from disparate sources to a sensory medium that people can

experience provides the opportunity for deeper insight into a problem or situation. Thiswas the basis for the use of the oscilloscope and many similar instruments. Visualization

techniquesa more modern oscilloscopeemerged as a powerful analysis and insightgeneration environment. Now data for many applications, from customer transactions in a

retail outlet to bioinformatics, is collected and stored in data warehouses.The data sources for these are diverse and the volumes very large. To use the data

in data warehouses effectively, we need tools and techniques to transform disparate datainto a form that will let people experience and gain insights into the situation. Current

database technology was developed for applications that were challenging three decadesago. With the completely changed landscape of data, we require new technology to

explore this data in most cases to generate insights into an event. We cannot do this in aquery-centric environment. The oscilloscope brought an experiential environment to

science and technology in the last century. We require an oscilloscope to bringexperience to computing in this century.

2.4 Situation Monitoring and Analysis

The increasing number of applications requires that we establish a large network ofdisparate data sources, including both sensors and human-entered data systems, to

produce a data stream. The data from these sources must be interpreted and combined toprovide an overall picture of the situation. This data is used for warning about potential

disastrous events, to provide the status of activities at different locations, and to analyzethe causes of past events. In many cases, using a real situation in the past, different what-

if analysis must be done to develop solutions for similar situations. In all suchapplications, real-time data analysis must combine with real-time data assimilation from


4/18


all sensors to provide a unified model of the situation. Users should not see the situationas raw data from different sensors, but as the evolving big picture of the situation. Thus,

in an emergency situation, users see not just isolated sensor streams from differentlocations, but situation characterized as needing medical help, fire engines, or police. In

all these applications, users are interested in the real-world situation from a users

perspective not the data from a specific source. Sensor data is but one of several sourcesthat form the model of the situation.

3.0 Common Data Characteristics

On the surface the requirements for experiential applications seem very different, but

upon closer examination they have important similarities:

Spatio-temporal data is important. Different data sources provide information to form the holistic picture. Users are not concerned with the data source. They want to know the result of the

data assimilation (the big picture of the event).

Real-time data processing is the only way to extract meaningful information Exploration, not querying, is the predominant mode of interaction, which makes

context and state critical. The user is interested in experience and information, independent of the medium and

the source.

Data sources are broadly of two types, precise and imprecise, and user requirements forthe data fall into roughly two categories, insight and information. The matrix in Figure 1

captures the tensions between these four characteristics. In many situations, I know thedata source precisely, even though it may be distributed. In other cases, I know only that

what I need is available from somewhere. Likewise, in some situations, I am trying togain insights into the behavior of a system, event, or concept, so my primary need is to

explore and understand. In other situations, I need information and I want a specificanswer.

Predictably, databases are in the intersection of precise and information, thebottom left quadrant. Nothing beats them as a means of getting information from a

precise source. In the top left quadrant are visualization environments and tools,promising ways to gain insight from a precise source. In the bottom right quadrant are

search engines. Few people will argue that search engines are an imprecise source.However, their intention is to provide information, not further exploration. Unfortunately,

exploration does occur, but it usually to find a suitable match for the query, not to explorean event further. Finally, in the top right quadrant is the intersection of insight and

imprecise source. This intersection produces what I call the experiential environment, anew way of presenting data that will become increasingly common in most data-intensive

applications. This will then improve techniques in the other three quadrants.


5/18


Figure 1. Data sources and access goals

4.0 Experiential Environment

An experiential environment is the collection of sensors and other data sources presentedin a unified model that lets the userdirectly apply his senses to observe data and

information of interest related to an event and to interact with the data according to hisinterests.

Current database systems are essentially stateless, which lets them interact withmultiple users in multiple contexts as efficiently as possible. The drawback is that people

must adapt to the machines way of doing things. In the early days of computing, peoplewere in awe of the machine, so they were willing to rearrange their perspectives and

activities to be part of its environment. Now they are far less reverent and demand thatthe machine make the adjustments. My Yahoo, My AOL, and other personalized Web

pages reflect this my way shift. People expect their systems to remember what theylike, where they went, what they need to do next, and where they like to shop. The

system must remember how they got to a particular state, to answer questions in thecontext of that state, and to evolve to another state in a kind of symbiotic partnership with

the user. E-commerce recognizes this shift, which is why Amazon suggests other booksyou might enjoy and displays books and other products that you most recently browsed.

Ironically, this relationship brings out the best in both partners. Humans areefficient conceptual and perceptual analysts, but relatively poor in mathematical and

logical evaluation; computers are exactly opposite. Computers can perform mathematical

Current Databases

Visualization

Search Engines

ExperientialEnvironments

Precise Source Imprecise Source

Information

Insight


6/18


and logical operations millions of times faster than any person, but their perceptualcapabilitieseven after all the progress of the last 40 yearsremain relatively primitive.

Yet current databases present sequential and logical information to humans andexpect computers to detect complex patterns. The powerful synergy of human and

machine is short-circuited. If we use computers and users synergistically, we can develop

the experiential environment.

4.1 Emphasis on natural

We are interaction-oriented creatures. It is how we learn about our environment. Weperform an action, see its effect on the environment, and act in response to that. In the

typical query system, however, we articulate a query, wait for the system to provide ananswer, analyze the response to see if the system understood what we really wanted, and

more often than not, formulate a new query and start all over again. This process ispainfully out of synch with our natural desire to learn through interaction. In experiential

environments, users get data that they can easily and rapidly interpret using their naturalsenses. Once they interpret the data, they can interact with the data set to either get a

modified dataset or to perform certain actions. At any given time, the data set from theprevious interaction is available and the user interacts with the system as a result of this

holistic information.

4.2 Similar query and presentation spaces

Most current systems use different query and presentation spaces. A query environment

helps users articulate their queries, the system computes the results of the query, andpresents them in a very different form. Search engines provide a box for the user to enter

keywords and the system responds with a list of thousands of entries spanning hundredsof pages. A user has no idea how the entries on the first page relate to the entries on the

113th page (if she gets that far), how many times the same entry appears, and often howentries on the same page can possibly have anything in common. Contrast this to video

games, in which the player formulates a query by selecting some action on the screen andthe system presents the result as some change on the same screen. Here query and

presentation spaces are the same, and the relationships among all relevant objects areclearly presented in a form that is obvious to the user.

4.3 State and context

People dont willingly change their physical and mental states abruptly. A gradual shift incontext is much preferred, even in natural language, which is why we take pains to insert

transitions like on the other hand to signal a contrast or similarly to signal acomparison. People simply operate better in known contexts because they can understand

enough about the relationships among different objects in space and time to drawinferences about them or create models of them. We live in a world that is continuous in

both space and time, so we are most comfortable organizing our knowledge of oursurroundings in that manner.

The space-time continuum is foreign to databases, and information systems in general.How can a stateless system respect the demands of spatio-temporal data? Databases may

be efficient, but being stateless has distinct disadvantages. Latency is a big one. Not onlyis the system slow, it doesnt even give feedback about its lack of progress. The perpetual


7/18


hourglass, the bar that takes an agonizing amount of time to fill, or the endless flittingpages are the only indications that the system hasnt completely abandoned its task. Some

Web sites try to reveal the number of bytes left, which is marginally useful as long astraffic allows. Nothing, however, will induce users to explore if it takes too long to move

from place to place. When latency is low, on the other hand, exploration is much more

pleasurable. Video games are an example. Their appeal is due in part to their near-zerolatency.

4.4 Multimedia immersion

Video games are also popular because they provide a powerful visual environment, and

in some cases tactile inputs. Early computing environments were strongly text orientedbecause they had to be. The technology couldnt support any alternative form. There is no

longer a reason to avoid powerful visual and audio presentations. Other senses may also become a familiar part of the computing environment. In some cases, like chemical

industry or culinary applications, smell could make a more compelling and immersivepresentation.

I allude to video games often because they are a powerful example of small-scaleexperiential environment. Anyone can use them, even children who cannot yet read or

write. A video game can keep even these young users engaged for hours, a testimony toits natural interactive ability.

5.0 Assimilating Data into Unified Information and Knowledge

In the experiential environment, data can be anythingaudio, video, text, alphanumerics,infrareddepending on the sensors employed. Current databases and information

systems were designed using data obtained and mediated by people, so predictably theyended up in alphanumeric form or in text. Database designers developed techniques to

organize such data and deal with it effectively. New applications require not just the datarepository, but complete environments for information and insight.

This complete environment requires rethinking the way we index data. Currentindexing techniques for different data types depend on metadata for that particular type.

Metadata plays a key role in introducing semantics and is important in determining howdata will be used. Schemas provide semantics in relational tables.

XML has become very popular for introducing semantics in text. Ironically,XML came about because researchers were trying to develop automatic approaches to

deduce semantics from the data. When it became clear that reaching this goal was farmore complex than they had thought, researchers turned to a mark-up approach to

semantics and threw in languages as well. I fail to understand the degree of excitementabout XML. Clearly, it will solve some interesting problems, but it is notthe panacea

many people believe.XMLs utility is limited to the introduction of semantics in strongly human-

mediated environments. For sensory data like audio and video, feature-based techniquesare much more useful. Here the goal is to identify some features in data that will serve as

a bridge between data and semantics. The idea of using clearly detectable attributes asfeatures to infer semantics seems an obvious solution.For images, commonly used

features are color histograms and simple measures of texture and structure. Mosttechniques measure global images, not objects within images, yet people are most often


8/18


interested in objects. How to get to those objects is a problem that most data engineersare ignoring.It is true for most other signals in medical, seismic, and other applications

also. Signals are usually indexed using features that capture global or semi-globalfeatures, while semantics usually requires the structure of local features.

Unfortunately, these techniques are still immature, primarily because researchers

are interested in developing general-purpose techniques rather than restricting theirsystem to a specific context. Researchers can learn from the success of natural languageor speech recognition systemsall successful systems work in a specific context.

5.1 Breaking down silos

Everyone gets information about objects and events from different sources in differentdata types. What I know about the war against terrorism is based on what I saw on TV,

read in newspapers and magazines, heard on the radio, and discussed with my friends.That is, my perception is based on information Ive assimilated from multiple data

sources. Somehow, and quite naturally, Ive assimilated all this information andrepresented it in a unified form that is independent of the individual data source.

Information systems, in contrast, create data silos. The metadata is defined andintroduced for a data of a particular type, which is indexed and neatly stashed in its own

place. A video collection cannot interact with a text collection to produce a videotextcollection. Indeed, the silo structure is strongly defined with little or no interaction among

silos, as Figure 2 shows.

Figure 2. Different data sources have different indexing mechanisms, but these sourceslive in their own silos.

The challenge to the database community, then, is to break down these silos to unify

information. This requires more out-of-the-box thinking because most data sources aredesignedto behave like independent silos. Their creators assume that after the integration

system analyzes the silos and extracts their metadata, it will somehow combine the

Vide Audi SensorData Tex

Inde Inde Inde Inde Inde


9/18


metadata to provide correct results. Indeed, many current research efforts are aimed atthis kind of solution.

Researchers also form strong silos. I know from experience in many research areas,including image and video database research, that tunnel vision is common. Just as the

six blind men had vastly different ideas about the size, shape, and function of an elephant,

so the database, computer vision, and information retrieval communities have diverse(and equally stubborn) views of an image database. Having all these people developsystems without communicating is no more productive than having five students in

separate rooms attempt to produce a coherent thesis.Perhaps the challenge is to break down both kinds of silos.

5.2 Information Assimilation

Many system engineers, particularly designers of control and communication systems,use a strong, domain-based information-assimilation approach to estimate system

parameters that uses many disjoint and disparate information sources. The parameters ofthe mathematical system model are successively refined by observing the data as it

becomes available. In this approach, each data source is just one more source thatcontributes to the models refinement, and the goal is to get the most precise model

possible. At some point, it is possible to completely ignore data from a specific source.Thus, a data source is just that, a data source, and the model represents the current

knowledge about the system, knowledge that in turn is based on evidence from all thedata sources. Conceptually this approach is very different from current information

integration, in which the system analyzes a particular data source and then combines itsresults with those of other data sources.

A very important result of the assimilation approach is that the system canefficiently deal with real-time data by keeping only what is important for the goal of the

system. Most applications collect real-time data, and it is very important to know that alldata is not equal and the importance is context dependent. Data-engineering systems will

have to learn to ignore data.This approach also allows a very smooth and effective introduction of semantics

in the process. Here the semantics will be brought by modeling the data and informationflow in the system, representing states and state transitions, and the role of different

objects in different states. Thus, this modeling process will help in representing data aswell as in analyzing the data as it comes to extract only meaningful information.

5.3 Event Graphs for Unified Indexing

An approach to data silo breakdown, which I and my colleagues at PRAJA and at UCSan Diegos Electrical and Computer Engineering Department developed, is to build a

unifying indexing system that introduces a layer on top of the metadata layer of each datasilo, or disparate data source. The layer uses an event-based domain model and metadata

to construct a new index that is independent of data type. We decided to use the eventasignificant occurrence or happening located at a single point in space and timeas the

basic organizational entity for the unifying index because it has many applications andtheories in human memory organization. An application domain can be modeled in terms

of events and objects. Events are hierarchical and have all desirable characteristics that


10/18


have made objects so popular in software development. In fact, events could beconsidered objects whose primary attributes are time and space.

In our approach, an event graph parses the data as it is coming and assimilatesdata to build an environment model that reflects knowledge about the event on the basis

of information collected so far. As Figure 3 shows, event graphs essentially create a list

of spatio-temporal events as they take place. This becomes the database that describesdomain semantics and links these events to individual data streams. Users can study asmany or as few of these as they want or they can visit the entire stream to experience the

full event. Event graphs also capture the entities and their roles in the event, the eventslocation and time, and event-transition information. They capture causality in an event-

transition mechanism.An event base stores the name and nature of the event and all other relevant

information. The relevant information may not be available at the time the event iscreated. If so, when it becomes available, the system attaches it to the event. For example,

comments in the local newspaper about the CEOs talk can be linked to the talk whenthey become available.

Thus, the event base is an organic database that keeps growing as a result of manydifferent processes running. It differs from the current database form in this respect.

The event base also stores links to original data sources, which means the systemcan present the appropriate media in the context of a particular event. Thus, when the

cricket fan accesses the match to see all Virendar Sehwags boundaries in the first inning,the system can suppress all other shots and show only what the user wants.

The user interacts with the event base directly, and the event base uses originaldata sources as required. This has several important implications: The system can

preprocess important information related to events and objects according to its domainknowledge. It can present information using domain-based visualization, and it can

provide unified access to all information related to an event, independent of the time thedata became available or was entered in an associated database.

Because of these characteristics, unified indexing is the backbone of anexperiential environment.

DataSources

Eventbase


11/18


Figure 3. Event graphs unify different data sources by providing a semantic indexing and

linking approach.

6.0 WYSIWYG Search

As Figure 4 shows, an event has three dimensions: what it is (its name and class),

where it took place, and when it took place.A user navigating through an event base isinterested in finding all events in a certain class that occurred at a particular location and

time. The event name captures the events purpose and identity, the event class isorganized in an ontology defined for the application. The What part on the screen (top

left) presents a list of all events modeled in the system. Location can be specified in somekind of mapgeographic, schematic, or conceptual. Time is organized as a timeline.


12/18



13/18


Figure 4. An approach to show events and create a WYSIWYG search environment.Clear What-When-Where areas provide a multidimensional WYSIWYG search

environment.

Users can select one or two event classes or navigate through class ontology hierarchies

endlessly; there is no theoretical limit on the subclass structure.The depth of thehierarchy depends on the model used in the application and the data available. Selecting aclass automatically selects all its subclasses. To navigate through event location and time,

users either zoom or move in different directions, similar to the way video game playersselect parts of a map, from a room to an entire world. The event timeline could be

anything from microseconds to centuries, or even light years.At all points of the search the user experiences What-You-See-Is-What-You-Get

(WYSIWYG) characteristics. Once a user selects event classes, part of the map, and time,the system presents all events and their selected attributes at all three places. In the figure,

the user has selected the inventory class for SBU accessories worldwide in 2001, which isakin to the text query, Show me inventory status of all the SBU accessories worldwide

in 2001. The event list (bottom half of the screen) shows the details of the inventories.The colored dots on the map show the location and status of the inventory: needs

immediate attention, needs some attention, okay. To avoid confusion, this example doesnot show a color-code list and timeline, but if the user selects an item in the list, the

display will change color to highlight that selection and its corresponding symbols in thetime and location areas. The exact mix of color and symbols depends on the application.

By displaying events on a map as well as on a timeline, the WYSIWYG approachmaintains event context. The user can then refine the search through any window, say by

zooming into and out of the map or timeline or going left or right. A change in one partautomatically updates results in the other windows. Consequently, the query and

presentation spaces remain equal. Also, as users change the search criteria, they getimmediate results with minimal latency. In most applications, the results can be

instantaneous. Users can experiment with the data set on their own terms and developinsights at their own pace, always with the events context. The system displays results

continuously, making it easier to hypothesize about data relationships. It will be possibleto test a hypothesis by linking such a systemto data-mining tools that would let the user

explore large data warehouses.If a user wants to know more about the event, he can explore it by double clicking in

any of the three windows (what, when, or where). The system then provides all the datasources (audio, video, or text) and any other event characteristics packaged in an event

envelope, which the system automatically generates on the basis of the informationassimilated in the event base. The system can automatically create event envelopes using

the information in the event base as it is created. The user can launch a variety of sourcesfrom the envelope, and they will open in the desired mode in either a different window

than the user originally selected or in the same window. The system accesses andappropriately presents much of the information in the event envelope through links to

original sources, such as programs launched to present results of a particular dataset or asimulation.

An event envelope is a powerful mechanism that unifies the results of many complexoperations. If selected variables have dynamic attributes, the event envelope can present


14/18


historical attributes for those variables. Users can then save an event envelope as a snapshotthe particular state of an eventand compare it with other snapshots representing

later states. The snapshot button (top right in Figure 3) lists all event envelopes the userhas saved. An event envelope can be sent through e-mail and hence can help build

communities around specific themes. Amateur astronomers are interested in scanning the

sky for near-Earth-objects like comets. Clubs could exchange event envelopes andcommentary about the images in the same e-mails. Moreover, the envelopes wouldcontain links to details like magnitude and angular distance from a known star. This kind

of rich communication increases both individual and community knowledge.

7.0 Applications

Here we briefly describe three applications to give an idea of what could be done in

this environment.

5.1 Football

Figure 5: Event graph of a football game

The graph in Figure 5 is of the events in a football game. The text shows several levelsof event hierarchies, from the complete game to a particular drive. For simplicity, the

figure does not show levels below a drive. The graph also shows potential transitionsfrom one event to the other in the game in terms of downs. Thus, the graph represents a

subset of an event-transition diagram for the game.

Drive

1st

Dow

2nd

Dow

3rd

Dow

4th

Dow

FG

TD

TO

n Gamen Quarter 1

Team A:Drive

Team B:Drive

n Quarter 2 Team A:Drive Team B:

Drive

n Quarter 3 Team A:


15/18


The model is generic for the game. The sequence of events generated depends on theparticular game. Figure 4 shows only show a small subset of the event model to give a

flavor of application.Our data sources for the game included video (plus audio) from multiple cameras;

play-by-play information, which various companies generated and made available as a

data stream; and a player and statistics database. A rule-based system decided if aparticular play (an event) would be of interest to anyone and thus whether or not to savethe related video.

The system parsed the play-by-play data stream, applied the rule base, and prepared anevent base for the game. As Figure 6 shows, the event base appeared to the user as a

time machine, in that users could go to any moment and see all the related statistics,including score and timeout left, rushing yards, and first downs for each team at that

particular time.This display is like the one in we discussed above but designed using the domain

information for football and hence familiar to football fans. By clicking on the time line auser can select any time instant and see what was the state of the game at that time. By

moving the pointer on the timeline, users can see how the game evolved. They couldfilter events of their choice and set them in standard football representationthe football

field at the bottom of the screen. By double clicking on a play, they could get moreinformation about the play or watch a video of it through the event viewer? As they

watched scoring plays, from various angles, they could click on a player to get moreinformation.

Twenty five college football teams used this system. These fans could not watch thegame on national TV either because the game was not televised or because they were in a

wrong place, but could enjoy football games of their team in a compelling way using thissystem. They could watch the same play of their favorite player from multiple angles to

gain insight in what really happened.


16/18


Figure5 : Experiential environment for Football fans to enjoy multimedia presentation

in a time-machine format.

Insight building via experience and exploration is not limited to entertainment

applications. In a modern enterprise, line managers would like to identify in real time thepotential problem areas, how they got here, and how they can change things. The focus is

on performance indicators, which measure the discrepancies between planned and actual performances, and on the relationships among performance indicators and available

infrastructure, environmental factors, promotional efforts, and so on. It is not enough todiscover a problem; companies want to analyze why the problem arose so that they can

improve their processes. In this context, the why includes activities related to the problemas well as its historical perspective.

+TargetSalesAmount

+ActualSalesAmount

+SalesAmountDiscrepancy

+TargetSalesCalls

+ActualSalesCalls

+SalesCallsDiscrepancy

MonthlySalesActivity

+TargetSalesAmount

+ActualSalesAmount

+SalesAmountDiscrepancy

+TargetSalesCalls

+ActualSalesCalls

+SalesCallsDiscrepancy

YearlySalesActivity

1

12

DailySalesAmount

DailySalesCalls

QuarterlySalesAmount

QuarterlySalesCalls

What

Sales

Overall

By Customer

Customer1

Customer2

By Product Category

Overall

Cat1

Overall

By Product

Product1

Product2

Cat2

Overall

...

Inventory

....

Markerting

....

Figure 6: Event graph and taxonomy of a Demand Activity Monitoring application

Figure 6 shows an event graph for a sales forecast and inventory monitoring systemdesigned to monitor an automotive parts manufacturers key activities. These include

sales (monthly, daily, and hourly forecast target and actual for different sales regions) and


17/18


inventory (monthly, daily, and hourly available inventory for different warehouses).Activities are rolled up temporally (hourly _ daily _ monthly) and by various actors

(customers, parts, parts line, and so on). Figure 7 shows a screen shot of EventViewer forthis application. Performance indicators for each activity are mapped to red, yellow, and

green based on domain specific criteria.The display in figure 7 is the close-up version of

the display in figure 4 shown earlier. It is easy to see that one can select differentgeographic regions and different parts to understand what was going on in that part of theworld. These displays provide a holistic picture of the situation to an analyst who can

then drill deeper into the situation. The system allows those tools, but we will not discussthose things here due to space limitations.

Figure 7: Another display of the inventory application. Compare this display to the

one in figure 4 to see how the system can be used in WYSIWYG mode.

Conclusion

Rapid advances in many related areas have brought in some interesting challenges for the

data engineering community. Traditional database techniques need to be reconsideredand readapted for the new applications. Relational approaches are powerful and will still


18/18


be useful as a backend. But the front end of these systems requires data engineering thatis very different from what we have done so far. The challenge is take a more solution-

oriented perspective or be boxed into back-end repository management.Some new attributes of data emerge as dominant issues: semantics, multimedia,

live, location sensitivity, and separate streams of sensor and other data. To unify all

sources of information, events appear to offer a powerful approach for modeling,managing and presenting data. I believe that event-based experiential environments willbe useful in many emerging applications. The thoughts and ideas I have presented are

still in the conceptual stage. We have a long way to go in refining this approach to makeit practical, but it is clear we must take a new path, one that is outside conventional

thinking, if we are to keep pace with and enable these new applications.

Bibliography

[1] A. Katkere, S. Moezzi, D.Y. Kuramura, P. Kelly, and R. Jain, Towards video basedimmersive environments, Multimedia Systems, Vol. 5, No. 2, pp.69-85, 1997.

[2] S. Moezzi, A. Katkere, D.Y. Kuramura, and R. Jain, Reality modeling andvisualization from multiple video sequences, IEEE Computer Graphics and

Applications, pp. 58-63, No. 1996.[3] A. Katkere, J. Schlenzig, A. Gupta, and R. Jain, Interactive video on WWW: Beyond

VCR like interfaces, Computer Networks and ISDN Systems, Vol. 28, pp. 1559-1572,1996.

[4] Ramesh Jain and Arun Katkere, Experiential environment for accessing multimediainformation, Proc. Of Second International Symposium on Multimedia Mediators,

University of Tokyo, Tokyo, March 7-8, 2002.[5] Y. Roth, R. Jain, "Knowledge Caching for Sensor-Based Systems."Artificial

Intelligence, 2-24. 1994.[6] Simone Santini and Ramesh Jain, A Multiple Perspective interactive Video

Architecture for VSAM, Proceedings of the 1998 DARPA Image UnderstandingWorkshop, Monterey, CA, November 1998

[7] Simone Santini and Ramesh Jain, Semantic Interactivity in Presence Systems,Proceedings of the Third International Workshop on Cooperative and Distributed Vision,

Kyoto, Japan, November 1999.

Documents

Data Engineering