5
Information for People Laura M. Haas, Steve B. Cousins IBM Almaden Research Center Laura at almaden.ibm.com, scousins at us.ibm.com Abstract Ordinary people have access to unprecedented volumes of information today. Researchers in the fields of information management (IM) and human-computer interaction (HCI) are reacting to this challenge from their own unique perspectives. Having access to a billion records is cool, but having access to a billion people is awesome. In this paper, we look at recent research from both communities, and speculate on how interactions between the communities could enhance the user experience of information. 1. Introduction The Information Management (IM) and Human Computer Interaction (HCI) communities have traditionally had different interests, and different claims to fame. The IM community has worried about storing vast volumes of data, and supporting the complex, mission-critical manipulations of data needed by the business world. Core concerns have included features and functions, and of course, systems issues, such as scalability (in both number of users and data volumes), performance and robustness. The traditional motivations were to simplify application development (by raising the level of abstraction beyond bits and bytes), to keep information safe (meeting the ACID test), and to meet the needs of the business using the data. As a result, the IM community’s core technologies include query languages and query optimization, data models, support for particular types of data, concurrency control, transaction processing, security and other “systems” issues. Meanwhile, the HCI community cares about people, and, in particular, how to make it easier for people to use computers. Their core concerns are centered around learnability, usability, and accessibility. Since social and physical issues impact how people work, they think about supporting collaboration, creating interfaces that work for people with disabilities, and leveraging human perceptual abilities when designing user experience. Typical HCI competencies are the optimization of interactions, graphical user interfaces, social aspects of interaction, design, user studies, visualization, multi-modal interaction, end-user programming and ethnography. These two communities rarely work together, and in fact, hardly overlap. A recent call for papers for a major database conference lists forty major topics, only one of which is directly relevant to users 1 . The rejection rate for user interface submissions to mainline database conferences is legendary. At the major HCI conferences, having a properly conducted user study is de rigeur, and often the topics of interest to that community would not attract mainstream information management practitioners. In fact, the call for papers for an upcoming conference 2 reads to an information management person like a promotional piece about a self-help convention. Not only are the interests of the two communities distinct, but their styles are also completely different. Both goals and styles keep the communities apart. Where HCI researchers are concerned with making applications more usable on the front end, IM researchers work to make the applications easier to implement from the back end. The applications themselves are often implemented by a layer of people who come between the two communities, making it even less likely that these communities will interact without the help of intermediaries. Occasionally, both disciplines get interested in similar problems, of course. When they do, the difference in the emphases is instructive. In the late 1970’s, for example, two very different (and yet, very similar) tools for information storage and analysis were introduced to the world. From the information management community came the relational database. Out of the user-focused community, the spreadsheet emerged. Both are based on a tabular view of data. In both cases, the idea is credited to a brilliant researcher, but one (Codd) worked for a provider of information technology, while the other (Mattessich) was a business school professor, interested in helping a particular group of users (accountants). While the information management folks focused on retrieval and management for large amounts of data (how to select just the data desired, and quickly retrieve it), the inventors of the spreadsheet focused on simple manipulation of information to accomplish a task. Of course, the two technologies were later wed, so that 1 http://www.vldb2007.org/callforpapers.html 2 http://www.chi2007.org/ 1-4244-0803-2/07/$20.00 ©2007 IEEE. 21

[IEEE 2007 IEEE 23rd International Conference on Data Engineering - Istanbul, Turkey (2006.04.15-2007.04.20)] 2007 IEEE 23rd International Conference on Data Engineering - Information

  • Upload
    steve-b

  • View
    213

  • Download
    1

Embed Size (px)

Citation preview

Page 1: [IEEE 2007 IEEE 23rd International Conference on Data Engineering - Istanbul, Turkey (2006.04.15-2007.04.20)] 2007 IEEE 23rd International Conference on Data Engineering - Information

Information for People

Laura M. Haas, Steve B. Cousins

IBM Almaden Research Center

Laura at almaden.ibm.com, scousins at us.ibm.com

Abstract

Ordinary people have access to unprecedented

volumes of information today. Researchers in the fields of

information management (IM) and human-computer

interaction (HCI) are reacting to this challenge from their

own unique perspectives. Having access to a billion

records is cool, but having access to a billion people is

awesome. In this paper, we look at recent research from

both communities, and speculate on how interactions

between the communities could enhance the user

experience of information.

1. Introduction

The Information Management (IM) and Human

Computer Interaction (HCI) communities have

traditionally had different interests, and different claims

to fame. The IM community has worried about storing

vast volumes of data, and supporting the complex,

mission-critical manipulations of data needed by the

business world. Core concerns have included features

and functions, and of course, systems issues, such as

scalability (in both number of users and data volumes),

performance and robustness. The traditional motivations

were to simplify application development (by raising the

level of abstraction beyond bits and bytes), to keep

information safe (meeting the ACID test), and to meet the

needs of the business using the data. As a result, the IM

community’s core technologies include query languages

and query optimization, data models, support for

particular types of data, concurrency control, transaction

processing, security and other “systems” issues.

Meanwhile, the HCI community cares about people,

and, in particular, how to make it easier for people to use

computers. Their core concerns are centered around

learnability, usability, and accessibility. Since social and

physical issues impact how people work, they think about

supporting collaboration, creating interfaces that work for

people with disabilities, and leveraging human perceptual

abilities when designing user experience. Typical HCI

competencies are the optimization of interactions,

graphical user interfaces, social aspects of interaction,

design, user studies, visualization, multi-modal

interaction, end-user programming and ethnography.

These two communities rarely work together, and in

fact, hardly overlap. A recent call for papers for a major

database conference lists forty major topics, only one of

which is directly relevant to users1. The rejection rate for

user interface submissions to mainline database

conferences is legendary. At the major HCI conferences,

having a properly conducted user study is de rigeur, and

often the topics of interest to that community would not

attract mainstream information management practitioners.

In fact, the call for papers for an upcoming conference2

reads to an information management person like a

promotional piece about a self-help convention. Not only

are the interests of the two communities distinct, but their

styles are also completely different.

Both goals and styles keep the communities apart.

Where HCI researchers are concerned with making

applications more usable on the front end, IM researchers

work to make the applications easier to implement from

the back end. The applications themselves are often

implemented by a layer of people who come between the

two communities, making it even less likely that these

communities will interact without the help of

intermediaries.

Occasionally, both disciplines get interested in similar

problems, of course. When they do, the difference in the

emphases is instructive. In the late 1970’s, for example,

two very different (and yet, very similar) tools for

information storage and analysis were introduced to the

world. From the information management community

came the relational database. Out of the user-focused

community, the spreadsheet emerged. Both are based on

a tabular view of data. In both cases, the idea is credited

to a brilliant researcher, but one (Codd) worked for a

provider of information technology, while the other

(Mattessich) was a business school professor, interested

in helping a particular group of users (accountants).

While the information management folks focused on

retrieval and management for large amounts of data (how

to select just the data desired, and quickly retrieve it), the

inventors of the spreadsheet focused on simple

manipulation of information to accomplish a task. Of

course, the two technologies were later wed, so that

1http://www.vldb2007.org/callforpapers.html

2http://www.chi2007.org/

1-4244-0803-2/07/$20.00 ©2007 IEEE. 21

Page 2: [IEEE 2007 IEEE 23rd International Conference on Data Engineering - Istanbul, Turkey (2006.04.15-2007.04.20)] 2007 IEEE 23rd International Conference on Data Engineering - Information

spreadsheets can store information in relational databases,

and relational database users can use spreadsheets to

access their data.

Today, the world has shifted, and the emphasis is no

longer on building standalone applications which can then

be connected to monolithic databases. In fact,

information is likely to be spread over many sources,

including databases, files, and applications. Both

communities are addressing the problem of creating

applications over diverse and distributed sources. The

information management community, for example,

pioneered data federation technologies, while the user-

centric folks created “mashups”. Now, the IM community

is trying to create some of the solid systems advantages of

federation as an underpinning that makes mashups easier

to create and more robust [1].

2. Today’s Challenges

With the coming of Web 2.0, there is a shift from a

world in which there are very few information producers

relative to the number of consumers, to one in which

there is much more end-user created content. People are

more likely to be computer-literate and they are

interacting with information in new and unanticipated

ways., Contributing to advances in this new world

requires knowledge and skills from both the IM and HCI

communities. Challenges include, for example,

information quality (when data is increasingly produced

by random people, how trustworthy is it?), ubiquitous

access to data, and finding information in massive (and

widely distributed), user-created information collections.

Data in the hands of the people. Information

constructed by many people, even if not edited

beforehand, can be very valuable. In fact, it has been

argued that large groups of people are smarter than an

elite few [2]. In recent work in social computing, the HCI

community has been building tools to exploit this

phenomenon, such as delicious3 and ePinions4. But how

good is the information? What kind of tools and methods

will make it easier to author high quality information?

The HCI folks approach these problems from a social

engineering perspective, while the IM community has

work on such issues as tracking provenance and

answering queries in the presence of uncertainty.

Ubiquitous access. The push toward ubiquitous

computing has consumed many HCI cycles. Nowadays,

people take for granted the ability to be connected almost

anywhere, anytime. And that doesn’t always mean from a

laptop. A lot of the techniques that permit text to be

entered on small devices, from a phone keypad to stylus

3http://del.icio.us

4http://www.epinions.com

input to thumb keyboards, have been published and

evaluated in the HCI literature. Similarly, HCI folks are

interested in how various sensors can be deployed in

ways that preserve privacy. Recent work has focused on

instrumenting homes of the elderly in order to help them

remain self-sufficient for as long as possible, while

reducing the risk that a fall or other medical emergency

goes undetected. The IM community, meanwhile, has

looked at caching information on mobile devices, at p2p

and networked data management, and at other distributed

information management issues such as processing

queries over streams of sensor data.

Finding a needle in a (growing) haystack. The

amount of data available online is huge, and growing

constantly. Much of it is produced by people,

unstructured, and with insufficient or inaccurate metadata.

As a result, human input is often needed today to interpret

or filter information. For example, how can a user search

a large corpus of photographs? An IM researcher would

think about how to store and then index them, what

language to use to query them, how to make the metadata

searchable, and how to let people search for photographs

across multiple repositories. He or she might even borrow

from the machine learning community, and think about

how we can categorize the photographs automatically

(i.e., generate metadata). An HCI researcher, on the other

hand, might first ask how to enlist people to label the

pictures. In one clever approach, a video game was

created where people are shown a picture and try to guess

the label [3]. Since it is a large distributed system, many

people are looking at the picture at once. When they

agree, they win, and so does the system because now it

has a label for the photograph!

3. Separate but equal?

Despite these shared challenges, the IM and HCI

communities still work largely separately. The different

values, languages and literatures make cooperation

difficult. Yet in several of the above examples, the work

of the two communities is complementary, and a full

solution to the challenge would need both parts. The

individual communities are doing good work, which

should be exposed to a broader audience.

For example, the IM community has produced a

number of systems that help to collect, analyze or present

particular types of information for specific user groups.

One such system is IBM’s WebFountain [4], which does

a tailored crawl of the Web to find information along a

given theme, then analyzes it, categorizes it, cross-

references it, and exposes it to privileged users via a

powerful (if somewhat arcane) query language. John

Battelle [5] wishes for the power of WebFountain – for

the masses. More recently, DBLife [6] trawls the Web for

1-4244-0803-2/07/$20.00 ©2007 IEEE. 22

Page 3: [IEEE 2007 IEEE 23rd International Conference on Data Engineering - Istanbul, Turkey (2006.04.15-2007.04.20)] 2007 IEEE 23rd International Conference on Data Engineering - Information

information of interest to database researchers, then uses

text analytics, entity resolution, and a model of what

types of events (e.g., talks, paper acceptances) database

researchers are interested in to create a portal for the IM

community. Other examples of IM work that should be

of interest to the HCI community include work on tools

for database administration or schema mapping, text

analytics, privacy and de-identification, to name a few.

Likewise, the HCI community has a sizable literature

on topics of interest to the IM world, starting, of course,

with papers on information visualization [7]. For

example, Treemaps and related techniques provide

mechanisms for visualizing hierarchically-structured

information [8], which should be important when so

much attention in the IM community is focused on XML.

Domain-specific interactive visualizations can be

particularly powerful, a good example being the Baby

Name Visualizer [9]. Beyond visualization, work on

animated interaction [10], activity-centric computing [11],

and interacting with information on small devices [12]

should be read and discussed by IM researchers.

Designers of information systems should know the HCI

literature on design for people with disabilities [13],

usability methods [14], and end-user programming [15].

The HCI community, like the IM community, is also

concerned with supporting particular communities of

users (though they are more interested in social

interaction), and has also done work on privacy, not to

mention applications of concern to both communities,

such as healthcare informatics.

Interestingly, both communities are starting to

cooperate more closely with machine learning (ML)

researchers. IM has interacted with ML sporadically over

the years, on such challenges as data mining (pattern

recognition), matching, and discovery. Likewise, HCI

has worked with ML on end-user programming and

intelligent user interfaces. What powerful new systems

and tools could come from pooling all of these diverse

talents?

4. Cooperation Needed?

Arguably, both IM and HCI are fast followers, rather

than innovators. Academic disciplines are much better at

focusing on problems than identifying new opportunities.

Innovation often comes from outsiders, who have

understood the central ideas of the discipline, and use

those ideas as a component of a new solution.

Consider the World-Wide Web, which is now a theme

of any conference on IM or HCI. Clearly it drew on ideas

from hypertext, information retrieval, database, HCI and

others, yet none of those communities can claim to have

given birth to the Web.

In fact, the Web came from the physics community.

Tim Berners-Lee built it to solve a problem in his

community, using simplified techniques from those other

communities. The hypertext community scoffed at the

World-Wide Web: it didn’t even have bi-directional links,

much less links as first-class objects, so how could it be

truly hypertext? The database community was similarly

disinterested. (This wouldn’t be so much fun to talk about

if it had only happened once….)

Neither community was interested in wikis when they

were invented. Content management systems already

existed. Wikis were just a new application. Why should

the IM community care? Likewise, the HCI community

was unimpressed. Where is the new interaction? Aren’t

wikis a step backward from true WYSIWYG editing?

Yet today, wikis have proven to be an important way that

people interact with information, and point to interesting

social phenomena for further study, as well as new

challenges for connecting, navigating and searching for

information.

MySQL and other light-weight databases were much

more interesting for what they left out than for any new

features they added. This is a classic “innovator’s

dilemma” situation: while the established community

attacks the “real” problems (scalability, query

optimization), a lightweight contender attacks a different

market (small, uninteresting databases behind websites)

but continues to improve until it ultimately commoditizes

the original community’s technology and business [16].

But when members of different communities go out of

their way to get to know another community amazing

results are possible. In 1994, the Stanford Digital Library

Initiative started as a collaboration between the database

group, the HCI group, and the AI group. One professor

from each discipline was a PI on the project, because the

funders were intent on breaking down silos. The link

between database and HCI was particularly fruitful, and

produced a number of theses, some from each area [17].

The sub-project from the Digital Library Initiative that

had the biggest impact didn’t result in a thesis, and didn’t

quite fit with the theme of the larger initiative. In fact, it

wasn’t even viewed as the deepest research, since after

all, the world already had five successful search engine

companies, so what could a new academic initiative

possibly add? But the tight pairing of one student from

HCI and one database student led to Google.

5. Time to Break Down the Silos?

The future, whether of the Web or of the enterprise,

requires us to join forces. A joint community would be

able to make a stronger contribution to today’s challenges

than the individual communities can alone.

One of the most pressing problems in today’s world is

information overload. Information flows at us from all

sides. The volume of email alone has grown to the point

1-4244-0803-2/07/$20.00 ©2007 IEEE. 23

Page 4: [IEEE 2007 IEEE 23rd International Conference on Data Engineering - Istanbul, Turkey (2006.04.15-2007.04.20)] 2007 IEEE 23rd International Conference on Data Engineering - Information

that high-school students view email as something for old

people (email = junk mail). The growth of digital storage

has been growing faster than Moore’s law. Instant

messages are being captured and retained, along with

voice mail, news feeds, stock price trends, and so on.

Thanks to the Web, individuals have access to more

information at home than they had at the best academic

libraries just a decade ago. How can they get any value

out of all this information?

Information overload is a problem that requires skills

from both IM and HCI (and probably other disciplines as

well) to solve. Rather than chipping away at this problem

from separate silos, researchers should form a new

community to attack it. Successful HCI researchers in

this new field will learn to embrace research approaches

and results from the IM field. Successful IM researchers

will become more “touchy-feely.” Not a proposition for

the faint of heart, but a combined attack may get to a

solution more quickly than if either community attacks

the problem alone.

A shared community could help accelerate solutions to

any number of problems. Everyone could benefit from

better decision support for individuals. Which long term

care insurance is best, if any? Which is the best car for a

teenage son to drive? These types of questions require

not only finding relevant information, but presenting it in

a way that allows users to understand and act on it

quickly. Rapid deployment in response to crises similarly

creates a need for appropriate information, but

additionally requires coordinating multiple parties, with

their own information sources, processes and tools. Even

more exciting is the opportunity to reduce death from

medical errors dramatically if the communities work

together. The right information, presented in the right

way, on the right device, at the optimal time, could

literally save lives. There are issues here to keep both

communities, as well as researchers in medical

informatics, gainfully employed for many years to come.

We are testing the value of cross-community

collaboration in a number of projects at Almaden. For

example, Avatar [18] provides semantic search over text.

The text is run past a set of annotators that label portions

of the text with the concepts they represent, e.g.,

“person’s phone number”. But where do the annotators

come from? People build them, so we are leveraging our

HCI colleagues’ skills not only to make the tools for

building annotators user-friendly, but also to apply social

tagging principles to the building of annotators, so that

they can be spread virally.

Indeed, whole new sciences may be born of multi-

disciplinary research. For example, services science is

emerging through the interaction of computer scientists,

mathematicians, and economists [19]. But computer

science is broad enough that intra-disciplinary work is

needed. We believe it is time for a new discipline within

computer science, a new “information interaction”

community. That community should jointly pursue the

information-intensive challenges that are increasingly

facing people today.

6. Acknowledgements Thanks to Eser Kandogan and Kevin Beyer for their

thoughtful comments on this manuscript.

7. Bibliography

[1] A. Jhingran, “Enterprise Information Mashups: Integrating

Information, Simply”, Proc. VLDB, Seoul, Korea, September

2006, pp. 3-4.

[2] J. Surowiecki, The Wisdom of Crowds, Random House,

2004.

[3] Luis von Ahn and Laura Dabbish. Labeling Images with a

Computer Game. In ACM Conference on Human

Factors in Computing Systems, CHI 2004. Pages 319-326. Try

it at http://www.espgame.org/

[4] D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A.

Tomkins, and J. Zien, “How to build a WebFountain: An

architecture for very large-scale text analytics”, IBM Systems

Journal, (43):1, 2004.

[5] http://battellemedia.com/archives/000428.php

[6] A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R.

McCann, M. Sayyadian, and W. Shen. IEEE Data Engineering

Bulletin, Special Issue on Probabilistic Databases, 29(1), 2006.

or try it at http://dblife.cs.wisc.edu/

[7] S.K. Card, J Mackinlay and B. Shneiderman, Readings in

Information Visualization: Using Vision to Think, Morgan

Kaufman, 1999.

[8] http://www.cs.umd.edu/hcil/treemap-history/index.shtml

[9] http://babynamewizard.com/namevoyager/lnv0105.html

[10] B.-W. Chang, D. Ungar: “Animation: From Cartoons to the

User Interface.” ACM Symposium on User Interface Software

and Technology 1993: 45-55.

[11] T.P. Moran, A. Cozzi, S.P. Farrell. “Unified activity

management: supporting people in e-business.”

Communications of the ACM, (48), 2005. pp. 67-70.

[12] Trevor, J., Hilbert, D. M., Schilit, B. N., and Koh, T. K.

2001. From Desktop to Phonetop: a UI for Web Interaction on

Very Small Devices. In Proceedings of the 14th Annual ACM

Symposium on User interface Software and Technology

(Orlando, Florida, November 11 - 14, 2001). UIST '01. ACM

Press, New York, NY, 121-130.

[13] ACM Transactions on Accessible Computing,

http://www.is.umbc.edu/taccess/index.html

[14] J. Tidwell, Designing Interfaces, O’Reilly Media, 2005.

[15] H. Lieberman, F. Paternò, and V. Wulf, End-User

Development, Springer, 2006.

[16] C. M. Christensen and M. E. Raynor The Innovator’s

Solution: Creating and Sustaining Successful Growth, Harvard

Business School Press, 2003.

[17] A. Paepcke, S. B. Cousins, H. Garcia-Molina, S. W.

Hassan, S. P. Ketchpel, M. Roscheisen, and T. Winograd.

"Using Distributed Objects for Digital Library Interoperability."

Computer 29 (May 1996): 61-68

1-4244-0803-2/07/$20.00 ©2007 IEEE. 24

Page 5: [IEEE 2007 IEEE 23rd International Conference on Data Engineering - Istanbul, Turkey (2006.04.15-2007.04.20)] 2007 IEEE 23rd International Conference on Data Engineering - Information

[18] T.S. Jayram, R. Krishnamurthy, S. Raghavan,

S.Vaithyanathan and H.Zhu Avatar Information Extraction

System, IEEE Data Engineering Bulletin, 2006

[19] Paul P. Maglio, Savitha Srinivasan, Jeffrey T. Kreulen, and

Jim Spohrer, “Service Systems, Service Scientists, SSME, and

Innovation”, Communications of the ACM, (49), 2007, pp. 81-

85.

1-4244-0803-2/07/$20.00 ©2007 IEEE. 25