Upload
steve-b
View
213
Download
1
Embed Size (px)
Citation preview
Information for People
Laura M. Haas, Steve B. Cousins
IBM Almaden Research Center
Laura at almaden.ibm.com, scousins at us.ibm.com
Abstract
Ordinary people have access to unprecedented
volumes of information today. Researchers in the fields of
information management (IM) and human-computer
interaction (HCI) are reacting to this challenge from their
own unique perspectives. Having access to a billion
records is cool, but having access to a billion people is
awesome. In this paper, we look at recent research from
both communities, and speculate on how interactions
between the communities could enhance the user
experience of information.
1. Introduction
The Information Management (IM) and Human
Computer Interaction (HCI) communities have
traditionally had different interests, and different claims
to fame. The IM community has worried about storing
vast volumes of data, and supporting the complex,
mission-critical manipulations of data needed by the
business world. Core concerns have included features
and functions, and of course, systems issues, such as
scalability (in both number of users and data volumes),
performance and robustness. The traditional motivations
were to simplify application development (by raising the
level of abstraction beyond bits and bytes), to keep
information safe (meeting the ACID test), and to meet the
needs of the business using the data. As a result, the IM
community’s core technologies include query languages
and query optimization, data models, support for
particular types of data, concurrency control, transaction
processing, security and other “systems” issues.
Meanwhile, the HCI community cares about people,
and, in particular, how to make it easier for people to use
computers. Their core concerns are centered around
learnability, usability, and accessibility. Since social and
physical issues impact how people work, they think about
supporting collaboration, creating interfaces that work for
people with disabilities, and leveraging human perceptual
abilities when designing user experience. Typical HCI
competencies are the optimization of interactions,
graphical user interfaces, social aspects of interaction,
design, user studies, visualization, multi-modal
interaction, end-user programming and ethnography.
These two communities rarely work together, and in
fact, hardly overlap. A recent call for papers for a major
database conference lists forty major topics, only one of
which is directly relevant to users1. The rejection rate for
user interface submissions to mainline database
conferences is legendary. At the major HCI conferences,
having a properly conducted user study is de rigeur, and
often the topics of interest to that community would not
attract mainstream information management practitioners.
In fact, the call for papers for an upcoming conference2
reads to an information management person like a
promotional piece about a self-help convention. Not only
are the interests of the two communities distinct, but their
styles are also completely different.
Both goals and styles keep the communities apart.
Where HCI researchers are concerned with making
applications more usable on the front end, IM researchers
work to make the applications easier to implement from
the back end. The applications themselves are often
implemented by a layer of people who come between the
two communities, making it even less likely that these
communities will interact without the help of
intermediaries.
Occasionally, both disciplines get interested in similar
problems, of course. When they do, the difference in the
emphases is instructive. In the late 1970’s, for example,
two very different (and yet, very similar) tools for
information storage and analysis were introduced to the
world. From the information management community
came the relational database. Out of the user-focused
community, the spreadsheet emerged. Both are based on
a tabular view of data. In both cases, the idea is credited
to a brilliant researcher, but one (Codd) worked for a
provider of information technology, while the other
(Mattessich) was a business school professor, interested
in helping a particular group of users (accountants).
While the information management folks focused on
retrieval and management for large amounts of data (how
to select just the data desired, and quickly retrieve it), the
inventors of the spreadsheet focused on simple
manipulation of information to accomplish a task. Of
course, the two technologies were later wed, so that
1http://www.vldb2007.org/callforpapers.html
2http://www.chi2007.org/
1-4244-0803-2/07/$20.00 ©2007 IEEE. 21
spreadsheets can store information in relational databases,
and relational database users can use spreadsheets to
access their data.
Today, the world has shifted, and the emphasis is no
longer on building standalone applications which can then
be connected to monolithic databases. In fact,
information is likely to be spread over many sources,
including databases, files, and applications. Both
communities are addressing the problem of creating
applications over diverse and distributed sources. The
information management community, for example,
pioneered data federation technologies, while the user-
centric folks created “mashups”. Now, the IM community
is trying to create some of the solid systems advantages of
federation as an underpinning that makes mashups easier
to create and more robust [1].
2. Today’s Challenges
With the coming of Web 2.0, there is a shift from a
world in which there are very few information producers
relative to the number of consumers, to one in which
there is much more end-user created content. People are
more likely to be computer-literate and they are
interacting with information in new and unanticipated
ways., Contributing to advances in this new world
requires knowledge and skills from both the IM and HCI
communities. Challenges include, for example,
information quality (when data is increasingly produced
by random people, how trustworthy is it?), ubiquitous
access to data, and finding information in massive (and
widely distributed), user-created information collections.
Data in the hands of the people. Information
constructed by many people, even if not edited
beforehand, can be very valuable. In fact, it has been
argued that large groups of people are smarter than an
elite few [2]. In recent work in social computing, the HCI
community has been building tools to exploit this
phenomenon, such as delicious3 and ePinions4. But how
good is the information? What kind of tools and methods
will make it easier to author high quality information?
The HCI folks approach these problems from a social
engineering perspective, while the IM community has
work on such issues as tracking provenance and
answering queries in the presence of uncertainty.
Ubiquitous access. The push toward ubiquitous
computing has consumed many HCI cycles. Nowadays,
people take for granted the ability to be connected almost
anywhere, anytime. And that doesn’t always mean from a
laptop. A lot of the techniques that permit text to be
entered on small devices, from a phone keypad to stylus
3http://del.icio.us
4http://www.epinions.com
input to thumb keyboards, have been published and
evaluated in the HCI literature. Similarly, HCI folks are
interested in how various sensors can be deployed in
ways that preserve privacy. Recent work has focused on
instrumenting homes of the elderly in order to help them
remain self-sufficient for as long as possible, while
reducing the risk that a fall or other medical emergency
goes undetected. The IM community, meanwhile, has
looked at caching information on mobile devices, at p2p
and networked data management, and at other distributed
information management issues such as processing
queries over streams of sensor data.
Finding a needle in a (growing) haystack. The
amount of data available online is huge, and growing
constantly. Much of it is produced by people,
unstructured, and with insufficient or inaccurate metadata.
As a result, human input is often needed today to interpret
or filter information. For example, how can a user search
a large corpus of photographs? An IM researcher would
think about how to store and then index them, what
language to use to query them, how to make the metadata
searchable, and how to let people search for photographs
across multiple repositories. He or she might even borrow
from the machine learning community, and think about
how we can categorize the photographs automatically
(i.e., generate metadata). An HCI researcher, on the other
hand, might first ask how to enlist people to label the
pictures. In one clever approach, a video game was
created where people are shown a picture and try to guess
the label [3]. Since it is a large distributed system, many
people are looking at the picture at once. When they
agree, they win, and so does the system because now it
has a label for the photograph!
3. Separate but equal?
Despite these shared challenges, the IM and HCI
communities still work largely separately. The different
values, languages and literatures make cooperation
difficult. Yet in several of the above examples, the work
of the two communities is complementary, and a full
solution to the challenge would need both parts. The
individual communities are doing good work, which
should be exposed to a broader audience.
For example, the IM community has produced a
number of systems that help to collect, analyze or present
particular types of information for specific user groups.
One such system is IBM’s WebFountain [4], which does
a tailored crawl of the Web to find information along a
given theme, then analyzes it, categorizes it, cross-
references it, and exposes it to privileged users via a
powerful (if somewhat arcane) query language. John
Battelle [5] wishes for the power of WebFountain – for
the masses. More recently, DBLife [6] trawls the Web for
1-4244-0803-2/07/$20.00 ©2007 IEEE. 22
information of interest to database researchers, then uses
text analytics, entity resolution, and a model of what
types of events (e.g., talks, paper acceptances) database
researchers are interested in to create a portal for the IM
community. Other examples of IM work that should be
of interest to the HCI community include work on tools
for database administration or schema mapping, text
analytics, privacy and de-identification, to name a few.
Likewise, the HCI community has a sizable literature
on topics of interest to the IM world, starting, of course,
with papers on information visualization [7]. For
example, Treemaps and related techniques provide
mechanisms for visualizing hierarchically-structured
information [8], which should be important when so
much attention in the IM community is focused on XML.
Domain-specific interactive visualizations can be
particularly powerful, a good example being the Baby
Name Visualizer [9]. Beyond visualization, work on
animated interaction [10], activity-centric computing [11],
and interacting with information on small devices [12]
should be read and discussed by IM researchers.
Designers of information systems should know the HCI
literature on design for people with disabilities [13],
usability methods [14], and end-user programming [15].
The HCI community, like the IM community, is also
concerned with supporting particular communities of
users (though they are more interested in social
interaction), and has also done work on privacy, not to
mention applications of concern to both communities,
such as healthcare informatics.
Interestingly, both communities are starting to
cooperate more closely with machine learning (ML)
researchers. IM has interacted with ML sporadically over
the years, on such challenges as data mining (pattern
recognition), matching, and discovery. Likewise, HCI
has worked with ML on end-user programming and
intelligent user interfaces. What powerful new systems
and tools could come from pooling all of these diverse
talents?
4. Cooperation Needed?
Arguably, both IM and HCI are fast followers, rather
than innovators. Academic disciplines are much better at
focusing on problems than identifying new opportunities.
Innovation often comes from outsiders, who have
understood the central ideas of the discipline, and use
those ideas as a component of a new solution.
Consider the World-Wide Web, which is now a theme
of any conference on IM or HCI. Clearly it drew on ideas
from hypertext, information retrieval, database, HCI and
others, yet none of those communities can claim to have
given birth to the Web.
In fact, the Web came from the physics community.
Tim Berners-Lee built it to solve a problem in his
community, using simplified techniques from those other
communities. The hypertext community scoffed at the
World-Wide Web: it didn’t even have bi-directional links,
much less links as first-class objects, so how could it be
truly hypertext? The database community was similarly
disinterested. (This wouldn’t be so much fun to talk about
if it had only happened once….)
Neither community was interested in wikis when they
were invented. Content management systems already
existed. Wikis were just a new application. Why should
the IM community care? Likewise, the HCI community
was unimpressed. Where is the new interaction? Aren’t
wikis a step backward from true WYSIWYG editing?
Yet today, wikis have proven to be an important way that
people interact with information, and point to interesting
social phenomena for further study, as well as new
challenges for connecting, navigating and searching for
information.
MySQL and other light-weight databases were much
more interesting for what they left out than for any new
features they added. This is a classic “innovator’s
dilemma” situation: while the established community
attacks the “real” problems (scalability, query
optimization), a lightweight contender attacks a different
market (small, uninteresting databases behind websites)
but continues to improve until it ultimately commoditizes
the original community’s technology and business [16].
But when members of different communities go out of
their way to get to know another community amazing
results are possible. In 1994, the Stanford Digital Library
Initiative started as a collaboration between the database
group, the HCI group, and the AI group. One professor
from each discipline was a PI on the project, because the
funders were intent on breaking down silos. The link
between database and HCI was particularly fruitful, and
produced a number of theses, some from each area [17].
The sub-project from the Digital Library Initiative that
had the biggest impact didn’t result in a thesis, and didn’t
quite fit with the theme of the larger initiative. In fact, it
wasn’t even viewed as the deepest research, since after
all, the world already had five successful search engine
companies, so what could a new academic initiative
possibly add? But the tight pairing of one student from
HCI and one database student led to Google.
5. Time to Break Down the Silos?
The future, whether of the Web or of the enterprise,
requires us to join forces. A joint community would be
able to make a stronger contribution to today’s challenges
than the individual communities can alone.
One of the most pressing problems in today’s world is
information overload. Information flows at us from all
sides. The volume of email alone has grown to the point
1-4244-0803-2/07/$20.00 ©2007 IEEE. 23
that high-school students view email as something for old
people (email = junk mail). The growth of digital storage
has been growing faster than Moore’s law. Instant
messages are being captured and retained, along with
voice mail, news feeds, stock price trends, and so on.
Thanks to the Web, individuals have access to more
information at home than they had at the best academic
libraries just a decade ago. How can they get any value
out of all this information?
Information overload is a problem that requires skills
from both IM and HCI (and probably other disciplines as
well) to solve. Rather than chipping away at this problem
from separate silos, researchers should form a new
community to attack it. Successful HCI researchers in
this new field will learn to embrace research approaches
and results from the IM field. Successful IM researchers
will become more “touchy-feely.” Not a proposition for
the faint of heart, but a combined attack may get to a
solution more quickly than if either community attacks
the problem alone.
A shared community could help accelerate solutions to
any number of problems. Everyone could benefit from
better decision support for individuals. Which long term
care insurance is best, if any? Which is the best car for a
teenage son to drive? These types of questions require
not only finding relevant information, but presenting it in
a way that allows users to understand and act on it
quickly. Rapid deployment in response to crises similarly
creates a need for appropriate information, but
additionally requires coordinating multiple parties, with
their own information sources, processes and tools. Even
more exciting is the opportunity to reduce death from
medical errors dramatically if the communities work
together. The right information, presented in the right
way, on the right device, at the optimal time, could
literally save lives. There are issues here to keep both
communities, as well as researchers in medical
informatics, gainfully employed for many years to come.
We are testing the value of cross-community
collaboration in a number of projects at Almaden. For
example, Avatar [18] provides semantic search over text.
The text is run past a set of annotators that label portions
of the text with the concepts they represent, e.g.,
“person’s phone number”. But where do the annotators
come from? People build them, so we are leveraging our
HCI colleagues’ skills not only to make the tools for
building annotators user-friendly, but also to apply social
tagging principles to the building of annotators, so that
they can be spread virally.
Indeed, whole new sciences may be born of multi-
disciplinary research. For example, services science is
emerging through the interaction of computer scientists,
mathematicians, and economists [19]. But computer
science is broad enough that intra-disciplinary work is
needed. We believe it is time for a new discipline within
computer science, a new “information interaction”
community. That community should jointly pursue the
information-intensive challenges that are increasingly
facing people today.
6. Acknowledgements Thanks to Eser Kandogan and Kevin Beyer for their
thoughtful comments on this manuscript.
7. Bibliography
[1] A. Jhingran, “Enterprise Information Mashups: Integrating
Information, Simply”, Proc. VLDB, Seoul, Korea, September
2006, pp. 3-4.
[2] J. Surowiecki, The Wisdom of Crowds, Random House,
2004.
[3] Luis von Ahn and Laura Dabbish. Labeling Images with a
Computer Game. In ACM Conference on Human
Factors in Computing Systems, CHI 2004. Pages 319-326. Try
it at http://www.espgame.org/
[4] D. Gruhl, L. Chavet, D. Gibson, J. Meyer, P. Pattanayak, A.
Tomkins, and J. Zien, “How to build a WebFountain: An
architecture for very large-scale text analytics”, IBM Systems
Journal, (43):1, 2004.
[5] http://battellemedia.com/archives/000428.php
[6] A. Doan, R. Ramakrishnan, F. Chen, P. DeRose, Y. Lee, R.
McCann, M. Sayyadian, and W. Shen. IEEE Data Engineering
Bulletin, Special Issue on Probabilistic Databases, 29(1), 2006.
or try it at http://dblife.cs.wisc.edu/
[7] S.K. Card, J Mackinlay and B. Shneiderman, Readings in
Information Visualization: Using Vision to Think, Morgan
Kaufman, 1999.
[8] http://www.cs.umd.edu/hcil/treemap-history/index.shtml
[9] http://babynamewizard.com/namevoyager/lnv0105.html
[10] B.-W. Chang, D. Ungar: “Animation: From Cartoons to the
User Interface.” ACM Symposium on User Interface Software
and Technology 1993: 45-55.
[11] T.P. Moran, A. Cozzi, S.P. Farrell. “Unified activity
management: supporting people in e-business.”
Communications of the ACM, (48), 2005. pp. 67-70.
[12] Trevor, J., Hilbert, D. M., Schilit, B. N., and Koh, T. K.
2001. From Desktop to Phonetop: a UI for Web Interaction on
Very Small Devices. In Proceedings of the 14th Annual ACM
Symposium on User interface Software and Technology
(Orlando, Florida, November 11 - 14, 2001). UIST '01. ACM
Press, New York, NY, 121-130.
[13] ACM Transactions on Accessible Computing,
http://www.is.umbc.edu/taccess/index.html
[14] J. Tidwell, Designing Interfaces, O’Reilly Media, 2005.
[15] H. Lieberman, F. Paternò, and V. Wulf, End-User
Development, Springer, 2006.
[16] C. M. Christensen and M. E. Raynor The Innovator’s
Solution: Creating and Sustaining Successful Growth, Harvard
Business School Press, 2003.
[17] A. Paepcke, S. B. Cousins, H. Garcia-Molina, S. W.
Hassan, S. P. Ketchpel, M. Roscheisen, and T. Winograd.
"Using Distributed Objects for Digital Library Interoperability."
Computer 29 (May 1996): 61-68
1-4244-0803-2/07/$20.00 ©2007 IEEE. 24
[18] T.S. Jayram, R. Krishnamurthy, S. Raghavan,
S.Vaithyanathan and H.Zhu Avatar Information Extraction
System, IEEE Data Engineering Bulletin, 2006
[19] Paul P. Maglio, Savitha Srinivasan, Jeffrey T. Kreulen, and
Jim Spohrer, “Service Systems, Service Scientists, SSME, and
Innovation”, Communications of the ACM, (49), 2007, pp. 81-
85.
1-4244-0803-2/07/$20.00 ©2007 IEEE. 25