52
© Tefko Saracevic 1 Information Science 2005 Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick, New Jersey USA http:// www.scils.rutgers.edu/~tefko

© Tefko Saracevic 1 Information Science 2005 Tefko Saracevic, PhD School of Communication, Information and Library Studies Rutgers University New Brunswick,

  • View
    225

  • Download
    4

Embed Size (px)

Citation preview

© Tefko Saracevic 1

Information Science 2005

Tefko Saracevic, PhDSchool of Communication, Information and Library StudiesRutgers UniversityNew Brunswick, New Jersey USA

http://www.scils.rutgers.edu/~tefko

© Tefko Saracevic 2

Information science: a short definition

“the science dealing with the efficient collection, storage, and

retrieval of information”

Webster

© Tefko Saracevic 3

Organization of presentation

1. Big picture – problems, solutions, social place2. Structure – main areas in research & practice3. Technology – information retrieval – largest part4. Information – representation; bibliometrics5. People – users, use, seeking, context6. Paradigm shift – distancing of areas7. Digital libraries – whose are they anyhow?8. Conclusions – big questions for the future

© Tefko Saracevic 4

Scope

α Evolution and state of the field in the last decade of the old and first decade of the new century

© Tefko Saracevic 5

1. The big picture

Problems addressed

α Bit of history: Vannevar Bush (1945):β Defined problem as “... the massive task of making more accessible of a bewildering store of knowledge.”

β Problem still with us & growing

[to Table of Content]

© Tefko Saracevic 6

… solution

α Bush suggested a machine: “Memex ... association of ideas ... duplicate mental processes artificially.”

α Technological fix to problemα Still with us: technological determinant

© Tefko Saracevic 7

At the base of information science:Problem

Trying to control content inα Information explosionβ exponential growth of information artifacts, if not of information itself

PLUS todayα Communication explosionβ exponential growth of means and ways by which information is communicated, transmitted, accesses, used

© Tefko Saracevic 8

technological solution, BUT …

applying technology to solving problems of effective use of information

BUT:from a

HUMAN & SOCIALand not only TECHNOLOGICAL perspective

© Tefko Saracevic 9

or a symbolic model

Information

Technology

People

© Tefko Saracevic 10

Problems & solutions: SOCIAL CONTEXT

α Professional practice AND scientific inquiry related to:Effective communication of knowledge records - ‘literature’ - among humans in the context of social, organizational, & individual need for and use of information

α Taking advantage of modern information technology

© Tefko Saracevic 11

or as White & McCaine put it:

“modeling the world of publications with a practical goal of being able to deliver their content to inquirers [users] on demand.”

© Tefko Saracevic 12

Elaboration

α Knowledge records = texts, sounds, images, multimedia, web ... ‘literature’ in given domainsβ content-bearing structures – central to information science

α Communication = human-computer-literature interfaceβ study of information science is the interface between people & literatures

α Information need, seeking, and use = reason d'être

α Effectiveness = relevance, utility

© Tefko Saracevic 13

General characteristics

α Interdisciplinarity - relations with a number of fields, some more or less predominant

α Technological imperative - driving force, as in many modern fields

α Information society - social context and role in evolution - shared with many fields

© Tefko Saracevic 14

2. Structure

Composition of the field

α As many fields, information science has different areas of concentration & specialization

α They change, evolve over timeβ grow closer, grow apartβ ignore each other, less or more

[to Table of Content]

© Tefko Saracevic 15

most importantly different areas…

α receive more or less in funding & emphasisβ producing great imbalances in work & progress

β attracting different audiences & fields

α this includes β vastly different levels of support for research and

β huge commercial investments & applications

© Tefko Saracevic 16

How to view structure?by decomposing areas & efforts in research & practice emphasizing

Technology

Information

or

People

or

© Tefko Saracevic 17

α Identified with information retrieval (IR)β by far biggest effort and investmentβ international & global β commercial interest large & growing

[to Table of Content]

Part 3.

Technology

© Tefko Saracevic 18

Information Retrieval – definition & objective

“ IR: ... intellectual aspects of description of information, ... search, ... & systems, machines...”

Calvin Mooers, 1951

α How to provide users with relevant information effectively?

For that objective:1. How to organize information intellectually?2. How to specify the search & interaction intellectually?

3. What techniques & systems to use effectively?

© Tefko Saracevic 19

Streams in IR Res. & Dev.

1. Information science:β Services, users, use; β Human-computer interaction;β Cognitive aspects

2. Computer science:β Algorithms, techniquesβ Systems aspects

3. Information industry:β Products, services, Webβ Market aspects

α Problem:β relative isolation – discussed later

© Tefko Saracevic 20

Contemporary IR research α Now mostly done within computer science β e.g Special Interest Group on IR, Association for Computing Machinery (SIGIR,ACM)

α Spread globally β e.g. major IR research communities emerged in China, Korea, Singapore

α Branched outside of information science - “everybody does information retrieval”β data mining, machine learning, natural language processing, artificial intelligence, computer graphics …

© Tefko Saracevic 21

Text REtrieval Conference (TREC)

α Started in 1992, now probably endingβ “support research within the IR community by providing the infrastructure necessary for large-scale evaluation”

α Methodsβ provides large test beds, queries, relevance judgments, comparative analyses

β essentially using Cranfield 1960’s methodologyβ organized around tracks

γ various topics – changing over years

© Tefko Saracevic 22

TREC impact

α International – big impact on creating research communities

α Annual conferences β report. exchange results, foster cooperation

α Resultsβ mostly in reports, available at http://trec.nist.gov/

β overviews provided as wellβ but, only a fraction published in journals or books

© Tefko Saracevic 23

TREC tracks 2004103 groups from 21 countries

α Genomics with 4 sub tracks α HARD (High Accuracy

Retrieval from Documents)

α Novelty (new, nonredundant information)

α Question answeringα Robust (improving poorly

performing topics)

α Terabyte (very large collections)

α Web track

α Previous tracks:β ad-hoc (1992-1999)β routing (92–97)β interactive (94-02)β filtering (95-02)β cross language (97-02)β speech (97-00)β Spanish (94-96)β video (00-01)β Chinese (96-97)β query (98-00)β and a few more run for

two years only

© Tefko Saracevic 24

Broadening of IR – ever changing, ever new areas added

α Cross language IR (CLIR)α Natural language processing (NLP IR)α Music IR (MIR)α Image, video, multimedia retrievalα Spoken language retrievalα IR for bioinformatics and genomicsα Summarization; text extractionα Question answeringα Many human-computer interactionsα XML IRα Web IR; Web search enginesα DB and IR integration – structured and unstructured data

© Tefko Saracevic 25

Commercial IR

α Search engines based on IRα But added many elaborations & significant innovationsβ dealing with HUGE numbers of pages fastβ countering spamming & page rank games – adversarial IRγ never ending combat of algorithms

α Spread & impact worldwideβ about 2000 engines in over 160 countriesβ English was dominant, but not any more

© Tefko Saracevic 26

Commercial IR: brave new world

α Large investments & economic sectorβ hope for big profits, as yet questionable

α Leading to proprietary, secret IRβ also aggressive hiring of best talentβ new commercial research centers in different countries (e.g. MS in China)

α Academic research funding is changingβ brain drain from academe

© Tefko Saracevic 27

IR successfully effected:

α Emergence & growth of the INFORMATION INDUSTRY

α Evolution of IS as a PROFESSION & SCIENCE

α Many APPLICATIONS in many fieldsβ including on the Web – search engines

α Improvements in HUMAN - COMPUTER INTERACTION

α Evolution of INTEDISCIPLINARITY

IR has a long, proud history

© Tefko Saracevic 28

Part 4.

Informationα Several areas of investigation;

β as basic phenomenon – not much progressγ measures as Shannon's not successfulγ concentrated on manifestations and effects

β information representationγ large area connected with IR, librarianshipγ metadata

β bibliometricsγ structures of literature

Covered in separate lecture: What_is_information.ppt

[to Table of Content]

© Tefko Saracevic 29

Part 5.

Peopleα Professional services

β in organization – moving toward knowledge management, competitive intelligence

β in industry – vendors, aggregators, Internet,

α Researchβ user & use studiesβ interaction studiesβ broadening to information seeking studies, social context, collaboration

β relevance studiesβ social informatics

[to Table of Content]

© Tefko Saracevic 30

User & use studies

α Oldest areaβ covers many topics, methods, orientations

β many studies related to IRγ e.g. searching, multitasking, browsing, navigation

α Branching into Web use studiesβ quantitative & qualitative studiesβ emergence of webmetrics

© Tefko Saracevic 31

Interaction

α Traditional IR model concentrates on matching not user side & interaction

α Several interaction models suggested

γ Ingwersen’s cognitive, Belkin’s episode, Saracevic’s stratified model

β hard to get experiments & confirmation

α Considered key to providing γ basis for better design γ understanding of use of systems

α Web interactions a major new area

© Tefko Saracevic 32

Information seeking

α Concentrates on broader context not only IR or interaction, people as they move in life & work

α Based on concept of social construction of information

α Most active area, particularly in Europe, with annual conferences

© Tefko Saracevic 33

Information seeking Sampling of theories, models

α Why people seek information:β Taylor’s stages of information needβ Dervin’s Sense-Making – gap, bridgeβ Belkin’s Anomalous State of Knowledge β Chatman’s life in the round – inf. poverty

α How people seek information:β Wilson’s General Model of inf. seeking β Bates’ berrypicking – acts in searchingβ Kuhlthau’s information search processβ Chang’s browsing modelβ Benoit’s communicative action - Habermas

© Tefko Saracevic 34

Part 6. Paradigm split in technology - people

α Split from early 80’s to date into two orientations

System-centeredγ algorithms, TRECγ continue traditional IR model

Human-(user)-centeredγ cognitive, situational, user studiesγ interaction models, some started in TREC

α These became almost separate universes – one based in computer science, the other in information science & librarianship

[to Table of Content]

© Tefko Saracevic 35

Critiques, culturesα Number of critiques (e.g. Dervin & Nilan) about isolated systems approachβ calls for user-centered approaches, designs & evaluation

α But user-centered studies did not deliver very useful design pointers, guides

α Very different cultures:β computer science has own, more science & technology oriented

β information science more humanities orientedβ C.P. Snow’s two cultures

© Tefko Saracevic 36

Human vs. system

α Human (user) side:β often highly critical, even one-sidedβ mantra of implications for designβ but does not deliver concretely

α System side:β mostly ignores user side & studiesβ ‘tell us what to do & we will’

α Issue NOT H or S approachβ even less H vs. Sβ but how can H AND S work together β major challenge for the future

© Tefko Saracevic 37

Reconciliation?

α Several efforts to provide human-centered designβ but more discussion than real application

α Integration of information seeking and information retrieval in context (Ingwersen & Järvelin)

α Research & development toward β using search context, improving user search experiences & search quality

β machine learning, incorporating semantics

© Tefko Saracevic 38

Funding

α Most funding goes toward systems side & computer scienceβ most (very large %) support for system work

α In the digital age support is for digital

α True globally

© Tefko Saracevic 39

Part 7. Digital libraries LARGE & growing area

α “Hot” area in R&Dβ a number of large grants & projects in the US, European Union, & other countries up to now;

β will it continue? It is not growingβ but “DIGITAL” big & “libraries“ small

α “Hot” area in practiceβ building digital collections, hybrid libraries,

β many projects throughout the worldβ growing at a high rate

[to Table of Content]

© Tefko Saracevic 40

Technical problems

α Substantial - larger & more complex than anticipated:β representing, storing & retrieving of library objectsγ particularly if originally designed to be printed & then digitized

β operationally managing large collections - issues of scale

β dealing with diverse & distributed collectionsγ interoperability

β assuring preservation & persistenceβ incorporating rights management

© Tefko Saracevic 41

Digital Library Initiatives in the US (DLI)

α Research consortia under National Science Foundation β DLI 1: 1994-98, 3 agencies, $24M, six large projects

β DLI 2: 1999-2006, 8 agencies, $60+M, 77 large & small projects in various categories

α ‘digital library’ not defined to cover many topics & stretch ideasβ not constrained by practice

© Tefko Saracevic 42

European Union

α DELOS Network of Excelence on Digital Librariesβ many projects throughout European Union

γ heavily technological

β many meetings, workshopsβ resembles DLIs in the USβ well funded, long range

© Tefko Saracevic 43

Research issues

β understanding objects in DLγ representing in many formatsγ non-textual materials

β metadata, cataloging, indexingβ conversion, digitizationβ organizing large collectionsβ federated searching over distributed (various) collections

β managing collections, scalingβ preservation, archivingβ interoperability, standardizationβ accessing, using,

© Tefko Saracevic 44

DL projects in practice

α Heavily oriented toward a variety of institutions – primarily libraries β but also museums, professional societies, specific domains, etc etc

α Main orientation: institutional missions, contexts, financesβ sustainability, preservation in real world

β managing growth, rights, access

© Tefko Saracevic 45

Agendasα Most DL research agenda is set from top down β from funding agencies to projectsβ imprint of the computer science community's interest & vision

α Most DL practice agendas are set from bottom upβ from institutions, incl. many librariesβ imprint of institutional missions, interests & visionγ providing access to specialized materials and collections from an institution (s) that are otherwise not accessible

γ covering in an integral way a domain with a range of sources

© Tefko Saracevic 46

Connection?

α DL research & DL practice presently are conducted β mostly independent of each other,

β minimally informing each other,

β & having slight, or no connection

α Parallel universes with little connections & interaction

© Tefko Saracevic 47

8. Conclusions

IS contributions

α IS effected handling of inf. in societyα Developed an organized body of knowledge & professional competencies

α Applied interdisciplinarityα IR reached a mature stageα IR penetrated many fields & human activities

α Stressed HUMAN in human-computer interaction

[to Table of Content]

© Tefko Saracevic 48

Challenges

α Adjust to the growing & changing social & organizational role of inf. & related inf. infrastructure

α Play a positive role in globalization of information

α Respond to technological imperative in human terms

α Respond to changes from inf. to communication explosion - bringing own experiences to resolutions, particularly to the INTERNET

α Join competition with qualityα Join DIGITAL with LIBRARIES

© Tefko Saracevic 49

Juncture

α IS is at a critical juncture in its evolutionα Many fields, groups ... moving into information

β big competitionβ entrance of powerful playersβ fight for stakes

α To be a major player IS needs to progress in its:β research & developmentβ professional competenciesβ educational effortsβ interdisciplinary relations

α Reexamination necessary

© Tefko Saracevic 51

© Tefko Saracevic 52

Bibliography

Bates, M. J. (1999). Invisible Substrate of Information Science. Journal of the American Society for Information Science,50, 1043-1050.

Bush, V. (1945). As We May Think. Atlantic Monthly, 176, (11), 101-108. Available: http://www.theatlantic.com/unbound/flashbks/computer/bushf.htm

Hjørland, B. (2000). Library and Information Science: Practice, Theory, and Philosophical Basis. Information Processing & Management, 36 (3), 501-531.

Pettigrew, K.E. & McKechnie, L.E.F. (2000). The use of theory in information science research. Journal of the American Society for Information Science and Technology, 52 (1), 62 - 73.

Saracevic, T. (1999). Information Science. Journal of the American Society for Information Science, 50 (9) 1051-1063. Available: http://www.scils.rutgers.edu/~tefko/JASIS1999.pdf

Saracevic, T. (2005). How were digital libraries evaluated? Presentation at the course and conference Libraries in the Digital Age (LIDA)30 May-3 June 2005, Dubrovnik, Croatia. Available: http://www.scils.rutgers.edu/~tefko/DL_evaluation_LIDA.pdf

Webber, S. (2003) Information Science in 2003: A Critique. Journal of Information Science, 29, (4), 311-330.

White, H. and Mc Cain, K. (1998). Visualizing a Discipline: An Author Co-citation Analysis of Information Science 1972-1995. Journal of the American Society for Information Science, 49 (4), 327-355.