60
Information Information Highlighting Highlighting Coping with the delugeof data. Technologydevelopedduringthecoldwar ishelpingorganisationstocopewiththeinform ationexplosion Analystsatthe MetaGroupestimate that theamount of privateinformation storedglobally doublesevery12-14months. Thisinformationexplosion isthe resultof theproliferationof personal computers which has allowedindividualsandworkgroupsto createdocuments and manageinformation to meet their ownneeds. As a result, thereisa massiveamount of information which needstobe retrievedand sharedinternally. “Thereisnow a greater volumeof information thancan be searched manually,”says Mr PhilippeCourtot,chairmanand chief executive of Verity(http.//www.verity.com), a leadingprovider of searchandretrieval applicationsinenterprise computing. “SurfingtheInternet isimpractical with so muchdata, so you needadifferent metaphor. Usersneedinformationpresentedtothem inaway whichispersonal.” Veritywas formedasa spin-offfrom Advanced Decision Systems(ADS), aUS government-funded project toautomatetheprocessof findinginformation. ADS createda software technologywhich readsdocuments, allowing users tofind storedinformationin responsetoaspecificquery. It can also monitor incomingdocuments to findanythingwhichisof interest toindividual users. Because theentire document is read, the resultsarealwaysaccurateand are deliveredin order ofrelevanceto theuser. Thecommercial product thatVerityhascomeupwithis“Topic”. Vast electronicarchives Theoriginal project was launchedbecausetheUSCentral Intelligence Agency was interestedin usingtechnologytohelp itfindinformationin itsvast electronicarchives. “Topicwassoonusedin theW hiteHouseand by theNational SecurityCouncil,”says Mr Courtot. “It wasanatural movetotheUSgovernment and toworld securityagencies. From there, itsoonmovedtolarge corporations.” Veritywas ideallypoised tohelp userscopewiththeenormousvolumeof information onthe Internet. “We knew it wasimportant, sowesought out Netscapeas apartner,”says Mr Courtot. “Ifyou know whatyou arelooking for,youcandescribeit in wordsand Topicwill findit quickly,”explainsMrCourtot. “Theproblem isif youdon’t know what information youwant, because youdon’thavethewordstodescribeit. “Informationusedtobe foundbyasking corporatelibrarianstogather ittogether. They did thisonaniterative basis, astheysearched throughcatalogues andindexes,refining their queriesas theyworkedtoimprove thequalityof the result. Thisdoesn’t workanymore because thereistoo muchinformation.” Organisations now havetogive directaccesstoinformation tousersbecausethey canno longer affordtoget specialistcorporatelibrariansto search.Software vendorssuch as Veritygive end-users toolstonavigate throughtheinformation available, withoutreading it, and guidethem downa pathby giving them choices. such as. “Do youwantEuropeor North America?" MrCourtot iskeen topoint out that Topicisfarmoreeffective than the popular Internet search engines becauseit readseachdocument and thereforereturnsamoreaccurate answer toqueries. “Thesearchengines ontheW eb tendtoreturnthousands of irrelevant answers,”heexplains. “Ifyoutypein “President oftheUnited States”, Lycosand Yahoo! will give you10,000answers and thefirstfew maynot even mentionBill Clinton.” Thetechnological challengewhich Verityfacesisconsiderable. Users need tobeable to search, retrieveand filterinformationin the enterprise, in onlinedatabases or acrossthe Internet. Thetechnologyalsoneeds tocopewithdissimilardocument types, incompatible information sourcesandgeographicallydispersed datastores. Mr Courtot hasalreadyapproachedtheseproblemsby introducinga largepartnering programme.Morethan100applicationsareindexingtheir informationwith theVerity format, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase. Verityhas partnership agreements withIT vendors, suchas AT&T, Compaq, Microsoft, Object Design, SAP,SCO and Tandem, as well aswithinformationproviders, including KnightRidder, Time WarnerPathfinder and FT Profile, a sistercompanytothe Financial Times. Theaccelerating growthin the amount ofinformationisgoingto createproblems. 90 per cent of inventionshavebeenin thelast 50yearsand90percent will be madein the next 25, predictsMrCourtot. “Theanswer isfor the system to categorise informationby understandingthenature ofthe document. Computers will never beperfectfor categorising, so you must askthepublisherto categoriseit.You needthe system to automaticallycreate categoriesandanabstract, which theauthorcanthenedit. W ehavetominimise the iterative process which the CIA were using. Theconcept of searchingfor information,rather than reading throughit, raisessome important issues.Scanninganewspaper or other document may exposea reader tonew ideas andstimulateinnovation,aprocesswhich may belostif weuse computerisedtools, suchasTopic. “Innovationisin theindividual, not theinformation,”saysMrCourtot. “Users will get moreknowledgeas they browseinformation structuresby discoveringanew category.” Expanding intelligence Thecomputer ageexpands theopportunities for humanstoinnovate,”headds. “Part ofman’sevolutionhas beentheabilitytousetoolsand learnnew waysto applythem. Today, Microsoft’sEncartaencyclopaedia isagoodstart in presenting knowledge. “Eventually, withvirtual realityandhuman gene mapping, we will extendourlives. W henwecandecode the genes, virtual reality.will give usa major new tool toshape the brain. Humanintelligence hasn’tgrown verymuch, but I believe it will.” W ithever increasingvolumes of information, users faceadanger of not havingall the information relevant toanimportant decision, soproductssuchasTopicaregoing tobe increasinglyimportant. “Withinformation becomingavailable at anaccelerating rate, the challenge isto find therighttypeof informationwith minimal effort,” concludesMr Courtot. “If wedon’t, decision-making will become stifledbythedemandsoffindingand managingthe information needed.” Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electrical engineer ingandphysics at the University of Paris. A formerchief executive of Thomson-CGR Medical corporation— nowa division of OE Medical — his personal achievements include the Benjamin Franklin Award fromthe Saturday EveningPost for his role in promotinga national awareness campaign in reaching more than 75m people inthe promotion of the lifes-avingbenefits of mammography screening. His association withVerity began dur inghis tenure as presidentand chief executive of cc:Mail. FT7 May1997 Tim Ostler Tim Ostler Cognitive Cognitive Architecture Architecture Anaphora Ltd Anaphora Ltd [email protected] [email protected] InfoVis’99 InfoVis’99 London 16 July London 16 July 1999 1999

Information Highlighting

Embed Size (px)

DESCRIPTION

A study of the discourse-analytical and other textual criteria people use to select words when they are highlighting a text for others.

Citation preview

Page 1: Information Highlighting

InformationInformation Highlighting Highlighting

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

Tim OstlerTim OstlerCognitive ArchitectureCognitive ArchitectureAnaphora Ltd Anaphora Ltd

[email protected]@cogarch.com

InfoVis’99InfoVis’99London 16 July 1999London 16 July 1999

Page 2: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisation Highlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 3: Information Highlighting

11 HighlightersHighlighters

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 OriginsOrigins

2 2 Cognitive functionCognitive function

33 Highlighting for Highlighting for othersothers

Page 4: Information Highlighting

Highlighters 1/3Highlighters 1/3 OriginsOrigins

1960s: use of 1960s: use of yellow fibre or felt pensyellow fibre or felt pens to highlight text to highlight text begins in the USAbegins in the USA

1971: 1971: Schwan-StabiloSchwan-Stabilo of West Germany launches first of West Germany launches first fluorescentfluorescent highlighter penhighlighter pen

Page 5: Information Highlighting

Highlighters 2/3Highlighters 2/3 Cognitive functionCognitive function

Highlighting Highlighting feelsfeels as though it helps revising, perhaps as though it helps revising, perhaps by encoding or by encoding or primingpriming material for incorporation into material for incorporation into long-term memorylong-term memory

Partly confirmed by research: Hult et al. (1984) found Partly confirmed by research: Hult et al. (1984) found that note-taking does involve that note-taking does involve semantic encodingsemantic encoding

Page 6: Information Highlighting

Also used to mark up a text for Also used to mark up a text for selective attentionselective attention of of another personanother person

ThisThis function chosen for study, because of clear function chosen for study, because of clear application to application to information overloadinformation overload

Conducted Conducted user studyuser study to define suitable heuristics for to define suitable heuristics for text selectiontext selection

Highlighters 3/3Highlighters 3/3 Highlighting for othersHighlighting for others

Page 7: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisation Highlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 8: Information Highlighting

22 Highlighting as information visualisationHighlighting as information visualisation

11 Syntax highlightingSyntax highlighting

2 2 SeeSoftSeeSoft

33 TextLightTextLight

44 Readers vs. AuthorsReaders vs. Authors

Page 9: Information Highlighting

Highlighting as info visualisation 1/4Highlighting as info visualisation 1/4 Syntax Syntax highlightinghighlighting

Highlighting can be seen as a means of visualising the Highlighting can be seen as a means of visualising the logical or logical or conceptual structureconceptual structure of a text of a text– Enhances understanding of textEnhances understanding of text– Guides eye to most important passages Guides eye to most important passages

Principle is widely demonstrated by the syntax highlighting in Principle is widely demonstrated by the syntax highlighting in text-editors for programmerstext-editors for programmers – UsefulUseful: need to : need to visualize logical structurevisualize logical structure acute acute– EasyEasy: programming languages offer finite and : programming languages offer finite and precise set of cuesprecise set of cues for for

editors to detect and coloureditors to detect and colour

Page 10: Information Highlighting

Highlighting as info visualisation 2/4Highlighting as info visualisation 2/4 SeeSoftSeeSoft

One of a suite of One of a suite of text structure text structure visualisationvisualisation tools from team tools from team led by Stephen Eick at Lucent led by Stephen Eick at Lucent (formerly Bell) Laboratories(formerly Bell) Laboratories

Each line of code reduced to aEach line of code reduced to a line of single pixel thicknessline of single pixel thickness, , coloured according to a range coloured according to a range of user-specified criteriaof user-specified criteria

Thousands of lines of codeThousands of lines of code can be displayed on the screen can be displayed on the screen at onceat once

Page 11: Information Highlighting

Highlighting as info visualisation 3/4Highlighting as info visualisation 3/4 TextLightTextLight

TextLightTextLight– Conceived as a tool to Conceived as a tool to

Detect certain attributes of a text’s cognitive structureDetect certain attributes of a text’s cognitive structure Encode them in visual, non-lexical formEncode them in visual, non-lexical form Superimpose them in place on the corresponding textSuperimpose them in place on the corresponding text

– Like a GIS, can reveal attributes of its data set that would Like a GIS, can reveal attributes of its data set that would otherwise be obscured, throwing the underlying structure into otherwise be obscured, throwing the underlying structure into high reliefhigh relief

Page 12: Information Highlighting

Highlighting as info visualisation 4/4Highlighting as info visualisation 4/4 Readers vs. Readers vs. authorsauthors

For For readersreaders,, no benefitsno benefits from using from using different coloursdifferent colours for for different categories of "new" informationdifferent categories of "new" information

But for But for authors and text analystsauthors and text analysts extending TextLight to identify extending TextLight to identify text attributes is as valuable as text attributes is as valuable as colouring different CAD layerscolouring different CAD layers to architectsto architects

Revealing the pattern of distribution of attributes such as Revealing the pattern of distribution of attributes such as readability or levels of completion like a readability or levels of completion like a knowledge discovery knowledge discovery system for authorssystem for authors

Page 13: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisation Highlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 14: Information Highlighting

33 Past studies on visual cueingPast studies on visual cueing

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 Judging importanceJudging importance

2 2 Choosing words 1Choosing words 1

33 Choosing words 2Choosing words 2

44 Core contentCore content

55 How many words?How many words?

66 Large varianceLarge variance

Page 15: Information Highlighting

Herbert Dreyfus: is the ability to tell the important from the Herbert Dreyfus: is the ability to tell the important from the unimportant a unimportant a fundamentally human fundamentally human cognitive operation?cognitive operation?

Perhaps, butPerhaps, but in some genres in some genres widespread agreement on widespread agreement on signalssignals for different stages in a discourse for different stages in a discourse

So while we can’t tell what So while we can’t tell what seemsseems important for important for everyevery person, person, we can assess whatwe can assess what is being is being presentedpresented as important as important

Past studies 1/6Past studies 1/6 Judging importanceJudging importance

Page 16: Information Highlighting

Weakness of all research: no formal rules on Weakness of all research: no formal rules on which textwhich text to cue to cue– Foster (1979): 26 students and lecturers given 3400-word text and asked to underline Foster (1979): 26 students and lecturers given 3400-word text and asked to underline

sentences containing sentences containing key ideaskey ideas author trying to put over author trying to put over

– Half subjects told not to underline more than 16 sentences, half not more than 8Half subjects told not to underline more than 16 sentences, half not more than 8

– First case: 213 selections spanned 80 sentences, with only 9 sentences selected by 6 First case: 213 selections spanned 80 sentences, with only 9 sentences selected by 6 or moreor more

– Second case: 102 selections distributed over 52 sentences, with only 2 selected by 6 Second case: 102 selections distributed over 52 sentences, with only 2 selected by 6 or moreor more

Foster’s conclusion: difficult to identify sections for cueing Foster’s conclusion: difficult to identify sections for cueing

Past studies 2/6Past studies 2/6 Choosing words 1Choosing words 1

Page 17: Information Highlighting

Other experiments Other experiments – Klare et al (1955) cued Klare et al (1955) cued single wordssingle words

– Dearborn et al (1949) emphasised word carrying the Dearborn et al (1949) emphasised word carrying the "peak "peak stress"stress" in a sentence (did not describe how word selected) in a sentence (did not describe how word selected)

– Crouse & Ildstein (1972) cued Crouse & Ildstein (1972) cued statementsstatements or or sentencessentences

Past studies 3/6Past studies 3/6 Choosing words 2 Choosing words 2

Page 18: Information Highlighting

Past studies 4/6Past studies 4/6 “Core” content“Core” content

Most Most specificspecific suggestions by Hershberger & Terry (1965) suggestions by Hershberger & Terry (1965)– ““Core” content made up 1/3 of total text length: Core” content made up 1/3 of total text length:

New key wordsNew key words

Familiar key wordsFamiliar key words

Key statementsKey statements

Basic core statementsBasic core statements

Key examplesKey examples

Rephrasing of key statementsRephrasing of key statements

Page 19: Information Highlighting

Crouse & Ildstein (1972)Crouse & Ildstein (1972)– DensityDensity of cued material influences its effect of cued material influences its effect

Foster (1979)Foster (1979)– Optimal proportionOptimal proportion of text to be highlighted of text to be highlighted still not establishedstill not established

Past studies 5/6Past studies 5/6 How many words?How many words?

Page 20: Information Highlighting

Fowler & Barker (1974)Fowler & Barker (1974)– Pointed to the Pointed to the large variancelarge variance (4% to 32%) observed in the proportion of (4% to 32%) observed in the proportion of

text highlighted by members of the test group who were asked to highlight text highlighted by members of the test group who were asked to highlight for themselvesfor themselves

Rickards & August (1975)Rickards & August (1975)– Asked to highlight passages of structural importance, test subjects all Asked to highlight passages of structural importance, test subjects all

chose passages that Rickards & August considered chose passages that Rickards & August considered relatively relatively unimportantunimportant

Past studies 6/6Past studies 6/6 Large varianceLarge variance

Page 21: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisation Highlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 22: Information Highlighting

44 User studyUser study

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 Experimental Experimental procedureprocedure

2 2 Analytical procedure Analytical procedure

33 Analysis of resultsAnalysis of results

44 ObservationsObservations

Page 23: Information Highlighting

11 subjects provided with an 1111-word article from the financial 11 subjects provided with an 1111-word article from the financial times IT supplement, with instructions to imagine they were times IT supplement, with instructions to imagine they were corporate librarianscorporate librarians identifying the identifying the key pointskey points in an article for a in an article for a board memberboard member

Questionnaire sought: Questionnaire sought: – Subjects’ Subjects’ past experiencepast experience of highlighting of highlighting– CriteriaCriteria for text selection for text selection– At what pointsAt what points made their selection made their selection– Other commentsOther comments

User study 1/4User study 1/4 Experimental procedureExperimental procedure

Page 24: Information Highlighting

ArticleArticle input into spreadsheet as input into spreadsheet as left axisleft axis of spreadsheet spanning 1111 rows of spreadsheet spanning 1111 rows (one word per row)(one word per row)

Along the Along the toptop of the spreadsheet entered the of the spreadsheet entered the attributesattributes for each word (36 for each word (36 categories) categories)

For each word For each word probability of lying in a highlighted passageprobability of lying in a highlighted passage given a given a decimal figure between 0 and 1decimal figure between 0 and 1

All other parameters All other parameters rebasedrebased to fall between 0 and 1 to fall between 0 and 1

Gave Gave correlationcorrelation of any given parameter with the probability that a word fell of any given parameter with the probability that a word fell within a within a highlighted highlighted group of wordsgroup of words

User study 2/4 User study 2/4 Analytical procedureAnalytical procedure

Page 25: Information Highlighting

Results show Results show wide variancewide variance in in numbernumber of words highlighted of words highlighted– Minimum of 50 (4.5%)Minimum of 50 (4.5%)

– Maximum of 396 (35.64%) Maximum of 396 (35.64%)

– (Fowler & Barker 1974: 4-32%)(Fowler & Barker 1974: 4-32%)

Marked difference between Marked difference between malemale and and femalefemale subjects subjects– Males averaging 15%Males averaging 15%

– Females 25.5%Females 25.5%

Little correlationLittle correlation between between part of speech/syntactic rolepart of speech/syntactic role and and probability of highlightingprobability of highlighting

Noticeable association with Noticeable association with longer wordslonger words

User study 3/4User study 3/4 Analysis of resultsAnalysis of results

Page 26: Information Highlighting

None of subjects made highlighting decisions before having read None of subjects made highlighting decisions before having read at least one paragraphat least one paragraph

Large majority (70%) Large majority (70%) delayeddelayed highlighting until whole passage highlighting until whole passage readread

Conclusion: decisions made at a Conclusion: decisions made at a discourse-analyticaldiscourse-analytical and not a and not a strictly strictly linguisticlinguistic level level

User study 4/4User study 4/4 ObservationsObservations

Page 27: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisation Highlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 28: Information Highlighting

55 HeuristicsHeuristics

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 Correlation with average Correlation with average choicechoice

2 2 Key correlationsKey correlations

33 Best heuristicsBest heuristics

44 Highlighting by humans 1Highlighting by humans 1

55 Highlighting by humans 2Highlighting by humans 2

66 Highlighting by best Highlighting by best heuristicsheuristics

77 Performance of best Performance of best heuristicsheuristics

Page 29: Information Highlighting

Average correlation between any Average correlation between any one person’sone person’s highlighting decisionshighlighting decisions and the scores for and the scores for probability probability of given words being highlightedof given words being highlighted was was 0.440.44

For any For any individual wordindividual word probability varied between probability varied between 0 0 andand 0.83 0.83, offering clear guidelines for assessing any , offering clear guidelines for assessing any trial selection criteria trial selection criteria

Heuristics 1/7Heuristics 1/7 Correlation with average choiceCorrelation with average choice

Page 30: Information Highlighting

0 0.1 0.2 0.3 0.4 0.5 0.6

Combination of best criteria

First statement in discourse segment

Proximity to start of sentence

Solution stage

First statement in quote

Present tense

List status

Proximity to start of paragraph

Heuristics 2/7Heuristics 2/7 Key correlationsKey correlations

Page 31: Information Highlighting

Most successful heuristics:Most successful heuristics:

11 Word should be part of Word should be part of first statement in a discourse first statement in a discourse segmentsegment

22 Word should be part of first statement in Word should be part of first statement in any quote not an any quote not an immediate continuation of a previous quoteimmediate continuation of a previous quote

33 Word should be part of a Word should be part of a listlist

44 Word should be part of Word should be part of “solution”“solution” stage stage

Heuristics 3/7Heuristics 3/7 Best heuristicsBest heuristics

Page 32: Information Highlighting

Heuristics 4/7Heuristics 4/7 Highlighting by humans 1Highlighting by humans 1

Verity was ideallypoised to help users copewith the enormousvolume of informationon the Internet. “Weknew it was important,so we sought outNetscape as a partner,”says Mr Courtot.“If you know what you

are looking for, you candescribe it in words andTopic will find itquickly,” explains MrCourtot. “The problem isif you don't know whatinformation you want,because you don't havethe words to describe it.“Information used to be

found by askingcorporate librarians togather it together. Theydid this on an iterativebasis, as they searched

through catalogues andindexes, refining theirqueries as they worked toimprove the quality ofthe result. This doesn'twork any more becausethere is too muchinformation.”Organisations now have

to give direct access toinformation to usersbecause they can nolonger afford to getspecialist corporatelibrarians to search.Software vendors such asVerity give end-userstools to navigate throughthe informationavailable, withoutreading it, and guidethem down a path bygiving them choices,such as “Do you wantEurope or North

America?"Mr Courtot is keen to

point out that Topic is farmore effective than thepopular Internet searchengines because it readseach document andtherefore returns a moreaccurate answer toqueries. “The searchengines on the Web tendto return thousands ofirrelevant answers,” heexplains. “If you type in“President of the UnitedStates”, Lycos andYahoo! will give you10000 answers and thefirst few may not evenmention Bill Clinton.”

Areas where Areas where probability of probability of highlighting is highlighting is greater thangreater than 0.4 0.4

Page 33: Information Highlighting

Heuristics 5/7Heuristics 5/7 Highlighting by humans 2Highlighting by humans 2

Verity was ideallypoised to help users copewith the enormousvolume of informationon the Internet. “Weknew it was important,so we sought outNetscape as a partner,”says Mr Courtot.“If you know what you

are looking for, you candescribe it in words andTopic will find itquickly,” explains MrCourtot. “The problem isif you don't know whatinformation you want,because you don't havethe words to describe it.“Information used to be

found by askingcorporate librarians togather it together. Theydid this on an iterativebasis, as they searched

through catalogues andindexes, refining theirqueries as they worked toimprove the quality ofthe result. This doesn'twork any more becausethere is too muchinformation.”Organisations now have

to give direct access toinformation to usersbecause they can nolonger afford to getspecialist corporatelibrarians to search.Software vendors such asVerity give end-userstools to navigate throughthe informationavailable, withoutreading it, and guidethem down a path bygiving them choices,such as “Do you wantEurope or North

America?"Mr Courtot is keen to

point out that Topic is farmore effective than thepopular Internet searchengines because it readseach document andtherefore returns a moreaccurate answer toqueries. “The searchengines on the Web tendto return thousands ofirrelevant answers,” heexplains. “If you type in“President of the UnitedStates”, Lycos andYahoo! will give you10000 answers and thefirst few may not evenmention Bill Clinton.”

Areas where Areas where probability of probability of highlighting is highlighting is greater thangreater than 0.33 0.33

Page 34: Information Highlighting

Heuristics 6/7Heuristics 6/7 Highlighting by best heuristicsHighlighting by best heuristics

Verity was ideallypoised to help users copewith the enormousvolume of informationon the Internet. “Weknew it was important,so we sought outNetscape as a partner,”says Mr Courtot.“If you know what you

are looking for, you candescribe it in words andTopic will find itquickly,” explains MrCourtot. “The problem isif you don't know whatinformation you want,because you don't havethe words to describe it.“Information used to be

found by askingcorporate librarians togather it together. Theydid this on an iterativebasis, as they searched

through catalogues andindexes, refining theirqueries as they worked toimprove the quality ofthe result. This doesn'twork any more becausethere is too muchinformation.”Organisations now have

to give direct access toinformation to usersbecause they can nolonger afford to getspecialist corporatelibrarians to search.Software vendors such asVerity give end-userstools to navigate throughthe informationavailable, withoutreading it, and guidethem down a path bygiving them choices,such as “Do you wantEurope or North

America?"Mr Courtot is keen to

point out that Topic is farmore effective than thepopular Internet searchengines because it readseach document andtherefore returns a moreaccurate answer toqueries. “The searchengines on the Web tendto return thousands ofirrelevant answers,” heexplains. “If you type in“President of the UnitedStates”, Lycos andYahoo! will give you10000 answers and thefirst few may not evenmention Bill Clinton.”

KEYKEY

First statement in a First statement in a quotequote

““Solution” stageSolution” stage

First statement in a First statement in a discourse segmentdiscourse segment

Page 35: Information Highlighting

Best combination of heuristics produced correlation with actual Best combination of heuristics produced correlation with actual highlighting probability of highlighting probability of 0.560.56 (average of 0.43(average of 0.43 for test subjects)for test subjects)

In other words, selecting text according to specified criteria In other words, selecting text according to specified criteria achieved a correlation that was achieved a correlation that was greater than all but one of the greater than all but one of the test subjects achievedtest subjects achieved and considerably higher than the and considerably higher than the averageaverage

BUT: challenge is to BUT: challenge is to identify the markersidentify the markers denoting relevant denoting relevant features in a discourse features in a discourse

Heuristics 7/7 Heuristics 7/7 Performance of best heuristicsPerformance of best heuristics

Page 36: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisationHighlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 37: Information Highlighting

66 Identifying discourse markersIdentifying discourse markers

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 SegmentsSegments

2 2 StatementsStatements

33 Solution stagesSolution stages

44 Stage labelsStage labels

55 Cue words as signalsCue words as signals

6 “Solution” signals6 “Solution” signals

Page 38: Information Highlighting

Identifying discourse markers 1/6Identifying discourse markers 1/6 SegmentsSegments

Different means of discourse segmentation beyond the Different means of discourse segmentation beyond the scope of this paperscope of this paper

Segments most often coincide with beginning of Segments most often coincide with beginning of paragraphs, and normally paragraphs, and normally begin with a propositionbegin with a proposition or or assertionassertion

Most effective technique found: Most effective technique found: select opening select opening statementstatement in its simplest form in its simplest form

Page 39: Information Highlighting

Identifying discourse markers 2/6Identifying discourse markers 2/6 StatementsStatements

Sometimes preceded or followed by a Sometimes preceded or followed by a coherence coherence relationrelation — a question or other linguistic feature that — a question or other linguistic feature that makes proposition’s relevance to the preceding text makes proposition’s relevance to the preceding text clear clear

Following text tends to Following text tends to fill out detailsfill out details and/or provide and/or provide supporting evidencesupporting evidence for the assertion for the assertion

Page 40: Information Highlighting

Identifying discourse markers 3/6Identifying discourse markers 3/6 Solution stageSolution stage

““Situation-problem-solution-evaluationSituation-problem-solution-evaluation”” structure structure– NarrativeNarrative structures structures

Boy meets girl – boy loses girl –Boy meets girl – boy loses girl – boy regains girl boy regains girl – boy & girl live – boy & girl live happily ever afterhappily ever after

– Feature articlesFeature articles Dogs make great pets – however they can get fleas – Dogs make great pets – however they can get fleas – Winalot have Winalot have

now launched a new anti-flea dog foodnow launched a new anti-flea dog food – owners have declared it – owners have declared it a success)a success)

Page 41: Information Highlighting

Identifying discourse markers 4/6Identifying discourse markers 4/6 Stage signalsStage signals

Hoey (1994) — elements of structure often signalled by Hoey (1994) — elements of structure often signalled by characteristic wordscharacteristic words

Stage signals Stage signals as the most basic level as the most basic level – ““Cars are a common way of getting from A to B. Cars are a common way of getting from A to B. HoweverHowever, ,

the congestion that they cause is a problem. the congestion that they cause is a problem. The solution isThe solution is to get people to use public transport. to get people to use public transport. In this wayIn this way everyone everyone can get to work quickly.” can get to work quickly.”

Page 42: Information Highlighting

Identifying discourse markers 5/6Identifying discourse markers 5/6 Cue words as signalsCue words as signals

Hoey (ibid.): Discourse structure essentially Hoey (ibid.): Discourse structure essentially evaluativeevaluative – e.g. “If thyristors are used to control the motor of an electric e.g. “If thyristors are used to control the motor of an electric

car, the vehicle moves smoothly but with poor efficiency at car, the vehicle moves smoothly but with poor efficiency at low speeds” low speeds”

– ““Problem” stage signalled by negative evaluation “poor”Problem” stage signalled by negative evaluation “poor” So stages can be identified by spotting So stages can be identified by spotting cue wordscue words or or

phrases phrases

Page 43: Information Highlighting

Identifying discourse markers 6/6 Identifying discourse markers 6/6 “Solution” signals “Solution” signals

TextLight need only be concerned with TextLight need only be concerned with “solution” “solution” signalssignals

Two examples of such signalsTwo examples of such signals

– Words to do with “Words to do with “solvingsolving”, “”, “developingdeveloping” or “” or “inventinginventing””– Change of verb form into the Change of verb form into the present perfect tensepresent perfect tense, as in , as in

"have -ed". Tense then reverts to simple present to denote "have -ed". Tense then reverts to simple present to denote that a new situation exists as a result of the solutionthat a new situation exists as a result of the solution

Page 44: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisationHighlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 45: Information Highlighting

77 “Given” and “new” information“Given” and “new” information

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 Highlighting the newHighlighting the new

2 2 Narrative stagesNarrative stages

33 ImportanceImportance

44 IntonationIntonation

55 First statementFirst statement

66 ListsLists

77 “Solution” as “new”“Solution” as “new”

88 Quasi-revisionQuasi-revision

99 Levels of “new”Levels of “new”

Page 46: Information Highlighting

““Given” and “new” information 1/9Given” and “new” information 1/9 Highlighting the newHighlighting the new

WhyWhy were best heuristics more effective than others? were best heuristics more effective than others?

Prague school (1930s) — information is composed of a Prague school (1930s) — information is composed of a mixture of mixture of “given”“given” and and “new” information “new” information

Proposition: essential factor behind the choice of text to Proposition: essential factor behind the choice of text to highlight is that they are all wayshighlight is that they are all ways in which “new” in which “new” information is signalled at the discourse levelinformation is signalled at the discourse level

Page 47: Information Highlighting

"Given" and ”new" information 2/9"Given" and ”new" information 2/9 Narrative stagesNarrative stages

Theory supported by the fact that Theory supported by the fact that 80%80% of subjects of subjects stated that they were highlighting words that “stated that they were highlighting words that “marked marked significant stages in the narrativesignificant stages in the narrative.” .”

This implies information that is This implies information that is new in the context of new in the context of preceding textpreceding text

Page 48: Information Highlighting

““Given” and “new” information 3/9Given” and “new” information 3/9 ImportanceImportance

We can argue that an idea’s We can argue that an idea’s perceived importance perceived importance is is judged according to the extent to which it is:judged according to the extent to which it is:

– NewNew as opposed to as opposed to givengiven– Matches a Matches a perceived gapperceived gap in the structure of the in the structure of the

reader’s reader’s domain knowledgedomain knowledge

When highlighting on behalf of When highlighting on behalf of othersothers, we have to make , we have to make informed judgement on informed judgement on how ultimate reader will how ultimate reader will define importancedefine importance

Page 49: Information Highlighting

““Given” and “new” information 4/9 Given” and “new” information 4/9 IntonationIntonation

Halliday (1970) — in spoken discourse,Halliday (1970) — in spoken discourse, intonation intonation is is used to signal to the listener used to signal to the listener what the speaker what the speaker understands to be newunderstands to be new information information

Could Could highlightinghighlighting perform equivalent function? perform equivalent function?

Page 50: Information Highlighting

““Given” and “new” information 5/9 Given” and “new” information 5/9 First statementFirst statement

First statement in a paragraph can be considered as First statement in a paragraph can be considered as supporting structuresupporting structure for the statement at the for the statement at the beginningbeginning of the discourse segment that contains it of the discourse segment that contains it

Operates as one of Operates as one of primary statementsprimary statements containing containing most of the “new” information in document most of the “new” information in document

Page 51: Information Highlighting

““Given” and “new” information 6/9 Given” and “new” information 6/9 ListsLists

Lists typically act as Lists typically act as systematic tabulationsystematic tabulation of what the author of what the author believes to be important (i.e. “new” and relevant) information believes to be important (i.e. “new” and relevant) information

Often used for Often used for predictivepredictive purposes within a discourse, or for purposes within a discourse, or for enumeratingenumerating significant points significant points

People therefore tend to identify lists as People therefore tend to identify lists as concentrated sources concentrated sources of meaningof meaning, and as such eligible for highlighting, and as such eligible for highlighting

Speaker might very well emphasise this by Speaker might very well emphasise this by counting the points counting the points offoff using the fingers of his hand using the fingers of his hand

Page 52: Information Highlighting

““Given” and “new” information 7/9Given” and “new” information 7/9 “Solution” as “new“Solution” as “new””

Solution stages comprise “new” information: a Solution stages comprise “new” information: a climactic climactic point of noveltypoint of novelty in schema, justifying status as in schema, justifying status as “highlightable” text“highlightable” text

If article modelled as histogram with columns depicting If article modelled as histogram with columns depicting sentences plotted against new information content, sentences plotted against new information content, highlighting like highlighting like slicing across the graph using a slicing across the graph using a threshold valuethreshold value

Page 53: Information Highlighting

““Given” and “new” information 8/9Given” and “new” information 8/9 Quasi-revisionQuasi-revision

Criteria and procedure would have been different for Criteria and procedure would have been different for quasi-revision quasi-revision – Shorter rangeShorter range– More spontaneouslyMore spontaneously applied applied

Reader has Reader has more detailed knowledgemore detailed knowledge of what is “new” of what is “new” info for him/herself info for him/herself

Highlighting can be doneHighlighting can be done– In In real timereal time– With With greater precisiongreater precision

Page 54: Information Highlighting

““Given” and “new” information 9/9Given” and “new” information 9/9 Levels of “newness”Levels of “newness”

Information can also be perceived as “new” at Information can also be perceived as “new” at several levels:several levels:– Within a Within a sentencesentence, particular , particular wordswords can be seen as new can be seen as new – Within a Within a paragraphparagraph, some , some sentencessentences can be interpreted as can be interpreted as

newnew and others as contextual or and others as contextual or supporting informationsupporting information– Within a discourse segment or Within a discourse segment or discoursediscourse, still , still longer longer

passagespassages may be perceived as containing “new” information may be perceived as containing “new” information

Page 55: Information Highlighting

SummarySummary

11 HighlightersHighlighters

22 Highlighting as information visualisationHighlighting as information visualisation

33 Past studies of visual cueingPast studies of visual cueing

44 User studyUser study

55 HeuristicsHeuristics

66 Identifying discourse markersIdentifying discourse markers

77 “Given” and “new” information“Given” and “new” information

88 Future directionsFuture directions

Page 56: Information Highlighting

66 Future directions Future directions

Coping with thedeluge of data.

Technology developed during the cold war is helping organisations to cope with the information explosion

Analysts at the Meta Group estimate that the amount of private information stored globallydoubles every 12-14 months. This information explosion is the result of the proliferation ofpersonal computers which has allowed individuals and workgroups to create documentsand manage information to meet their own needs. As a result, there is a massive amount ofinformation which needs to be retrieved and shared internally.

“There is now a greater volume of information than can be searched manually,” saysMr Philippe Courtot, chairman and chief executive of Verity (http.//www.verity.com), aleading provider of search and retrieval applications in enterprise computing.

“Surfing the Internet is impractical with so much data, so you need a differentmetaphor. Users need information presented to them ina way which is personal.” Verity wasformed as a spin-off from Advanced Decision Systems (ADS), a US government-fundedproject to automate the process of finding information.

ADS created a software technology which reads documents, allowing users to findstored information in response to a specific query. It can also monitor incoming documentsto find anything which is of interest to individual users. Because the entire document isread, the results are always accurate and are delivered in order of relevance to the user.The commercial product that Verity has come up with is “Topic” .

Vast electronic archivesThe original project was launched because the US Central Intelligence Agency was

interested in using technology to help it find information in its vast electronic archives.“Topic was soon used in the White House and by the National Security Council,” says

Mr Courtot. “It was a natural move to the US government and to world security agencies.From there, it soon moved to large corporations.”

Verity was ideally poised to help users cope with the enormous volume of informationon the Internet. “We knew it was important, so we sought out Netscape as a partner,” saysMr Courtot.

“ If you know what you are looking for, you can describe it in words and Topic will find itquickly,” explains Mr Courtot. “The problem is if you don’t know what information you want,because you don’t have the words to describe it.

“ Information used to be found by asking corporate librarians to gather it together. Theydid this on an iterative basis, as they searched through catalogues and indexes, refiningtheir queries as they worked to improve the quality of the result. This doesn’t work any morebecause there is too much information.”

Organisations now have to give direct access to information to users because theycan no longer afford to get specialist corporate librarians to search. Software vendors suchas Verity give end-users tools to navigate through the information available, without readingit, and guide them down a path by giving them choices. such as. “Do you want Europe orNorth America?"

Mr Courtot is keen to point out that Topic is far more effective than the popular Internetsearch engines because it reads each document and therefore returns a more accurateanswer to queries. “The search engines on the Web tend to return thousands of irrelevantanswers,” he explains. “ If you type in “President of the United States” , Lycos and Yahoo!will give you 10,000 answers and the first few may not even mention Bill Clinton.”

The technological challenge which Verity faces is considerable. Users need to be ableto search, retrieve and filter information in the enterprise, in online databases or across the

Internet. The technology also needs to cope with dissimilar document types, incompatibleinformation sources and geographically dispersed datastores.

Mr Courtot has already approached these problems by introducing a large partneringprogramme. More than 100 applications are indexing their information with the Verityformat, including Documentum, Informix, Lotus Notes, Netscape, PC Docs and Sybase.

Verity has partnership agreements with IT vendors, such as AT&T, Compaq,Microsoft, Object Design, SAP, SCO and Tandem, as well as with information providers,including Knight Ridder, Time Warner Pathfinder and FT Profile, a sister company to theFinancial Times.

The accelerating growth in the amount of information is going to create problems. 90per cent of inventions have been in the last 50 years and 90 per cent will be made in thenext 25, predicts Mr Courtot. “The answer is for the system to categorise information byunderstanding the nature of the document. Computers will never be perfect for categorising,so you must ask the publisher to categorise it. You need the system to automatically createcategories and an abstract, which the author can then edit. We have to minimise theiterative process which the CIA were using.

The concept of searching for information, rather than reading through it, raises someimportant issues. Scanning a newspaper or other document may expose a reader to newideas and stimulate innovation, a process which may be lost if we use computerised tools,such as Topic. “ Innovation is in the individual, not the information,” says Mr Courtot. “Userswill get more knowledge as they browse information structures by discovering a newcategory.”

Expanding intelligenceThe computer age expands the opportunities for humans to innovate,” he adds. “Part

of man’s evolution has been the ability to use tools and learn new ways to apply them.Today, Microsoft’s Encarta encyclopaedia is a good start in presenting knowledge.

“Eventually, with virtual reality and human gene mapping, we will extend our lives.When we can decode the genes, virtual reality. will give us a major new tool to shape thebrain. Human intelligence hasn’t grown very much, but I believe it will.”

With ever increasing volumes of information, users face a danger of not having all theinformation relevant to an important decision, so products such as Topic are going to beincreasingly important.

“With information becoming available at an accelerating rate, the challenge is to findthe right type of information with minimal effort,” concludes Mr Courtot. “ If we don’t,decision-making will become stifled by the demands of finding and managing theinformation needed.”

Philippe Courtot, a Basque born and raised Frenchman, earned degrees in electricalengineering and physics at the University of Paris.

A former chief executive of Thomson-CGR Medical corporation — now a division ofOE Medical — his personal achievements include the Benjamin Franklin Award from theSaturday Evening Post for his role in promoting a national awareness campaign in reachingmore than 75m people in the promotion of the lifes-aving benefits of mammographyscreening.

His association with Verity began during his tenure as president and chief executive ofcc:Mail.

FT 7 May 1997

11 Highlighting long Highlighting long neglectedneglected

2 2 Virtues of Virtues of highlightinghighlighting

33 TextLight: to doTextLight: to do

Page 57: Information Highlighting

Future directions 1/3Future directions 1/3 Highlighting long neglectedHighlighting long neglected

The study of the The study of the selection of words for highlightingselection of words for highlighting previously neglectedpreviously neglected

Potential of Potential of automatic highlighting as a toolautomatic highlighting as a tool to handle to handle information overload also neglectedinformation overload also neglected

Page 58: Information Highlighting

Future directions 2/3 Future directions 2/3 Virtues of highlightersVirtues of highlighters

Output Output familiar to usersfamiliar to users

Highlighting shown to be helpful in Highlighting shown to be helpful in content recallcontent recall

Addresses issue of Addresses issue of confidenceconfidence– Highlighting acts Highlighting acts not as a censor but as a guidenot as a censor but as a guide: non-: non-

selected text (and therefore the context) always in viewselected text (and therefore the context) always in view Suitable as a Suitable as a plug-in moduleplug-in module for other programs for other programs

Page 59: Information Highlighting

Future directions 3/3 Future directions 3/3 TextLight: to doTextLight: to do

Incorporate discourse Incorporate discourse segmentation algorithmssegmentation algorithms

Complete lexical dictionaryComplete lexical dictionary for cue recognition for cue recognition

Port from Prolog to Port from Prolog to JavaJava for greater portability for greater portability

Page 60: Information Highlighting

TextLightTextLight URLsURLs

http://www.cogarch.demon.co.uk/textlight.htmlhttp://www.cogarch.demon.co.uk/textlight.html

mailto:[email protected]:[email protected]