View
2.056
Download
4
Category
Preview:
DESCRIPTION
See conference video - http://www.lucidimagination.com/devzone/events/conferences/ApacheLuceneEurocon2011 See and hear how IBM applies Lucence into their commercial software offerings. Hear about experience in development and advantages of this approach.
Citation preview
© 2011 IBM Corporation
LIGHTNING TALKS
Powered by Lucene:
IBM Content Analytics with Enterprise Search
Wolfgang Jung
Barcelona, 19th October 2011
© 2011 IBM Corporation2
IBM Content Analytics with Enterprise Search
Our agenda in the next 10 minutesLIGHTNING TALKS
� IBM is commited to Open Source
– Decade of contribution to the community.
� Adoption of Apache Lucene to IBM Content Analytics
– The Why, What & examples.
� Demonstration of IBM Content Analytics
– see the development results live.
Be enlightened !
© 2011 IBM Corporation3
IBM Content Analytics with Enterprise Search
IBM is commited to Open Source
�Decade of lineage and contributions to the open source community
– Apache Hadoop.
IBM‘s use of BigIndex for Search is mention in Chuck Lams‘s “Hadopp in Action”
– Apache Derby
– Apache Geronimo and Jetty
– Eclipse: Founded by IBM, PMC Board of Directors
– Apache UIMA: Unstructured Information Management Architecture.
Developed by IBM, Contributed to Apache
– Apache Jakarta: Lucene. PMC members
Significant contributions via IBM Lucene Extension Library (ILEL)
– Linux ... and more!
© 2011 IBM Corporation4
IBM Content Analytics with Enterprise Search
Adoption of Apache Luceneto IBM Content Analytics with Enterprise Search
� The use of UIMA is existing since first release in 2005 of IBM OmniFind and later
IBM Content Analytics, continued into today‘s IBM Content Analytics with Enterprise Search
http://www-01.ibm.com/software/data/content-management/analytics/uima.html
�IBM‘s decision for the use of Lucene
–Index is a common technology and better to improve
–lower cost of maintenance
–advantage in incremental indexing
–extensibility
© 2011 IBM Corporation5
IBM Content Analytics with Enterprise Search
Adoption of Apache Luceneto IBM Content Analytics with Enterprise Search
� IBM is a very active contributor. Look for PMC members:
–Michael McCandless; Shai Erera; Doron Cohen
http://lucene.apache.org/who.html
� IBM extended Lucene based on our needs. Two examples already
contributed to community :
–Query Parser
–Facets
© 2011 IBM Corporation6
IBM Content Analytics with Enterprise Search
Adoption of Apache Luceneto IBM Content Analytics with Enterprise Search
� On 13th December 2006, IBM and Yahoo! announced IBM OmniFind Yahoo! Edition, as
“no-cost, entry level enterprise search product developed to help eliminate financial and
technology barriers to intranet and Web search.”
http://www-03.ibm.com/press/us/en/pressrelease/20767.wss
� This technology included Lucene as index technology and had full support by IBM
– 45,000+ downloads from the website http://omnifind.ibm.yahoo.net
– IBM support contracts for clients with “IBM Elite Support for OmniFind Yahoo Edition“
– Below 15 incidents regarding index technology
�Technology is seen as success for IBM
© 2011 IBM Corporation7
IBM Content Analytics with Enterprise Search
Analysed documentswith identified concepts
Claus sprained his ankle on the step
Noun Verb Noun Phrase Prep Phrase
Person Injury Body Part Location
Claimant: Soft Tissue Injury
Extracted Concept
Automatic
VisualizingResults of concept evaluation
are displayed to the users
Sources of InformationInternal (ECM, Files, DBMS, etc.)
and External (Social, News, etc.)
Content Analytics generates new insights and aggregates key findings gathered from large data volumes in a visualized form
© 2011 IBM Corporation8
IBM Content Analytics with Enterprise Search
Rapid Insights from Automotive Complaints
� We will be using publically available data from the National Highway Traffic Safety Agency (NHTSA)
to demonstrate how IBM Content Analytics can be used to identify problems with automobiles.
NHTSA receives various reports about malfunctions, accidents, and other issues with automobiles
from dealerships, repair facilities, and from the general public. NHTSA publishes the data at
http://www.nhtsa.gov. For this demo we have created a collection from the NHTSA “complaints”
data spanning several years ending in early 2010. We will show how this and similar data can be
analyzed to arrive at rapid insights not possible by manually reading through the complaint records.
© 2011 IBM Corporation9
IBM Content Analytics with Enterprise Search
See Content Analytics live!
© 2011 IBM Corporation10
IBM Content Analytics with Enterprise Search
See Content Analytics live!
© 2011 IBM Corporation11
IBM Content Analytics with Enterprise Search
Be enlightened !
© 2011 IBM Corporation
LIGHTNING TALKS
Powered by Lucene: IBM Content Analytics with Enterprise Search
Wolfgang Jung
Barcelona, 19th October 2011
Recommended