Upload
conceptsearching
View
1.217
Download
1
Embed Size (px)
DESCRIPTION
In this webinar, Nate Treloar, Principle Search Technology Evangelist in the Microsoft Enterprise Search Group shared the new Microsoft Search strategy and focused on FAST for SharePoint 2010 and what it means to your organization. Learn how Concept Searching's award winning conceptClassifier eliminates manual metadata tagging through automatic conceptual metadata generation and provides the framework to rapidly build and deploy taxonomies to improve the search experience. Recipient of the FAST Innovative Solution Award for their Search Solutions Framework at the SharePoint 2009 Conference, Aeturnum will share their expertise and best practices in deploying FAST and conceptClassifier as an enterprise search solution.
Citation preview
Solving Enterprise Search Challenges
Sponsored by:Microsoft, Aeturnum, Concept Searching
Nate Treloar - MicrosoftPrincipal Search Technology Evangelist in the Microsoft Enterprise Search GroupResponsible for the group’s technology innovation and evangelism programs
Don Miller – Concept SearchingVice President Business Development
John Challis – Concept SearchingCTO/CEO
Mike Knuts – AeturnumVice President Business Development
Sashika Dias – AeturnumKnowledge Management & Information Access Practice
Welcome - Agenda
Search is the key to engaginginformation experiences
Connecting people to information, driving better outcomes
Search helps your
customers get what they want
Search helps your
employees get their jobs done
cutting costsincreasing revenue
Solutions for
Business Productivity
Solutions for
Internet Sites
Search
OR
Best of SharePointBest of High-end
Best of Microsoft
Products for Every Customer Need
Complete OOB search High end search delivered
through SharePoint
• Common UI Framework
• Social search features and integration
• SharePoint platform integration
• End user and site administrator enablement
Common across the product line
• Common Connector Framework (BDC)
• APIs and developer Experience
• Admin & deployment capabilities
• Operations advantages (SCOM, scripting)
Deep Refinement
Thumbnails
Previews
Sorting
Similar Results
Federation
People Search
and streamline how you find and collaborate with others
Filter by title, expertise &
other attributes
Expertise matching
Phonetic name lookup
Org browsing
Find recent content
Real-timepresence
A systematic approach to interpreting your content
Map Crawled
Properties
Maps all of the metadata discovered by the various
pipeline stages
Date and Time
Normalization
Converts dates and times to a standard representation, to
handle locale specific representations. For example,
knows that 14-Mar-10 is equivalent March 14, 2010.
Entity
Extraction
Finds terms in the content and maps them to predefined
categories. Out of the box support for People,
Companies and Locations, but can be extended to any
category.
Language
Encoding and
Detection
Identifies the native written language and locale specific
encoding so that the proper dictionaries can be used by
the tokenization and lemmatization stages
Format
Conversion
Extracts plain text from multiple file formats, encodings,
and applicationsCustom Stage Insert custom stages to perform specialized content
enrichment and other processing.
Enterprise Search from Microsoft
UX ITDX
Go beyond the
search box
Eliminate
compromise
Do more
with search
© 2009 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market
conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.
microsoft.com /Enterprise Search
Leveraging Metadata to Improve Findability, Records
Management, and Compliance in SharePoint
Donald T. Miller, Vice President Business Development
Concept Searching, Inc.Company founded in 2002
Product launched in 2003 Focus on management of structured and unstructured
information
Technology Automatic concept identification, content tagging, auto-
classification, taxonomy management Only statistical vendor that can extract conceptual metadata
2009 and 2010 ‘100 Companies that Matter in KM’ (KM World Magazine)
KMWorld ‘Trend Setting Product’ of 2009
Locations: US, UK, & South Africa
Client base: Fortune 500/1000 organizations
Managed Partner under Microsoft global ISV Program - “go to partner” for Microsoft for auto-classification and taxonomy management
Microsoft Enterprise Search ISV , FAST Partner
Taxonomy
Classification hierarchy
Provides a manageable information infrastructure
Group unstructured information together based on an understanding of the concepts and ideas that share mutual attributes
Taxonomies and Metadata Drive Business Value
Search Foundation for improving search outcomes (Key word only provides 33%
of results) Consistency of indexing, tagging, resulting in the ability to guide the end
user to the ‘right’ informationFinds concepts – eliminate ambiguity in single words Taxonomy browse and faceted search (Guided navigation increases
access to content by over 35%) Solves the problem when people don’t know what they are looking for or
even what they are looking for exists
Identification and protection of sensitive information (PII, PHI, etc.)
Only solution that combines pattern matching with associated vocabulary
Documents can be tagged and locked down and rendered unavailable in search
Enable more effective Records Management
Identify and declare documents of record and tag with the appropriate retention code and route to the Records Center
A manual metadata approach will fail 95% of time
Issue Organizational Impact
Inconsistent Less than 50% of content is correctly indexed, meta-tagged or efficiently searchable rendering it unusable to the organization (IDC)
Subjective Highly trained Information Specialists will agree on meta tags about 33% of the time. (C. Cleverdon)
Cumbersome Average cost of manually tagging one item runs from $4 - $7 per document and does not factor in the accuracy of the meta tags nor the repercussions from mis-tagged content (Hoovers)
Malicious Compliance End users select first value in list (Perspectives on Metadata, Sarah Courier)
No perceived value for end user What’s in it for me? End user creates document, does not see value for organization nor risks associated with litigation and non conformance to policies.
What have you seen Metadata will continue to be a problem due to inconsistent human behavior
The answer to consistent metadata is an automated approach that can extract the meaning from content eliminating manual metadata generation yet still providing the ability to manage
knowledge assets in alignment with the unique corporate knowledge infrastructure.
Create enterprise metadata framework/model Average return on investment minimum of 38%
and runs as high as 600% (IDC)
Apply consistent meaningful metadata to enterprise content Incorrect meta tags costs an organization
$2,500 per user per year – in addition potential costs for non-compliance (IDC)
Guide users to relevant content with taxonomy navigation Savings of $8,965 per year per user based on an
$80K salary (Chen & Dumais)
Use automatic conceptual metadata generation to improve Records Management Eliminate inconsistent end user tagging at $4-$7
per record (Hoovers)
Improve compliance processes, eliminate potential privacy exposures
An Automated Metadata Approach Drives Business Value
1. Model and Validate
2. Automate Tagging
3. Findability
4. Business Processes
5. Records Management
and PII
6. Life Cycle Management
Concept Based Metadata Generation
Compound Term Processing – the ability to extract ‘concepts in context’
• Only statistical metadata generation and classification company that can extract concepts from content as it is created or ingested
triple heart bypass
Triple
BaseballThree
Heart
OrganCenter
Bypass
HighwayAvoid
conceptClassifier will generate conceptual metadata by extracting multi-word terms that identifies ‘triple heart bypass’ as a concept as opposed to single keywords
• Search will return results based on the concept even if the exact terms are not contained in the document (i.e. ‘coronary artery surgery’, ‘heart surgery’)
• Metadata can be used by any search engine index or any application/process that uses metadata
conceptClassifier and TaxonomyManager
We Make Metadata Work For You
Automatic Conceptual Metadata Generation
Automated Classification
Taxonomy Development & Management • Proven to reduce taxonomy development by 80%
Microsoft Integration• Runs natively in SharePoint 2007 and SharePoint
2010, Microsoft Office Applications, SharePoint Search and FAST, Windows Server 2008 R2 FCI
• Fully integrated with SharePoint Content Types
Content Type Updater• Automatically changes the Content Type based on
presence of organizationally defined metadata found within the document
• Identification of confidential/privacy data• Ability to identify records based on the
records retention schedule and route to the records center
Technology• Downloadable in 30 minutes – no programming
required• Fully SOA compliant, delivered as Web Parts, based
on open standards• Highly scalable
conceptClassifier for FAST Search
Improves search outcomes by placing conceptual metadata in the FAST Search index to increase relevancy of search results
Enables import of FAST Entities into the conceptClassifier taxonomy manager to fine-tune them with metadata generated from your own content and nomenclature
Runs natively as a FAST Pipeline Stage eliminating integration and customization issues
Eliminates vocabulary normalization issues across global boundaries through controlled vocabularies
Improves faceted search results as facets are based on concepts aligned with the taxonomy
Provides taxonomy browse capabilities based on the nodes within the corporate taxonomy(s)
Provides accurate metadata filters such as numeric range searching and wildcard alphanumeric matching
Removes documents from search results that are confidential/sensitive through automatic Content Type updating and routing to secure server
Automatically tags content with both vocabulary and retention codes and respects SharePoint security that could prevent access to the document once it has been declared a record
Roadmap from SharePoint 2007 to 2010
Enterprise Metadata Management
Properties (current flat lists) become hierarchical “Term Sets” –
Term Sets provide capability for faceted search and hierarchical navigation: Regions Country/State, Business Unit/Departments, Band Names/Album Names, TV Show Titles/Characters
Ability to automatically extract all meaningful concepts from content when it is created or ingested to be used by the Term Sets
Augments EMM through auto-classification to automatically apply all semantic (conceptual) metadata to the Term Sets
Automates the management, validation, and testing of the Term Sets in EMM from conceptClassifier’s Taxonomy Manager
Facilitates the ongoing taxonomy and Term Set maintenance through easy-to-use taxonomy features designed for Subject Matter Experts
conceptClassifier fully supports SharePoint 2010 EMM as the primary location for taxonomy definitions with no need to Import/ExportChanges to the taxonomy structure using Microsoft tools will
be immediately visible in conceptClassifier and vice versa
Solving Enterprise Search Challenges
www.aeturnum.com
Contents
Why we need metadata
Architectural considerations
Why conceptSearching?
Why we need metadata(a search practitioner’s view)
» Improve search relevancy
• Create personas based on information needs (eg – a bank loan officer vs. a branch manager)
• Attach different content sources, relevancy models and user experiences to these personas based on metadata
» Better post-search navigation
• Selectively expose metadata as navigators
» Enable other features
• Metadata can be used as input for workflows and alerting features (eg– notify the risk management department when documents with social security no’s are shared on the company intranet)
Architectural considerations
Should metadata reside in a separate metadata repository, within the content repository, or within a search engine?
Content Store
conceptClassifier
Content Store Content Store
Search Engine
Content Store
conceptClassifier
Content Store Content Store
Search Engine
Search engine• Simple to configure (connect search engine ingest process to classifier)• Some content stores can’t hold metadata (eg –shared drives)• Best suited for multiple repositories
Metadata or content repository• Metadata is actionable beyond search (drive workflows, alerts, etc.)• More complex implementation as no. of repositories increase
Why use conceptSearching?
» Iterative taxonomy development cycle avoids false positives and surprises
» Pass taxonomy control into the hands of Business & KM users
» Allow taxonomy to change with the business
» Compound term classification, statistical ‘clues’ suggestions, etc.
Define / Refine
taxonomy
Find cluesTest
results