Upload
judith-holt
View
32
Download
0
Embed Size (px)
DESCRIPTION
Implementing DFS Search Services. Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette. DFS 6.5 Search and Classification Services. DFS: Service-oriented and platform-agnostic Search service in DFS since 6.0: Federated Search on Documentum repositories and external repositories - PowerPoint PPT Presentation
Citation preview
1© Copyright 2008 EMC Corporation. All rights reserved.
Implementing DFS Search Services
Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette
2© Copyright 2008 EMC Corporation. All rights reserved.
DFS 6.5 Search and Classification Services
DFS: Service-oriented and platform-agnostic
Search service in DFS since 6.0:– Federated Search on Documentum repositories and external repositories
6.5: New search and content intelligence features:– Nonblocking search– Clustering of search results– Saved searches– Classification service
A platform to build wide range of search applications from mobile search to advanced discovery interface
This presentation of services put them in practice by progressively building an application example.
3© Copyright 2008 EMC Corporation. All rights reserved.
Agenda
Search
Clustering
Saved Queries
Classification
Troubleshooting
4© Copyright 2008 EMC Corporation. All rights reserved.
DFS 6.5 Search and Classification Services
Search service
Simple search
Federated search
Nonblocking search
Advanced queries
5© Copyright 2008 EMC Corporation. All rights reserved.
Search Service
SearchService:– execute
Executes a query and returns results
– getRepositoryList Returns the list of available sources (managed and unmanaged repositories)
Query can be structured or passthrough (straight DQL)
Results contains Query status and DataPackage (list of DataObject)
Stateless: relies on a caching mechanism
6© Copyright 2008 EMC Corporation. All rights reserved.
Services Architecture
Consumers DFS
Search Service
DFSRuntime
DFSRuntime
JAX-WS / JAXB
Control flow
WSDL-basedProxies
Query Store Service
Analytics Service
DFC
Search Service
Content Server
ECI Server
CI Server
7© Copyright 2008 EMC Corporation. All rights reserved.
Demo: A Simple Search Application
8© Copyright 2008 EMC Corporation. All rights reserved.
Example: A Simple Search Application
A simple example that performs a search on one repository and displays results
Architecture of the example:– User interface in AJAX– Java servlets call DFS and format results in JSON for the UI– Remote call to DFS but could also be local calls
Browser:AJAX application
Content Server Full-text Indexer
JBOSS
DFSJava Servlets SOAPHTTP/JSON
9© Copyright 2008 EMC Corporation. All rights reserved.
Build and execute query
Example: Execute Query
StructuredQuery q = new StructuredQuery();
q.addRepository("MSSQL60ECI4");
q.setObjectType("dm_document");
ExpressionSet expressionSet = new ExpressionSet();
expressionSet.addExpression(new FullTextExpression(searchQuery));
q.setRootExpressionSet(expressionSet);
QueryExecution queryExec = new QueryExecution(0, 100, 100);
QueryResult queryResult = searchService.execute(q, queryExec, null);
Setup context
RepositoryIdentity identity = new RepositoryIdentity("MSSQL60ECI4", "userdev1", "userdev1", "");
ContextFactory contextFactory = ContextFactory.getInstance();
IServiceContext context = contextFactory.newContext();
context.addIdentity(identity);
ISearchService searchService = ServiceFactory.getInstance().getRemoteService(ISearchService.class,
context, "search", "http://127.0.0.1:8080/services");
10© Copyright 2008 EMC Corporation. All rights reserved.
Example: Wrap the Query in a Servlet
Get parameter
public class SearchServlet extends HttpServlet
{
protected void doPost(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse) throws ServletException, IOException
{
String searchQuery = httpServletRequest.getParameter("queryTerms");
//…
11© Copyright 2008 EMC Corporation. All rights reserved.
Example: Format Response as JSON
JSON: A JavaScript-friendly structure
Easy to represent lists and name/value pairs
12© Copyright 2008 EMC Corporation. All rights reserved.
Example: Format Response as JSON
public void writeJSON(PrintWriter writer, QueryResult response) {
writer.append("[");
for (Iterator it = response.getDataObjects().iterator(); it.hasNext();) {
DataObject dataObject = (DataObject) it.next();
writer.append("{");
PropertySet set = dataObject.getProperties();
Iterator<Property> iterator = set.iterator();
while (iterator.hasNext()) {
Property prop = iterator.next();
String strName = prop.getName();
String value = prop.getValueAsString();
writer.append("\"").append(strName).append("\":\"").append(value).append("\"");
if (iterator.hasNext()) writer.append(",");
}
writer.append("}\n");
if (it.hasNext()) writer.append(",");
}
writer.append("]");
}
13© Copyright 2008 EMC Corporation. All rights reserved.
Example: HTML Form
function updatepage(str){
var rsp = eval("("+str+")"); // use eval to parse JSON response
var html= "<table>";
for (i = 0 ; i < rsp.length; i++) {
var result = rsp[i];
html += "\n<tr><td>" + result.object_name + "</td></tr>";
}
html += "</table>"
document.getElementById("result").innerHTML = html;
}
<!-- … --!>
<form name="searchForm"
onsubmit='xmlhttpPost("/EMCWorldDemo/search",updatepage, getQueryParams()); return false;'>
<p>query: <input name="queryTerms" type="text">
<input value="Go" type="submit"></p>
<div id="result"></div></td>
</form>
14© Copyright 2008 EMC Corporation. All rights reserved.
Federated Search
DFS Search Service supports federated search across multiple Documentum repositories and external repositories
Requires ECI option for external repositories
ECI supports a large catalog of adapters to external sources: – CMS (FileNet, SharePoint, IBMCM…)– Websites (Google, Yahoo …)– Databases– Indexers (Verity, Fast, IndexServer…)– Specialized sources (legal, science, regulation, patents, health…)– EMC products (eRoom, EX, AX…)
Support for authentication using the same service as Docbase repositories
15© Copyright 2008 EMC Corporation. All rights reserved.
Federated Search: Configure ECI
To search external repositories:
Install ECIS
Edit dfc.properties in DFS ear:– dfc.search.ecis.enable=true– dfc.search.ecis.host=ecishost
16© Copyright 2008 EMC Corporation. All rights reserved.
Querying multiple sources
Example: Querying Several Sources
String[] sources = httpServletRequest.getParameterValues("sources");
ContextFactory contextFactory = ContextFactory.getInstance();
IServiceContext context = contextFactory.newContext();
for (String source: sources) {
RepositoryIdentity identity = new RepositoryIdentity( source, "userdev1", "userdev1", "");
context.addIdentity(identity);
}
StructuredQuery q = new StructuredQuery();
for (String source: sources) q.addRepository(source);
Listing available sources
List<Repository> repositories = searchService.getRepositoryList(null);
for (Repository dataObject: repositories) {
Repository dataObject = it.next();
String sourceName = dataObject.getName();
String userLogin = dataObject.getProperties().getUserLoginCapability();
}
17© Copyright 2008 EMC Corporation. All rights reserved.
Demo: Nonblocking Search
18© Copyright 2008 EMC Corporation. All rights reserved.
Nonblocking Search
DFS is based on DFC, which supports asynchronous search execution
Allows dynamic display of results
DFS supports it through nonblocking query call:
– Allows multiple successive call to get new results and query status
DFS Client DFS Service
execute(query,0,100)
no results
wait 1 second
execute(query,0,100)
10 results
wait 1 second
execute(query,10,100)
90 results
19© Copyright 2008 EMC Corporation. All rights reserved.
Nonblocking Search: Cache
DFS queries are cached
Each query has a definition and a query ID used as key in the cache
Cache policy is size-based and time-based
Each Search Service call contains the initial query (definition) so that the query may be re-executed in case of cache miss.
Configurable in dfs-runtime.properties:– dfs.query_cache_house_keeper.period = 5
20© Copyright 2008 EMC Corporation. All rights reserved.
Nonblocking Search: QueryStatus
QueryStatus contains status of the query for each repository
Example: Two sources, one successful, one failed with network error
21© Copyright 2008 EMC Corporation. All rights reserved.
Example: Nonblocking Query Execution
Set asynchronous call
QueryExecution queryExec = new QueryExecution(start, len, 350);
queryExec.setQueryId(queryId);
SearchProfile profile = new SearchProfile();
profile.setAsyncCall(true);
OperationOptions options = new OperationOptions();
options.setSearchProfile(profile);
QueryResult queryResult = searchService.execute(q, queryExec, options);
24© Copyright 2008 EMC Corporation. All rights reserved.
Advanced Queries
StructuredQuery: an abstract query
Allow to refine the query.
Allow to bind the query to UI controls.
Independent of the Full-text Indexer and Content Server version. Independent on the presence of an Indexer.
25© Copyright 2008 EMC Corporation. All rights reserved.
Advanced Queries
FullTextExpression– Supports a Boolean ‘mini-language’: phrase AND, OR, NOT and parentheses– Example: EMC contract AND (“end of life” OR termination) NOT ECIS
ExpressionSet– Boolean expression between FullTextExpression and PropertyExpression
PropertyExpression– Constraints on document attributes– Operators: EQUAL, NOT_EQUAL, GREATER_THAN, LESS_THAN,
GREATER_EQUAL, LESS_EQUAL, BEGINS_WITH, CONTAINS, DOES_NOT_CONTAIN, ENDS_WITH, IN, NOT_IN, BETWEEN, IS_NULL, IS_NOT_NULL,
– Values: SimpleValue, ValueList, ValueRange, RelativeDateValue
26© Copyright 2008 EMC Corporation. All rights reserved.
Advanced Queries: Example
Example of structured query:
Object_name contains “test”, modified date in the last month and owner_name is “marc” or “ghislain”
Advanced query example
ExpressionSet expr = new ExpressionSet();
expr.addExpression(new PropertyExpression("object_name", Condition.CONTAINS, "test"));
expr.addExpression(new PropertyExpression("r_modify_date", Condition.GREATER_EQUAL, new RelativeDateValue(-1, TimeUnit.MONTH)));
ExpressionSet orExpr = new ExpressionSet(ExpressionSetOperator.OR);
orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL, "marc"));
orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL, "ghislain"));
expr.addExpression(orExpr);
27© Copyright 2008 EMC Corporation. All rights reserved.
Agenda
Search
Clustering
Saved Queries
Classification
Troubleshooting
28© Copyright 2008 EMC Corporation. All rights reserved.
DFS 6.5 Search and Classification Services
Clustering
Simple clustering of search results
Multiple facets and strategies
Getting results
Go beyond search
29© Copyright 2008 EMC Corporation. All rights reserved.
Clustering
Dynamic grouping of results into ‘clusters’
Based on results properties (not content)
Uses linguistic rules
Option of Search Service
Requires an SBO to be installed– An installer is provided
(Webtop Extended Search)
Supports hierarchical clustering
30© Copyright 2008 EMC Corporation. All rights reserved.
Clustering
SearchService:– getClusters
Return the clusters for a query
– getSubClusters Return the clusters for a subset of a query
– getResultsProperties Return the properties for a subset of a query
The services are stateless– Reuse query cached by SearchService.execute. Reexecute it if needed.– All the methods have query and query execution parameter in case of cache miss
31© Copyright 2008 EMC Corporation. All rights reserved.
Demo: Enhance the Search Application with Clustering
32© Copyright 2008 EMC Corporation. All rights reserved.
Example: Computing Clusters
Get clusters for a query
QueryExecution queryExec = new QueryExecution(0, 100, 350);
queryExec.setQueryId(queryId);
ClusteringProfile profile = new ClusteringProfile();
profile.addClusteringStrategy(new ClusteringStrategy("Topics",
Arrays.asList("object_name", "title", "subject", "summary")));
OperationOptions options = new OperationOptions();
options.setClusteringProfile(profile);
QueryCluster queryClusters = searchService.getClusters(query, queryExec, options);
33© Copyright 2008 EMC Corporation. All rights reserved.
Example: Clustering Response Objects
getClusters() responseQueryCluster
ClusterTree
+ isRefreshable: Boolean
Cluster
+ clusterSize: int+ clusterValues: List<String>+ isSubClusterTreeAvailable: Boolean
ClusteringStrategy
+ strategyName: String
ObjectIdentitySet
0..*
0..*
1
0..1
38© Copyright 2008 EMC Corporation. All rights reserved.
Multiple Facets and Strategies
Several ways to group results together
Defined by a strategy:
– Topic
– Person names
– Dates
– Document sizes
39© Copyright 2008 EMC Corporation. All rights reserved.
Example: Multiple Strategies
INSERT example of strategies call: author & date by quarterSet cluster strategy for ‘Topic’ and ‘Date’
ClusteringProfile profile = new ClusteringProfile();
profile.addClusteringStrategy(new ClusteringStrategy("Topics", Arrays.asList("object_name", "title", "subject", "summary")));
ClusteringStrategy dateClusteringStrategy =
new ClusteringStrategy("Date", Arrays.asList("r_modify_date"));
PropertySet tokenizerPropSet = new PropertySet(new StringProperty("r_modify_date", "quarterdate"));
dateClusteringStrategy.setTokenizers(tokenizerPropSet);
profile.addClusteringStrategy(dateClusteringStrategy);
40© Copyright 2008 EMC Corporation. All rights reserved.
Go Beyond Search
Clustering can be used for nonsearch applications
Example: most active subjects in a repository (automatic tag clouds)
43© Copyright 2008 EMC Corporation. All rights reserved.
Agenda
Search
Clustering
Saved Queries
Classification
Troubleshooting
44© Copyright 2008 EMC Corporation. All rights reserved.
Saved Queries
QueryStoreService:– listSavedQueries– loadSavedQuery– saveQuery
Allow to manipulate dm_smart_list object (exposed in Webtop since 5.3)
Allow to control which results are saved
45© Copyright 2008 EMC Corporation. All rights reserved.
Saved Queries
List the saved queries for the current user.
IQueryStoreService service = ServiceFactory.getInstance().getRemoteService(IQueryStoreService.class, context, "core", "http://127.0.0.1:8080/services");
QueryExecution queryExec = new QueryExecution(0, 100, 100);
SavedQueryFilter filter = new SavedQueryFilter(SavedQueryAccessibility.OWNED);
DataPackage queryResult = service.listSavedQueries("MSSQL60ECI4", queryExec, filter, null);
46© Copyright 2008 EMC Corporation. All rights reserved.
Saved Queries
Load a saved query
ObjectIdentity queryId = new ObjectIdentity(new ObjectId("0821f7588000132e"), "MSSQL60ECI3");
QueryExecution queryExec = new QueryExecution(0, 100, 100);
SavedQuery queryResult = queryStoreService.loadSavedQuery(queryId, queryExec, null);
SavedQuery
QueryResult
RichQuery
+ displayedAttributes: List<String>+ propertySet: PropertySet
Query
0..1
1 1
47© Copyright 2008 EMC Corporation. All rights reserved.
Saved Queries
Save a query
Query query = //…
ObjectIdentity queryId = new ObjectIdentity("MSSQL60ECI3");
DataObject metadata = new DataObject(queryId) ;
metadata.getProperties().set("object_name", "My Saved Query");
RichQuery richQuery = new RichQuery();
richQuery.setQuery(query);
QueryExecution queryExec = new QueryExecution(0, 100, 100);
ObjectIdentity queryResult = queryStoreService.saveQuery(metadata, richQuery, queryExec, null, null);
48© Copyright 2008 EMC Corporation. All rights reserved.
Agenda
Search
Clustering
Saved Queries
Classification
Troubleshooting
49© Copyright 2008 EMC Corporation. All rights reserved.
Classification
Introduce a service to compute ‘tags’ for documents
Based on CIS classification engine and managed taxonomy
AnalyticsService:– analyze
Takes a list of object IDs and computes the list of categories for each document
50© Copyright 2008 EMC Corporation. All rights reserved.
Classification Configuration
Install CIS Server– Installer deploy ear with embedded app server (JBoss)
Install taxonomy– Available Taxonomies
Energy / Energy Industry Energy / Oil Trading General Finance General Knowledge Information Science and Technology Law / Federal Legislation Terms Life Sciences Manufacturing / Chemical Hazards Military / DTIC Science and Engineering …
51© Copyright 2008 EMC Corporation. All rights reserved.
Classification: Compute Categories
Analyze an object
ObjectIdentitySet documentsSet = new ObjectIdentitySet(new ObjectIdentity(new ObjectId("0821f7588000132e"), MY_DOCBASE));
OperationOptions operationOptions = new OperationOptions();
PropertyProfile propProfile = new PropertyProfile();
propProfile.setIncludeProperties(Arrays.asList("CATEGORIES"));
operationOptions.setPropertyProfile(propProfile);
IAnalyticsService analyticsService = serviceFactory.getRemoteService(IAnalyticsService.class, context, "analytics", "http://127.0.0.1:7001/services");
List<AnalyticsResult> analyticsResults = analyticsService.analyze(documentsSet, operationOptions);
52© Copyright 2008 EMC Corporation. All rights reserved.
Classification: ‘Analyze’ Response
Display the categories for each object
for (AnalyticsResult classResult : analyticsResults)
{
System.out.println("Document ID: " + classResult.getObjIdentity());
List<CategoryAssign> catAssigns = classResult.getCategoryAssignList();
for (CategoryAssign catAssign : catAssigns)
{
System.out.println("\t " + catAssign.getCategory().getName());
}
}
53© Copyright 2008 EMC Corporation. All rights reserved.
Agenda
Search
Clustering
Saved Queries
Classification
Troubleshooting
54© Copyright 2008 EMC Corporation. All rights reserved.
Troubleshooting
Diagnose query issue: print QueryStatus object:
Diagnose ECIS communication problem: log4j traces
Diagnose query issues after execute
QueryResult queryResult = searchService.execute(query, exec, options);
System.out.println(queryResults.getStatus());
55© Copyright 2008 EMC Corporation. All rights reserved.
Troubleshooting
Trace DFS request/response on SUN JVM:
System.setProperty("com.sun.xml.ws.transport.http.client.HttpTransportPipe.dump", "true");