48
1 © Copyright 2008 EMC Corporation. All rights reserved. Implementing DFS Search Services Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette

Implementing DFS Search Services

Embed Size (px)

DESCRIPTION

Implementing DFS Search Services. Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette. DFS 6.5 Search and Classification Services. DFS: Service-oriented and platform-agnostic Search service in DFS since 6.0: Federated Search on Documentum repositories and external repositories - PowerPoint PPT Presentation

Citation preview

Page 1: Implementing DFS Search Services

1© Copyright 2008 EMC Corporation. All rights reserved.

Implementing DFS Search Services

Pierre-Yves “Pitch” Chevalier on behalf of Marc Brette

Page 2: Implementing DFS Search Services

2© Copyright 2008 EMC Corporation. All rights reserved.

DFS 6.5 Search and Classification Services

DFS: Service-oriented and platform-agnostic

Search service in DFS since 6.0:– Federated Search on Documentum repositories and external repositories

6.5: New search and content intelligence features:– Nonblocking search– Clustering of search results– Saved searches– Classification service

A platform to build wide range of search applications from mobile search to advanced discovery interface

This presentation of services put them in practice by progressively building an application example.

Page 3: Implementing DFS Search Services

3© Copyright 2008 EMC Corporation. All rights reserved.

Agenda

Search

Clustering

Saved Queries

Classification

Troubleshooting

Page 4: Implementing DFS Search Services

4© Copyright 2008 EMC Corporation. All rights reserved.

DFS 6.5 Search and Classification Services

Search service

Simple search

Federated search

Nonblocking search

Advanced queries

Page 5: Implementing DFS Search Services

5© Copyright 2008 EMC Corporation. All rights reserved.

Search Service

SearchService:– execute

Executes a query and returns results

– getRepositoryList Returns the list of available sources (managed and unmanaged repositories)

Query can be structured or passthrough (straight DQL)

Results contains Query status and DataPackage (list of DataObject)

Stateless: relies on a caching mechanism

Page 6: Implementing DFS Search Services

6© Copyright 2008 EMC Corporation. All rights reserved.

Services Architecture

Consumers DFS

Search Service

DFSRuntime

DFSRuntime

JAX-WS / JAXB

Control flow

WSDL-basedProxies

Query Store Service

Analytics Service

DFC

Search Service

Content Server

ECI Server

CI Server

Page 7: Implementing DFS Search Services

7© Copyright 2008 EMC Corporation. All rights reserved.

Demo: A Simple Search Application

Page 8: Implementing DFS Search Services

8© Copyright 2008 EMC Corporation. All rights reserved.

Example: A Simple Search Application

A simple example that performs a search on one repository and displays results

Architecture of the example:– User interface in AJAX– Java servlets call DFS and format results in JSON for the UI– Remote call to DFS but could also be local calls

Browser:AJAX application

Content Server Full-text Indexer

JBOSS

DFSJava Servlets SOAPHTTP/JSON

Page 9: Implementing DFS Search Services

9© Copyright 2008 EMC Corporation. All rights reserved.

Build and execute query

Example: Execute Query

StructuredQuery q = new StructuredQuery();

q.addRepository("MSSQL60ECI4");

q.setObjectType("dm_document");

ExpressionSet expressionSet = new ExpressionSet();

expressionSet.addExpression(new FullTextExpression(searchQuery));

q.setRootExpressionSet(expressionSet);

QueryExecution queryExec = new QueryExecution(0, 100, 100);

QueryResult queryResult = searchService.execute(q, queryExec, null);

Setup context

RepositoryIdentity identity = new RepositoryIdentity("MSSQL60ECI4", "userdev1", "userdev1", "");

ContextFactory contextFactory = ContextFactory.getInstance();

IServiceContext context = contextFactory.newContext();

context.addIdentity(identity);

ISearchService searchService = ServiceFactory.getInstance().getRemoteService(ISearchService.class,

context, "search", "http://127.0.0.1:8080/services");

Page 10: Implementing DFS Search Services

10© Copyright 2008 EMC Corporation. All rights reserved.

Example: Wrap the Query in a Servlet

Get parameter

public class SearchServlet extends HttpServlet

{

protected void doPost(HttpServletRequest httpServletRequest, HttpServletResponse httpServletResponse) throws ServletException, IOException

{

String searchQuery = httpServletRequest.getParameter("queryTerms");

//…

Page 11: Implementing DFS Search Services

11© Copyright 2008 EMC Corporation. All rights reserved.

Example: Format Response as JSON

JSON: A JavaScript-friendly structure

Easy to represent lists and name/value pairs

Page 12: Implementing DFS Search Services

12© Copyright 2008 EMC Corporation. All rights reserved.

Example: Format Response as JSON

public void writeJSON(PrintWriter writer, QueryResult response) {

writer.append("[");

for (Iterator it = response.getDataObjects().iterator(); it.hasNext();) {

DataObject dataObject = (DataObject) it.next();

writer.append("{");

PropertySet set = dataObject.getProperties();

Iterator<Property> iterator = set.iterator();

while (iterator.hasNext()) {

Property prop = iterator.next();

String strName = prop.getName();

String value = prop.getValueAsString();

writer.append("\"").append(strName).append("\":\"").append(value).append("\"");

if (iterator.hasNext()) writer.append(",");

}

writer.append("}\n");

if (it.hasNext()) writer.append(",");

}

writer.append("]");

}

Page 13: Implementing DFS Search Services

13© Copyright 2008 EMC Corporation. All rights reserved.

Example: HTML Form

function updatepage(str){

var rsp = eval("("+str+")"); // use eval to parse JSON response

var html= "<table>";

for (i = 0 ; i < rsp.length; i++) {

var result = rsp[i];

html += "\n<tr><td>" + result.object_name + "</td></tr>";

}

html += "</table>"

document.getElementById("result").innerHTML = html;

}

<!-- … --!>

<form name="searchForm"

onsubmit='xmlhttpPost("/EMCWorldDemo/search",updatepage, getQueryParams()); return false;'>

<p>query: <input name="queryTerms" type="text">

<input value="Go" type="submit"></p>

<div id="result"></div></td>

</form>

Page 14: Implementing DFS Search Services

14© Copyright 2008 EMC Corporation. All rights reserved.

Federated Search

DFS Search Service supports federated search across multiple Documentum repositories and external repositories

Requires ECI option for external repositories

ECI supports a large catalog of adapters to external sources: – CMS (FileNet, SharePoint, IBMCM…)– Websites (Google, Yahoo …)– Databases– Indexers (Verity, Fast, IndexServer…)– Specialized sources (legal, science, regulation, patents, health…)– EMC products (eRoom, EX, AX…)

Support for authentication using the same service as Docbase repositories

Page 15: Implementing DFS Search Services

15© Copyright 2008 EMC Corporation. All rights reserved.

Federated Search: Configure ECI

To search external repositories:

Install ECIS

Edit dfc.properties in DFS ear:– dfc.search.ecis.enable=true– dfc.search.ecis.host=ecishost

Page 16: Implementing DFS Search Services

16© Copyright 2008 EMC Corporation. All rights reserved.

Querying multiple sources

Example: Querying Several Sources

String[] sources = httpServletRequest.getParameterValues("sources");

ContextFactory contextFactory = ContextFactory.getInstance();

IServiceContext context = contextFactory.newContext();

for (String source: sources) {

RepositoryIdentity identity = new RepositoryIdentity( source, "userdev1", "userdev1", "");

context.addIdentity(identity);

}

StructuredQuery q = new StructuredQuery();

for (String source: sources) q.addRepository(source);

Listing available sources

List<Repository> repositories = searchService.getRepositoryList(null);

for (Repository dataObject: repositories) {

Repository dataObject = it.next();

String sourceName = dataObject.getName();

String userLogin = dataObject.getProperties().getUserLoginCapability();

}

Page 17: Implementing DFS Search Services

17© Copyright 2008 EMC Corporation. All rights reserved.

Demo: Nonblocking Search

Page 18: Implementing DFS Search Services

18© Copyright 2008 EMC Corporation. All rights reserved.

Nonblocking Search

DFS is based on DFC, which supports asynchronous search execution

Allows dynamic display of results

DFS supports it through nonblocking query call:

– Allows multiple successive call to get new results and query status

DFS Client DFS Service

execute(query,0,100)

no results

wait 1 second

execute(query,0,100)

10 results

wait 1 second

execute(query,10,100)

90 results

Page 19: Implementing DFS Search Services

19© Copyright 2008 EMC Corporation. All rights reserved.

Nonblocking Search: Cache

DFS queries are cached

Each query has a definition and a query ID used as key in the cache

Cache policy is size-based and time-based

Each Search Service call contains the initial query (definition) so that the query may be re-executed in case of cache miss.

Configurable in dfs-runtime.properties:– dfs.query_cache_house_keeper.period = 5

Page 20: Implementing DFS Search Services

20© Copyright 2008 EMC Corporation. All rights reserved.

Nonblocking Search: QueryStatus

QueryStatus contains status of the query for each repository

Example: Two sources, one successful, one failed with network error

Page 21: Implementing DFS Search Services

21© Copyright 2008 EMC Corporation. All rights reserved.

Example: Nonblocking Query Execution

Set asynchronous call

QueryExecution queryExec = new QueryExecution(start, len, 350);

queryExec.setQueryId(queryId);

SearchProfile profile = new SearchProfile();

profile.setAsyncCall(true);

OperationOptions options = new OperationOptions();

options.setSearchProfile(profile);

QueryResult queryResult = searchService.execute(q, queryExec, options);

Page 22: Implementing DFS Search Services

24© Copyright 2008 EMC Corporation. All rights reserved.

Advanced Queries

StructuredQuery: an abstract query

Allow to refine the query.

Allow to bind the query to UI controls.

Independent of the Full-text Indexer and Content Server version. Independent on the presence of an Indexer.

Page 23: Implementing DFS Search Services

25© Copyright 2008 EMC Corporation. All rights reserved.

Advanced Queries

FullTextExpression– Supports a Boolean ‘mini-language’: phrase AND, OR, NOT and parentheses– Example: EMC contract AND (“end of life” OR termination) NOT ECIS

ExpressionSet– Boolean expression between FullTextExpression and PropertyExpression

PropertyExpression– Constraints on document attributes– Operators: EQUAL, NOT_EQUAL, GREATER_THAN, LESS_THAN,

GREATER_EQUAL, LESS_EQUAL, BEGINS_WITH, CONTAINS, DOES_NOT_CONTAIN, ENDS_WITH, IN, NOT_IN, BETWEEN, IS_NULL, IS_NOT_NULL,

– Values: SimpleValue, ValueList, ValueRange, RelativeDateValue

Page 24: Implementing DFS Search Services

26© Copyright 2008 EMC Corporation. All rights reserved.

Advanced Queries: Example

Example of structured query:

Object_name contains “test”, modified date in the last month and owner_name is “marc” or “ghislain”

Advanced query example

ExpressionSet expr = new ExpressionSet();

expr.addExpression(new PropertyExpression("object_name", Condition.CONTAINS, "test"));

expr.addExpression(new PropertyExpression("r_modify_date", Condition.GREATER_EQUAL, new RelativeDateValue(-1, TimeUnit.MONTH)));

ExpressionSet orExpr = new ExpressionSet(ExpressionSetOperator.OR);

orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL, "marc"));

orExpr.addExpression(new PropertyExpression("owner_name", Condition.EQUAL, "ghislain"));

expr.addExpression(orExpr);

Page 25: Implementing DFS Search Services

27© Copyright 2008 EMC Corporation. All rights reserved.

Agenda

Search

Clustering

Saved Queries

Classification

Troubleshooting

Page 26: Implementing DFS Search Services

28© Copyright 2008 EMC Corporation. All rights reserved.

DFS 6.5 Search and Classification Services

Clustering

Simple clustering of search results

Multiple facets and strategies

Getting results

Go beyond search

Page 27: Implementing DFS Search Services

29© Copyright 2008 EMC Corporation. All rights reserved.

Clustering

Dynamic grouping of results into ‘clusters’

Based on results properties (not content)

Uses linguistic rules

Option of Search Service

Requires an SBO to be installed– An installer is provided

(Webtop Extended Search)

Supports hierarchical clustering

Page 28: Implementing DFS Search Services

30© Copyright 2008 EMC Corporation. All rights reserved.

Clustering

SearchService:– getClusters

Return the clusters for a query

– getSubClusters Return the clusters for a subset of a query

– getResultsProperties Return the properties for a subset of a query

The services are stateless– Reuse query cached by SearchService.execute. Reexecute it if needed.– All the methods have query and query execution parameter in case of cache miss

Page 29: Implementing DFS Search Services

31© Copyright 2008 EMC Corporation. All rights reserved.

Demo: Enhance the Search Application with Clustering

Page 30: Implementing DFS Search Services

32© Copyright 2008 EMC Corporation. All rights reserved.

Example: Computing Clusters

Get clusters for a query

QueryExecution queryExec = new QueryExecution(0, 100, 350);

queryExec.setQueryId(queryId);

ClusteringProfile profile = new ClusteringProfile();

profile.addClusteringStrategy(new ClusteringStrategy("Topics",

Arrays.asList("object_name", "title", "subject", "summary")));

OperationOptions options = new OperationOptions();

options.setClusteringProfile(profile);

QueryCluster queryClusters = searchService.getClusters(query, queryExec, options);

Page 31: Implementing DFS Search Services

33© Copyright 2008 EMC Corporation. All rights reserved.

Example: Clustering Response Objects

getClusters() responseQueryCluster

ClusterTree

+ isRefreshable: Boolean

Cluster

+ clusterSize: int+ clusterValues: List<String>+ isSubClusterTreeAvailable: Boolean

ClusteringStrategy

+ strategyName: String

ObjectIdentitySet

0..*

0..*

1

0..1

Page 32: Implementing DFS Search Services

38© Copyright 2008 EMC Corporation. All rights reserved.

Multiple Facets and Strategies

Several ways to group results together

Defined by a strategy:

– Topic

– Person names

– Dates

– Document sizes

Page 33: Implementing DFS Search Services

39© Copyright 2008 EMC Corporation. All rights reserved.

Example: Multiple Strategies

INSERT example of strategies call: author & date by quarterSet cluster strategy for ‘Topic’ and ‘Date’

ClusteringProfile profile = new ClusteringProfile();

profile.addClusteringStrategy(new ClusteringStrategy("Topics", Arrays.asList("object_name", "title", "subject", "summary")));

ClusteringStrategy dateClusteringStrategy =

new ClusteringStrategy("Date", Arrays.asList("r_modify_date"));

PropertySet tokenizerPropSet = new PropertySet(new StringProperty("r_modify_date", "quarterdate"));

dateClusteringStrategy.setTokenizers(tokenizerPropSet);

profile.addClusteringStrategy(dateClusteringStrategy);

Page 34: Implementing DFS Search Services

40© Copyright 2008 EMC Corporation. All rights reserved.

Go Beyond Search

Clustering can be used for nonsearch applications

Example: most active subjects in a repository (automatic tag clouds)

Page 35: Implementing DFS Search Services

43© Copyright 2008 EMC Corporation. All rights reserved.

Agenda

Search

Clustering

Saved Queries

Classification

Troubleshooting

Page 36: Implementing DFS Search Services

44© Copyright 2008 EMC Corporation. All rights reserved.

Saved Queries

QueryStoreService:– listSavedQueries– loadSavedQuery– saveQuery

Allow to manipulate dm_smart_list object (exposed in Webtop since 5.3)

Allow to control which results are saved

Page 37: Implementing DFS Search Services

45© Copyright 2008 EMC Corporation. All rights reserved.

Saved Queries

List the saved queries for the current user.

IQueryStoreService service = ServiceFactory.getInstance().getRemoteService(IQueryStoreService.class, context, "core", "http://127.0.0.1:8080/services");

QueryExecution queryExec = new QueryExecution(0, 100, 100);

SavedQueryFilter filter = new SavedQueryFilter(SavedQueryAccessibility.OWNED);

DataPackage queryResult = service.listSavedQueries("MSSQL60ECI4", queryExec, filter, null);

Page 38: Implementing DFS Search Services

46© Copyright 2008 EMC Corporation. All rights reserved.

Saved Queries

Load a saved query

ObjectIdentity queryId = new ObjectIdentity(new ObjectId("0821f7588000132e"), "MSSQL60ECI3");

QueryExecution queryExec = new QueryExecution(0, 100, 100);

SavedQuery queryResult = queryStoreService.loadSavedQuery(queryId, queryExec, null);

SavedQuery

QueryResult

RichQuery

+ displayedAttributes: List<String>+ propertySet: PropertySet

Query

0..1

1 1

Page 39: Implementing DFS Search Services

47© Copyright 2008 EMC Corporation. All rights reserved.

Saved Queries

Save a query

Query query = //…

ObjectIdentity queryId = new ObjectIdentity("MSSQL60ECI3");

DataObject metadata = new DataObject(queryId) ;

metadata.getProperties().set("object_name", "My Saved Query");

RichQuery richQuery = new RichQuery();

richQuery.setQuery(query);

QueryExecution queryExec = new QueryExecution(0, 100, 100);

ObjectIdentity queryResult = queryStoreService.saveQuery(metadata, richQuery, queryExec, null, null);

Page 40: Implementing DFS Search Services

48© Copyright 2008 EMC Corporation. All rights reserved.

Agenda

Search

Clustering

Saved Queries

Classification

Troubleshooting

Page 41: Implementing DFS Search Services

49© Copyright 2008 EMC Corporation. All rights reserved.

Classification

Introduce a service to compute ‘tags’ for documents

Based on CIS classification engine and managed taxonomy

AnalyticsService:– analyze

Takes a list of object IDs and computes the list of categories for each document

Page 42: Implementing DFS Search Services

50© Copyright 2008 EMC Corporation. All rights reserved.

Classification Configuration

Install CIS Server– Installer deploy ear with embedded app server (JBoss)

Install taxonomy– Available Taxonomies

Energy / Energy Industry Energy / Oil Trading General Finance General Knowledge Information Science and Technology Law / Federal Legislation Terms Life Sciences Manufacturing / Chemical Hazards Military / DTIC Science and Engineering …

Page 43: Implementing DFS Search Services

51© Copyright 2008 EMC Corporation. All rights reserved.

Classification: Compute Categories

Analyze an object

ObjectIdentitySet documentsSet = new ObjectIdentitySet(new ObjectIdentity(new ObjectId("0821f7588000132e"), MY_DOCBASE));

OperationOptions operationOptions = new OperationOptions();

PropertyProfile propProfile = new PropertyProfile();

propProfile.setIncludeProperties(Arrays.asList("CATEGORIES"));

operationOptions.setPropertyProfile(propProfile);

IAnalyticsService analyticsService = serviceFactory.getRemoteService(IAnalyticsService.class, context, "analytics", "http://127.0.0.1:7001/services");

List<AnalyticsResult> analyticsResults = analyticsService.analyze(documentsSet, operationOptions);

Page 44: Implementing DFS Search Services

52© Copyright 2008 EMC Corporation. All rights reserved.

Classification: ‘Analyze’ Response

Display the categories for each object

for (AnalyticsResult classResult : analyticsResults)

{

System.out.println("Document ID: " + classResult.getObjIdentity());

List<CategoryAssign> catAssigns = classResult.getCategoryAssignList();

for (CategoryAssign catAssign : catAssigns)

{

System.out.println("\t " + catAssign.getCategory().getName());

}

}

Page 45: Implementing DFS Search Services

53© Copyright 2008 EMC Corporation. All rights reserved.

Agenda

Search

Clustering

Saved Queries

Classification

Troubleshooting

Page 46: Implementing DFS Search Services

54© Copyright 2008 EMC Corporation. All rights reserved.

Troubleshooting

Diagnose query issue: print QueryStatus object:

Diagnose ECIS communication problem: log4j traces

Diagnose query issues after execute

QueryResult queryResult = searchService.execute(query, exec, options);

System.out.println(queryResults.getStatus());

Page 47: Implementing DFS Search Services

55© Copyright 2008 EMC Corporation. All rights reserved.

Troubleshooting

Trace DFS request/response on SUN JVM:

System.setProperty("com.sun.xml.ws.transport.http.client.HttpTransportPipe.dump", "true");

Page 48: Implementing DFS Search Services