Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Sanne GrinoveroJBoss by Red Hat

Hibernate Searchfull-text queries for Hibernate and

Infinispan

Index

•What Hibernate Search can do for your project•How to use it

•with Hibernate ORM and OGM•with Infinispan

•News•Faceting•Spatial•Clustering

•Plans for the future

What you will learn

Sanne Grinovero•Senior Software Engineer at Red Hat•Hibernate team

•lead of Hibernate Search•Hibernate OGM

•Infinispan•Search, Query and Lucene integrations

•Apache Lucene•JGroups

Who I am

•Object Index Mapper•Full-text search engine library based on Lucene•API at the Object level

•Integrates with Hibernate ORM and Infinispan•Used by many other projects•Cluster friendly

Hibernate SearchWhat’s that?

The Search problem

l Who searches, doesn't know what he Searches:l Please, don't ask the user to know the primary keyl Doesn't know the exact content of the document either

Hibernate model example

The naive search engine

The almost naive search engine

Does it work? How about these:

String author = “De Andre, Fabrizio”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );

String author = “De André, Fabrizio”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );

String author = “Fabrizio De André”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );

Evolving the requirementsl Unique search input

l Might contain either/both names of author, product titlel Both entities might be composed of multiple terms

l Relevancel Products matching both fields should be listed on topl Exact word matches should be scored better

✦ So you need approximate word matches?✦ How about typos?

SQL è il martello:

List<Product> list = s.createQuery( “ ...? “ ) .setParameter(“F. de André nuvole barocche”) .list();

l Mixed case, accentsl Relative order of terms, distance of related termsl Abbreviations, typosl Match on multiple fieldsl 18,800 results in 20 milliseconds

What if ..Google returned results in

alphabetical order?would you still use it?

“hibernate search”About 3.580.000 results (0.04 seconds)

We have a Problem

l The database is not a good fitl SQL is not appropriate for the taskl Still we want SQL for many other reasons

l Need to integrate them, keep data integrity and consistency

A key element of the solution:Apache Lucene

l Open source Apache™ top level project,l In the “top 10” of most downloaded and active projects

l Very advanced implementationl Main language is Java, some ports exists in other languagesl Rich open source environment around itl Many products embed it

Examples of Lucene users

Nexus

Similarityl N-Grams (edit distance)l Phonetic (Soundex™)l Any custom...

Cagliari ⁓ càliariCagliari ⁓ cag agl gli lia ari

Cagliari ⁓ CGRI

Lucene: Synonymsl Can be applied at “index time”l at “query time”l Requires a vocabulary

l WordNet

newspaper ⁓ daily ⁓ journal

Journal ??⁓ newspaper

Lucene: Stemming

continuait ⁓ continucontinuation ⁓ continu

continué ⁓ continucontinuelle ~ continuel

Lucene: Stop-wordsl Removes terms which are so frequently used that they are not suited as search keywords – might depend on your domain!

a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,

with,would,yet,you,your

Apache Lucene: Index

l It requires an Indexl On filesysteml In memoryl ...

l Made of immutable segmentsl Optimized for search speed, not for updates

l A world of strings and frequencies

Integrating with a database

l The index structure is deeply different than a relational database – not all is possible

l You need to keep the data in syncl In case they are not, which one should be trusted more?

l How do queries look like?l What do queries return?

Different worlds

l A Lucene Document is unstructured (schemaless), something similar to Map<String,String>

l An Hibernate model is structured to be functional as representation of your business model

The data mismatch

The architectural mismatch

Quickstart Hibernate Search

l Add the hibernate-search dependency to an existing Hibernate project:<dependency>

<groupId>org.hibernate</groupId>

<artifactId>hibernate-search-orm</artifactId>

<version>4.2.0.Final</version>

</dependency>

Quickstart Hibernate Search

l Any other configuration is optional:l Where we store indexesl Extension modules, custom analyzersl Performance tuningl Advanced mappingl Clustering

Quickstart Hibernate Search@Entitypublic class Essay { @Id public Long getId() { return id; }

public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; }

public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; }...

@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}

@Entitypublic class Book { private Integer id; private String title;}

@Entitypublic class Book { private Integer id; private String title;}

Another example

@Entity @Indexedpublic class Author { @Id @GeneratedValue private Integer id; @Field(store=Store.YES) private String name; @OneToMany @IndexedEmbedded private Set<Book> books;}

@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}

@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}

Index structure

String[] productFields = {"summary", "author.name"};

Query luceneQuery = // query builder or any Lucene Query

FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);

FullTextQuery query = ftEm.createFullTextQuery( luceneQuery, Product.class );

List<Product> items = query.setMaxResults(100).getResultList();

int totalNbrOfResults = query.getResultSize();

Query

Creating a Query with the DSL

Results

l Managed POJO: updates are applied to both Lucene and database

l JPA pagination, known APIs:l .setMaxResults( 20 ).setFirstResult( 100 );

l Type restrictions, polymorphic full-text queries:l .createQuery( luceneQuery, A.class, B.class, ..);

l Projectionl Result mapping

Filters

Filters

FullTextQuery ftQuery = s // s is a FullTextSession .createFullTextQuery( query, Product.class )

.enableFullTextFilter( "minorsFilter" ) .enableFullTextFilter( "specialDayOffers" )

.setParameter( "day", “20130117” ) .enableFullTextFilter( "inStockAt" )

.setParameter( "location", "Bangalore" );List<Product> results = ftQuery.list();

Advanced text analysis@Entity @Indexed

@AnalyzerDef(name = "frenchAnalyzer", tokenizer =

@TokenizerDef(factory=StandardTokenizerFactory.class),filters = {

@TokenFilterDef(factory = LowerCaseFilterFactory.class),

@TokenFilterDef(factory = SnowballPorterFilterFactory.class,

params = {@Parameter(name = "language", value = "French")})

})

public class Book {

@Field(index=Index.TOKENIZED, store=Store.NO) @Analyzer(definition = "frenchAnalyzer")

private String title;

...

More...l @Boost e @DynamicBoostl @AnalyzerDiscriminatorl @DateBridge(resolution=Resolution.MINUTE)l @ClassBridge e @FieldBridgel @Similarityl Automatic Index optimizationl Sharding, sharding filtersl Facetingl @Spatial

@Spatial queries

•Where can I find events on Hibernate, Errai and Infinispan in a 5 KM radius from Bangalore?

•And cofee in 100m please?

•Boolean query on longitude and latitude ranges•Good for small corpus (<100k documents)•Smaller index

•Quad Tree•Efficient for a larger data corpus•Works better with heterogenous distribution

Indexing Coordinates

1.1 1.2 2

3 4

1.41.3.1.1 1.3.1.2

@Entity@Indexedpublic class POI {

@Spatial(spatialMode = SpatialMode.GRID) @Embedded public Coordinates getLocation() { return new Coordinates() { @Override public double getLatitude() { return latitude; }

@Override public double getLongitude() { return longitude; } }; } ...

Query query = builder .spatial() .onCoordinates( "location" ) .within( 51, Unit.KM ) .ofCoordinates( coordinates ) .createQuery();

List results = fullTextSession .createFullTextQuery( query, POI.class ) .list();

Creating a Spatial Query

Infinispan Query

Infinispan

• An advanced clusterable cache• A very fast, transactional scalable datagrid• A “NoSQL”, a key-value store

–How do you query a key value store?

SELECT * FROM GRID

To Query a “Grid”

• What's in C7 ?Object =

cache.get(“c7”);

If you don't know the key, no way to find the value

Infinispan Query quickstart• Enable it in configuration• Have infinispan-query.jar in your classpath• Annotate your POJO values to specify what to index

<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.2.0.CR1</version></dependency>

Enable Infinispan Query, programmatically

Configuration c = new Configuration() .fluent() .indexing() .addProperty( "hibernate.search.option", "valueForOption" ) .build();CacheManager manager = new DefaultCacheManager( c );

Enable Queryin Infinispan XML

<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd" xmlns="urn:infinispan:config:5.2"><default> <indexing enabled="true"> <properties> <property name="hibernate.search.option" value="value" /> </properties> </indexing></default>

Annotate your model

@Indexedpublic class Book implements Serializable {

@Field String title; @Field String author; @Field String editor;

public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }

}

Run a Query

SearchManager qf = Search.getSearchManager(cache); Query query = qf.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = qf.getQuery(query).list();

Clustering

•Requires an Index•on a filesystem•in memory•in Infinispan

Drawbacks of any Lucene based solution

Scalability issues

• Global writer locks• NFS based index sharing very tricky

Clustering using a queue

Index stored in Infinispan

Single node performance idea

multi-node setup

Support?•Volunteers based by community (in order of preference):•on the Hibernate forums•on the Infinispan forums•stackoverflow

•Professionally supported products:•Web Framework Kit 2.1

What’s next• Ease configuration aspects for clustering• Support for Lucene 4.x• Supporting out-of-VM Lucene servers• Parallel searching on an Infinispan cluster• Cloud-tm platform: self-tuning ergonomics• Your involvement!

Questions?http://search.hibernate.org

Hibernate Search in Action [Manning]

http://lucene.apache.orgLucene In Action (2°ed) [Manning]

http://www.infinispan.org

http://in.relation.to

http://forum.hibernate.org

@SanneGrinovero@Hibernate@Infinispan

http://search.hibernate.org

http://search.hibernate.org

http://lucene.apache.org

http://lucene.apache.org

http://forum.hibernate.org/

http://forum.hibernate.org/

Documents

Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr