69

Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

  • Upload
    donhan

  • View
    223

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr
Page 2: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Sanne GrinoveroJBoss by Red Hat

Hibernate Searchfull-text queries for Hibernate and

Infinispan

Page 3: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Index

Page 4: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

•What Hibernate Search can do for your project•How to use it

•with Hibernate ORM and OGM•with Infinispan

•News•Faceting•Spatial•Clustering

•Plans for the future

What you will learn

Page 5: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Sanne Grinovero•Senior Software Engineer at Red Hat•Hibernate team

•lead of Hibernate Search•Hibernate OGM

•Infinispan•Search, Query and Lucene integrations

•Apache Lucene•JGroups

Who I am

Page 6: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

•Object Index Mapper•Full-text search engine library based on Lucene•API at the Object level

•Integrates with Hibernate ORM and Infinispan•Used by many other projects•Cluster friendly

Hibernate SearchWhat’s that?

Page 7: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

The Search problem

l Who searches, doesn't know what he Searches:l Please, don't ask the user to know the primary keyl Doesn't know the exact content of the document either

Page 8: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Hibernate model example

Page 9: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

The naive search engine

Page 10: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

The almost naive search engine

Page 11: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Does it work? How about these:

String author = “De Andre, Fabrizio”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );

String author = “De André, Fabrizio”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );

String author = “Fabrizio De André”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );

Page 12: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Evolving the requirementsl Unique search input

l Might contain either/both names of author, product titlel Both entities might be composed of multiple terms

l Relevancel Products matching both fields should be listed on topl Exact word matches should be scored better

✦ So you need approximate word matches?✦ How about typos?

Page 13: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

SQL è il martello:

List<Product> list = s.createQuery( “ ...? “ ) .setParameter(“F. de André nuvole barocche”) .list();

l Mixed case, accentsl Relative order of terms, distance of related termsl Abbreviations, typosl Match on multiple fieldsl 18,800 results in 20 milliseconds

Page 14: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

What if ..Google returned results in

alphabetical order?would you still use it?

“hibernate search”About 3.580.000 results (0.04 seconds)

Page 15: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

We have a Problem

l The database is not a good fitl SQL is not appropriate for the taskl Still we want SQL for many other reasons

l Need to integrate them, keep data integrity and consistency

Page 16: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

A key element of the solution:Apache Lucene

l Open source Apache™ top level project,l In the “top 10” of most downloaded and active projects

l Very advanced implementationl Main language is Java, some ports exists in other languagesl Rich open source environment around itl Many products embed it

Page 17: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Examples of Lucene users

Nexus

Page 18: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Similarityl N-Grams (edit distance)l Phonetic (Soundex™)l Any custom...

Cagliari ⁓ càliariCagliari ⁓ cag agl gli lia ari

Cagliari ⁓ CGRI

Page 19: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Lucene: Synonymsl Can be applied at “index time”l at “query time”l Requires a vocabulary

l WordNet

newspaper ⁓ daily ⁓ journal

Journal ??⁓ newspaper

Page 20: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Lucene: Stemming

continuait ⁓ continucontinuation ⁓ continu

continué ⁓ continucontinuelle ~ continuel

Page 21: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Lucene: Stop-wordsl Removes terms which are so frequently used that they are not suited as search keywords – might depend on your domain!

a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,

with,would,yet,you,your

Page 22: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Apache Lucene: Index

l It requires an Indexl On filesysteml In memoryl ...

l Made of immutable segmentsl Optimized for search speed, not for updates

l A world of strings and frequencies

Page 23: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Integrating with a database

l The index structure is deeply different than a relational database – not all is possible

l You need to keep the data in syncl In case they are not, which one should be trusted more?

l How do queries look like?l What do queries return?

Page 24: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Different worlds

l A Lucene Document is unstructured (schemaless), something similar to Map<String,String>

l An Hibernate model is structured to be functional as representation of your business model

Page 25: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

The data mismatch

Page 26: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

The architectural mismatch

Page 27: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr
Page 28: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search

l Add the hibernate-search dependency to an existing Hibernate project:<dependency>

<groupId>org.hibernate</groupId>

<artifactId>hibernate-search-orm</artifactId>

<version>4.2.0.Final</version>

</dependency>

Page 29: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search

l Any other configuration is optional:l Where we store indexesl Extension modules, custom analyzersl Performance tuningl Advanced mappingl Clustering

Page 30: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search@Entitypublic class Essay { @Id public Long getId() { return id; }

public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Page 31: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; }

public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Page 32: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Page 33: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...

Page 34: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; }...

Page 35: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}

@Entitypublic class Book { private Integer id; private String title;}

@Entitypublic class Book { private Integer id; private String title;}

Another example

Page 36: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

@Entity @Indexedpublic class Author { @Id @GeneratedValue private Integer id; @Field(store=Store.YES) private String name; @OneToMany @IndexedEmbedded private Set<Book> books;}

@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}

@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}

Index structure

Page 37: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

String[] productFields = {"summary", "author.name"};

Query luceneQuery = // query builder or any Lucene Query

FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);

FullTextQuery query = ftEm.createFullTextQuery( luceneQuery, Product.class );

List<Product> items = query.setMaxResults(100).getResultList();

int totalNbrOfResults = query.getResultSize();

Query

Page 38: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Creating a Query with the DSL

Page 39: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Results

l Managed POJO: updates are applied to both Lucene and database

l JPA pagination, known APIs:l .setMaxResults( 20 ).setFirstResult( 100 );

l Type restrictions, polymorphic full-text queries:l .createQuery( luceneQuery, A.class, B.class, ..);

l Projectionl Result mapping

Page 40: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Filters

Page 41: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Filters

FullTextQuery ftQuery = s // s is a FullTextSession .createFullTextQuery( query, Product.class )

.enableFullTextFilter( "minorsFilter" ) .enableFullTextFilter( "specialDayOffers" )

.setParameter( "day", “20130117” ) .enableFullTextFilter( "inStockAt" )

.setParameter( "location", "Bangalore" );List<Product> results = ftQuery.list();

Page 42: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Advanced text analysis@Entity @Indexed

@AnalyzerDef(name = "frenchAnalyzer", tokenizer =

@TokenizerDef(factory=StandardTokenizerFactory.class),filters = {

@TokenFilterDef(factory = LowerCaseFilterFactory.class),

@TokenFilterDef(factory = SnowballPorterFilterFactory.class,

params = {@Parameter(name = "language", value = "French")})

})

public class Book {

@Field(index=Index.TOKENIZED, store=Store.NO) @Analyzer(definition = "frenchAnalyzer")

private String title;

...

Page 43: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

More...l @Boost e @DynamicBoostl @AnalyzerDiscriminatorl @DateBridge(resolution=Resolution.MINUTE)l @ClassBridge e @FieldBridgel @Similarityl Automatic Index optimizationl Sharding, sharding filtersl Facetingl @Spatial

Page 44: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr
Page 45: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

@Spatial queries

•Where can I find events on Hibernate, Errai and Infinispan in a 5 KM radius from Bangalore?

•And cofee in 100m please?

Page 46: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

•Boolean query on longitude and latitude ranges•Good for small corpus (<100k documents)•Smaller index

•Quad Tree•Efficient for a larger data corpus•Works better with heterogenous distribution

Indexing Coordinates

Page 47: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr
Page 48: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

1.1 1.2 2

3 4

1.41.3.1.1 1.3.1.2

Page 49: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

@Entity@Indexedpublic class POI {

@Spatial(spatialMode = SpatialMode.GRID) @Embedded public Coordinates getLocation() { return new Coordinates() { @Override public double getLatitude() { return latitude; }

@Override public double getLongitude() { return longitude; } }; } ...

Page 50: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Query query = builder .spatial() .onCoordinates( "location" ) .within( 51, Unit.KM ) .ofCoordinates( coordinates ) .createQuery();

List results = fullTextSession .createFullTextQuery( query, POI.class ) .list();

Creating a Spatial Query

Page 51: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Infinispan Query

Page 52: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Infinispan

• An advanced clusterable cache• A very fast, transactional scalable datagrid• A “NoSQL”, a key-value store

–How do you query a key value store?

SELECT * FROM GRID

Page 53: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

To Query a “Grid”

• What's in C7 ?Object =

cache.get(“c7”);

Page 54: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

If you don't know the key, no way to find the value

Page 55: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Infinispan Query quickstart• Enable it in configuration• Have infinispan-query.jar in your classpath• Annotate your POJO values to specify what to index

<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.2.0.CR1</version></dependency>

Page 56: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Enable Infinispan Query, programmatically

Configuration c = new Configuration() .fluent() .indexing() .addProperty( "hibernate.search.option", "valueForOption" ) .build();CacheManager manager = new DefaultCacheManager( c );

Page 57: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Enable Queryin Infinispan XML

<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd" xmlns="urn:infinispan:config:5.2"><default> <indexing enabled="true"> <properties> <property name="hibernate.search.option" value="value" /> </properties> </indexing></default>

Page 58: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Annotate your model

@Indexedpublic class Book implements Serializable {

@Field String title; @Field String author; @Field String editor;

public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }

}

Page 59: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Run a Query

SearchManager qf = Search.getSearchManager(cache); Query query = qf.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = qf.getQuery(query).list();

Page 60: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Clustering

Page 61: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

•Requires an Index•on a filesystem•in memory•in Infinispan

Drawbacks of any Lucene based solution

Page 62: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Scalability issues

• Global writer locks• NFS based index sharing very tricky

Page 63: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Clustering using a queue

Page 64: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Index stored in Infinispan

Page 65: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Single node performance idea

Page 66: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

multi-node setup

Page 67: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Support?•Volunteers based by community (in order of preference):•on the Hibernate forums•on the Infinispan forums•stackoverflow

•Professionally supported products:•Web Framework Kit 2.1

Page 68: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

What’s next• Ease configuration aspects for clustering• Support for Lucene 4.x• Supporting out-of-VM Lucene servers• Parallel searching on an Infinispan cluster• Cloud-tm platform: self-tuning ergonomics• Your involvement!

Page 69: Sanne Grinovero - JBoss Grinovero JBoss by Red Hat ... String author = “De Andre, Fabrizio ... String author = “Fabrizio De Andr

Questions?http://search.hibernate.org

Hibernate Search in Action [Manning]

http://lucene.apache.orgLucene In Action (2°ed) [Manning]

http://www.infinispan.org

http://in.relation.to

http://forum.hibernate.org

@SanneGrinovero@Hibernate@Infinispan