Upload
donhan
View
223
Download
2
Embed Size (px)
Citation preview
Sanne GrinoveroJBoss by Red Hat
Hibernate Searchfull-text queries for Hibernate and
Infinispan
Index
•What Hibernate Search can do for your project•How to use it
•with Hibernate ORM and OGM•with Infinispan
•News•Faceting•Spatial•Clustering
•Plans for the future
What you will learn
Sanne Grinovero•Senior Software Engineer at Red Hat•Hibernate team
•lead of Hibernate Search•Hibernate OGM
•Infinispan•Search, Query and Lucene integrations
•Apache Lucene•JGroups
Who I am
•Object Index Mapper•Full-text search engine library based on Lucene•API at the Object level
•Integrates with Hibernate ORM and Infinispan•Used by many other projects•Cluster friendly
Hibernate SearchWhat’s that?
The Search problem
l Who searches, doesn't know what he Searches:l Please, don't ask the user to know the primary keyl Doesn't know the exact content of the document either
Hibernate model example
The naive search engine
The almost naive search engine
Does it work? How about these:
String author = “De Andre, Fabrizio”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );
String author = “De André, Fabrizio”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );
String author = “Fabrizio De André”String title = “Nuvole barocche”List<Product> list = s.createQuery( “ ...? “ );
Evolving the requirementsl Unique search input
l Might contain either/both names of author, product titlel Both entities might be composed of multiple terms
l Relevancel Products matching both fields should be listed on topl Exact word matches should be scored better
✦ So you need approximate word matches?✦ How about typos?
SQL è il martello:
List<Product> list = s.createQuery( “ ...? “ ) .setParameter(“F. de André nuvole barocche”) .list();
l Mixed case, accentsl Relative order of terms, distance of related termsl Abbreviations, typosl Match on multiple fieldsl 18,800 results in 20 milliseconds
What if ..Google returned results in
alphabetical order?would you still use it?
“hibernate search”About 3.580.000 results (0.04 seconds)
We have a Problem
l The database is not a good fitl SQL is not appropriate for the taskl Still we want SQL for many other reasons
l Need to integrate them, keep data integrity and consistency
A key element of the solution:Apache Lucene
l Open source Apache™ top level project,l In the “top 10” of most downloaded and active projects
l Very advanced implementationl Main language is Java, some ports exists in other languagesl Rich open source environment around itl Many products embed it
Examples of Lucene users
Nexus
Similarityl N-Grams (edit distance)l Phonetic (Soundex™)l Any custom...
Cagliari ⁓ càliariCagliari ⁓ cag agl gli lia ari
Cagliari ⁓ CGRI
Lucene: Synonymsl Can be applied at “index time”l at “query time”l Requires a vocabulary
l WordNet
newspaper ⁓ daily ⁓ journal
Journal ??⁓ newspaper
Lucene: Stemming
continuait ⁓ continucontinuation ⁓ continu
continué ⁓ continucontinuelle ~ continuel
Lucene: Stop-wordsl Removes terms which are so frequently used that they are not suited as search keywords – might depend on your domain!
a,able,about,across,after,all,almost,also,am,among,an,and,any,are,as,at,be,because,been,but,by,can,cannot,could,dear,did,do,does,either,else,ever,every,for,from,get,got,had,has,have,he,her,hers,him,his,how,however,i,if,in,into,is,it,its,just,least,let,like,likely,may,me,might,most,must,my,neither,no,nor,not,of,off,often,on,only,or,other,our,own,rather,said,say,says,she,should,since,so,some,than,that,the,their,them,then,there,these,they,this,tis,to,too,twas,us,wants,was,we,were,what,when,where,which,while,who,whom,why,will,
with,would,yet,you,your
Apache Lucene: Index
l It requires an Indexl On filesysteml In memoryl ...
l Made of immutable segmentsl Optimized for search speed, not for updates
l A world of strings and frequencies
Integrating with a database
l The index structure is deeply different than a relational database – not all is possible
l You need to keep the data in syncl In case they are not, which one should be trusted more?
l How do queries look like?l What do queries return?
Different worlds
l A Lucene Document is unstructured (schemaless), something similar to Map<String,String>
l An Hibernate model is structured to be functional as representation of your business model
The data mismatch
The architectural mismatch
Quickstart Hibernate Search
l Add the hibernate-search dependency to an existing Hibernate project:<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-search-orm</artifactId>
<version>4.2.0.Final</version>
</dependency>
Quickstart Hibernate Search
l Any other configuration is optional:l Where we store indexesl Extension modules, custom analyzersl Performance tuningl Advanced mappingl Clustering
Quickstart Hibernate Search@Entitypublic class Essay { @Id public Long getId() { return id; }
public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; }
public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne public Author getAuthor() { return author; }...
Quickstart Hibernate Search@Entity @Indexedpublic class Essay { @Id public Long getId() { return id; } @Field public String getSummary() { return summary; } @Lob @Field @Boost(0.8) public String getText() { return text; } @ManyToOne @IndexedEmbedded public Author getAuthor() { return author; }...
@Entitypublic class Author { @Id @GeneratedValue private Integer id; private String name; @OneToMany private Set<Book> books;}
@Entitypublic class Book { private Integer id; private String title;}
@Entitypublic class Book { private Integer id; private String title;}
Another example
@Entity @Indexedpublic class Author { @Id @GeneratedValue private Integer id; @Field(store=Store.YES) private String name; @OneToMany @IndexedEmbedded private Set<Book> books;}
@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}
@Entitypublic class Book { private Integer id; @Field(store=Store.YES) private String title;}
Index structure
String[] productFields = {"summary", "author.name"};
Query luceneQuery = // query builder or any Lucene Query
FullTextEntityManager ftEm = Search.getFullTextEntityManager(entityManager);
FullTextQuery query = ftEm.createFullTextQuery( luceneQuery, Product.class );
List<Product> items = query.setMaxResults(100).getResultList();
int totalNbrOfResults = query.getResultSize();
Query
Creating a Query with the DSL
Results
l Managed POJO: updates are applied to both Lucene and database
l JPA pagination, known APIs:l .setMaxResults( 20 ).setFirstResult( 100 );
l Type restrictions, polymorphic full-text queries:l .createQuery( luceneQuery, A.class, B.class, ..);
l Projectionl Result mapping
Filters
Filters
FullTextQuery ftQuery = s // s is a FullTextSession .createFullTextQuery( query, Product.class )
.enableFullTextFilter( "minorsFilter" ) .enableFullTextFilter( "specialDayOffers" )
.setParameter( "day", “20130117” ) .enableFullTextFilter( "inStockAt" )
.setParameter( "location", "Bangalore" );List<Product> results = ftQuery.list();
Advanced text analysis@Entity @Indexed
@AnalyzerDef(name = "frenchAnalyzer", tokenizer =
@TokenizerDef(factory=StandardTokenizerFactory.class),filters = {
@TokenFilterDef(factory = LowerCaseFilterFactory.class),
@TokenFilterDef(factory = SnowballPorterFilterFactory.class,
params = {@Parameter(name = "language", value = "French")})
})
public class Book {
@Field(index=Index.TOKENIZED, store=Store.NO) @Analyzer(definition = "frenchAnalyzer")
private String title;
...
More...l @Boost e @DynamicBoostl @AnalyzerDiscriminatorl @DateBridge(resolution=Resolution.MINUTE)l @ClassBridge e @FieldBridgel @Similarityl Automatic Index optimizationl Sharding, sharding filtersl Facetingl @Spatial
@Spatial queries
•Where can I find events on Hibernate, Errai and Infinispan in a 5 KM radius from Bangalore?
•And cofee in 100m please?
•Boolean query on longitude and latitude ranges•Good for small corpus (<100k documents)•Smaller index
•Quad Tree•Efficient for a larger data corpus•Works better with heterogenous distribution
Indexing Coordinates
1.1 1.2 2
3 4
1.41.3.1.1 1.3.1.2
@Entity@Indexedpublic class POI {
@Spatial(spatialMode = SpatialMode.GRID) @Embedded public Coordinates getLocation() { return new Coordinates() { @Override public double getLatitude() { return latitude; }
@Override public double getLongitude() { return longitude; } }; } ...
Query query = builder .spatial() .onCoordinates( "location" ) .within( 51, Unit.KM ) .ofCoordinates( coordinates ) .createQuery();
List results = fullTextSession .createFullTextQuery( query, POI.class ) .list();
Creating a Spatial Query
Infinispan Query
Infinispan
• An advanced clusterable cache• A very fast, transactional scalable datagrid• A “NoSQL”, a key-value store
–How do you query a key value store?
SELECT * FROM GRID
To Query a “Grid”
• What's in C7 ?Object =
cache.get(“c7”);
If you don't know the key, no way to find the value
Infinispan Query quickstart• Enable it in configuration• Have infinispan-query.jar in your classpath• Annotate your POJO values to specify what to index
<dependency> <groupId>org.infinispan</groupId> <artifactId>infinispan-query</artifactId> <version>5.2.0.CR1</version></dependency>
Enable Infinispan Query, programmatically
Configuration c = new Configuration() .fluent() .indexing() .addProperty( "hibernate.search.option", "valueForOption" ) .build();CacheManager manager = new DefaultCacheManager( c );
Enable Queryin Infinispan XML
<?xml version="1.0" encoding="UTF-8"?><infinispan xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="urn:infinispan:config:5.2 http://www.infinispan.org/schemas/infinispan-config-5.2.xsd" xmlns="urn:infinispan:config:5.2"><default> <indexing enabled="true"> <properties> <property name="hibernate.search.option" value="value" /> </properties> </indexing></default>
Annotate your model
@Indexedpublic class Book implements Serializable {
@Field String title; @Field String author; @Field String editor;
public Book(String title, String author, String editor) { this.title = title; this.author = author; this.editor = editor; }
}
Run a Query
SearchManager qf = Search.getSearchManager(cache); Query query = qf.buildQueryBuilderForClass(Book.class) .get() .phrase() .onField("title") .sentence("in action") .createQuery(); List<Object> list = qf.getQuery(query).list();
Clustering
•Requires an Index•on a filesystem•in memory•in Infinispan
Drawbacks of any Lucene based solution
Scalability issues
• Global writer locks• NFS based index sharing very tricky
Clustering using a queue
Index stored in Infinispan
Single node performance idea
multi-node setup
Support?•Volunteers based by community (in order of preference):•on the Hibernate forums•on the Infinispan forums•stackoverflow
•Professionally supported products:•Web Framework Kit 2.1
What’s next• Ease configuration aspects for clustering• Support for Lucene 4.x• Supporting out-of-VM Lucene servers• Parallel searching on an Infinispan cluster• Cloud-tm platform: self-tuning ergonomics• Your involvement!
Questions?http://search.hibernate.org
Hibernate Search in Action [Manning]
http://lucene.apache.orgLucene In Action (2°ed) [Manning]
http://www.infinispan.org
http://in.relation.to
http://forum.hibernate.org
@SanneGrinovero@Hibernate@Infinispan