21
(incubating) OQL and Indexing

OQL querying and indexes with Apache Geode (incubating)

Embed Size (px)

Citation preview

Page 1: OQL querying and indexes with Apache Geode (incubating)

(incubating)

OQL and Indexing

Page 2: OQL querying and indexes with Apache Geode (incubating)

OQLIt is a SQL-like language with extended functionality for querying complex objects, object attributes and methods.

Only a subset of the OQL features are supported.

Advantages of OQL:

● You can query on any arbitrary object● You can navigate object collections● You can invoke methods and access the behavior of objects● You are not required to declare types. Since you do not need type definitions, you can work across multiple

languages● You are not constrained by a schema

Page 3: OQL querying and indexes with Apache Geode (incubating)

Commonly used KeywordsSELECT * or field projectionFROM “select * from /users”WHERE “select * from /users where id = 0”AND “select * from /users where id > 0 and age > 21”OR “select * from /users where id != 0 or age < 21”AS “select * from /users as u where u.id <> 0” , “select * from /users u where u.id > 0”COUNT “select count(*) from /users”DISTINCT “select distinct(*) from /users”, “select distinct(name) from /usersIN “select * from /users u where u.id in set (0, 1, 2)”,

“select * from /users u where u.id in (select id from /employees e)”LIMIT “select * from /users u limit 5”LIKE “select * from /users u where u.name like ‘%a%’”NOT “select * from /users u where u.name NOT (id = 2)”ORDER BY “select * from /users u where u.name = ‘Joe’ order by u.id”TO_DATE (parsed using SimpleDateFormat) to_date('05/09/10', 'yy/dd/yy') to_date('050910', 'yyddMM')

That’s not all! More keywords and information can be found in the Geode Documentation

Page 4: OQL querying and indexes with Apache Geode (incubating)

Geode Specific KeywordsIS_DEFINED

● Query function. Returns TRUE if the expression does not evaluate to UNDEFINED.

IS_UNDEFINED

● Query function. Returns TRUE if the expression evaluates to UNDEFINED. In most queries, undefined values are

not included in the query results. The IS_UNDEFINED function allows undefined values to be included, so you can

identify element with undefined values.

Page 5: OQL querying and indexes with Apache Geode (incubating)

Geode Specific Keywords Continued<trace> “<trace> select * from /users u where u.id = 0”

Example log output:No Indexes used:

● [info 2015/05/26 10:25:35.102 PDT Server <main> tid=0x1] Query Executed in 9.619656 ms; rowCount = 99; indexesUsed(0) "select * from /users u where id > 0 and status='active'"

One index used:● [info 2015/05/26 10:25:35.317 PDT Server <main> tid=0x1] Query Executed in 1.5342 ms; rowCount =

199; indexesUsed(1):sampleIndex-1(Results: 199) "select count * from /users u where u.id > 0"When more than one index is used:

● [info 2015/05/26 10:25:35.673 PDT Serve <main> tid=0x1] Query Executed in 2.43847 ms; rowCount = 199; indexesUsed(2):sampleIndex-2(Results: 100),sampleIndex-1(Results: 199) "select * from /users u where u.id > 0 OR u.status='active'"

System.setProperty("gemfire.Query.VERBOSE","true");

<hint ‘indexName’> or <hint ‘indexName1’, ‘indexName2’> Example:“<hint ‘nameIndex’>select * from /users u where u.name = ‘Joe’ and u.age > 10”

Page 6: OQL querying and indexes with Apache Geode (incubating)

Query Bind ParametersWhat

Similar to a SQL prepared statementParameters start with a ‘$’ and a number starting from 1

Examples:String queryString = “SELECT DISTINCT * FROM /exampleRegion p WHERE p.status = $1 and p.symbol = $2”;...Object[] params = {“sold”, “abc”}SelectResults results = (SelectResults)query.execute(params);

Possible ExceptionsQueryParameterCountInvalidExceptionTypeMismatchException

Bind region as a parameter● Binding region parameter requires actual region object and not the string name

“SELECT DISTINCT * FROM $1 p WHERE p.status = $2”

Page 7: OQL querying and indexes with Apache Geode (incubating)

Field visibility and Method Invocation

The query engine tries to evaluate the value using the public field value, if public field is not found makes a get call using field name (having its first character uppercase).

Examples:

SELECT DISTINCT * FROM /users u where u.firstName = 'Joe'

SELECT DISTINCT * FROM /users u where u.getFirstName() = 'Joe'

SELECT DISTINCT * FROM /users u where u.combineFullName() = ‘Joe’s Full Name’

Page 8: OQL querying and indexes with Apache Geode (incubating)

Type conversionsThe Geode query engine will implicitly do the following conversionsBinary Numeric PromotionThe query processor performs binary numeric promotion on the operands of the following operators:

● Operators <, <=, >, and >=, = and <>1. If either operand is of type double, the other is converted to double2. If either operand is of type float, the other is converted to float3. If either operand is of type long, the other is converted to long4. Both operands are converted to type int char

Temporal Type Conversionjava.util.Date , java.sql.Date , java.sql.Time , and java.sql.Timestamp are treated as nanosecond comparisons

Enum Conversion are not done implicitly, a toString() call is needed

Query Evaluation of Float.NaN and Double.NaNFloat.NaN and Double.NaN are not evaluated as primitives; instead, they are compared in the same manner used as the JDK methods Float.compareTo and Double.compareTo

Page 9: OQL querying and indexes with Apache Geode (incubating)

Query a Partitioned RegionOperations summary:

1.) “Coordinating” node calculates where all data resides2.) Creates and executes tasks to query data on remote nodes

a.) Each node will execute the query, using any indexes the node currently has3.) Executes query on local node4.) On failure, will recalculate where failed data now resides5.) Executes tasks to query data on remote nodes that failed/where data now resides6.) Combines data and returns

Page 10: OQL querying and indexes with Apache Geode (incubating)

Query MonitorQuery Timeout -Set the system property - gemfire.Cache.MAX_QUERY_EXECUTION_TIME (default is disabled and set to -1)

ResourceManager - Monitoring Queries for Low MemoryHelps prevent out of memory exceptions when querying or creating indexes.This feature is automatically enabled when you set a critical-heap-percentage attribute for the resource-manager element in cache.xml or by using cache.getResourceManager().setCriticalHeapPercentage(float heapPercentage) API. If set, timeout is now set to 5 hours if one has not been set.Queries will be cancelled with QueryExecutionLowMemoryExcepton and InvalidIndexException

Set the system property - gemfire.cache.DISABLE_QUERY_MONITOR_FOR_LOW_MEMORY to true to disable.

Partitioned Region Queries and Low Memory

Partitioned region queries are likely causes for out-of-memory exceptions. If query monitoring is enabled, partitioned region queries drop or ignore results that are being gathered by other servers if the executing server is low in memory.

Query-monitoring does not address a scenario in which a low-level collection is expanded while the partitioned region query is gathering results. For example, if a row is added and then causes a Java level collection or array to expand, it is possible to then encounter an out-of-memory exception. This scenario is rare and is only possible if the collection size itself expands before a low memory condition is met and then expands beyond the remaining available memory. As a workaround, in the event that you encounter this situation, you may be able to tune the system by additionally lowering the critical-heap-percentage.

Page 11: OQL querying and indexes with Apache Geode (incubating)

IndexingWhy use an index?● Significantly improve querying speeds.● No longer iterate through the entire region when a matching index can be used

Additional Info:● Indexed fields must implement Comparable● Provide simple way to index on fields, nested object fields, nested collection of objects/fields and nested maps

Types:● Functional Index● Functional (Compact) Index● Map index● Hash Index● Primary Key Index

Page 12: OQL querying and indexes with Apache Geode (incubating)

Functional IndexA sorted index, internally represented as a tuple and copy of the value

How to createqs.createIndex(“indexName”, “d.name”, “/users u, u.dependents d”); //(List or Set)qs.createIndex(“indexName”, “d.name”, “/users u, u.dependents.values d”); //(Map)

RepresentationKey ValuesSonny | Collection: [(User:Joe, Sonny)]Cheryl | Collection: [(User:Joe, Cheryl), (User:John, Cheryl)]

Example query“select * from /users u, u.dependents d where d.name = ‘Sonny’”

Restrictions:Cannot be created on overflow regions

Page 13: OQL querying and indexes with Apache Geode (incubating)

Functional Index (Compact)Memory savings over the non compact index at the expense of doing extra work during index maintenance.

How to createqs.createIndex(“user names”, “u.name”, “/users u”);qs.createIndex(“user names”, “u.nestedObject.fieldName”, “/users u”);

RepresentationKey ValuesJoe | Region EntryJohn | [Region Entry, Region Entry]Jerry | Collection(Region Entry, Region Entry)

Restrictions:Index maintenance is synchronousOnly when there is one iterator in the from clause (example: /users u)

Additional Info:What about updates in progress?What about “in place modification”

Page 14: OQL querying and indexes with Apache Geode (incubating)

Key IndexCreating a key index makes the query service aware of the relationship between the values in the region and the keys in the region.

This allows the query service to translate a query using a key into a get.

How to create:qs.createKeyIndex(“indexName”, “u.id”, “/users u”);

Example Query:“select * from /users u where u.id = 1”

Restrictions:Equality comparisons only

Page 15: OQL querying and indexes with Apache Geode (incubating)

Hash IndexThe good

Saves on memory due to not storing index key valuesHash values are computed from index key

The badSlower maintenance and query timesOnly a slight savings in memoryName is a bit misleading

Representation Array: [ RE, RE, null, RE, REMOVED, null, RE, ...]

How to createqs.createHashIndex(“indexName”, “u.name”, “/users u”);

Restrictions:Only equality based queriesSingle iterator

Page 16: OQL querying and indexes with Apache Geode (incubating)

Map IndexAllows indexing a map field of an object

How to create:qs.createIndex("indexName", "u.name[*]", "/users u");qs.createIndex("indexName", "u.name['first', 'middle']", "/users u");

In Gfsh:gfsh>create index --name="IndexName" --expression="u.name[‘first’, 'middle']" --region="/users u"

Example of query:“SELECT * FROM /users u WHERE u.name['first'] = 'John' OR u.name['last'] = 'Smith'”

Gotcha:Using u.name.get(‘first’) will not create or query the map index.

Page 17: OQL querying and indexes with Apache Geode (incubating)

Map Index...

‘first’‘middle’‘last’

Keys

Range Index

Key ValueJoe Collection: [(User: Joe Bob, Joe)]John Collection:[(User:John Jacob Schmidt, John)]Jerry Collection:[(User:Jerry Schmidt, Jerry)]

Range Index

Key ValueJacob Collection:[User: John Jacob Schmidt, Jacob)]

Range Index

Key ValueBob Collection: [(User: Joe Bob, Bob)]Schmidt Collection:[(User:John Jacob Schmidt, Schmidt),

(User:Jerry Schmidt, Schmidt)]]

Values

Page 18: OQL querying and indexes with Apache Geode (incubating)

Multiple Index CreationCreating an multiple indexes on a populated region requires iterating that region for each indexThis has significant impact when we have overflow regionsSame mechanism used when cache is brought up internally

Example of multiple index creation: Cache cache = new CacheFactory().create(); QueryService queryService = cache.getQueryService(); queryService.defineIndex("name1", "indexExpr1", "regionPath1"); queryService.defineIndex("name2", "indexExpr2", "regionPath2"); queryService.defineHashIndex("name3", "indexExpr3", "regionPath2"); queryService.defineKeyIndex("name4", "indexExpr4", "regionPath2"); List<Index> indexes = queryService.createDefinedIndexes();

To clear any defined indexes that have not been created yet queryService.clearDefinedIndexes();

Page 19: OQL querying and indexes with Apache Geode (incubating)

Querying with FunctionsBenefits:

● Allows targeting specific nodes by filtering by partitioning key● Closer to data● Logic and computation on results from node, possibly less to send back

Drawbacks:

● More work for users (writing the function)● More work for users (registering the function)

Page 20: OQL querying and indexes with Apache Geode (incubating)

Equijoin QueriesRestrictions:● Must be colocated

Problems:● Slow due to cartesian● Memory usage due to temporary joined result sets

Some improvements are coming:● Significantly reduce join time for single iterator filters where indexes can be used:

“select * from /users u, /employees e where u.name = ‘John’ and u.id = e.id”“select * from /users u, /employees e where u.name = ‘John’ and u.age > 21 and u.id = e.id”“select * from /users u, /employees e, /office o where u.name = ‘John’” and u.id = e.id and e.location = o.location”

Page 21: OQL querying and indexes with Apache Geode (incubating)

General Tips/Tricks● From clause of the query and index expression should match

● For AND operators, put the more selective filter first in the query

● Whenever possible, provide a hint to allow the query engine to prefer a specific index