26
OCTOBER 11-14, 2016 BOSTON, MA

Solr JDBC - Lucene/Solr Revolution 2016

Embed Size (px)

Citation preview

Page 1: Solr JDBC - Lucene/Solr Revolution 2016

O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A

Page 2: Solr JDBC - Lucene/Solr Revolution 2016

Solr JDBCKevin Risden

Apache Lucene/Solr Committer; Avalon Consulting, LLC

Page 3: Solr JDBC - Lucene/Solr Revolution 2016

3

03About me

• Consultant with Avalon Consulting, LLC• ~4 years working with Hadoop and Search• Contributed patches to Ambari, HBase, Knox, Solr, Storm• Installation, security, performance tuning, development,

administration

• Kevin Risden• Apache Lucene/Solr Committer• YCSB Contributor

Page 4: Solr JDBC - Lucene/Solr Revolution 2016

4

03Overview• Background• Use Case• Solr JDBC• Demo• Future Development/Improvements

Page 5: Solr JDBC - Lucene/Solr Revolution 2016

5

01Background - What is JDBC?

The JDBC API is a Java API that can access any kind of tabular data, especially data stored in a Relational Database.

Source: https://docs.oracle.com/javase/tutorial/jdbc/overview/

JDBC drivers convert SQL into a backend query.

Page 6: Solr JDBC - Lucene/Solr Revolution 2016

6

01Background - Why should you care about Solr JDBC?

• SQL skills are prolific.• JDBC drivers exist for most relational databases.• Existing reporting tools work with JDBC/ODBC

drivers.

Solr 6 works with SQL and existing JDBC tools!

Page 7: Solr JDBC - Lucene/Solr Revolution 2016

7

01Use Case – Analytics – Utility RatesData set: 2011 Utility RatesQuestions:• How many utility companies serve the state of Maryland?• Which Maryland utility has the cheapest residential rates?• What are the minimum and maximum residential power rates excluding missing data

elements?• What is the state and zip code with the highest residential rate?

How could you answer those questions with Solr?

Inspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/

• Facets• Filter Queries• Filters• Grouping

• Sorting• Stats• String queries

together

Page 8: Solr JDBC - Lucene/Solr Revolution 2016

8

01Use Case – Analytics – Utility Rates

Inspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/

Method: Lucene syntaxQuestions:• How many utility companies serve the state of Maryland?

http://solr:8983/solr/rates/select?q=state%3A%22MD%22&wt=json&indent=true&group=true&group.field=utility_name&rows=10&group.limit=1

• Which Maryland utility has the cheapest residential rates?http://solr:8983/solr/rates/select?q=state%3A%22MD%22&wt=json&indent=true&group=true&group.field=utility_name&rows=1&group.limit=1&sort=res_rate+asc

• What are the minimum and maximum residential power rates excluding missing data elements?http://solr:8983/solr/rates/select?q=*:*&fq=%7b!frange+l%3D0.0+incl%3Dfalse%7dres_rate&wt=json&indent=true&rows=0&stats=true&stats.field=res_rate

• What is the state and zip code with the highest residential rate?http://solr:8983/solr/rates/select?q=res_rate:0.849872773537&wt=json&indent=true&rows=1Is there a better way?

Page 9: Solr JDBC - Lucene/Solr Revolution 2016

9

01Solr JDBCHighlights• JDBC Driver for Solr• Powered by Streaming Expressions and Parallel SQL

• Thursday - Parallel SQL and Analytics with Solr – Yonik Seeley• Thursday - Creating New Streaming Expressions – Dennis

Gove• Integrates with any* JDBC client * tested with the JDBC clients in this presentation

Usagejdbc:solr://SOLR_ZK_CONNECTION_STRING?collection=COLLECTION_NAME

Apache Solr Reference Guide - Parallel SQL Interface

Page 10: Solr JDBC - Lucene/Solr Revolution 2016

10

01Solr JDBC - Architecture

Page 11: Solr JDBC - Lucene/Solr Revolution 2016

11

01DemoProgramming Languages• Java• Python/Jython• R• Apache Spark

Web• Apache Zeppelin• RStudio

GUI – JDBC• DbVisualizer• SQuirreL SQL

GUI – ODBC• Microsoft Excel• Tableau*

https://github.com/risdenk/solrj-jdbc-testing

Page 12: Solr JDBC - Lucene/Solr Revolution 2016

12

01Demo – Javaimport org.slf4j.Logger;import org.slf4j.LoggerFactory;import java.sql.*;public class SolrJJDBCTestingJava { private static final Logger LOGGER = LoggerFactory.getLogger(SolrJJDBCTestingJava.class); public static void main(String[] args) throws Exception { String sql = args[0]; try (Connection con = DriverManager.getConnection("jdbc:solr://solr:9983?collection=test")) { try (Statement stmt = con.createStatement()) { try (ResultSet rs = stmt.executeQuery(sql)) { ResultSetMetaData rsMetaData = rs.getMetaData(); int columns = rsMetaData.getColumnCount(); StringBuilder header = new StringBuilder(); for(int i = 1; i < columns + 1; i++) { header.append(rsMetaData.getColumnLabel(i)).append(","); } LOGGER.info(header.toString()); while (rs.next()) { StringBuilder row = new StringBuilder(); for(int i = 1; i < columns + 1; i++) { row.append(rs.getObject(i)).append(","); } LOGGER.info(row.toString()); } } } } }} Apache Solr Reference Guide - Generic

Page 13: Solr JDBC - Lucene/Solr Revolution 2016

13

01Demo – Python#!/usr/bin/env python# https://pypi.python.org/pypi/JayDeBeApi/import jaydebeapiimport sysif __name__ == '__main__': jdbc_url = "jdbc:solr://solr:9983?collection=test” driverName = "org.apache.solr.client.solrj.io.sql.DriverImpl” statement = "select fielda, fieldb, fieldc, fieldd_s, fielde_i from test limit 10” conn = jaydebeapi.connect(driverName, jdbc_url) curs = conn.cursor() curs.execute(statement) print(curs.fetchall()) conn.close()

Apache Solr Reference Guide - Python/Jython

Page 14: Solr JDBC - Lucene/Solr Revolution 2016

14

01Demo – Jython#!/usr/bin/env jython# http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html# https://wiki.python.org/jython/DatabaseExamples#SQLite_using_JDBCimport sys from java.langimport Class from java.sqlimport DriverManager, SQLExceptionif __name__ == '__main__': jdbc_url = "jdbc:solr://solr:9983?collection=test” driverName = "org.apache.solr.client.solrj.io.sql.DriverImpl” statement = "select fielda, fieldb, fieldc, fieldd_s, fielde_i from test limit 10” dbConn = DriverManager.getConnection(jdbc_url) stmt = dbConn.createStatement() resultSet = stmt.executeQuery(statement) while resultSet.next(): print(resultSet.getString("fielda")) resultSet.close() stmt.close() dbConn.close() Apache Solr Reference Guide - Python/Jython

Page 15: Solr JDBC - Lucene/Solr Revolution 2016

15

01Demo – R# https://www.rforge.net/RJDBC/library("RJDBC")solrCP <- c(list.files('/opt/solr/dist/solrj-lib', full.names=TRUE), list.files('/opt/solr/dist', pattern='solrj', full.names=TRUE, recursive = TRUE))drv <- JDBC("org.apache.solr.client.solrj.io.sql.DriverImpl", solrCP, identifier.quote="`")conn <- dbConnect(drv, "jdbc:solr://solr:9983?collection=test", "user", "pwd")dbGetQuery(conn, "select fielda, fieldb, fieldc, fieldd_s, fielde_i from test limit 10")dbDisconnect(conn)

Apache Solr Reference Guide - R

Page 16: Solr JDBC - Lucene/Solr Revolution 2016

16

01Demo – Apache Zeppelin

Apache Solr Reference Guide - Apache Zeppelin

Page 17: Solr JDBC - Lucene/Solr Revolution 2016

17

01Demo – RStudio

Page 18: Solr JDBC - Lucene/Solr Revolution 2016

18

01Demo – DbVisualizer

Apache Solr Reference Guide - DbVisualizer

Page 19: Solr JDBC - Lucene/Solr Revolution 2016

19

01Demo – SQuirreL SQL

Apache Solr Reference Guide - SQuirreL SQL

Page 20: Solr JDBC - Lucene/Solr Revolution 2016

20

01Demo – Microsoft Excel

Page 21: Solr JDBC - Lucene/Solr Revolution 2016

21

01Use Case – Analytics – Utility Rates

Inspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/

Method: Lucene syntaxQuestions:• How many utility companies serve the state of Maryland?

http://solr:8983/solr/rates/select?q=state%3A%22MD%22&wt=json&indent=true&group=true&group.field=utility_name&rows=10&group.limit=1

• Which Maryland utility has the cheapest residential rates?http://solr:8983/solr/rates/select?q=state%3A%22MD%22&wt=json&indent=true&group=true&group.field=utility_name&rows=1&group.limit=1&sort=res_rate+asc

• What are the minimum and maximum residential power rates excluding missing data elements?http://solr:8983/solr/rates/select?q=*:*&fq=%7b!frange+l%3D0.0+incl%3Dfalse%7dres_rate&wt=json&indent=true&rows=0&stats=true&stats.field=res_rate

• What is the state and zip code with the highest residential rate?http://solr:8983/solr/rates/select?q=res_rate:0.849872773537&wt=json&indent=true&rows=1Is there a better way?

Page 22: Solr JDBC - Lucene/Solr Revolution 2016

22

01Use Case – Analytics – Utility RatesMethod: SQLQuestions:• How many utility companies serve the state of Maryland?

select distinct utility_name from rates where state='MD';• Which Maryland utility has the cheapest residential rates?

select utility_name,min(res_rate) from rates where state='MD' group by utility_name order by min(res_rate) asc limit 1;

• What are the minimum and maximum residential power rates excluding missing data elements?

select min(res_rate),max(res_rate) from rates where not res_rate = 0;• What is the state and zip code with the highest residential rate?

select state,zip,max(res_rate) from rates group by state,zip order by max(res_rate) desc limit 1;

How should you answer those questions with Solr? – Using SQL!Inspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/

Page 23: Solr JDBC - Lucene/Solr Revolution 2016

23

01Use Case – Analytics – Utility Rates

How should you answer those questions with Solr? – Using SQL!Inspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/

Page 24: Solr JDBC - Lucene/Solr Revolution 2016

24

01Future Development/Improvements• Replace Presto with Apache Calcite - SOLR-8593• Improve SQL compatibility• Ability to specify optimization rules (push downs, joins,

etc)• Potentially use Avatica JDBC/ODBC drivers

• Streaming Expressions/Parallel SQL improvements - SOLR-8125

• JDBC driver improvements - SOLR-8659Info on how to get involved

Page 25: Solr JDBC - Lucene/Solr Revolution 2016

25

01Future Development/ImprovementsSQL Join

Info on how to get involved

SELECT movie_title,character_name,lineFROM movie_dialogs_movie_titles_metadata aJOIN movie_dialogs_movie_lines bONa.movieID=b.movieID;

select( innerJoin( search(movie_dialogs_movie_titles_metadata, q=*:*, fl="movieID,movie_title", sort="movieID asc"), search(movie_dialogs_movie_lines, q=*:*, fl="movieID,character_name,line", sort="movieID asc"), on="movieID” ), movie_title,character_name,line)

Streaming Expression Join

Page 26: Solr JDBC - Lucene/Solr Revolution 2016

26

01Questions?