Solr JDBC - Lucene/Solr Revolution 2016

  • View
    329

  • Download
    5

Embed Size (px)

Text of Solr JDBC - Lucene/Solr Revolution 2016

PowerPoint Presentation

OCTOBER 11-14, 2016 BOSTON, MA

Solr JDBCKevin RisdenApache Lucene/Solr Committer; Avalon Consulting, LLC

303About meConsultant with Avalon Consulting, LLC~4 years working with Hadoop and SearchContributed patches to Ambari, HBase, Knox, Solr, StormInstallation, security, performance tuning, development, administration

Kevin RisdenApache Lucene/Solr CommitterYCSB Contributor

403OverviewBackgroundUse CaseSolr JDBCDemoFuture Development/Improvements

501Background - What is JDBC?The JDBC API is a Java API that can access any kind of tabular data, especially data stored in aRelational Database.

Source: https://docs.oracle.com/javase/tutorial/jdbc/overview/JDBC drivers convert SQL into a backend query.

601Background - Why should you care about Solr JDBC?SQL skills are prolific.JDBC drivers exist for most relational databases.Existing reporting tools work with JDBC/ODBC drivers.Solr 6 works with SQL and existing JDBC tools!

701Use Case Analytics Utility RatesData set: 2011 Utility RatesQuestions:How many utility companies serve the state of Maryland?Which Maryland utility has the cheapest residential rates?What are the minimum and maximum residential power rates excluding missing data elements?What is the state and zip code with the highest residential rate?How could you answer those questions with Solr?Inspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/FacetsFilter QueriesFiltersGroupingSortingStatsString queries together

801Use Case Analytics Utility RatesInspired By: http://blog.cloudera.com/blog/2015/10/how-to-use-apache-solr-to-query-indexed-data-for-analytics/Method: Lucene syntaxQuestions:How many utility companies serve the state of Maryland?http://solr:8983/solr/rates/select?q=state%3A%22MD%22&wt=json&indent=true&group=true&group.field=utility_name&rows=10&group.limit=1Which Maryland utility has the cheapest residential rates?http://solr:8983/solr/rates/select?q=state%3A%22MD%22&wt=json&indent=true&group=true&group.field=utility_name&rows=1&group.limit=1&sort=res_rate+ascWhat are the minimum and maximum residential power rates excluding missing data elements?http://solr:8983/solr/rates/select?q=*:*&fq=%7b!frange+l%3D0.0+incl%3Dfalse%7dres_rate&wt=json&indent=true&rows=0&stats=true&stats.field=res_rateWhat is the state and zip code with the highest residential rate?http://solr:8983/solr/rates/select?q=res_rate:0.849872773537&wt=json&indent=true&rows=1Is there a better way?

901Solr JDBCHighlightsJDBC Driver for SolrPowered by Streaming Expressions and Parallel SQLThursday - Parallel SQL and Analytics with Solr Yonik SeeleyThursday - Creating New Streaming Expressions Dennis GoveIntegrates with any* JDBC client * tested with the JDBC clients in this presentationUsagejdbc:solr://SOLR_ZK_CONNECTION_STRING?collection=COLLECTION_NAME

Apache Solr Reference Guide - Parallel SQL Interface

1001Solr JDBC - Architecture

1101DemoProgramming LanguagesJavaPython/JythonRApache Spark

WebApache ZeppelinRStudio

GUI JDBCDbVisualizerSQuirreL SQL

GUI ODBCMicrosoft ExcelTableau*https://github.com/risdenk/solrj-jdbc-testing

1201Demo Javaimport org.slf4j.Logger;import org.slf4j.LoggerFactory;import java.sql.*;

public class SolrJJDBCTestingJava { private static final Logger LOGGER = LoggerFactory.getLogger(SolrJJDBCTestingJava.class);

public static void main(String[] args) throws Exception { String sql = args[0];

try (Connection con = DriverManager.getConnection("jdbc:solr://solr:9983?collection=test")) { try (Statement stmt = con.createStatement()) { try (ResultSet rs = stmt.executeQuery(sql)) { ResultSetMetaData rsMetaData = rs.getMetaData(); int columns = rsMetaData.getColumnCount(); StringBuilder header = new StringBuilder(); for(int i = 1; i < columns + 1; i++) { header.append(rsMetaData.getColumnLabel(i)).append(","); } LOGGER.info(header.toString()); while (rs.next()) { StringBuilder row = new StringBuilder(); for(int i = 1; i < columns + 1; i++) { row.append(rs.getObject(i)).append(","); } LOGGER.info(row.toString()); } } } } }}Apache Solr Reference Guide - Generic

1301Demo Python#!/usr/bin/env python# https://pypi.python.org/pypi/JayDeBeApi/

import jaydebeapiimport sys

if __name__ == '__main__': jdbc_url = "jdbc:solr://solr:9983?collection=test driverName = "org.apache.solr.client.solrj.io.sql.DriverImpl statement = "select fielda, fieldb, fieldc, fieldd_s, fielde_i from test limit 10 conn = jaydebeapi.connect(driverName, jdbc_url) curs = conn.cursor() curs.execute(statement) print(curs.fetchall()) conn.close()Apache Solr Reference Guide - Python/Jython

1401Demo Jython#!/usr/bin/env jython# http://www.jython.org/jythonbook/en/1.0/DatabasesAndJython.html# https://wiki.python.org/jython/DatabaseExamples#SQLite_using_JDBC

import sys from java.langimport Class from java.sqlimport DriverManager, SQLException

if __name__ == '__main__': jdbc_url = "jdbc:solr://solr:9983?collection=test driverName = "org.apache.solr.client.solrj.io.sql.DriverImpl statement = "select fielda, fieldb, fieldc, fieldd_s, fielde_i from test limit 10 dbConn = DriverManager.getConnection(jdbc_url) stmt = dbConn.createStatement() resultSet = stmt.executeQuery(statement) while resultSet.next(): print(resultSet.getString("fielda")) resultSet.close() stmt.close() dbConn.close()Apache Solr Reference Guide - Python/Jython

1501Demo R# https://www.rforge.net/RJDBC/

library("RJDBC")

solrCP