16
tl;dr: Solr

Tldr solr-courseload

Embed Size (px)

Citation preview

Page 1: Tldr solr-courseload

tl;dr: Solr

Page 2: Tldr solr-courseload

Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, pours them into the basin, and examines them at one's leisure. It becomes easier to spot patterns and links, you understand, when they are in this form."Harry: "You mean... that stuff's your thoughts?"Dumbledore: "Certainly."

Page 3: Tldr solr-courseload

Dumbledore: "I use the Pensieve. One simply siphons the excess thoughts from one's mind, pours them into the basin, and examines them at one's leisure. It becomes easier to spot patterns and links, you understand, when they are in this form."Harry: "You mean... that stuff's your thoughts?"Dumbledore: "Certainly."

Page 4: Tldr solr-courseload

Solr is Lucene-based Lucene = text search engine library written in Java All kinds of crazy goodies:

Ranked search Multiple indexing Simultaneous read & write Date-range search ...the list goes on

Platform-independent (thanks, Java!) Fast & efficient

Index size ~= 20-30% size of indexed data Very high throughput indexing (95GB/hour)

Page 5: Tldr solr-courseload

Solr is NoSQL NoSQL == Non-relational database RDBMS metaphor:

One database One table Denormalized data Query parameters instead of SQL “Documents” instead of rows

Bottom line: it's a persistent datastore, and we use it to store data persistently.

Page 6: Tldr solr-courseload

Vocabulary Master Slave Replication Document API

Page 7: Tldr solr-courseload

Master There can be only one Read & write operations Must be secure Younger, stronger brother of production DB Home base for Solr slaves

Page 8: Tldr solr-courseload

Slave There are many copies They have a plan: replication Read-only Gets copy of index from the Solr master every k minutes

Responds to queries

Page 9: Tldr solr-courseload

Replication Slaves –-HTTP GET--> Master Replication is differential Configuration is set in solrconfig.xml http://tinyurl.com/DESolrRepl

Page 10: Tldr solr-courseload

Document RDBMS = row; Solr = document Denormalized relational data

my friend,

RDBMS = row; Solr = document Denormalized relational data

Flatten a bunch of related RDBMS rows into a single Solr document

Page 11: Tldr solr-courseload

API Application programming interface Primary means of communicating with Solr is an HTTP API

Page 12: Tldr solr-courseload

The Good Stuff:Unix & Diagnostics

“This is the Unix philosophy: Write programs that do one thing and do it well. Write programs to work together. Write programs to handle text streams, because that is a universal interface.”

- Doug McIlroy

Examples of things beyond the scope of this talk: Cat Awk Grep Sed Cut Wc Sort Tail Head

Great read: http://matt.might.net/articles/sql-in-the-shell/

Page 13: Tldr solr-courseload

The Good Stuff:Unix & Diagnostics

You cannot effectively troubleshoot without parsing logs You cannot effectively parse logs without good text-parsing tools:

Cat Awk Grep Sed Cut Wc Sort Tail Head

No *nix OS? PowerShell!

Page 14: Tldr solr-courseload

The Good Stuff:Unix & Diagnostics

Example commands: tail -f /var/log/celery/project.log

Output the Celery log to stdout, in real time cat /ebs2/log/celery/project.log|grep -oE 'BUID:([0-9]{0,5})'|grep -oE '[0-9]{0,5}'|sort --unique Parse the Celery log, printing a list of unique BUIDs

cat /ebs2/log/celery/project.log|grep -B 15 "DocumentInvalid"|grep -E 'Download complete for BUID ([0-9]{1,5})'|awk '{sub(/\[/, "");print $1 " " $2 " " $7 ":" $8}' Parse the Celery log, outputting a list of BUID the feed file for which failed for some reason:

Page 15: Tldr solr-courseload

Conclusion RTFreakingM

http://wiki.apache.org/solr/SolrQuerySyntax http://wiki.apache.org/solr/SolrCaching http://wiki.apache.org/solr/SchemaXml http://django-haystack.readthedocs.org/en/latest/

Experiment & tinker & reinvent the wheel Get comfortable with the command line – you can't effectively administer

Solr (or any sufficiently complex system) with a web GUI Read the logs Connect Solr behavior to application operations

Page 16: Tldr solr-courseload