18
Search All the Things CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013 Friday, February 8, 13

Search all the things

Embed Size (px)

DESCRIPTION

This outlines a 24 hackathon project at Acquia that addresses combining generated api documentation and docs from github hosted resources into a single indexeable interface managed by Solr and Drupal.

Citation preview

  • 1. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013 Search All the ThingsFriday, February 8, 13

2. IntroductionKevin Bridges Senior Software Engineer, Cloud Systems at Acquia Avid technologist that believes Drupal is a component of larger systems. http://drupal.org/user/27802 - aka cyberswat https://twitter.com/cyberswat 2Friday, February 8, 13 3. The ProblemLarge organizations have lots of data that can be in multipleformats. Different teams can use different tools andservices making a cohesive interface difficult. Hosted data with services like Github Internal APIs Wikis Documents and text files.This data can span multiple languages and formats. Howcan we combine all of these sources into a single interfacethat is easy to use while maintaining context?3Friday, February 8, 13 4. Engineering Week HackathonWe had 24 hours to solve the problem. Build a Drupal 7 site Integrate with LDAP over SSL for secure access Serve generated API docs like RDoc Index generated docs and github docs for searching Enable an effective faceted search4Friday, February 8, 13 5. The TeamWe needed a few specialists to pull this off. 3 Drupaldevelopers, 1 Drupal themer, and 2 operations hackers. Kevin Bridges (@cyberswat) - Drupal & DevOps Peter Wolanin (@pwolanin) - Drupal & Solr Peter Jackson (@faoiseamh) - Drupal & DevOps Richard Burford (@psynaptic) - Drupal Themer Amin Astaneh (@aastaneh) - Operations Chris Rutter (@ChrisRut) - Operations 5Friday, February 8, 13 6. Drupal ModulesWe used 6 contributed modules to accelerate ourdevelopment efforts. We needed to create 1 custommodule that currently lives in a Drupal Sandbox.Contributed Modules Acquia Connector - Contains the Acquia Searchmodule which provides integration between a Drupal siteand Acquias hosted search service Apache Solr - Integrates Drupal with the Apache Solrsearch platform Apache Solr Attachments - Allows searching within fileattachments from Solr6Friday, February 8, 13 7. Drupal ModulesContributed Modules Continued Apache Solr Multisite Search - Search across multiplesites with Solr Facet API - Abstract facet API that can be used byvarious search backends LDAP - Provides integration with LDAP servicesCustom Modules API docs search - Search API docs with Solr7Friday, February 8, 13 8. Custom StreamWrappersDrupals StreamWrappers allow us to keep local copies ofthe data we need to index while maintaining control overhow the data is displayed to the end user.generated Store generated content for indexing and viewing. Allow the files to be viewable from the search results in the context of the Drupal site. Allows us to store raw html for display from search results.github Store github content for pre-processing and indexing. Modify external links to this content to reference the document as it lives on github for additional context. 8Friday, February 8, 13 9. JenkinsJenkins runs a cron that gathers all of the data we wantindexed and pushes it into the main git repository asrendered content for the site. Once content is in git it ispulled onto the server for our StreamWrappers to work. Checks out the allthethings repo that runs the main drupal install. Loops over each of the git repositories we are interested in indexing. Scans our standard documentation types and locations for changes and commits them to allthethings. Runs RDoc to generate Ruby Docs and commits the documentation to allthethings if it has changed. 9Friday, February 8, 13 10. Scanning Content for IndexingBefore we can index content in Solr we need to identifywhat should be indexed. Once identified, the file is trackedin mysql so that it can be processed efficiently. Cron is used to pull down changes Jenkins may have pushed. Each of the StreamWrapper file directories is scanned for valid content. A hash of the content is generated with the timestamps to help target what should be indexed. Database record includes uri, hash, timestamp, type, mimetype and status.10Friday, February 8, 13 11. Passing Content to SolrFor each of the scanned documents we need to build a Solrdocument to be used in search results. Evaluate the content and render it using the github markup gem if necessary. Evaluate the content for html tags to assist with surfacing content in searches. Identify a good title for the document by searching for title and h1 tags. Send the completed document to Solr for indexing. Update our scanned documents status to indicate it has been indexed. 11Friday, February 8, 13 12. Create Facets with FacetAPIThe FacetAPI is used to create custom Facets. We wanteda facet to allow filtering by API Source and Content Type. During generation of the Solr document populate the ss_apisource attribute. FacetAPI provides a block for each content type. This corresponds with the entity_type attribute in our Solr document. Implement hook_facetapi_facet_info to provide the definition of the facet. Use apidocs_search_map_source to map different sources to labels.12Friday, February 8, 13 13. Drush IntegrationIts always a good idea to start with Drush while buildingadvanced tools. This provides easier development,troubleshooting and maintenance capabilities. apidocs-clean Removes file references from database that no longer exist in the filesystem apidocs-index Indexes files referenced in {apidocs_search_files}. apidocs-scan - Scans existing documentation to record references in the database. apidocs-markup - Parses a github flavored markdown file into markup. 13Friday, February 8, 13 14. Custom apidocs_search ModuleThe bulk of our customizations were focused in theapidocs_search module. This module is available in asandbox on drupal.org for your inspection. apidocs_search.index.inc - Manages Solr indexing apidocs_search.install - Manages the apidocs_search_files schema. apidocs_search_markup.rb - uses the github-markup gem to render github flavored markdown apidocs_search_streamwrappers.inc - Provides a generated documentation and github stream wrapper apidocs_search.module - Provides the necessary callbacks and methods to make it all work 14Friday, February 8, 13 15. Resources and LinksDevelopers cyberswat - http://drupal.org/user/27802 pwolanin - http://drupal.org/user/49851 faoiseamh - http://drupal.org/user/1999750 psynaptic - http://drupal.org/user/93429 aastaneh - http://drupal.org/user/2318122 ChrisRut - http://drupal.org/user/597820More Reading https://www.acquia.com/blog/finding-all-things-engineering-hackathon http://www.slideshare.net/cyberswat/drupalcon-sydney 15Friday, February 8, 13 16. Resources and LinksContrib Modules http://drupal.org/project/acquia_connector http://drupal.org/project/apachesolr http://drupal.org/project/apachesolr_attachments http://drupal.org/project/apachesolr_multisitesearch http://drupal.org/project/facetapi http://drupal.org/project/ctools http://drupal.org/project/ldapCustom Modules http://drupal.org/sandbox/pwolanin/1801674 16Friday, February 8, 13 17. Aquia is Hiring in Australia(and elsewhere) https://www.acquia.com/careersFriday, February 8, 13 18. CODING & DEVELOPMENT | KEVIN BRIDGES | FEBRUARY 8 2013Search All the Things We Need Your Feedback http://sydney2013.drupal.org/node/348Friday, February 8, 13