32
# Sven Erik Knop Technical Marketing Manager Mastering Your Universe P4Search Ralf Gronkowski Principal Product Consultant

Mastering Your Universe with P4 Search

Embed Size (px)

DESCRIPTION

P4 Search is a tool built internally and open-sourced in the Perforce Workshop. It creates and uses an external search index to allow users to search the content of a Perforce Server. This talk will explain the inner workings of P4 Search, its setup and applications, and explore ideas on how to extend this great and essential tool.

Citation preview

Page 1: Mastering Your Universe with P4 Search

#

Sven Erik KnopTechnical Marketing Manager

Mastering Your UniverseP4Search

Ralf GronkowskiPrincipal Product Consultant

Page 2: Mastering Your Universe with P4 Search

#

Sven Erik KnopPerforce Software

Ralf GronkowskiPerforce Software

Page 3: Mastering Your Universe with P4 Search

#

• Why P4Search?• What is P4Search?• Implementation Details and Demonstration

Overview

Page 4: Mastering Your Universe with P4 Search

#

Why P4Search?

Page 5: Mastering Your Universe with P4 Search

#

What is Search?

p4 files / p4 fstat / ...

???

File names, Changes ...

File content?

C#

.h

JAVA

PPTX

PDF

Page 6: Mastering Your Universe with P4 Search

#

• Built-in command, since Perforce 2010.1• Search files stored in P4D based on content

– Case sensitive and insensitive searches– Can use regular expressions– Can search through all revisions– Provide context search

• Returns depot paths

p4 grep

Page 7: Mastering Your Universe with P4 Search

#

• A few drawbacks:– Text search only, limited to 4K lines– No search for Metadata such as attributes

• Performance concerns:– Limited to 10,000 revisions by default– Memory and CPU consumption– But: lockless with peeking since 2013.3

What’s Not to Like?

Page 8: Mastering Your Universe with P4 Search

#

Solution: External Indexp4 files/p4 fstat

index

storesearch

Search engine indexes contentStores it in its own database

Users search the index firstIndex returns a depot path

Index and Perforce Servercan live on separate hosts

Page 9: Mastering Your Universe with P4 Search

#

• Lucene– Scalable, high performance indexing– Search Algorithms

• Solr– Stand-alone enterprise search server– HTML Administration interface– Extensible

• Tika– Content analysis tool

Apache Lucene, Solr and Tika

Page 10: Mastering Your Universe with P4 Search

#

• P4Search– Index queue (processing indexing requests)– Search controller (security)– RESTful API (integration into other tools)– UI (simple searches)

• Runs in Jetty

Additional Components Required

Page 11: Mastering Your Universe with P4 Search

#

What We Want to Search For

//depot/Talkhouse/rel1.0/com/walkerbros/common/widget/EBolt.java#10

Page 12: Mastering Your Universe with P4 Search

#

• Changes/Changelists• Branches• Jobs• Users• Workspaces• Depots

What We Don’t Want to Search For

Page 13: Mastering Your Universe with P4 Search

#

• Content• Metadata (whatever that might be)

What We Search By

Page 14: Mastering Your Universe with P4 Search

#

There is Content …

Page 15: Mastering Your Universe with P4 Search

#

• Accessible through p4 files / p4 fstat ...

And There is P4 Metadata

Page 16: Mastering Your Universe with P4 Search

#

And There is Common Metadata

Page 17: Mastering Your Universe with P4 Search

#

• For ordinary folks– p4 edit file– p4 attribute –n tags –v cool file– p4 submit -d “just defined a cool tag on file rev”

• For admins– p4 attribute –f –n tags –v cool file#rev

• Find them with• p4 fstat -Oa -F "attr-tags=cool" //depot/...

There is Even Custom P4 Metadata

Page 18: Mastering Your Universe with P4 Search

#

• File content• P4 Metadata• P4 attributes• And the common Metadata if desired

P4Search Will Index ...

Page 19: Mastering Your Universe with P4 Search

#

Details

Page 20: Mastering Your Universe with P4 Search

#

What We Store in Solr

+ other fields

Page 21: Mastering Your Universe with P4 Search

#

Solr Search Does Know A Lot But…

No ACL’s, no permission

Page 22: Mastering Your Universe with P4 Search

#

• Is query endpoint for users• Has simplified API• Provides P4 authentication (password|ticket)• Filters query results honoring the existing

P4 protections

So A Search Controller

Page 23: Mastering Your Universe with P4 Search

#

Accessing the Index

P4SearchSearch controller

SolrSearch index

Page 24: Mastering Your Universe with P4 Search

#

• External index and protection table?• Solution:

– Use a programmable search engine– Use Perforce protections to filter results

Users need read access to files to be able to search

Security Concerns

Page 25: Mastering Your Universe with P4 Search

#

• Jetty– Solr

• Lucene

• Jetty– P4Search

• Search queue/Indexer• Search controller• RESTful API• UI

Implementation

Page 26: Mastering Your Universe with P4 Search

#

• swarm.workshop.perforce.com/projects/perforce-software-p4search/files/main

Open source – Where To Find

Page 27: Mastering Your Universe with P4 Search

#

• Download from the Workshop• Follow the provided instructions to install• Run two services

– p4search-solr– p4search-jetty

Installation

Page 28: Mastering Your Universe with P4 Search

#

• On first run index your entire depot– You probably don’t want to do this

• On submit index new file revs– change-commit trigger on depot location

• At any time any given change– curl POST --data commit,change#

http://p4search:8080/api/queue/{token}

Ways to Populate the Index

Page 29: Mastering Your Universe with P4 Search

#

• Indexing– With trigger P4D, so ultimately any given client and user

• Searching– P4Search UI– Piper– Commons– Custom through P4Search API

Who Uses P4Search Today

Page 30: Mastering Your Universe with P4 Search

#

• Deep dive after learning Lucene/Solr• Starting point

p4search/solr/example/solr/collection1/conf– schema.xml– solrconfig.xml

Tweaking P4Search

Page 31: Mastering Your Universe with P4 Search

#

DEMO

Page 32: Mastering Your Universe with P4 Search

##

Thank you!Sven Erik [email protected]

Ralf Gronkowski

[email protected]@_gronk