Upload
jimmy-lai
View
554
Download
1
Embed Size (px)
DESCRIPTION
In this slides, I propose a solution for fast prototyping of search engine back end API. It consists of Linux + Django + Solr + Python (LDSP), and all are open source softwares. The solution also provides code repository with automation scripts. Everyone can build a Search Engine back end API in seconds by exploiting LDSP.
Citation preview
Search Engine Back End API Solution for Fast Prototyping
https://github.com/jimmylai/search_engine_backend_template
Introduction and Tutorial Jimmy Lai
r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536
2013/01/23
Requirements:
• A Backend API provides: 1. Full-text search 2. Spatial search 3. RESTful API 4. Input validation 5. Output rendering
• Simple and easy for fast prototyping
LDSP 2
LDSP: Linux + Django + Solr + Python
• Django: API serving – Input validation – Data mash-up – Output rendering
• Solr: Indexing and searching – Schema – Indexing – Query
LDSP 3
Automation: Shell command + Fabric
• Installation: – Solr – Python packages: django, djangorestframework,
pysolr • Start services:
– Solr – Django standalone server
• Data: – Create Solr core – Feed data
LDSP 4
[Tutorial] Requirement
• Example Dataset: Taiwan movie dataset http://data.gov.tw/node/7731
• Data source http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx?type=26&radius=100&format=json
• We’d like to build a geo search API to find nearby movie items given a location
LDSP 5
[Tutorial] Data preprocessing (1/2)
• Original data • {"sno":"29277","mainTitle":"「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的外型遊台灣","year":"未知","format":"508x640pixels","subject":"戲劇電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國時期","sourcesite":"http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg","px":"120.49","py":"23.853"}
LDSP 6
[Tutorial] Data preprocessing (2/2) • Preprocessed data: normalized the location format • The preprocessed data is stored as data/movie.json • { • "format": "508x640pixels", • "sno": "29277", • "sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?
xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg", • "mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的外型遊台灣",
• "location": "23.853,120.49", • "year": "未知", • "date": "創作⽇日期:⺠民國時期⺠民國時期", • "id": "29277", • "subject": "戲劇電影歌仔戲" • },
LDSP 7
[Tutorial] Data indexing • pip install fabric • fab setup_env • fab create_core:movie • fab run_solr • fab feed_data:movie,data/movie.json
– Core name as the 1st parameter, data file name as the 2nd parameter
• fab run_django • We are done!
LDSP 8
[Tutorial] Solr Admin
LDSP 9
[Tutorial] Djangorestframework UI
LDSP 10
[Tutorial] BE API Query, json output
LDSP 11
[Tutorial] BE API Query, xml output
LDSP 12
The next steps
• Understand the mechanism of: – Solr:
• Update schema • Update partial data • Query language of Solr
– Djangorestframework: • Add more parameters • Input validation • Output rendering
• Deploy to production environment
LDSP 13