13
Search Engine Back End API Solution for Fast Prototyping https://github.com/jimmylai/ search_engine_backend_template Introduction and Tutorial Jimmy Lai r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536 2013/01/23

[LDSP] Search Engine Back End API Solution for Fast Prototyping

Embed Size (px)

DESCRIPTION

In this slides, I propose a solution for fast prototyping of search engine back end API. It consists of Linux + Django + Solr + Python (LDSP), and all are open source softwares. The solution also provides code repository with automation scripts. Everyone can build a Search Engine back end API in seconds by exploiting LDSP.

Citation preview

Page 1: [LDSP] Search Engine Back End API Solution for Fast Prototyping

Search Engine Back End API Solution for Fast Prototyping

https://github.com/jimmylai/search_engine_backend_template

Introduction and Tutorial Jimmy Lai

r97922028 [at] ntu.edu.tw http://tw.linkedin.com/pub/jimmy-lai/27/4a/536

2013/01/23

Page 2: [LDSP] Search Engine Back End API Solution for Fast Prototyping

Requirements:

•  A Backend API provides: 1.  Full-text search 2.  Spatial search 3.  RESTful API 4.  Input validation 5.  Output rendering

•  Simple and easy for fast prototyping

LDSP 2

Page 3: [LDSP] Search Engine Back End API Solution for Fast Prototyping

LDSP: Linux + Django + Solr + Python

•  Django: API serving –  Input validation – Data mash-up – Output rendering

•  Solr: Indexing and searching – Schema –  Indexing – Query

LDSP 3

Page 4: [LDSP] Search Engine Back End API Solution for Fast Prototyping

Automation: Shell command + Fabric

•  Installation: – Solr – Python packages: django, djangorestframework,

pysolr •  Start services:

– Solr – Django standalone server

•  Data: – Create Solr core – Feed data

LDSP 4

Page 5: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] Requirement

•  Example Dataset: Taiwan movie dataset http://data.gov.tw/node/7731

•  Data source http://nrchbms.culture.tw/OpenData/API/iCultureAPI.aspx?type=26&radius=100&format=json

•  We’d like to build a geo search API to find nearby movie items given a location

LDSP 5

Page 6: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] Data preprocessing (1/2)

•  Original data •  {"sno":"29277","mainTitle":"「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的外型遊台灣","year":"未知","format":"508x640pixels","subject":"戲劇電影歌仔戲","date":"創作⽇日期:⺠民國時期⺠民國時期","sourcesite":"http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg","px":"120.49","py":"23.853"}

LDSP 6

Page 7: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] Data preprocessing (2/2) •  Preprocessed data: normalized the location format •  The preprocessed data is stored as data/movie.json •  { •  "format": "508x640pixels", •  "sno": "29277", •  "sourcesite": "http://nrch.cca.gov.tw/ccahome/photo/photo_meta.jsp?

xml_id=0005780574&dofile=cca100009-hp-tmtp0281_00-0001-w.jpg", •  "mainTitle": "「新南光」電影《孫悟空遊台灣》,描寫孫悟空化為⼤大學⽣生的外型遊台灣",

•  "location": "23.853,120.49", •  "year": "未知", •  "date": "創作⽇日期:⺠民國時期⺠民國時期", •  "id": "29277", •  "subject": "戲劇電影歌仔戲" •  },

LDSP 7

Page 8: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] Data indexing •  pip install fabric •  fab setup_env •  fab create_core:movie •  fab run_solr •  fab feed_data:movie,data/movie.json

– Core name as the 1st parameter, data file name as the 2nd parameter

•  fab run_django •  We are done!

LDSP 8

Page 9: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] Solr Admin

LDSP 9

Page 10: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] Djangorestframework UI

LDSP 10

Page 11: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] BE API Query, json output

LDSP 11

Page 12: [LDSP] Search Engine Back End API Solution for Fast Prototyping

[Tutorial] BE API Query, xml output

LDSP 12

Page 13: [LDSP] Search Engine Back End API Solution for Fast Prototyping

The next steps

•  Understand the mechanism of: – Solr:

•  Update schema •  Update partial data •  Query language of Solr

– Djangorestframework: •  Add more parameters •  Input validation •  Output rendering

•  Deploy to production environment

LDSP 13