17
© 2013 LucidWorks

Edanz journal selector case study a prototype based on solr nutch hadoop

Embed Size (px)

DESCRIPTION

Presented by Liang Shen, Developer, European Bioinformatics Institute I'm going to introduce a project I built in 2011: Edanz Journal Selector. It's a tool for scholars to find the right journals to publish their manuscripts. It will be a typical “How We Did It” Development Case Study. We built Edanz Journal Selector based on Solr/Lucene/Hadoop/Hive and deployed it on Amazon web servies. I'm going to share experiences about architecture, cloud and etc. from this project.

Citation preview

Page 1: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Page 2: Edanz journal selector case study a prototype based on solr nutch hadoop

Edanz Journal Selector: Case Study: a

Prototype based on Solr/Nutch/Hadoop

Liang SHEN @shenzhuxi

European Bioinformatics Institute

Page 3: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Edanz Journal Selector

a Prototype based on Solr/Nutch/Hadoop

Page 4: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

English editing for scientists

Page 5: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Help scientists publish papers

Page 6: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Target journal?

Page 7: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Journal Selector

Page 8: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Open Access

PubMed

Page 9: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Journal TOCs

created in 2009

21,498 journals from

1,677 publishers

Institute for Computer

Based Learning

Heriot-Watt University

Page 10: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Partner

• Springer Metadata API

Provides metadata for over 5 million online documents

• Springer Open Access API

Provides metadata, full-text content, and images for

over 80,000 open access articles

Page 11: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Open Source Stack

• Infrastructure: Amazon Web Service

• Data processing: Hadoop/Hive

• Index: Solr/Lucene

• Web service: Drupal

• Secret Sauce/Custom Works

Page 12: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Infrastructure: Amazon EC2

Page 13: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Data processing

HDFS

Index

AP

I

Feed

s

Web

Pages

Page 14: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

<script>

http://global.js.wid

get.eja.hk/ja/edan

z_ja/w.js

</script>

Web service

Page 15: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Embeddable web widget

Page 16: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

Split Index for performance

Index can be divided without losing ranking, if there is always a facet field.

Page 17: Edanz journal selector case study a prototype based on solr nutch hadoop

© 2013 LucidWorks

@shenzhuxi

Thanks!

Questions?