13
Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Embed Size (px)

DESCRIPTION

Importance of the paper in the current context:  Search Engines have become an important part of life for billions of people. It is intriguing how SE’s manage magnanimous amount of data of the order of 3B documents and increasing every second.  Behind-the-scenes challenges of designing SEs in terms of: Ranking Documents Ranking Query Results Availability Freshness of Data  Discusses data-intensive applications in the wake of SEs.  Finally this paper invokes thought as to how scalable these models can be as in the case of SE’s the data on the internet is increasing every second.

Citation preview

Page 1: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Combining Systems and Databases: A Search Engine Retrospective

By:Rooma RathoreRohini Prinja

Author: Eric A. Brewer

Page 2: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Problem Statement:

How Search Engines (SEs) should have been designed.

How to leverage the Database principles in designing Data-Intensive (DI) applications without necessarily using the same semantics.

Page 3: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Importance of the paper in the current context:

Search Engines have become an important part of life for billions of people. It is intriguing how SE’s manage magnanimous amount of data of the order of 3B documents and increasing every second.

Behind-the-scenes challenges of designing SEs in terms of:

Ranking Documents Ranking Query Results Availability Freshness of Data

Discusses data-intensive applications in the wake of SEs.

Finally this paper invokes thought as to how scalable these models can be as in the case of SE’s the data on the internet is increasing every second.

Page 4: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Contributions:

It gives numbers for various search engine parameters like: No of documents, Data stored, No of queries

etc.. etc.. Discusses the challenges of designing a

SE. What principles of DBMS can be (should

be) applied when designing DI applications like search engines: Top-Down Design Data Independence Declarative Query Language

Page 5: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Contributions (contd.):

Why SE’s did not use DBMS in the first place?

Why SE’s can not be implemented as DBMS in true sense: Speed: DBMS are slow Cost: DBMS are not cost-effective given the

magnitude of the data High-Availability vs. Consistency: DBMS prefer

consistency in antithesis to SE’s Update: The model of updating data in SE’s is

entirely different from databases

Page 6: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Contributions (contd.): New Design Uses static databases and large degree of offline

work to build and rebuild static databases. Overview of SE design:

Crawl, Index, Serve Query (read-only)

Scoring of documents and words Making a Query Plan Query Implementation

Access Methods and Physical Operators Optimize queries to maximize the through-put of the

system Providing redundancy using clustering Compression and other optimizations

Updation of data Fault Tolerance

Page 7: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Contributions (contd.):

SE challenges different from traditional DBMS: Personalization Logging Query rewriting Phrase queries

Page 8: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Validations:

Author experience on developing Inktomi search engine to come up with improved search engine design.

Author also studied the working of various modern Search Engines like Google, Alta-vista, Infoseek.

Page 9: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Assumptions:

Following are the assumptions that author has made while writing this paper: DI applications are essentially like SE’s and

hence should be no different when it comes to utilizing database principles.

While scoring the document, author assumes that shorter the length of the document, the higher the score it should be assigned

Updates to the systems can always happen offline.

It assumes that documents from one site are evenly distributed across the cluster nodes for load balancing.

Page 10: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Conclusion of the paper:

Data-intensive systems should employ the principles of databases.

Many systems are a good fit for DBMS principles (though may not use the same artifacts): Logging System Google File Systems Batch Aware distributed file system

Page 11: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Additional information that can be re-written/added if written today:

More emphasis and details on Logging: Companies like Google earn their moolah using

advertising (of the order of billion of dollars) How the following factors should affect

the design of a SE: Probability of Click Attacks Privacy/Copyright concerns while crawling the

web Generic Search vs. Search against a particular

domain like law or image search Comparisons of the design proposed with

one current popular search engine.

Page 12: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

References:

http://en.wikipedia.org/wiki/ “The Anatomy of a Large-Scale

Hypertextual Web Search Engine” (1998) by Sergey Brin, Lawrence Page

Page 13: Combining Systems and Databases: A Search Engine Retrospective By: Rooma Rathore Rohini Prinja Author: Eric A. Brewer

Thanks!!