88
Open Source Technologies

Open source Technology

Embed Size (px)

Citation preview

Page 1: Open source Technology

Open Source Technologies

Page 2: Open source Technology
Page 3: Open source Technology

What is Open Source ?

Page 4: Open source Technology

Simple: You can read the code.

           You can see how it's made

Page 5: Open source Technology

Two main characteristics 

First, Its FREE

Page 6: Open source Technology

Second (much more important &  interesting),it’s free as in freedom.

Page 7: Open source Technology

Four Freedoms

* The freedom to run the program for any   Purpose

* The freedom to study how the program   

  works, and adapt it to your needs

* The freedom to redistribute copies

* The freedom to improve the program

Page 8: Open source Technology

Why this is cool ?

Page 9: Open source Technology

Anyone can do whatever they like with it.Nobody owns it, Everyone can use it, Anyone can improve it

Page 10: Open source Technology

Improved in terms of quantity of code (functionality)People add layers on top of other people’s code

Page 11: Open source Technology

As the code base grows, the potential growsImproves chances of it being used for something not intended by the originator

Page 12: Open source Technology

What does it take to be a Web Developer?

Page 13: Open source Technology

HTML&

PHP

Page 14: Open source Technology

Let's take a brief look on what is a “Web Developer”

Page 15: Open source Technology
Page 16: Open source Technology
Page 17: Open source Technology
Page 18: Open source Technology
Page 19: Open source Technology

And that was just the Ruby stack

Page 20: Open source Technology

Now back to the question

Page 21: Open source Technology

What does it take to be a Web Developer?

Page 22: Open source Technology

A Passion for Learning

Page 23: Open source Technology

LAMP

Page 24: Open source Technology

LinuxL

Page 25: Open source Technology

 * Very reliable OS

 * Extremely powerful

 * Performs great even in less    resources

 * Compelling Graphics

 * Powerful Programming supports

 * Scalable

 * No piracy Issues

Page 26: Open source Technology

ApacheL

Page 27: Open source Technology

Web server can refer to either the hardware (the computer)  or  the  software  (the  computer application)  that  helps  to  deliver  Web  content that can be accessed through the Internet.

The  most  common  use  of  web  servers  is  to  host websites,  but  there  are  other  uses  such  as gaming,  data  storage  or  running  enterprise applications.

Apache * Only web­server to run on all major platforms    (*NIX, WINDOZ, MAC, FREEBSD and any other you    name it)

 * Largest Market share holder for web servers    since 1996 and still growing.

Page 28: Open source Technology

MySQLL

Page 29: Open source Technology

 * Relational Database 

 * World’s Fastest growing open    source database servers.

 * Fast performance, high reliability    and ease of use. 

 * It's used on every continent ­­    Yes, even Antarctica 

 * Work on more than 20 platforms    including Linux, Windoz, OS/X, HP­   UX, AIX, Netware to name a few

 * Supports various Engines

Page 30: Open source Technology

PHPL

Page 31: Open source Technology

 * Open Source server­side scripting     language designed specifically for the    web. 

 * Most widely uses language on the web

 * Outputs not only HTML but can output XML,    images (JPG & PNG), PDF files and even    Flash movies (using libswf and Ming) all    generated on the fly. Can write these    files to the filesystem.

 * Supports a wide­range of databases    (20 + ODBC).

 * Perl­ and C­like syntax. Relatively easy    to learn.

Page 32: Open source Technology

LAMP OverviewL

Page 33: Open source Technology

Let's CODE :)

Page 34: Open source Technology

Memcache

Page 35: Open source Technology

What is Caching ?

Page 36: Open source Technology

A Copy of real data with faster (and/or cheaper) access.

From  Wikipedia  :  "A  cache  is  a collection of data duplicating original stored  elsewhere  or  computed  earlier, where the original data is expensive to fetch(owing  to  longer  access  time)  or to  compute,  compared  to  the  cost  of reading the cache."

Page 37: Open source Technology

MySQL query Cache : Cache in the DB

Disk : File Cache

In Memory : Memached

Page 38: Open source Technology

Free & open source, high­performance, distributed memory  object  caching  system,  generic  in  nature, but  intended  for  use  in  speeding  up  dynamic  web applications by alleviating database load.

Memcached  is  an  in­memory  key­value  store  for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.

Memcached  is  simple  yet  powerful.  Its  simple design  promotes  quick  deployment,  ease  of development, and solves many problems facing large data caches. Its API is available for most popular languages.

What is Memcache ?

Page 39: Open source Technology

FaebookNaukri

LiveJournalWikipediaFlickrBebo

TwitterTypepadYellowbotYoutubeDigg

WordPress.comCraigslist

Mixi

Memcache Users

Page 40: Open source Technology

­ Fetch from cache

­ If there, return

­ Else caclculate, place in cache, return

Pattern

Page 41: Open source Technology

function get_foo(foo_id)

    foo = memcached_get("foo:" . foo_id)

    return foo if defined foo

    foo = fetch_foo_from_database(foo_id)

    memcached_set("foo:" . foo_id, foo)

    return foo

end

Program

Page 42: Open source Technology
Page 43: Open source Technology

Let's add Memcache to the CODE

Page 44: Open source Technology
Page 45: Open source Technology

GEARMAN ?

Page 46: Open source Technology

MANAGER

Page 47: Open source Technology
Page 48: Open source Technology

­ Daemon that manages the work.

­ Does not do any work.

­ Accetps a job id and a binay payload from   Clients

­ Workers keep connections open at all   times.

Gearmend

Page 49: Open source Technology
Page 50: Open source Technology

­ Clients connect to Gearmand and ask for   work to be done

­ The client can fire and forget or wait on   a responses

­ Multiple jobs can be done asynchronously   by workers for one client.

Client

Page 51: Open source Technology
Page 52: Open source Technology

­ A single worker can do just one job or   can do many jobs.

­ Does not have to be written using the   same language as the workers.

Workers

Page 53: Open source Technology

# Create our client object.$client= new GearmanClient(); # Add default server (localhost).$client­>addServer(); echo "Sending job\n"; # Send reverse job$result = $client­>do("reverse", "Hello!");if ($result) {  echo "Success: $result\n";}

An Example Client

Page 54: Open source Technology

# Create our worker object.$worker= new GearmanWorker(); # Add default server (localhost).$worker­>addServer(); # Register function "reverse" with the server.$worker­>addFunction("reverse", "reverse_fn"); while (1){  print "Waiting for job...\n";  $ret= $worker­>work();  if ($worker­>returnCode() != GEARMAN_SUCCESS)    break;} # A much simple reverse functionfunction reverse_fn($job){  $workload= $job­>workload();  echo "Received job: " . $job­>handle() . "\n";  echo "Workload: $workload\n";   $result= strrev($workload);  echo "Result: $result\n";  return $result;}

An Example Worker

Page 55: Open source Technology
Page 56: Open source Technology

NOSQL

Page 57: Open source Technology

Database paradigms

* Relational (RDBMS)

* NoSQL* Key­value stores* Document databases* Graph Database

* Others

Page 58: Open source Technology

Relational Databases* ACID 

AutomicityConsistencyIsolationDurability

* SQL

* Mature

Page 59: Open source Technology
Page 60: Open source Technology
Page 61: Open source Technology

NoSQL* No relational tables

* No fixed tables schemas

* No joins

* No risk, no fun !

* Massive data stores

* Scaling is easy

* Simpler to implement 

Page 62: Open source Technology

Goodbye rows and tables, hello documents and collections

Page 63: Open source Technology

Lots of pretty pictures to fool you.

Page 64: Open source Technology

Noise

Page 65: Open source Technology

Introduction

MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which provide rich queries and deep functionality).

MongoDB is document-oriented, schema-free, scalable, high-performance, open source. Written in C++

Mongo is not a relational database like MySQL

Goodbye rows and tables, hello documents and collections

FeaturesDocument-oriented

Documents (objects) map nicely to programming language data types Embedded documents and arrays reduce need for joins No joins and no multi-document transactions for high performance and easy scalability

High performance No joins and embedding makes reads and writes fast Indexes including indexing of keys from embedded documents and arrays

High availability Replicated servers with automatic master failover

Easy scalability Automatic sharding (auto-partitioning of data across servers)

Reads and writes are distributed over shards No joins or multi-document transactions make distributed queries easy and fast

Eventually-consistent reads can be distributed over replicated servers

Page 66: Open source Technology

Why ?

Cost - MongoDB is free

MongoDb is easily installable.

MongoDb supports various programming languages like C, C++, Java,Javascript, PHP.

MongoDB is blazingly fast

MongoDB is schemaless

Ease of scale-out

If load increases it can be distributed to other nodes across computer networks.

It's trivially easy to add more fields -- even complex fields -- to your objects.

So as requirements change, you can adapt code quickly.

Background Indexing

MongoDB is a stand-alone server

Development time is faster, too, since there are no schemas to manage.

It supports Server-side JavaScript execution.

Which allows a developer to use a single programming language for both client and server side code

Page 67: Open source Technology

Limitations

Mongo is limited to a total data size of 2GB for all databases in 32-bit mode.

No referential integrity

Data size in MongoDB is typically higher.

At the moment Map/Reduce (e.g. to do aggregations/data analysis) is OK,

but not blisteringly fast.

Group By : less than 10,000 keys.

For larger grouping operations without limits, please use map/reduce .

Lack of predefined schema is a double-edged sword

No support for Joins & transactions

Page 68: Open source Technology

Mongo data model

MySQL Term Mongo Term

database database

table collection

index index

row BSON document

column BSON field

Primary key _id field

A Mongo system (see deployment above) holds a set of databasesA database holds a set of collectionsA collection holds a set of documentsA document is a set of fieldsA field is a key-value pairA key is a name (string)A value is a

basic type like string, integer, float, timestamp, binary, etc., a document, or an array of values

Page 69: Open source Technology

SQL to Mongo Mapping Chart

Page 70: Open source Technology

Continued ...

SQL Statement Mongo Statement

Page 71: Open source Technology

Debugging & Profiling

Page 72: Open source Technology

Debugging & Profiling

Page 73: Open source Technology

Debugging & Profiling

Page 74: Open source Technology

Why & How ?

* Bugs are bad

* Locate issues during runtime

* Speed up issue resolution

* Breakpoints

* Xdebug

Page 75: Open source Technology

Xdebug

  Xdebug  is  a  PHP  extension  that  aims  to lend  a  helping  hand  in  the  process  of debugging  your  applications.  Xdebug offers features like:

    * Automatic stack trace upon error    * Function call logging    * Display features such as enhanced       var_dump() output and code       coverage information    ­ Open Source  ­ Free

Page 76: Open source Technology

Enabling Xdebug in php.ini

zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"xdebug.remote_enable=1xdebug.remote_host="127.0.0.1"xdebug.remote_port=9000xdebug.profiler_enable=1xdebug.show_local_vars=Onxdebug.trace_output_dir="/tmp/xprofile/"xdebug.trace_output_name= %t.tracexdebug.profiler_output_name = %s.%t.profilexdebug.profiler_output_dir="/tmp/xprofile/"

Page 77: Open source Technology

Enabling Xdebug in php.ini

zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"xdebug.remote_enable=1xdebug.remote_host="127.0.0.1"xdebug.remote_port=9000xdebug.profiler_enable=1xdebug.show_local_vars=Onxdebug.trace_output_dir="/tmp/xprofile/"xdebug.trace_output_name= %t.tracexdebug.profiler_output_name = %s.%t.profilexdebug.profiler_output_dir="/tmp/xprofile/"

Page 78: Open source Technology

Enabling Xdebug in php.ini

zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"xdebug.remote_enable=1xdebug.remote_host="127.0.0.1"xdebug.remote_port=9000xdebug.profiler_enable=1xdebug.show_local_vars=Onxdebug.trace_output_dir="/tmp/xprofile/"xdebug.trace_output_name= %t.tracexdebug.profiler_output_name = %s.%t.profilexdebug.profiler_output_dir="/tmp/xprofile/"

Page 79: Open source Technology

Lucene

Page 80: Open source Technology

Apache  Lucene  is  a  free/open  source information  retrieval  software  library, originally  created  in  Java  by  Doug Cutting.

Page 81: Open source Technology

Scalable, High­Performance Indexing

  * small RAM requirements* incremental indexing as fast as batch indexing

   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  * ranked searching ­­ best results returned first* many powerful query types: phrase queries, wildcard 

     queries, proximity queries, range queries and more   * fielded searching (e.g., title, author, contents)   * date­range searching   * sorting by any field   * multiple­index searching with merged results   * allows simultaneous update and searching

Cross­Platform Solution

*  Available  as  Open  Source  software  under  the  Apache      License which lets you use Lucene in both commercial        and Open Source programs

* 100%­pure Java   * Implementations in other programming languages      available that are index­compatible

Page 82: Open source Technology

Scalable, High­Performance Indexing

  * small RAM requirements* incremental indexing as fast as batch indexing

   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  * ranked searching ­­ best results returned first* many powerful query types: phrase queries, wildcard 

     queries, proximity queries, range queries and more   * fielded searching (e.g., title, author, contents)   * date­range searching   * sorting by any field   * multiple­index searching with merged results   * allows simultaneous update and searching

Cross­Platform Solution

*  Available  as  Open  Source  software  under  the  Apache      License which lets you use Lucene in both commercial        and Open Source programs

* 100%­pure Java   * Implementations in other programming languages      available that are index­compatible

Page 83: Open source Technology

Scalable, High­Performance Indexing

  * small RAM requirements* incremental indexing as fast as batch indexing

   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  * ranked searching ­­ best results returned first* many powerful query types: phrase queries, wildcard 

     queries, proximity queries, range queries and more   * fielded searching (e.g., title, author, contents)   * date­range searching   * sorting by any field   * multiple­index searching with merged results   * allows simultaneous update and searching

Cross­Platform Solution

*  Available  as  Open  Source  software  under  the  Apache      License which lets you use Lucene in both commercial        and Open Source programs

* 100%­pure Java   * Implementations in other programming languages      available that are index­compatible

Page 84: Open source Technology

Scalable, High­Performance Indexing

  * small RAM requirements* incremental indexing as fast as batch indexing

   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  * ranked searching ­­ best results returned first* many powerful query types: phrase queries, wildcard 

     queries, proximity queries, range queries and more   * fielded searching (e.g., title, author, contents)   * date­range searching   * sorting by any field   * multiple­index searching with merged results   * allows simultaneous update and searching

Cross­Platform Solution

*  Available  as  Open  Source  software  under  the  Apache      License which lets you use Lucene in both commercial        and Open Source programs

* 100%­pure Java   * Implementations in other programming languages      available that are index­compatible

Page 85: Open source Technology

Scalable, High­Performance Indexing

  * small RAM requirements* incremental indexing as fast as batch indexing

   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  * ranked searching ­­ best results returned first* many powerful query types: phrase queries, wildcard 

     queries, proximity queries, range queries and more   * fielded searching (e.g., title, author, contents)   * date­range searching   * sorting by any field   * multiple­index searching with merged results   * allows simultaneous update and searching

Cross­Platform Solution

*  Available  as  Open  Source  software  under  the  Apache      License which lets you use Lucene in both commercial        and Open Source programs

* 100%­pure Java   * Implementations in other programming languages      available that are index­compatible

Pitfalls

* Update = Delete + Add

* No Partial document update

* No Joins

Page 86: Open source Technology

Scalable, High­Performance Indexing

  * small RAM requirements* incremental indexing as fast as batch indexing

   * index size roughly 20­30% the size of text indexed

Powerful, Accurate and Efficient Search Algorithms

  * ranked searching ­­ best results returned first* many powerful query types: phrase queries, wildcard 

     queries, proximity queries, range queries and more   * fielded searching (e.g., title, author, contents)   * date­range searching   * sorting by any field   * multiple­index searching with merged results   * allows simultaneous update and searching

Cross­Platform Solution

*  Available  as  Open  Source  software  under  the  Apache      License which lets you use Lucene in both commercial        and Open Source programs

* 100%­pure Java   * Implementations in other programming languages      available that are index­compatible

Code: FS Indexer

private IndexWriter writer; public Indexer(String indexDir) throws IOException { Directory dir = FSDirectory.open(new File(indexDir)); writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,

IndexWriter.MaxFieldLength.UNLIMITED); }

public void close() throws IOException { writer.close(); }

public void index(String dataDir, FileFilter filter) throws Exception { File[] files = new File(dataDir).listFiles(); for (File f: files) { Document doc = new Document(); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); }}

Page 87: Open source Technology

Code: Searcher public void search(String indexDir, String q) throws IOException,

ParseException { Directory dir = FSDirectory.open(new File(indexDir)); IndexSearcher is = new IndexSearcher(dir, true);

QueryParser parser = new QueryParser("contents", new

StandardAnalyzer(Version.LUCENE_CURRENT)); Query query = parser.parse(q); TopDocs hits = is.search(query, 10); System.err.println("Found " + hits.totalHits + " document(s)");

for (int i=0; i<hits.scoreDocs.length; i++) { ScoreDoc scoreDoc = hits.scoreDocs[i]; Document doc = is.doc(scoreDoc.doc); System.out.println(doc.get("filename")); }

is.close(); }

Page 88: Open source Technology