Upload
amardeep-vishwakarma
View
956
Download
2
Tags:
Embed Size (px)
Citation preview
Open Source Technologies
What is Open Source ?
Simple: You can read the code.
You can see how it's made
Two main characteristics
First, Its FREE
Second (much more important & interesting),it’s free as in freedom.
Four Freedoms
* The freedom to run the program for any Purpose
* The freedom to study how the program
works, and adapt it to your needs
* The freedom to redistribute copies
* The freedom to improve the program
Why this is cool ?
Anyone can do whatever they like with it.Nobody owns it, Everyone can use it, Anyone can improve it
Improved in terms of quantity of code (functionality)People add layers on top of other people’s code
As the code base grows, the potential growsImproves chances of it being used for something not intended by the originator
What does it take to be a Web Developer?
HTML&
PHP
Let's take a brief look on what is a “Web Developer”
And that was just the Ruby stack
Now back to the question
What does it take to be a Web Developer?
A Passion for Learning
LAMP
LinuxL
* Very reliable OS
* Extremely powerful
* Performs great even in less resources
* Compelling Graphics
* Powerful Programming supports
* Scalable
* No piracy Issues
ApacheL
Web server can refer to either the hardware (the computer) or the software (the computer application) that helps to deliver Web content that can be accessed through the Internet.
The most common use of web servers is to host websites, but there are other uses such as gaming, data storage or running enterprise applications.
Apache * Only webserver to run on all major platforms (*NIX, WINDOZ, MAC, FREEBSD and any other you name it)
* Largest Market share holder for web servers since 1996 and still growing.
MySQLL
* Relational Database
* World’s Fastest growing open source database servers.
* Fast performance, high reliability and ease of use.
* It's used on every continent Yes, even Antarctica
* Work on more than 20 platforms including Linux, Windoz, OS/X, HP UX, AIX, Netware to name a few
* Supports various Engines
PHPL
* Open Source serverside scripting language designed specifically for the web.
* Most widely uses language on the web
* Outputs not only HTML but can output XML, images (JPG & PNG), PDF files and even Flash movies (using libswf and Ming) all generated on the fly. Can write these files to the filesystem.
* Supports a widerange of databases (20 + ODBC).
* Perl and Clike syntax. Relatively easy to learn.
LAMP OverviewL
Let's CODE :)
Memcache
What is Caching ?
A Copy of real data with faster (and/or cheaper) access.
From Wikipedia : "A cache is a collection of data duplicating original stored elsewhere or computed earlier, where the original data is expensive to fetch(owing to longer access time) or to compute, compared to the cost of reading the cache."
MySQL query Cache : Cache in the DB
Disk : File Cache
In Memory : Memached
Free & open source, highperformance, distributed memory object caching system, generic in nature, but intended for use in speeding up dynamic web applications by alleviating database load.
Memcached is an inmemory keyvalue store for small chunks of arbitrary data (strings, objects) from results of database calls, API calls, or page rendering.
Memcached is simple yet powerful. Its simple design promotes quick deployment, ease of development, and solves many problems facing large data caches. Its API is available for most popular languages.
What is Memcache ?
FaebookNaukri
LiveJournalWikipediaFlickrBebo
TwitterTypepadYellowbotYoutubeDigg
WordPress.comCraigslist
Mixi
Memcache Users
Fetch from cache
If there, return
Else caclculate, place in cache, return
Pattern
function get_foo(foo_id)
foo = memcached_get("foo:" . foo_id)
return foo if defined foo
foo = fetch_foo_from_database(foo_id)
memcached_set("foo:" . foo_id, foo)
return foo
end
Program
Let's add Memcache to the CODE
GEARMAN ?
MANAGER
Daemon that manages the work.
Does not do any work.
Accetps a job id and a binay payload from Clients
Workers keep connections open at all times.
Gearmend
Clients connect to Gearmand and ask for work to be done
The client can fire and forget or wait on a responses
Multiple jobs can be done asynchronously by workers for one client.
Client
A single worker can do just one job or can do many jobs.
Does not have to be written using the same language as the workers.
Workers
# Create our client object.$client= new GearmanClient(); # Add default server (localhost).$client>addServer(); echo "Sending job\n"; # Send reverse job$result = $client>do("reverse", "Hello!");if ($result) { echo "Success: $result\n";}
An Example Client
# Create our worker object.$worker= new GearmanWorker(); # Add default server (localhost).$worker>addServer(); # Register function "reverse" with the server.$worker>addFunction("reverse", "reverse_fn"); while (1){ print "Waiting for job...\n"; $ret= $worker>work(); if ($worker>returnCode() != GEARMAN_SUCCESS) break;} # A much simple reverse functionfunction reverse_fn($job){ $workload= $job>workload(); echo "Received job: " . $job>handle() . "\n"; echo "Workload: $workload\n"; $result= strrev($workload); echo "Result: $result\n"; return $result;}
An Example Worker
NOSQL
Database paradigms
* Relational (RDBMS)
* NoSQL* Keyvalue stores* Document databases* Graph Database
* Others
Relational Databases* ACID
AutomicityConsistencyIsolationDurability
* SQL
* Mature
NoSQL* No relational tables
* No fixed tables schemas
* No joins
* No risk, no fun !
* Massive data stores
* Scaling is easy
* Simpler to implement
Goodbye rows and tables, hello documents and collections
Lots of pretty pictures to fool you.
Noise
Introduction
MongoDB bridges the gap between key-value stores (which are fast and highly scalable) and traditional RDBMS systems (which provide rich queries and deep functionality).
MongoDB is document-oriented, schema-free, scalable, high-performance, open source. Written in C++
Mongo is not a relational database like MySQL
Goodbye rows and tables, hello documents and collections
FeaturesDocument-oriented
Documents (objects) map nicely to programming language data types Embedded documents and arrays reduce need for joins No joins and no multi-document transactions for high performance and easy scalability
High performance No joins and embedding makes reads and writes fast Indexes including indexing of keys from embedded documents and arrays
High availability Replicated servers with automatic master failover
Easy scalability Automatic sharding (auto-partitioning of data across servers)
Reads and writes are distributed over shards No joins or multi-document transactions make distributed queries easy and fast
Eventually-consistent reads can be distributed over replicated servers
Why ?
Cost - MongoDB is free
MongoDb is easily installable.
MongoDb supports various programming languages like C, C++, Java,Javascript, PHP.
MongoDB is blazingly fast
MongoDB is schemaless
Ease of scale-out
If load increases it can be distributed to other nodes across computer networks.
It's trivially easy to add more fields -- even complex fields -- to your objects.
So as requirements change, you can adapt code quickly.
Background Indexing
MongoDB is a stand-alone server
Development time is faster, too, since there are no schemas to manage.
It supports Server-side JavaScript execution.
Which allows a developer to use a single programming language for both client and server side code
Limitations
Mongo is limited to a total data size of 2GB for all databases in 32-bit mode.
No referential integrity
Data size in MongoDB is typically higher.
At the moment Map/Reduce (e.g. to do aggregations/data analysis) is OK,
but not blisteringly fast.
Group By : less than 10,000 keys.
For larger grouping operations without limits, please use map/reduce .
Lack of predefined schema is a double-edged sword
No support for Joins & transactions
Mongo data model
MySQL Term Mongo Term
database database
table collection
index index
row BSON document
column BSON field
Primary key _id field
A Mongo system (see deployment above) holds a set of databasesA database holds a set of collectionsA collection holds a set of documentsA document is a set of fieldsA field is a key-value pairA key is a name (string)A value is a
basic type like string, integer, float, timestamp, binary, etc., a document, or an array of values
SQL to Mongo Mapping Chart
Continued ...
SQL Statement Mongo Statement
Debugging & Profiling
Debugging & Profiling
Debugging & Profiling
Why & How ?
* Bugs are bad
* Locate issues during runtime
* Speed up issue resolution
* Breakpoints
* Xdebug
Xdebug
Xdebug is a PHP extension that aims to lend a helping hand in the process of debugging your applications. Xdebug offers features like:
* Automatic stack trace upon error * Function call logging * Display features such as enhanced var_dump() output and code coverage information Open Source Free
Enabling Xdebug in php.ini
zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"xdebug.remote_enable=1xdebug.remote_host="127.0.0.1"xdebug.remote_port=9000xdebug.profiler_enable=1xdebug.show_local_vars=Onxdebug.trace_output_dir="/tmp/xprofile/"xdebug.trace_output_name= %t.tracexdebug.profiler_output_name = %s.%t.profilexdebug.profiler_output_dir="/tmp/xprofile/"
Enabling Xdebug in php.ini
zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"xdebug.remote_enable=1xdebug.remote_host="127.0.0.1"xdebug.remote_port=9000xdebug.profiler_enable=1xdebug.show_local_vars=Onxdebug.trace_output_dir="/tmp/xprofile/"xdebug.trace_output_name= %t.tracexdebug.profiler_output_name = %s.%t.profilexdebug.profiler_output_dir="/tmp/xprofile/"
Enabling Xdebug in php.ini
zend_extension="/usr/lib/php5/20090626+lfs/xdebug.so"xdebug.remote_enable=1xdebug.remote_host="127.0.0.1"xdebug.remote_port=9000xdebug.profiler_enable=1xdebug.show_local_vars=Onxdebug.trace_output_dir="/tmp/xprofile/"xdebug.trace_output_name= %t.tracexdebug.profiler_output_name = %s.%t.profilexdebug.profiler_output_dir="/tmp/xprofile/"
Lucene
Apache Lucene is a free/open source information retrieval software library, originally created in Java by Doug Cutting.
Scalable, HighPerformance Indexing
* small RAM requirements* incremental indexing as fast as batch indexing
* index size roughly 2030% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
* ranked searching best results returned first* many powerful query types: phrase queries, wildcard
queries, proximity queries, range queries and more * fielded searching (e.g., title, author, contents) * daterange searching * sorting by any field * multipleindex searching with merged results * allows simultaneous update and searching
CrossPlatform Solution
* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%pure Java * Implementations in other programming languages available that are indexcompatible
Scalable, HighPerformance Indexing
* small RAM requirements* incremental indexing as fast as batch indexing
* index size roughly 2030% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
* ranked searching best results returned first* many powerful query types: phrase queries, wildcard
queries, proximity queries, range queries and more * fielded searching (e.g., title, author, contents) * daterange searching * sorting by any field * multipleindex searching with merged results * allows simultaneous update and searching
CrossPlatform Solution
* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%pure Java * Implementations in other programming languages available that are indexcompatible
Scalable, HighPerformance Indexing
* small RAM requirements* incremental indexing as fast as batch indexing
* index size roughly 2030% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
* ranked searching best results returned first* many powerful query types: phrase queries, wildcard
queries, proximity queries, range queries and more * fielded searching (e.g., title, author, contents) * daterange searching * sorting by any field * multipleindex searching with merged results * allows simultaneous update and searching
CrossPlatform Solution
* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%pure Java * Implementations in other programming languages available that are indexcompatible
Scalable, HighPerformance Indexing
* small RAM requirements* incremental indexing as fast as batch indexing
* index size roughly 2030% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
* ranked searching best results returned first* many powerful query types: phrase queries, wildcard
queries, proximity queries, range queries and more * fielded searching (e.g., title, author, contents) * daterange searching * sorting by any field * multipleindex searching with merged results * allows simultaneous update and searching
CrossPlatform Solution
* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%pure Java * Implementations in other programming languages available that are indexcompatible
Scalable, HighPerformance Indexing
* small RAM requirements* incremental indexing as fast as batch indexing
* index size roughly 2030% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
* ranked searching best results returned first* many powerful query types: phrase queries, wildcard
queries, proximity queries, range queries and more * fielded searching (e.g., title, author, contents) * daterange searching * sorting by any field * multipleindex searching with merged results * allows simultaneous update and searching
CrossPlatform Solution
* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%pure Java * Implementations in other programming languages available that are indexcompatible
Pitfalls
* Update = Delete + Add
* No Partial document update
* No Joins
Scalable, HighPerformance Indexing
* small RAM requirements* incremental indexing as fast as batch indexing
* index size roughly 2030% the size of text indexed
Powerful, Accurate and Efficient Search Algorithms
* ranked searching best results returned first* many powerful query types: phrase queries, wildcard
queries, proximity queries, range queries and more * fielded searching (e.g., title, author, contents) * daterange searching * sorting by any field * multipleindex searching with merged results * allows simultaneous update and searching
CrossPlatform Solution
* Available as Open Source software under the Apache License which lets you use Lucene in both commercial and Open Source programs
* 100%pure Java * Implementations in other programming languages available that are indexcompatible
Code: FS Indexer
private IndexWriter writer; public Indexer(String indexDir) throws IOException { Directory dir = FSDirectory.open(new File(indexDir)); writer = new IndexWriter(dir, new StandardAnalyzer(Version.LUCENE_CURRENT), true,
IndexWriter.MaxFieldLength.UNLIMITED); }
public void close() throws IOException { writer.close(); }
public void index(String dataDir, FileFilter filter) throws Exception { File[] files = new File(dataDir).listFiles(); for (File f: files) { Document doc = new Document(); doc.add(new Field("contents", new FileReader(f))); doc.add(new Field("filename", f.getName(), Field.Store.YES, Field.Index.NOT_ANALYZED)); writer.addDocument(doc); }}
Code: Searcher public void search(String indexDir, String q) throws IOException,
ParseException { Directory dir = FSDirectory.open(new File(indexDir)); IndexSearcher is = new IndexSearcher(dir, true);
QueryParser parser = new QueryParser("contents", new
StandardAnalyzer(Version.LUCENE_CURRENT)); Query query = parser.parse(q); TopDocs hits = is.search(query, 10); System.err.println("Found " + hits.totalHits + " document(s)");
for (int i=0; i<hits.scoreDocs.length; i++) { ScoreDoc scoreDoc = hits.scoreDocs[i]; Document doc = is.doc(scoreDoc.doc); System.out.println(doc.get("filename")); }
is.close(); }