34
Apache Systems Projects in the Real World Srinath Perera Ph.D. Senior Software Architect, WSO2 Inc. Member, Apache Software Foundation Visiting Faculty, University of Moratuwa Research Scientist, Lanka Software Foundation

IESL Talk Series: Apache System Projects in the Real World

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: IESL Talk Series: Apache System Projects in the Real World

Apache Systems Projects in the Real

WorldSrinath Perera Ph.D.

Senior Software Architect, WSO2 Inc.Member, Apache Software Foundation Visiting Faculty, University of Moratuwa

Research Scientist, Lanka Software Foundation

Page 2: IESL Talk Series: Apache System Projects in the Real World

Goals of this Talk

Intro to Apache and Opensource Describe a large Scale E-Science Project build on

Apache Technology and some open problems. Apache Airavata Discuss “should your project move to Apache?”

photo by John Trainoron Flickr http://www.flickr.com/photos/trainor/2902023575/, Licensed under CC

Page 3: IESL Talk Series: Apache System Projects in the Real World

Open Source Basic definition is code

accessible to everyone. Yes, you can write

something and make it opensource.

But Community is one of the key aspects.

Often build by volunteers (at least not payed by the project)

Does serious Crowdsourcing Ideally, Code contributions , governance, and decision

model all open and decentralized. Not all opensource projects are equal (different license)

o GPL License – Linux etc., you have to contribute back changes o Apache License – Commercial friendly Copyright digitalART2 and licensed for reuse under CC License , http://www.flickr.com/photos/digitalart/2101765353/

Page 4: IESL Talk Series: Apache System Projects in the Real World

How does a Opensource Work? Open code repository (SVN or

Git etc.) Two parts of the community

o Developer Communityo User Community

Communication through Mailing lists / IRC Channelo Develop mailing listo User mailing list

Bug tracking database to track errors (Jira, Bugzilla)

People submit improvements as patches through Jira etc.

Committers have write access to repository Committers review and apply patches, and when you

submit lot of them, they will make you a committer.

Page 5: IESL Talk Series: Apache System Projects in the Real World

Success Stories Apache Web Server Linux MySQL Apache Tomcat Apache Axis2 Apache Synapse/WSO2 ESB Firefox Eclipse …

Copyright kafka4prez and licensed for reuse under CC License , http://www.flickr.com/photos/kafka4prez/198465913

Victory

Gartner Predicted that by 2012 most systems will use open source components

Page 6: IESL Talk Series: Apache System Projects in the Real World

Why People Contribute? Because they Enjoy it To work with smart people Because they get payed to do it.

o If you are a reputed opensource developer, bets are that you can get someone to pay you for contributing to opensource.

Visibility, to Make an impact o Recognition, prestigeo To Improve your brand / profile o To get into Grad school

As a Business Strategy o Building or supporting an opensource

project may be a long term strategic action.

Copyright U. S. Fish and Wildlife Service and licensed for reuse under CC License , http://www.flickr.com/photos/usfwsnortheast/4754624921 and Copyright WxMom and licensed for reuse under CC

License , http://www.flickr.com/photos/wxmom/1359996991.

Great investments need faith and

patience

Page 7: IESL Talk Series: Apache System Projects in the Real World

Open Source Business Model Opensource projects occupy a significant portion of the

middleware space and many others. o Many commercial products are powered by Open source projectso Many large companies invest a significant amount of resources

on opensource projects (sometime 1000s) Often there are companies around opensource Projects Business models

o Build an improved pro versions and sell themo Sell production support o Provide Consultancy, learning etc.

Copyright Emdot and licensed for reuse under CC License, http://www.flickr.com/photos/emdot/2418695

Page 8: IESL Talk Series: Apache System Projects in the Real World

Apache Software Foundation Build on the Success of Apache

Web Server Home to many successful and

highly influential Open Source Project like Apache Web Server

Governed by Apache Licenseo Can edit and redistribute, and even

sell o Not viral, you are free to make

money on top it Community is the Key

o User Community o Developer Community

Open development model with Open decisions o Communication through mailing lists

Copyright Jeff Kubina and licensed for reuse under CC License , http://www.flickr.com/photos/95118988@N00/416015918

Warm Springs Chiricahua Apache

Page 9: IESL Talk Series: Apache System Projects in the Real World

Apache System Projects Web Service Support

o Apache Axis2, Apache Rampart, Apache Sandesha, Apache CXF ..

Workflow Engine o Apache ODE

Enterprise Service Buso Apache Synapse o Apache Camel

Messaging o Apache Qpid/ ActiveMQ

Data Storages o Apache Cassandra, CouchDB, Apache OODT

J2EE Container o Apace Geronimo

Copyright ind{yeah} and licensed for reuse under CC License , http://www.flickr.com/photos/flickcoolpix/3566848458/

Page 10: IESL Talk Series: Apache System Projects in the Real World

A Large E-Science Project as a Case

Study

Page 11: IESL Talk Series: Apache System Projects in the Real World

E-Science Continuation of High Performance Computing, Parallel Computing, and Grid.

Underline theme is “Cyber-infrastructures to support Scientific Research”.

Build around “Computation” as the third pillar of Science (along with Analysis and Experimentation).

Characterized by wide range of computing (CPU minutes to CPU years) and Data (few KB to PBs of data) requirements.

Based on Real life usecases.

Page 12: IESL Talk Series: Apache System Projects in the Real World

“Tis strange—but true; for truth is always strange,Stranger than fiction.”

---- Lord Byron, Don Juan (1818-24)

E-Science joins Theory with Real life data Real Life Applications often go beyond our

experiences. Most Weather models are calculated much less

than ideal resolutions, otherwise a 24 hour forecast takes more than 24 hours !!!

Physics Usecases (e.g. Large Hadron Collider), Telescopes, Genome Analysis generate Tera bytes of data in days if not hours, and moving a 1TB takes hours even in a 10 GB networks of TeraGrid.

Scale, geographical distribution of resources, Heterogeneity makes these usecases Complex.

Copyright Nrbelex and licensed for reuse under CC License , http://www.flickr.com/photos/nrbelex/529393643

Surprise

Page 13: IESL Talk Series: Apache System Projects in the Real World

Linked Environments for Atmospheric Discovery (LEAD)

U.S. NSF funded, 10+ Universities, 11M $, 5 Years.

Used for U.S. National Weather forecasts by NOAA.

Presented to U.S. Congress as an example to justify Scientific research spending by U.S. NSF.

Have brought the state of the art forecasting capabilities to wider audience ranging from hardcore scientists to high schools students.

Copyright f2n_downtown and licensed for reuse under CC License , http://www.flickr.com/photos/myneighborhood/4809104443

Page 14: IESL Talk Series: Apache System Projects in the Real World

LEAD: Dynamic Weather Analysis in U.S. Wide Scale

Page 15: IESL Talk Series: Apache System Projects in the Real World

Why is it Hard? Geographically Distributed Sensors,

Computing Power, Storage, and Expertise.

Handling Failures and Recovery Long Running Jobs (> 1 Hour). Large Scale Jobs (10-1000+

processors). Large Sized Data (KBs to GB of data). Need to serve many parallel users. Usage spikes.

Copyright Wonderlane and licensed for reuse under CC License , http://www.flickr.com/photos/wonderlane/3302165946

Page 16: IESL Talk Series: Apache System Projects in the Real World

LEAD as an Example

Assume a Hurricane has developed, and 1000 scientists across U.S. come to the LEAD portal to run forecasts.

Lets assume, Each user run 3 workflows. Each Workflow has 6 services,

generates about 300 notifications, moves 50 100MB files, generates 50 100MB files, and runs for one hour.

Each Service needs 5 CPUs Hours .

Copyright gletham GIS, Social, Mobile Tech Images and licensed for reuse under CC License, http://www.flickr.com/photos/gisuser/54062274/

Page 17: IESL Talk Series: Apache System Projects in the Real World

Which Means

3000 Parallel workflows Need 90,000 CPUs per Hour 250 TPS for messaging

System Move 8GB/Sec through the

network Generate 15TB data per Hour

Not all of this can be handled now, but they give us an idea about the

challenge. Copyright matsuyuki and licensed for reuse under CC License,

http://www.flickr.com/photos/matsuyuki/5461363022

Do the math

Page 18: IESL Talk Series: Apache System Projects in the Real World

SOA, E-Science and LEAD E-Science infrastructures are distributed, complex, and

heterogeneous. SOA is designed to handle just the like. LEAD is based on many SOA Specs

WSDL, SOAP, WS-Addressing for Communication WS-BPEL for Workflows WS-Eventing for Messaging WSDM for service Management

LEAD People have closely worked with and contributed to Web Services, pushing its limits to apply it to LEAD.

Page 19: IESL Talk Series: Apache System Projects in the Real World

LEAD Architecture

Page 20: IESL Talk Series: Apache System Projects in the Real World

Workflow Subsystem

Page 21: IESL Talk Series: Apache System Projects in the Real World

Workflow Subsystem Challenges

Maximizing Resource Utilizationo Utilizing the Cloud o Cloud Bursting o Handling Priorities

Scaling up Service and Workflow

Governance Execution Delegation

Copyright Doug Lee and licensed for reuse under CC License,

http://www.geograph.org.uk/photo/1893583

Page 22: IESL Talk Series: Apache System Projects in the Real World

Data Subsystem

Page 23: IESL Talk Series: Apache System Projects in the Real World

Data Subsystem Challenges Large Scale data

Repositories o To detect, collect

metadata, and store o To Search o Replica Management

Data Mining o CEPo Clustering algorithms etc.

Data Provenance o Data Quality

Copyright Anne Petty and licensed for reuse under CC

License, http://www.geograph.org.uk/photo/

101401

Page 24: IESL Talk Series: Apache System Projects in the Real World

Messaging Subsystem

Page 25: IESL Talk Series: Apache System Projects in the Real World

Messaging Subsystem Challenges Underline model is Publish/

Subscribe pattern Challenges are

o How to scale up? Supporting large number of users and supporting large number of subscriptions

o Avoid single Point of Failureo Ensure guaranteed delivery o Security within

Publish/Subscribe pattern Related Projects

o WS-Messengero Narada Brokero Apache Qpid

Copyright Dave Croker and licensed for reuse under CC License,

http://www.geograph.org.uk/photo/689155

Page 26: IESL Talk Series: Apache System Projects in the Real World

LEAD & Apache WS History LEAD and Apache teams both has contributed to other

(and there is overlap) LEAD is older than Axis2, and it forked off in Axis era,

mainly because of Async messaging support. Five years ago LEAD implemented many tools (e.g.

Registries, Async Messaging, Workflow Engine), that are hot topics now.

Team receive Continuing funding to make it Open Source under OGCE

LEAD code base now based on Axis2, ODE and others Moved into Apache as “Apache Airavata”

Page 27: IESL Talk Series: Apache System Projects in the Real World

LEAD with Apache Projects LEAD Switched to Apache ODE for workflow execution

more than 3 years ago. LEAD data subsystems switched to Axis2 about 3 years

ago. Job Submission was switched to Axis2 about 2 years

back. Service Factory is being converted to Axis2 about year

back. Conversion of Messaging System about year back

(Through a Indiana University and LSF collaboration).

Page 28: IESL Talk Series: Apache System Projects in the Real World

Apache Airavata

All partners agreed that best option for OGCE Project to continue through is Apache Project

Joined Apache Incubator about 2 months back

Includes following subprojectso Xbaya workflow composer o WS-Messenger as the

Messaging systemo Generic Service Toolkit o Service Registry

Copyright ZeePack and licensed for reuse under CC License,

http://www.flickr.com/photos/zeepack/3681815248

Page 29: IESL Talk Series: Apache System Projects in the Real World

Should You Try toMove your Project

to Apache?

Page 30: IESL Talk Series: Apache System Projects in the Real World

Apache as a Sustainability model for Research projects

Industry values “People”, we (opensource) value “Code”, and Academia values “Ideas”.

Most NSF Grants, now, ask for a Sustainability Model as part of Proposals.

One option is a commercial spin off.

Doing it in a opensource way, building a community and users around a project is also a potential Solution.

Many Challenges: ownership, need to renounce control, active engagement of the community are the key.

“Source Open” is not good enough!! “Dump and Run” does not work either.

Copyright stephend9 and licensed for reuse under CC License, http://www.flickr.com/photos/stephend9/372996705

Diamonds areForever

Page 31: IESL Talk Series: Apache System Projects in the Real World

Pros & ConsAdvantages Disadvantages

Reach to a wider audience, healthy user Community, world debug your project for you.

You have to let go of the ownership, at least to a some extent.

Potential long lifetime for the , Self sustaining community if Successful.

Need for community consent might slow you down.

To take advantage of Apache process throughout project life cycle (Releases, SVN, Jira, Wiki, Culture ).

You have to learn to listen and explain. Some arguments are harder to do in a mailing list.

Better chances of attracting external developers, more inputs. Better chance of avoiding “source open”.

Have to time publications.

Take advantage of Apache Infrastructure.

Page 32: IESL Talk Series: Apache System Projects in the Real World

How does the Model Works? Need a Champion Have to submit a Proposal to Apache Incubator If accepted, will be placed in the incubator Team should work to build the community

o Users o Developers o Diversity of the community

Graduation More users usually means more contribution Apache Board continues to monitor for compliance

Page 33: IESL Talk Series: Apache System Projects in the Real World

Conclusion Wanted to share a Real Life, Large-Scale SOA

Usecase Wanted to show LEAD-Apache interactions as a real

Life Case Study of interactions between Apache and an Academic Project.

Wanted to Showcase Apache as a Sustainability Mechanism, if it is done right.

Wanted to Give you a sense of Some open problems and kind of problems Distributed Systems and E-Science trying to solve.

Page 34: IESL Talk Series: Apache System Projects in the Real World

Copyright by romainguy, and licensed for reuse under CC License http://www.flickr.com/photos/romainguy/249370084

Questions?