Upload
gavin-dustin-terry
View
213
Download
0
Embed Size (px)
Citation preview
SilverLining
Stuff we're covering
• Hardware infrastructure and scaling• Cloud platform as a service • The SilverLining Project
Some context
• We work at a university• Funding based on projects• Biodiversity web apps and APIs• Focus on software (not hardware)
Infrastructure
• Applications depend on infrastructure• Infrastructure that "just works" is expensive• More money for infrastructure means less money for
application development• Degenerates without long-term funding• Unreliability is bad for applications • Increasingly bad user experience over time
• $1.6M USD total budget to 17 institutions• $245k USD (30.6% of direct costs) for infrastructure
• $1.6M USD total budget to 17 institutions• $245k USD (30.6% of direct costs) for infrastructure• $100k USD (12.6% of direct costs) for core application
developmento DiGIR provider, DiGIR portal
MaNIS, ORNIS, HerpNet, FishNet
• $7.6M USD combined budgets, 71 institutions• $196k USD annual operating cost
MaNIS, ORNIS, HerpNet, FishNet
• $7.6M USD combined budgets, 71 institutions• $196k USD annual operating cost• $179k USD (92%) for infrastructure
Infrastructure as a Problem (IaaP)
Infrastructure as a Problem (IaaP)
• Unsustainable• Creates a barrier to innovation• And this is all before scaling
comes into play!
Scalability
"The ability for infrastructure to reliably handle heavy request
loads in a high performance way."
IaaP at scale
Scaling up
• Scale up vertically with a server upgrade • Scale out horizontally with more servers
Scaling up
Scaling DiGIR networksMaNIS, ORNIS, HerpNet, FishNet
• ~85 million records • ~100 servers
Scaling DiGIR networksMaNIS, ORNIS, HerpNet, FishNet
• ~85 million records • ~100 servers
s
Query: All records with a point
Response: Error: IO problem
"Scaling is hard."- Alex Payne
"Scaling is hard."- Alex Payne
al3x.net/2010/07/27/node.html
Scaling in the small
• Handling dozens or requests per second• Scaling up vertically is sufficient• Performance improvements are software related
al3x.net/2010/07/27/node.html
Scaling in the large
• Billions of requests per week (Google)• Millions of active users (Facebook)• Data centers worldwide with millions of servers
al3x.net/2010/07/27/node.html
Are we scaling large or small?
• GBIF ~220 million records• eBird ~2 million new records per month• Undigitized collections ~2.5 billion records
Scaling in the "small-ish"
• We're at the brink!• IaaP is in the way, scaling is making it worse• Where's the silver lining in all of this?
Platform as a Service (PaaS)en.wikipedia.org/wiki/Platform_as_a_service
Conceptually quite simple:• Computing power over the Internet • No servers to maintain• Pay for use• Scales large (even if your application is small)• Provided by companies such as Amazon, Microsoft, Google
SilverLiningsilver-lining.googlecode.com
• Experiments, metrics, prototypes (not products)• Picked Google App Engine• PaaS with biodiversity data• Simple Darwin Core• Bulk loading, storage• MapReduce - indexes, validation, statistics• Optimize for resource efficiency, search performance
Cost comparison
Total annual operating costs of vertebrate networks:• Current architecture: USD $195,600• Projected App Engine: USD $19,540
Cost comparison
Total annual operating costs of vertebrate networks:• Current architecture: USD $195,600• Projected App Engine: USD $19,540
Total cost for SilverLining work to date:• 50 cents
App Enginecode.google.com/appengine
• Develop scalable web apps on Google's infrastructure• No servers or hardware to maintain and free quotas• Standards based Java and Python SDKs• IDE support for Eclipse, NetBeans, IntelliJ• Local development server • Integrated support for unit testing
App Engine constraints
• Practical constraints for performance and scalability• The datastore is not a relational database • Query can only use inequality filters on 1 property• Fails: year >= 1980 and year <= 1982 and elevation > 10• Solution: Set membership queries
Set membership queries
• Before: year >= 1980 and year <= 1982 and elevation > 10• After: year "within 1 year" of 1981 and elevation > 10• List for "within 1 year" of 1980: [1979, 1980, 1981]
Aggregation and synchronizationcode.google.com/p/pubsubhubbubcode.google.com/apis/feed/push
• Fast aggregation via API• Subscribe to changes at the source• Changes pushed automatically
What's the end game?
• PaaS instead of IaaP • SaaS (software as a solution)• BaaS (biodiversity applications at scale)
Aaron [email protected]
John [email protected]
What's the end game?
• PaaS instead of IaaP • SaaS (software as a solution)• BaaS (biodiversity applications at scale)
Any QaaC? (Questions as a challenge)
Aaron [email protected]
John [email protected]