Big Data Infrastructure for Scientific Computing
Mathijs Kattenberg – [email protected]
Big Data Landscape
Large Hadron Collider:- Uses: Grid- Volume: ~15 PB per year (~4PB @ SURFsara)- Type of data: structured
Next Generation Sequencing (GoNL):- Uses: Grid, Cloud, Cluster- Volume: ~100 GB to 300 TB- Type of data: various formats and noise
Big Data Landscape
Big Data Landscape
Information retrieval and NLP- Uses: Hadoop, Cloud- Volume: ~70 TB- Type of data: Text, unstructured
http://bit.ly/173ddfz
Where having and exploiting data leads to insights:
- Brainscanr- Healthmap
Effectiveness of Data
• Lots of open data:- Open data Nederland- CitySDK- Community of Amsterdam- Rijkswaterstaat- Twitter- Facebook- Google
• Different formats:- Excel files- JSON- Webservices
• Different quality:- Noise- Missing values- Availability
(Open) Data Sources
Capacity:
• CPU cores
• Hard drive space
• Network bandwidth
Solutions:
• Scale up: get faster tools
• Scale out: work with more tools
Complexity:
• Data:- Noise, missing data- Formats- Access
• Distributed computing- Failures- Parallel programming
Solutions:
• Data: deal with it
• Distributed computing:- Super/Cluster computer- Grid- Hadoop
Computing Big Data
Computing Big Data
Computing Big Data
SURFsara provides:
1. Infrastructure: Supercomputer, clusters, grid, cloud, hadoop
2. Support: development, parallelization, consultancy
3. R&D: piloting new technologies
4. Hosting datasets for common use
What SURFsara Offers
www.surfsara.nl
Mathijs [email protected]
www.sendsteps.comPrepare to react; keep your phone ready!
TXT 1
2
Text to +316 4250 0030
Type Session <space> WS4 <space> your answer
Internet 1
2
Go to sendc.com
Log in with Session
Posting messages is anonymousNo additional charge per message
3 Type WS4 <space> your answer
What kind of technologies would you consider using in order to deal with technical Big Data challenges?
Internet Go to sendc.com and log in with Session Type WS4 <space> Your answer
TXT Send to 06 4250 0030: Session Type WS4 <space> Your answer