Upload
barcelona-tech-upc-barcelona-supercomputer-center-bsc
View
2.301
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Big Data en el Barcelona Supercomputing Center
Citation preview
1
Activity
in Big Data 26/03/2012
2
Previous work
in Big Data
3
Scenario
Application placement and
scheduling:
MapReduce
Data management:
Key-Value storage
Target Applications:
Data Analytics
Bioinformatics
4
Big Data Papers
High level performance goals and Big Data
• Resource-aware Adaptive Scheduling for MapReduce Clusters. J. Polo, C. Castillo, D. Carrera, Y. Becerra, I. Whalley, M. Steinder, J. Torres, E. Ayguadé. In the ACM/IFIP/USENIX 12th International Middleware Conference (Middleware 2011).
• Performance-Driven Task Co-Scheduling for MapReduce Environments.J. Polo, D. Carrera, Y. Becerra, J. Torres, E. Ayguadé, M. Steinder, I. Whalley. In the 12th IEEE/IFIP Network Operations and Management Symposium (NOMS2010).
Hybrid Hardware and Big Data
• Speeding Up Distributed MapReduce Applications Using Hardware Accelerators. Y. Becerra, V. Beltran, D. Carrera, M. González, J. Torres and E. Ayguadé. In the 38th International Conference on Parallel Processing (ICPP 2009).
• Accelerated MapReduce Workloads in Heterogeneous Clusters. J. Polo, D. Carrera, Y. Becerra, V. Beltran, J. Torres, E. Ayguadé. Performance Management of Accelerators. In the 39th International Conference on Parallel Processing (ICPP2010).
Big Data and Energy:
• Towards Energy-Eficient Management of MapReduce Workloads. J.Polo, Y. Becerra, D. Carrera, V. Beltran, J. Torres and E. Ayguadé. First international conference on energy-efficient computing and networking. (e-Energy 2010).
• GreenHadoop: Leveraging Green Energy in Data-Processing Frameworks. Í. Goiri, K. Le, T. D. Nguyen, J. Guitart, J. Torres, and R. Bianchini. European Conference on Computer Systems (Eurosys 2012).
5
On going research
in Big Data
6
New challenges in Big Data: OUR VISION
Data Volume GBs PBs
Executio
n T
ime
Conventional Storage Systems
Large Data
Sets, growing
too big for
conventional
storage/tools
new
requirements
for real-time
decisions
7
New challenges in Big Data: OUR APPROACH
Data Volume GBs PBs
Executio
n T
ime
Conventional Storage Systems
MapReduce &
NoSQL
In-memory
8
On going research projects
Goal Use case Collaborators Technology
involved
MapReduce & NoSQL
Snapshot isolation (support to
online data generation)
Data Analytics
IBM Hadoop
& Cassandra
High level performance goal and automatic query configuration
Data Analytics and Bioinformatics (support to
drug discovery)
Life Science Dept. (BSC)
Hadoop &
Cassandra
Automatic configuration, data organization to meet high level
performance goals
Bioinformatics (support to drug discovery)
Life Science Dept. (BSC)
Cassandra
In-Memory
In-Memory Bioinformatics
Workflows (index construction, alignment, sorting, data
processing)
Bioinformatics (genomic sequencing)
IBM and Life Science Dept. (BSC)
PIMD
9
Next planned research
in Big Data
10
New challenges in Big Data: Our approach
Data Volume GBs PBs
Executio
n T
ime
Conventional Storage Systems
MapReduce &
NoSQL
In-memory
Storage Hierarchy
Management RDBMS
IN-MEMORY
APPLICATION
Our Big Data resource management picture
Resource Management
Application placement and scheduling:
(multi-job performance goals, resource awareness,
hybrid harware)
Data Management: automatic data organization
and configuration (NoSQL/In-Memory/Hierarchy
management)
Hig
h S
ca
lab
le
NoSQL
In-M
em
ory
DB
Heterogeneous Compute Nodes
Storage Hierarchy:
Mix of Mechanichal + Flash + SCM
Data Analytics
Drug Discovery
Air Quality Forecasting
Genomic Sequencing
Business Intelligent
SQL
To meet performance goals as:
Consistency,
Availability,
Partitioning Tolerance,
Energy Consumption,
Response Time,
…
Collaboration with other BSC departments
Data-centric Resource Manager
Custom Data
Mgmt.
Compute nodes Storage
In-mem Key/Val
eDSL Prog.
Models
NoSQL
FileSystems Persistent
Objects
Mix of Mechanichal + Flash + SCM
Lega
cy C
od
e
(MP
I)
Heterogeneous Application Flows (Domain Specific, Differentiated Resource
Requirements)
13
Autonomic & eBusiness
Group
14
Group Goal
To research autonomic and intelligent resource
management for today's business applications.
Autonomic and Intelligent Resource
Management
Cloud Computing
Big Data
Business Analytics
High Performance Computing
Sustainable Computing
The objective is to create new
components at middleware
level that provides holistic
solutions for some of the new
IT challenges in the industry
15
Current main interrelated areas
Middleware that provides
a holistic solution
Workload Management
Massively Distributed Data Stores
Embedded Domain Specific Languages for
HPC
Exploiting Heterogeneous
Hardware
BLO-driven Management
High performance architectures for Big Data
Online predictors
Service-aware VM
Management
Energy-aware Management
16
Group members
www.bsc.es/eBusiness