Upload
duane-sherman
View
218
Download
2
Embed Size (px)
Citation preview
Opportunities in Parallel I/O for Scientific Data Management
Rajeev Thakur and Rob RossMathematics and Computer Science Division
Argonne National Laboratory
2
Outline
• Brief review of our accomplishments so far
• Thoughts on component coupling
• Topics for future work
3
PVFS2• Collaborative effort between ANL,
Clemson, Northwestern, Ohio State, etc.• Very successful as a freely available
parallel file system for Linux clusters• Also deployed on IBM BG/L• Used by many groups as a research
vehicle for implementing new parallel file system concepts
• True open source software • Open source development
• Tightly coupled MPI-IO implementation (ROMIO)
• Forms the basis for higher layers to deliver high performance to applications
CC CC CC CC CC
Comm. NetworkComm. Network
IOSIOS
PVFSPVFS PVFSPVFS PVFSPVFS PVFSPVFS PVFSPVFS
IOSIOSIOSIOS IOSIOS
4
PVFS2 Performance
5
PVFS2 Performance
0
100
200
300
400
500
600
700
Number of Processes
Avg
. Cre
ate
Tim
e (m
s)
GPFSLustrePVFS2
Time to Create Files Through MPI-IO
6
PnetCDF
• Parallel version of the popular netCDF library
• Major contribution of the SDM SciDAC (funded solely by it)
• Collaboration between Argonne and Northwestern• Main implementers: Jianwei Li (NW) and Rob Latham (ANL)
• Addresses lack of parallelism in serial netCDF without the difficulty of parallelism in HDF
• Only minor changes to the standard netCDF API
• Being used in many applications
7
MPI-IO over Logistical Networking (LN)• LN is a technology that many applications
are using to move data efficiently over the wide area
• Implementing MPI-IO over LN enables applications to access their data directly from parallel programs
• We are implementing a new ROMIO ADIO layer for LN (Jonghyun Lee, ANL)• Nontrivial because the LN API is unlike a
traditional file system API
• Collaboration between Argonne and Univ. of Tennessee
Application
MPI-IO
ADIO
PVFS UFS LN
Local Storage Remote Storage
8
Fruitful Collaborations
• Key to our successes in this SciDAC have been strong collaborations with other participants in the Center• Northwestern University
• PnetCDF, PVFS2• Jianwei, Avery, Kenin, Alok, Wei-keng
• ORNL• Nagiza’s group• MPI-IO and PnetCDF for visualization (parallel VTK)
• LBNL• Ekow (MPI-IO on SRM)
• Ongoing collaboration with Univ. of Tennessee for MPI-IO/LN
9
Component Coupling via Standard Interfaces
• We believe that well-defined standard APIs are the right way to couple different components of the software stack
• Having the right API at each level is crucial for performance
Application
HDF-5 PnetCDF
MPI-IO
PVFS Lustre GPFS
Topics for Future Work
11
Guiding Theme
• How can we cater better to the needs of SciDAC applications?
12
Make Use of Extended Attributes on Files
• PVFS2 now allows users to store extended attributes along with files
• Also available in local Linux file systems, so a standard is emerging
• This feature has many applications:• Store metadata for high-level libraries as
extended attributes instead of directly in the file
• avoids the problem of unaligned file accesses
• Store MPI-IO hints for persistence
• Store provenance information
FILE
Xattr Name=“Mesh Size” Value=“1K x 1K”
13
Next Generation High-Level Library
• HDF and netCDF were written 15-20 years ago as serial libraries
• Explore the possibility of designing a new high-level library that is explicitly built as a parallel library for modern times• What features are needed?
• Can we exploit extended attributes?
• Can the data span multiple files instead of one file, with a directory as the object?
• What is the portable file format?
• New, more efficient implementation techniques
14
Implement Using Combination of Database and Parallel I/O• Use a real database to store
metadata and a parallel file system to store actual data
• Flexible and high performance• Powerful search and retrieval
capability• Prototype implemented in 1999-
2000 (published in SC2000 and JPDC)• Jaechun No, Rajeev Thakur, Alok
Choudhary• Needs more work; collaboration
with application scientists• Serializability/portability of data is
a challenge• What is the right API for this?
Application
SDM
DatabaseMPI-IO
Parallel file system
Metadata
Data
Berkeley DB,Oracle, DB2
15
Parallel File System Improvements
• Autonomic• Self-tuning, self-mantaining, self-healing
• Fault tolerant• Tolerate server failures
• Scalability• Ten to hundred-thousand clients
• Active storage• Run operations on the server, such as data reduction, filtering,
transformation
16
End-to-End Data and Performance Management• Applications run and write data at one site (say NERSC)
• Scientists need to access the data at their home location, which is geographically distant
• Need high-performance and management of this whole process
• We intend to focus on ensuring that our “local access” tools (PVFS, MPI-IO, PnetCDF) integrate well with other tools that access data over the wide area (SRM, Logistical Networking, Gridftp)
17
Summary
• Despite progress on various fronts, managing scientific data continues to be a challenge for application scientists
• We plan to continue to tackle the important problems by• focusing on our strengths in the areas of parallel file systems and
parallel I/O
• collaborating with other groups doing complementary work in other areas to ensure that our tools integrate well with theirs