20
Centralized Decentralized MapReduce Grid Brief History Classes of DFS An Overview of the Classes and Uses of Distributed Filesystems Calvin Winkowski Linux and Unix Users Group at Virginia Tech April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20

An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Brief HistoryClasses of DFS

An Overview of theClasses and Uses of

Distributed Filesystems

Calvin Winkowski

Linux and Unix Users Group at Virginia Tech

April 24, 2014

Calvin Winkowski Distributed FileSystems 1/20

Page 2: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Brief HistoryClasses of DFS

OriginsITS

• MIT AI Lab - IncompatibleTimesharing System

• Virtual software devices• Inter-machine FS access

Locus

• Atomic operations• Transparent data location• Replication• Caching• Fault-tolerance Calvin Winkowski Distributed FileSystems 2/20

Page 3: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Brief HistoryClasses of DFS

AFSGoals

• Develop the future environment of computing• Reduction in central timesharing devices• Networking will be pivotal

Design

• File locking• Location independent• Hierarchy of file systems• Storage independent of consumers

Calvin Winkowski Distributed FileSystems 3/20

Page 4: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Brief HistoryClasses of DFS

Modern World

Data

• Large data stores, very fast access• High availability

Compute

• Variable requirements• High speed & small storage vs Low speed & large storage

Calvin Winkowski Distributed FileSystems 4/20

Page 5: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Brief HistoryClasses of DFS

Gains of DFSs

Capability

• Speed (Write vs Read)• Replication• Locking• De-duplication

Utility

• Unified Namespace• Access Control• Enumeration• Failover vs Fault Tolerance• Load Balancing

Calvin Winkowski Distributed FileSystems 5/20

Page 6: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Brief HistoryClasses of DFS

Classes of DFSs

• Centralized• Decentralized• Mapreduce• Grid

Calvin Winkowski Distributed FileSystems 6/20

Page 7: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamples

Basics of Centralized DFSs

• Central Metadata server• Distributed object storage• Parallel Access is simplified• Fast throughput, higher latency• Loss of high-availability• Favours big compute

Calvin Winkowski Distributed FileSystems 7/20

Page 8: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamples

Lustre

Calvin Winkowski Distributed FileSystems 8/20

Page 9: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamples

Examples of Centralized DFSs

• Lustre ← Note the spelling• XtreemFS• pNFS• AFS

Calvin Winkowski Distributed FileSystems 9/20

Page 10: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamples

Basics of Decentralized DFSs

• All data is clustered• Increases locking complexity• HA friendly• Loss of throughput• May increase latency

Calvin Winkowski Distributed FileSystems 10/20

Page 11: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamples

CEPH

Calvin Winkowski Distributed FileSystems 11/20

Page 12: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamples

Examples of Decentralized DFSs

• CEPH• GFS• GlusterFS• FhGFS• Tahoe-LAFS

Calvin Winkowski Distributed FileSystems 12/20

Page 13: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamplesMapReduce DFSsExamples

MapReduction

Map

• Mathematical definition• Organizing the data for consumption

Reduce

• Produce a series of values• Generating results

Calvin Winkowski Distributed FileSystems 13/20

Page 14: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamplesMapReduce DFSsExamples

MapReduction Example

Calvin Winkowski Distributed FileSystems 14/20

Page 15: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamplesMapReduce DFSsExamples

Basics of MapReduce DFSs

• Favours big data• Some centralized entry point• Master maps to storage servers• Integrate with MapReduce frameworks• Usually provides an API

Calvin Winkowski Distributed FileSystems 15/20

Page 16: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamplesMapReduce DFSsExamples

Hadoop

Calvin Winkowski Distributed FileSystems 16/20

Page 17: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

MethodologyExamplesMapReduce DFSsExamples

Examples of MapReduce DFSs

• Hadoop• GFS (Not to be confused with GFS, GFS2, GlusterFS,

GPFS)• GloudStore• QFS

Calvin Winkowski Distributed FileSystems 17/20

Page 18: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Grid File Systems

• Take advantage of many small nodes• Often run on workstations• "Crowd source" storage• Gfarm File System• Scalable I/O• Grid computing

Calvin Winkowski Distributed FileSystems 18/20

Page 19: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Gfarm

Calvin Winkowski Distributed FileSystems 19/20

Page 20: An Overview of the Classes and Uses of Distributed Filesystems · April 24, 2014 Calvin Winkowski Distributed FileSystems 1/20. Centralized Decentralized MapReduce Grid Brief History

CentralizedDecentralized

MapReduceGrid

Questions

Calvin Winkowski Distributed FileSystems 20/20