Upload
kike-tapia
View
227
Download
0
Embed Size (px)
DESCRIPTION
SRM overview
Citation preview
STORAGE ARCHITECTURE/GETTING STARTED:SAN SCHOOL 101 Marc FarleyPresident of Building Storage, IncAuthor, Building Storage Networks, Inc.
Agenda Lesson 1: Basics of SANsLesson 2: The I/O pathLesson 3: Storage subsystemsLesson 4: RAID, volume management and virtualizationLesson 5: SAN network technologyLesson 6: File systems
Basics of storage networking Lesson #1
Connecting
Networking or bus technology Cables + connectorsSystem adapters + network device driversNetwork devices such as hubs, switches, routersVirtual networking Flow control Network securityConnecting
Storing
Device (target) command and controlDrives, subsystems, device emulationBlock storage address space manipulation (partition management)MirroringRAIDStripingVirtualizationConcatentationStoring
Filing
Namespace presents data to end users and applications as files and directories (folders)Manages use of storage address spacesMetadata for identifying datafile nameownerdates Filing
Connecting, storing and filing as a complete storage systemConnecting
NAS and SAN analysisNAS is filing over a networkSAN is storing over a networkNAS and SAN are independent technologiesThey can be implemented independentlyThey can co-exist in the same environmentThey can both operate and provide services to the same users/applications
Protocol analysis for NAS and SANNAS
SAN
NetworkFilingConnectingStoring
NAS Server + SAN Initiator NAS HeadConnectingStoringConnectingFilingIntegrated SAN/NAS environment
StoringConnectingFilingNAS HeadCommon wiring with NAS and SAN
The I/O path Lesson #2
Host hardware path componentsProcessorMemory BusSystem I/O BusStorage Adapter (HBA)Memory
Host software path componentsApplicationOperating SystemFiling SystemVolume ManagerDevice DriverMulti- PathingCache Manager
Network hardware path components CablingFiber opticCopperSwitches, hubs, routers, bridges, gatways Port buffers, processors Backplane, bus, crossbar, mesh, memory
RoutingFabric ServicesVirtual NetworkingAccess and SecurityFlow ControlNetwork software path components
Network PortsAccess and SecurityInternal Bus or NetworkCacheResourceManagerSubsystem path components
Disk drivesTape drivesSolid state devicesTape MediaDevice and media path components
The end to end I/O path pictureDisk drivesTape drivesNetwork SystemsAccess and SecurityInternal Bus or NetworkCacheCablingRoutingFabric Services Virtual NetworkingAccess and SecurityProcessorMemory BusSystem I/O BusStorage Adapter (HBA)Memory AppOperating SystemFiling SystemVolume ManagerDevice DriverMulti- PathingCache ManagerSubsystem Network PoirtResourceManagerFlow Control
Storage subsystems Lesson #3
Storage ResourcesGeneric storage subsystem modelNetwork PortsCache MemoryController (logic+processors) Access control Resource manager Internal Bus or NetworkPower
Redundancy for high availabilityMultiple hot swappable power suppliesHot swappable cooling fansData redundancy via RAIDMulti-path supportNetwork ports to storage resources
Physical and virtual storageSubsystem Controller Resource Manager(RAID, mirroring, etc.)Hot SpareDevice
SCSI communications are independent of connectivitySCSI initiators (HBAs) generate I/O activityThey communicate with targetsTargets have communications addressesTargets can have many storage resourcesEach resource is a single SCSI logical unit (LU) with a universal unique ID (UUID) - sometimes referred to as a serial number An LU can be represented by multiple logical unit numbers (LUNs)Provisioning associates LUNs with LUs & subsystem ports A storage resource is not a LUN, its an LUSCSI communications architectures determine SAN operations
Provisioning storagePort S4Port S3Port S2Port S1LUN 0LUN 1LUN 1LUN 2LUN 2LUN 3LUN 3LUN 0Controller functions
MP SW
MultipathingLUN XLUN X
Read Caches
1. Recently Used2. Read AheadWrite Caches
1. Write Through (to disk)2. Write Back (from cache)Caching
Tape Subsystem Controller Tape SlotsRobotTape subsystemsTape DriveTape DriveTape DriveTape Drive
Exported Storage ResourceManagement stationbrowser-basednetwork mgmt softwareEthernet/TCP/IPOut-of-band management portStorage SubsystemIn-band management Now with SMISSubsystem management
Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)Data redundancy
I/O PathMirroring OperatorI/O PathAI/O PathBTerminate I/O & regenerate new I/OsError recovery/notificationHost-basedWithin a subsystemDuplication redundancy with mirroring
HostUni-directional (writes only)ABAAADuplication redundancy with remote copy
Subsystem SnapshotHostABAAACPoint-in-time snapshot
Lesson #4RAID, volume managementand virtualization
Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)RAID = parity redundancy
Late 1980s R&D project at UC Berkeley David PattersonGarth Gibson (independent)Redundant array of inexpensive disksStriping without redundancy was not defined (RAID 0) Original goals were to reduce the cost and increase the capacity of large disk storageHistory of RAID
Capacity scalingCombine multiple address spaces as a single virtual addressPerformance through parallelismSpread I/Os over multiple disk spindlesReliability/availability with redundancyDisk mirroring (striping to 2 disks)Parity RAID (striping to more than 2 disks)Benefits of RAID
RAIDController (resource manager)Storageextent 1Storageextent 2Storageextent 3Storageextent 4Storageextent 5Storageextent 6Storageextent 7Storageextent 8Storageextent 9Storageextent10Storageextent11Storageextent12Combined extents 1 - 12Capacity scaling
RAID controller (microsecond performance)Disk driveDisk driveDisk driveDisk driveDisk driveDisk driveDisk drives (Millisecond performance) from rotational latency and seek timePerformance
RAID arrays use XOR for calculating parity Operand 1Operand 2XOR Result False FalseFalse False TrueTrue True FalseTrue True TrueFalse XOR is the inverse of itselfApply XOR in the table above from right to leftApply XOR to any two columns to get the thirdParity redundancy
Reduced mode operationsWhen a member is missing, data that is accessed must be reconstructed with xor An array that is reconstructing data is said to be operating in reduced mode System performance during reduced mode operations can be significantly reducedXOR {M1&M2&M3&P}
RAID Parity RebuildThe process of recreating data on a replacement member is called a parity rebuildParity rebuilds are often scheduled for non-production hours because performance disruptions can be so severe XOR {M1&M2&M3&P}Parity rebuild
Hybrid RAID: 0+1Disk driveDisk driveDisk driveDisk drive12345Disk driveDisk driveDisk driveDisk driveDisk driveDisk driveMirrored pairs of striped membersRAID 0+1, 10RAID Controller
Volume management and virtualizationStoring level functionsProvide RAID-like functionality in host systems and SAN network systems Aggregation of storage resources for:scalability availabilitycost / efficiencymanageability
RAID & partition management Device driver layer between the kernel and storage I/O drivers
OS kernelFile systemVolume ManagerHBA driversHBAsVolume ManagerVolume management
SAN diskresources
Virtual Storage
Server systemHBA driversSAN HBA SCSI BusSCSI HBA SCSI disk resourceSAN SwitchSAN cableVolume managers can use all available connections and resources and can span multiple SANs as well as SCSI and SAN resources
RAID and partition management in SAN systemsTwo architectures:In-band virtualization (synchronous)Out-of-band virtualization (asynchronous)SAN storage virtualization
DisksubsystemsExported virtual storageI/O PathSystem(s),switch orrouterIn-band virtualization
Distributed volume managementVirtualization agents are managed from a central system in the SAN
Virtualization agentsDisksubsystemsOut-of-band virtualization
Lesson #5SAN networks
The first major SAN networking technologyVery low latencyHigh reliabilityFiber optic cablesCopper cables Extended distance1, 2 or 4 Gb transmission speedsStrongly typed Fibre channel
A Fibre Channel fabric presents a consistent interface and set of services across all switches in a networkHost and subsystems all 'see' the same resources Storage SubsystemStorage SubsystemStorage SubsystemFibre channel
FC ports are defined by their network roleN-ports: end node ports connecting to fabricsL-ports: end node ports connecting to loopsNL-ports: end node ports connecting to fabrics or loopsF-ports: switch ports connecting to N portsFL-ports: switch ports connecting to N ports or NL ports in a loopE-ports: switch ports connecting to other switch portsG ports: generic switch ports that can be F, FL or E portsFibre channel port definitions
Ethernet / TCP / IP SAN technologiesLeveraging the install base of Ethernet and TCP/IP networksiSCSI native SAN over IPFC/IP FC SAN extensions over IP
Native storage I/O over TCP/IPNew industry standardLocally over Gigabit EthernetRemotely over ATM, SONET, 10Gb EthernetiSCSITCPIPMACPHYiSCSI
Storage NICs (HBAs)SCSI driversCablesCopper and fiberNetwork systemsSwitches/routersFirewallsiSCSI equipment
FC/IPExtending FC SANs over TCP/IP networksFCIP gateways operate as virtual E-port connectionsFCIP creates a single fabric where all resources appear to be localFCIP GatewayFCIP GatewayTCP/IP LAN, MAN or WANE-portE-port
SAN switching & fabricsHigh-end SAN switches have latencies of 1 - 3 secTransaction processing requires lowest latencyMost other applications do not Transaction processing requires non-blocking switchesNo internal delays preventing data transfers
Switches8 48 portsRedundant power suppliesSingle system supervisorDirectors64+ portsHA redundancyDual system supervisorLive SW upgradesSwitches and directors
StarSimplestsingle hopDual starSimple network + redundancySingle hopIndependent or integrated fabric(s)SAN topologies
N-wide starScalableSingle hopIndependent or integrated fabric(s) Core - edgeScalable1 3 hopsintegrated fabricSAN topologies
RingScalableintegrated fabric 1 to N2 hops Ring + StarScalableintegrated fabric1 to 3 hopsSAN topologies
Lesson #6File systems
File system functions
StoringFiling
Think of the storage address space as a sequence of storage locations (a flat address space)
SuperblocksSuperblocks are known addresses used to find file system roots (and mount the file system)
Sheet1
SB
SB
Filing and ScalingStoringStoringFilingFile systems must have a known and dependable address spaceThe fine print in scalability - How does the filing function know about the new storing address space?
Sheet1
12345
678910
1112131415
1617181920
2122232425
Sheet1
123456
789101112
131415161718
192021222324
252627282930
313233343536
373839404142
SCSI's role in storage networkingLegacy open systems server storage Physical parallel bus Independent master/slave protocolStoring in SANsCompatibility requirements with system software force the use of the SCSI protocol Storing and wiring in NASSCSI and ATA (IDE) used with NASLesson #2
Parallel SCSI bus technologies8-bit and 16-bit (narrow and wide)Single ended, differential, low voltage differential (LVD) electronics5MB, 10MB, 20MB, 40MB, 80MB, 160MB, 320MB Ultra SCSI 3 is 320 MB/secDistances vary from 3 to 25 metersCurrent LVD SCSI is 12 metersA bus, with address lines and data lines
SCSI command protocolMaster/slave relationshipshost = master, device = slaveIndependent of physical connectivityCDBs = Command Descriptor BlocksCommand formatUsed for both device operations and data xfersSerial SCSI standard created and implemented as:Fibre Channel Protocol (FCP)iSCSI
SCSI addressing modelLUN16 bus addresses with LUN sub-addressingHost systemTarget storage subsystem
SCSI daisy chain connectivityHost systemTarget devices or subsystemsIn / OutIn / OutIn / OutIn /Storage interfaceStorage interfaceStorage interfaceStorage interface
SCSI arbitrationHost system ID 7Target IDs 6 5 4 3 2 1 0The highest number address 'wins' arbitration to access the bus next
SCSI resource discoveryLUNSCSI inquiry CDB tell me your resourcesHost systemTarget storage subsystemThere is no domain server concept in SCSI
SCSI performance capabilitiesOverlapped I/O Tagged command queuing (Reshuffled I/Os)writereadstatus
Parallel SCSI bus shortcomingsBus lengthservers and storage tied togetherSingle initiator access to data depends on serverA standard full of variationschange is the only constant
Disk drivesDisk drive componentsAreal densityRotational latencySeek timeBuffer memoryDual portingLesson #4
Disk drivesComplex electro-mechanical devicesMediaMotor and speed control logicBearings and suspensionActuator (arm)Read/write headsRead/write channelsI/O controller (ext interface + int operations)Buffer memoryPower
Disk drive areal densityAmount of signal per unit area of mediaKeeps pace with Moore's lawAreal density doubles approximately every 18 monthsIncreasingly smaller magnetic particlesContinued refinement of head technologyElectro-magnetic physics research
Rotational latencyTime for data on media to rotate underneath headsfaster rotational speed = lower rotational latency2 to 10 milliseconds are commonApplication level I/O operations can generate multiple disk accesses, each impacted by rotational latencyMemorySAN switchDisk drivenanosecondsmicroseconds milliseconds 10 -9 10 -6 10 -3
Rotational latency & filing systemsFiling systems determine contiguous data lengths(file systems and databases)Block size definitions5122k4k16k512k2MDisk media
Seek TimeTime needed to position the actuator over the trackEquivalent to rotational latency in timeDisk mediaDisk headDisk actuator
Disk drive buffer memoryFIFO memory for data transfersnot cacheOvercome mechanical latencies with faster memory storageEnables overlapped I/Os to multiple drivesPerformance metricsBurst transfer rate = transfer in/out buffer memory(Sustained transfer rate = transfer with track changes)
Dual-ported disk drives Redundant connectivity interfacesOnly FC to dateController AController B
Forms of data redundancy Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)
Business Continuity24 x 7 data access is the goal5 nines through planning and luckThere are many potential threatsPeoplePowerNatural disastersFires Redundancy is the keyMultiple techniques cover different threats
Backup and recoveryRemovable media, usually taperemovable redundancy Backup systemsBackup operations Media rotationBackup metadataBackup challengesLesson #8
Forms of data redundancy in backup Duplication Parity Difference2nn+1-1d(x) = f(x) f(x-1) f(x-1)
Magnetic 'ribbon'multiple layers of backing, adhesive, magnetic particles and lube/coatingcorrodes and cracksrequires near-perfect conditions for long-term storageSequential accessslow load and seek timesreasonable transfer ratescan hold multiple versions of filesBackup and recovery tape media
Two primary geometriesLongitudinal trackingHelical trackingHighly differentiatedSpeeds (3MB/s to 30MB/s)Capacities (20MB to 160MB)Physical formats (layouts)Compatibility is a constant issueMostly parallel SCSI Tape drives
Two primary geometriesLongitudinal trackingHelical trackingHighly differentiatedSpeeds (3MB/s to 30MB/s)Capacities (20MB to 160MB)Physical formats (layouts)4mm, 8mm, inch, DLT, LTO, 19mmCartridge construction, tape lengthsCompatibility is a constant issueMostly parallel SCSI Tape drive formats
Parallel data tracks written lengthwise on tape by a 'stack' of headsLongitudinal trackingTape headsData tracksTechnologies: DLT, SDLT, LTO, QIC
Single data tracks written diagonally across tape by a rotating cylindrical head assemblyHelical trackingTape headData tracksTapeTechnologies: 4mm, 8mm, 19mm
Tape libraries & autoloadersTape subsystemsRobotTape Subsystem Controller TapedriveTapedriveTapedriveTapes
Tape subsystemsI/O bus/network subsystemWork scheduler & managerData moverMetadata (database or catalog)Media manager (rotation scheduler)File system and database backup agentsGeneric backup system components
Generic Network Backup SystemFile server Web server DB server APP server Backup server SCSI busBackup agentBackup agentBackup agentBackup agentWork schedulerData moverMetadata systemMedia managerTape drive(s) orTape subsystemEthernet network
Full (all data)Longest backup operationsUsually done over/on weekendsEasiest recovery with 1 tape setIncremental (changed data)Shortest backup operationOften done on days of the weekMost involved recovery Differential (accumulated changed data)Compromise for easier backups and recoveryMax 2 tape set restoreBackup operations
Full Duplication redundancyOne backup for complete redundancyIncremental Difference redundancyMultiple backups for complete redundancyDifferential Difference redundancyTwo backups for complete redundancyBackup operations and data redundancy
Change of tapes with common names and purposesTape sets - not individual tapesBackup job schedules anticipate certain tapesMonday, Tuesday, Wednesday, etc..Even days, odd days1st Friday, 2nd Friday, etc..January, February, March, etc...1st Qtr, 2nd Qtr, etc....Media rotations
What happens when wrong tapes are used by mistake?Say you use the last Friday's tape on the next Tuesday? Data you might need to restore sometime can be overwritten!Backup system logic may have to choose between:Not completing backup (restore will fail)Deleting older backup files (restore might fail)Media rotation problems
A database for locating data on tape:Version: create/modify date & sizeDate/time of backup jobTape names & backup job ID on tapeOwner Delete records (don't restore deleted data!)Transaction processing during backupMany small files creates heavy processor loadsThis is where backup fails to scaleBackup databases need to be prunedPerformance and capacity problemsBackup metadata
Completing backups within the backup windowBackup window = time allotted for daily backups Starts after daily processing finishesEnds before next day's processing beginsMedia management and administrationThousands of tapes to manageAudit requirements are increasingOn/offsite movement for disaster protectionBalancing backup time against restore complexityTraditional backup challenges
LAN-free backup in SANsFile server Web server DB server APP server Backup softwareEthernet client networkBackup softwareBackup softwareBackup softwareSAN switchTape drives or tape subsystemSANLAN
Consolidated resources (especially media)Centralized administrationPerformanceOffloads LAN traffic Platform optimization Advantages of LAN-free backupSAN
Path managementDual pathingZoningLUN maskingReserve / releaseRoutingVirtual networking
System software for redundant pathsPath management is a super-driver processRedirects I/O traffic over a different path to the same storage resourceTypically invoked after SCSI timeout errorsActive / active or active / passiveStatic load balancing onlyDual pathing
I/O segregation Switch function that restricts forwardingZone membership is based on port or addressZoning 1Port zoningZone 1Addr 1Addr 2Zone 2Addr 3Addr 4Zone 3Addr 5Addr 6Address zoning
Address zoning allows nodes to belong to more than one zoneFor example, tape subsystems can belong to all zonesZoning 2Zone 1Addr 1 (server A)Addr 2 (disk subsystem port target address A)Addr 7 (tape subsystem port target address A)Zone 2Addr 3 (server B)Addr 4 (disk subsystem port target address B)Addr 7 (tape subsystem port target address A)Zone 3Addr 5 (server C)Addr 6 (disk subsystem port target address C)Addr 7 (tape subsystem port target address A) Addr1 Addr 2 Addr 7 Addr 3 Addr 4 Addr 7 Addr 5 Addr 6 Addr 7#1#2#3
Zones (or zone memberships) can be 'swapped' to reflect different operating environmentsZoning 3Changing zones
Restricts subsystem access to defined servers Target or LUN level masking Non-response to SCSI inquiry CDB Can be used with zoning for multi-level controlLUN maskingNo response to SCSI inquiry
SCSI function Typically implemented in SCSI/SAN storage routers Used to reserve tape resources during backupstape drivesroboticsReserve / Release1st access2nd access blockedReservedStorage router
Path decisions made by switchesLarge TCP/IP networks require routing in switches instead of in end nodesLooping is avoided by spanning tree algorithms that ensure a single pathOSPF is spanning tree technology for Fibre ChannelRouting is not HA failover technologyRouting
Name Space The Name Space is the representation of data to end users and applicationsIdentification and searchingOrganizational structureDirectories or folders in file systemsRows and columns in databasesAssociations of data Database indexingFile system linking
Metadata and Access Control (Security)Metadata is the description of dataIntrinsic information and accounting informationAccess control determines how (or if) a user or application can use the datafor example, read-onlyAccess control is often incorporated with metadata but can be a separate function
Data has attributes that describe itStorage is managed based on data attributesActivity infoOwner infoCapacity infoWhatever infoData can have security associated with it.Data can be erased, copied, renamed, etc.
LockingManaging multiple users or applications with concurrent access to dataLocking has been done in multi-user systems for decadesLocking in NAS has been a central issueNFS advisory locks provide no guaranteesCIFS oplocks are enforcedLock persistence
Blocks are SCSIs address abstraction layer Filing functions use block addresses to communicate with storing level entities Filing systems manage the utilization of block address spaces (space management)Block address structures typically are uniformBlock address boundaries are static for efficient and error-free space managementFile systems organize data in blocks
JournalingFile system structure has to be verified when mounting (FSCHECK)FSCheck can take hours on large file systemsJournaling file systems keep a log of file system updatesLike a database log file, journal updates can be checked against actual structuresIncomplete updates can be rolled forward or backward to maintain system integrity
V/VM and Filing Filing is a filing functionVirtualization & volume management (V/VM) is a storing functionV/VM manipulates block addresses and creates real and virtual address spaces Filing manages the placement of data in the address spaces exported by virtualization