View
0
Download
0
Category
Preview:
Citation preview
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Testing 'Continuously Available' File Servers An end to end service viewpoint
Tsan Zheng Aniket Malatpure
Microsoft Corporation
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Agenda
‘Continuously Available’ file server overview Problem space & testing goals Test methodology Test model Test infrastructure Q & A
2
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
‘Continuously Available’ File Server
Continuous Availability Transparent failover of application data storage Application sees contained IO delay
Value propositions Servicing without downtime Reliable low-cost file storage
Deployment Scenarios Server application storage platform File Server consolidation Virtual Desktop Infrastructure
Deployment Variations Multiple customer segments (enterprises, hosters) Multiple networking configurations (Ethernet, Infiniband etc.) Multiple storage options (JBOD, RAID, SAS, SATA, FC etc.)
3
Shared Disk
Hyper-V, SQL, IIS etc.
File Server Node A
File Server Node B
\\fs1\share \\fs1\share
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Continuous Availability : Scenarios
4
Node A Node B
Resource Group A Leader
File Share(s)
Distributed Network Name
Node C
Shared Storage with SAS drives
Storage Pool For resource group A
SS SS
Storage Pool For resource group B
SS SS
DPM Server for backup
Hyper-V clusters
SQL Server
Node D Node E
Node F Node G
Node H
IP address A
VIP RG
IP address B
VIP RG
Resource Group A Clone
File Share(s)
Distributed Network Name
Information Workers
....
....
App Server Node 1
App Server Node N
Switch Switch
File Server
Node 1
File Server
Node N
NIC1 NIC2 NIC1 NIC2
NIC1 NIC2 NIC1 NIC2
App App Clustered App
App
Shared Storage SS: StorageSpaces
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Continuous Availability: Variations
5
....
App Server Node 1
App Server Node N
Switch Switch
File Server
Node 1
File Server
Node N
NIC1 NIC2 NIC1 NIC2
NIC1 NIC2 NIC1 NIC2
App App App App
Shared Storage
Application workloads Hyper-V SQL Server Information Worker IIS … ….. Networking configurations DCB NIC Teaming RDMA IPSec
File Server & Storage Configurations File servers •Clustered •Scale-out File systems •NTFS •ReFS RAID solutions •StorageSpaces •PCI RAID •RBOD
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Testing end-to-end scenarios with ‘Continuously Available’ file servers
6
Personae Client applications
System administrators
Experiences Ability to smoothly migrate workload
Increased network bandwidth
Fast and efficient file access
Scalable application file access
Reliable crash recovery
Resiliency to storage corruptions
Resiliency to storage failures
SLA [C]Zero application client error
[C]Application client response time
[C]Increased network bandwidth utilization
[S]Scalability w.r.t number of nodes
[S]Time to full crash recovery
[S]Ease of use configure, manage, diagnose
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Problem space & testing goals
Problem space Complex module inter-ops Application specifics Sensitive timing conditions Hardware variations Software and hardware updates from various sources
Testing goals Test with real-world configurations Test with real-world operations Assess service availability for a long period of time
7
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test methodology
Black-box validation o Various persona interacting with Service
o SLA between persona and Service • Client applications (quantitative) • Admin (qualitative)
White box validation o Service health monitoring and validation
Service-Centered Approach
Client
Administrator
Service
Evaluating Experiences And Services
Experience
Service
Ove
rall
resu
lt/co
vera
ge
VM H
ostin
g
Data
base
tr
ansa
ctio
ns
Web
Hos
ting
Mobility
Memory error recovery
Resiliency against network component failures
Concept: Validate experiences relevant to different persona in the context of services
Measuring Success
2
1
3
3 Defining Experiences 2 Experience:
oAbility provided to a specific persona by the product to perform a task
Use case: oPrecise set of steps relevant to an
experience and performed by a persona
SLA: o In context of experience, persona and
use cases
Covering all phases of the I.T. lifecycle Setup and deployment Operating - Managing, monitoring,
troubleshooting De-commissioning
Defining Services 1
Modeling “typical” customer deployments First step: customers engagement and
survey Second step: Service modeling
Covering various configurations Different ways to implement a service
depending on business needs Modeled by different “profiles”: different
hardware and software configurations
Meeting persona expectations Client SLA (Quantitative) Admin SLA (Qualitative)
Staying healthy Monitoring and measuring the health of the
system’s components : oSCOM alerts oPerformance counters
Lifecycle acceleration and SLA projection
8
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Continuously Available File Servers: Experiences, use cases and models
9
Service
Persona SLA Experience
Use case
Test profile
Test topology
Roles Configurations
Test scenarios
Action group
Action Fault SLA test
Modeling service Testing object model
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Testing end-2-end scenarios with ‘Continuously Available File Servers (cont.)
10
Application based actions •HyperV: Live migration, storage migration, snapshot/restore, start/stop/pause/resume •SQL Server: DB backup/restore, DBCC, BCP, create/delete •Information worker: DFSR, DFSN, Quota, classification Common actions •Networking: NIC teaming/un-teaming, NIC swap •Storage: Array rebuiiding, disk swap, Dedup, Bitlocker, chkdsk •Clustering: planned fail-over, patching
Actions Networking •NIC failure (disable/enable adapters) •Package loss •Packet delay Storage •Meta-data corruption •User-data corruption
Clustering •Power loss •Low memory
Faults Application •No data access failure •Application specific performance goals Networking •Multi adapter/channel •Throughput •Utilization Clustering •Clustered file server up time •Fail-over completion time
SLA test
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test infrastructure overview
11
Test Machine
Test harness
SQL Database
Configuration XML
Setup Tool Scheduler
Scenario XML
Scheduler Client
SCOM Client
SCOM Server
Test Process
Test Process
Test Process
SCOM
SCOM* Database
Monitoring
Scheduling
Smart Action
Scheduling
Reporting Dashboard
Setup Scheduling Monitoring Reporting
*SCOM: Microsoft System Center Operation Manager
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test infrastructure: setup
12
Goals A light weight & self-contained tool A extensible object model to enable authoring and scheduling complex test scenarios.
Setup object model for CA file server
9. Disk/LUN & Pools
Disk & LUN (MBR/GPT)
Storage Pools (1+)
Common external share storage setup by nodes. FS & Apps Server Cluster & RG setup
1. Cluster Nodes
1/FS
2
4
8
2. Resource
Group Type
Singleton (Non CSV)
Scale out (CSV)
iSCSI Target
Virtual Machine
3. Share Type
SMB
NFS
4. # Shares Per
Volume
Single 1:1
Multiple N:1
5. File System
NTFS
ReFS
6. # Vol Per Disk
Single 1:1
Multiple N:1
7. Resilient Spaces
No RAID
RAID 0
RAID 1
RAID 5
8. # Spaces
Per Pool
Single 1:1
Multiple N:1
10. Disk &
Bus Type
JBOD/SAS (MBR/GPT)
RBOD/FC (MBR/GPT)
(1+) iSCSI Targets
(MBR/GPT)
9. NIC config*
Physical
Virtual
6. Subnet
Mask Value…
7. IP address
Static
Dynamic
5. VLAN*
By Port
By IP
3. DCB*
CA
N Other
Traffic
FCoE
HBA
iSCSI
FC
SAS
8. Teaming
Type
4. QoS*
By Port
By IP
Network setup by nodes.
Cluster setting Node setting Network setting
Setup Scheduling Monitoring Reporting
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test infrastructure: Scheduling and execution
13
ActionGroup & action scheduling Goals: Fixed and random scheduling policies Enable different workflows (test, troubleshooting,
verification) Scheduling ActionGroups Scheduling policy
Fixed: repeatable, pre-defined sequence Random: based on certain distribution of
type of ActionGroups ActionGroup selection
Applicability: based on test env. state Scheduling policy
Repeat/Re-run for verification (“Fixed”) Complete re-run Partial re-run
ActionGroup definitions Action details Error/failure conditions Verification type & definitions
Scheduler & scheduler client
Setup Scheduling Monitoring Reporting
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test infrastructure: Error model and diagnostics
14
Goals: Track issues life cycle Evaluate impact on product quality Error model Sev0: break on error/failure
Test/action failure Failure to meet SLA
Sev1: log error and continue Issue details Diagnostics forensics
Diagnostics & issue tracking Forensics
Logs Traces System state
Issue tracking Associate with bug tracking Track private and scenario impact
Setup Scheduling Monitoring Reporting
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test infrastructure: Monitoring
What to monitor? System health – SCOM infra. Test progress and status –
scenario testing infra.
15
Goals: Provide data necessary to
assess product quality Enable error handling
semantics specified by the tests
Setup Scheduling Monitoring Reporting
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test infrastructure: Reporting
16
Rea
l-tim
e da
shbo
ard
Scen
ario
rol
l-up
repo
rt
Goals: Map test results to user SLAs Reflect trends in product
quality Track progress and coverage
of testing What to report Admin SLA Service SLA Test coverage Trending
Setup Scheduling Monitoring Reporting
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
An in-depth peek at reporting dashboard
17
Test scenarios
SLA metric overview
Scenario schedule results
User SLA metric details
Admin SLA metric details
Test scenarios result history
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Key takeaways
18
Approach testing by modeling the service first
Map test results to user visible metrics (SLA)
Persona focused Internal verifications
Test with agility
Common needs addressed by infrastructure Test focuses on building re-usable test content Invest in important application workloads
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Q & A
19
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Appendix
20
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test profile: An example
21
Test scenario
Test topology
Test profile
Test scenario (groups of) Use cases
Groups of test actions Scheduling policies of
defined test actions per group
SLA metrics expected Error handling policies
A sample test profile for data protection test profile for Hyper-V over SMB scenario Test scenario (ActionGroup of Actions)
Start VMs (remote VHD on file server cluster)
Taking a backup of VMs Unexpected reboot of the hosting
file server node Verify VM client access remains
intact Test topology
2-node file server cluster 2-node Hyper-V cluster (100 VMs) DPM server Dual 10G network NICs Mirror StorageSpaces with 50 SAS
drives Same domain with 2-DCs
Test topology (product specific) File server nodes Application nodes Networking Clustering Storage Domain topology
2012 Storage Developer Conference. © Microsoft Corporation. All Rights Reserved.
Test positioning
22
• Functional tests
• Scenario tests
• Unit tests • Random stress tests
Extended operations
Limited verification
Limited operations
Limited verification
Limited operations
Rich verification
Extended operations
Rich verification
Position testing right
Recommended