Fault Tolerance in Charm++

Embed Size (px)

DESCRIPTION

Fault Tolerance in Charm++. Gengbin Zheng 10/11/2005 Parallel Programming Lab University of Illinois at Urbana-Champaign. Motivation. As machines grow in size MTBF decreases Applications have to tolerate faults Applications need fast, low cost and scalable fault tolerance support - PowerPoint PPT Presentation

Citation preview

  • Fault Tolerance in Charm++Gengbin Zheng10/11/2005Parallel Programming LabUniversity of Illinois at Urbana-Champaign

  • MotivationAs machines grow in sizeMTBF decreasesApplications have to tolerate faultsApplications need fast, low cost and scalable fault tolerance supportFault tolerant runtime for:Charm++Adaptive MPI

  • OutlineDisk Checkpoint/RestartFTC-Charm++in-memory checkpoint/restartProactive Fault ToleranceFTL-Charm++message logging

  • Disk Checkpoint/Restart

  • Checkpoint/RestartSimplest scheme for application fault toleranceAny long running application saves its state into disk periodically at certain pointcoordinated checkpointing strategy (barrier)State information is saved in a directory of your choosingCheckpoint of the application data is done by invoking pup routine of all objectsRestore also uses pup, so no additional application code is needed (pup is all you need)

  • Checkpointing JobIn Charm++, use:void CkStartCheckpoint(char* dirname,const CkCallback& cb)Called on one processor; calls resume when checkpoint is completeIn AMPI, use:MPI_Checkpoint();Collective call; returns when checkpoint is complete

  • Restart Job from CheckpointThe charmrun option ++restart is used to restart./charmrun +p4 ./pgm ++restart logNumber of processors need not be the sameParallel objects are redistributed when needed

  • FTC-Charm++In-Memory Checkpoint/Restart

  • Disk vs. In-memory SchemeDisk checkpointing suffersNeed user intervention to restart a jobAssume reliable storage - diskDisk I/O is slowIn-memory checkpoint/restart schemeOnline version of the previous schemeLow impact on fault-free executionProvide fast and automatic restart capabilityDoes not rely on extra processorsMaintain execution efficiency after restartDoes not rely on any fault-free componentNot assume stable storage

  • OverviewCoordinated checkpointing schemeSimple, low overhead on fault-free executionScientific applications that are iterativeDouble checkpointingTolerate one failure at a timeIn-memory checkpointingDiskless checkpointingEfficient for applications with small memory footprintIn case when there is no extra processorsProgram continue to run with remaining processorsLoad balancing for restart

  • Checkpoint ProtocolSimilar to the previous schemecoordinated checkpointing strategyProgrammers decide what to checkpointvoid CkStartMemCheckpoint(CkCallback &cb)Each object pack data and send to two different (buddy) processors

  • Restart protocolInitiated by the failure of a physical processorEvery object rolls back to the state preserved in the recent checkpointsCombine with load balancer to sustain the performance

  • Checkpoint/Restart ProtocolHIJABCEDFGABCDEFGHIJABCFGDEHIJABCDEFGHIJAFCDEFGHIJHIJABCDEBGAAAAPE0PE1PE2PE3PE0PE2PE3objectcheckpoint 1checkpoint 2restored objectPE1 crashed ( lost 1 processor )

  • Local Disk-Based ProtocolDouble in-memory checkpointingMemory concernPick checkpointing time where global state is smallDouble In-disk checkpointingMake use of local diskAlso does not rely on any reliable storageUseful for applications with very big memory footprint

  • Compiling FTC-Charm++Build charm with syncft option./build charm++ net-linux syncft OCommand line switch +ftc_disk for disk/memory checkpointing:charmrun ./pgm +ftc_disk

  • Performance EvaluationIA-32 Linux cluster at NCSA512 dual 1Ghz Intel Pentium III processors1.5GB RAM each processorConnected by both Myrinet and 100MBit Ethernet

  • Performance Comparisons with Traditional Disk-based Checkpointing

    Chart2

    0.0040.0420.3872.196

    0.0080.0450.4052.234

    0.0160.0970.782.353

    0.0320.1711.1482.546

    0.0610.3041.5853.52

    0.140.62.1648.3

    0.271.1883.58217.65

    0.5172.376.85433.2

    1.0354.7114.01279.71

    2.0299.4326.629129.87

    3.84518.8347.06215.78

    double in-memory (Myrinet)

    double in-memory (100Mb)

    double in-disk (Myrinet)

    NFS disk

    Problem size (MB)

    Checkpoint overhead (s)

    Sheet1

    6.40.0040.0420.2180.3872.196

    12.80.0080.0450.2440.4052.234

    25.60.0160.0970.6230.782.353

    51.20.0320.1711.1981.1482.546

    102.40.0610.3041.2161.5853.52

    204.80.140.61.5982.1648.3

    409.60.271.1882.1123.58217.65

    819.20.5172.374.6696.85433.2

    1638.41.0354.716.90114.01279.71

    3276.82.0299.4311.76226.629129.87

    6553.63.84518.8321.48147.06215.78

    Sheet1

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    00000

    double in-memory (Myrinet)

    double in-memory (100Mb)

    Local Disk

    double in-disk (Myrinet)

    NFS disk

    Problem size (MB)

    Checkpoint overhead (s)

    Sheet2

    Sheet3

  • Recovery PerformanceMolecular Dynamics Simulation application - LeanMDApoa1 benchmark (92K atoms)128 processorsCrash simulated by killing processesNo backup processorsWith load balancing

  • Performance improve with Load BalancingLeanMD, Apoa1, 128 processors

    Chart2

    2.060572

    1.931205

    1.878904

    1.627035

    1.54573

    1.838356

    1.940753

    1.66237

    1.798899

    1.699308

    1.609657

    1.582982

    1.725091

    1.724436

    1.416855

    1.912544

    1.992336

    1.843522

    1.787968

    1.578881

    1.749074

    1.721319

    1.720792

    1.548338

    1.404611

    1.852854

    1.768577

    1.598787

    1.687954

    1.706513

    1.620017

    1.255407

    1.233622

    1.127641

    1.234462

    1.287475

    1.193388

    1.189892

    0.99324

    0.962672

    0.992411

    1.006699

    1.04138

    0.969301

    1.113824

    0.916239

    0.936094

    0.954022

    0.989555

    1.02479

    1.005211

    1.163595

    1.201799

    1.102465

    1.098228

    1.028322

    1.036103

    0.972076

    1.065422

    1.138067

    0.986625

    1.005647

    1.095719

    0.958778

    0.998048

    1.010745

    1.11048

    0.983394

    0.993608

    1.083521

    1.074188

    1.038286

    1.088879

    1.148243

    1.041047

    1.119522

    1.069704

    1.069352

    0.985704

    0.960018

    1.025316

    0.988069

    0.978957

    1.035497

    0.976275

    1.068464

    0.967931

    0.974109

    0.985639

    0.977586

    1.080294

    0.95472

    1.010183

    1.008256

    1.042001

    1.000356

    1.04785

    0.989684

    0.950716

    0.907331

    0.879343

    0.887231

    0.874751

    0.856354

    0.920724

    0.900419

    0.945531

    1.044399

    0.935695

    0.918593

    1.049466

    1.081448

    1.116128

    1.116112

    1.180923

    1.246483

    1.292915

    1.214981

    1.314288

    1.305134

    1.386524

    1.285486

    1.339381

    1.21084

    1.229857

    1.248031

    1.224514

    1.168148

    1.29184

    1.127459

    1.076682

    1.115381

    1.119832

    1.165949

    1.225126

    1.175402

    0.972285

    0.999218

    1.049842

    1.089163

    1.0472

    1.019105

    1.02164

    1.067423

    0.980737

    1.012048

    1.05805

    1.130666

    1.120378

    1.065913

    1.076211

    1.076157

    1.019195

    1.08021

    1.108113

    1.052279

    1.011755

    1.004593

    1.012997

    1.083555

    1.146402

    1.169461

    1.085392

    1.134367

    1.098053

    1.073385

    1.104686

    1.069164

    1.099568

    1.184347

    1.113693

    1.190831

    1.255191

    1.038505

    1.073652

    1.106564

    1.078479

    1.102903

    1.079685

    1.122704

    1.213064

    1.12968

    1.08834

    1.083566

    1.071111

    1.102088

    1.148053

    1.143092

    0.98726

    1.034385

    1.074711

    1.149926

    1.121928

    1.114652

    1.232812

    1.12267

    1.07861

    1.065033

    1.019278

    0.893063

    0.864822

    0.875184

    0.872714

    0.867169

    0.914183

    2.51822

    3.274045

    3.38535

    3.327303

    3.326231

    1.572805

    1.291528

    1.282563

    1.195413

    1.281697

    1.351278

    1.41182

    1.420893

    1.35073

    1.207345

    1.423579

    1.215017

    1.169013

    1.101311

    0.984618

    1.002282

    1.009545

    0.940676

    0.930696

    1.089722

    1.12236

    1.094353

    1.02755

    1.10523

    1.078062

    1.185435

    1.172529

    1.180859

    1.135149

    1.215596

    1.146032

    1.173749

    1.160676

    1.114597

    1.098378

    1.20156

    1.126391

    1.218134

    1.365837

    1.112039

    1.179607

    1.352579

    1.134288

    1.149921

    1.091482

    1.230376

    1.3414

    1.270711

    1.321813

    1.387056

    1.179084

    1.071516

    1.080868

    1.194492

    1.003742

    1.153284

    1.288704

    1.218278

    1.10546

    1.107355

    1.082475

    1.126309

    1.127273

    1.260399

    1.205362

    1.23908

    1.176623

    1.151854

    1.142528

    1.084323

    1.031449

    1.047176

    1.096269

    1.068707

    1.075525

    1.020311

    1.11261

    1.063544

    1.040865

    1.123212

    1.004555

    1.057848

    1.149934

    1.080261

    1.188141

    1.201995

    1.203672

    1.232376

    1.20621

    1.268279

    1.249325

    1.142575

    1.075439

    1.066331

    0.939303

    1.028284

    1.150136

    0.975997

    1.072057

    1.032319

    1.047562

    1.021373

    0.991336

    0.978471

    1.077165

    1.074773

    1.065461

    1.097531

    1.154555

    1.069852

    1.096352

    1.145798

    1.194215

    1.259295

    1.22811

    1.387169

    1.400844

    1.296343

    1.161405

    1.207471

    1.096671

    1.099053

    1.116261

    1.224266

    1.173608

    1.107899

    1.170175

    1.134412

    1.289924

    1.14338

    1.157075

    1.023501

    1.171553

    1.208359

    1.264599

    1.318647

    1.381978

    1.313919

    1.408652

    1.355754

    1.19197

    1.227243

    1.186915

    1.073975

    1.111496

    1.159998

    1.049054

    1.053717

    1.087436

    1.1362

    1.146585

    1.171901

    1.304624

    1.393229

    1.344975

    1.247171

    1.19004

    1.214342

    1.169926

    1.102449

    1.176876

    1.189897

    1.260473

    1.219286

    1.240637

    1.196267

    1.222259

    1.092598

    1.20193

    1.204242

    1.272453

    1.207051

    1.131122

    1.226318

    1.099369

    1.124663

    1.13474

    1.183924

    1.203704

    1.099796

    1.08283

    1.110308

    1.172176

    1.161562

    1.08917

    1.158742

    1.171614

    1.26025

    1.152344

    1.078808

    1.214086

    1.242712

    1.320226

    1.422131

    1.269468

    1.050201

    1.126931

    0.980549

    0.89928

    0.951257

    1.018921

    1.072066

    1.022013

    0.997903

    1.070778

    1.235411

    1.145889

    1.184446

    1.13628

    1.316929

    1.273835

    1.270294

    1.325384

    1.339973

    1.201388

    1.180919

    1.107804

    1.191401

    1.193776

    1.237621

    1.287715

    1.216725

    1.302029

    1.437431

    1.267819

    1.224654

    1.387489

    1.359403

    1.438596

    1.325198

    1.210793

    1.308453

    1.157467

    1.294329

    1.253489

    1.30115

    1.247543

    1.270057

    1.302583

    1.413441

    1.224054

    1.333359

    1.250685

    1.293967

    1.255599

    1.206257

    1.134268

    1.21995

    1.232118

    1.199247

    1.276804

    1.333451

    1.325729

    1.347565

    1.336253

    1.180034

    1.218165

    1.295983

    1.250252

    1.36607

    1.20234

    1.21411

    1.187355

    1.179482

    1.202792

    1.269934

    1.209173

    1.191557

    1.201336

    1.212744

    1.310085

    1.226332

    1.317346

    1.328281

    1.295426

    1.255357

    1.232607

    1.187254

    1.19741

    1.15615

    1.240302

    1.247239

    1.05425

    1.175569

    1.259295

    1.316296

    1.312434

    1.214804

    1.214533

    1.182005

    1.091171

    1.090968

    1.228359

    1.221041

    1.077193

    1.042002

    1.107627

    0.969915

    0.960628

    0.936166

    1.060246

    0.962746

    0.991709

    1.038922

    1.182681

    1.139499

    1.113086

    1.104547

    1.134765

    1.251538

    1.200886

    1.24352

    1.29842

    1.450039

    1.387679

    1.321627

    1.201209

    1.142341

    1.089895

    1.201425

    1.210527

    1.287517

    1.397484

    1.458638

    1.181752

    1.177029

    1.178251

    1.246274

    1.135265

    1.195558

    1.142152

    1.114611

    1.301401

    1.242675

    1.245565

    1.345349

    1.239675

    1.244319

    1.221322

    1.288002

    1.296624

    1.242214

    1.277246

    1.311366

    1.261963

    1.288182

    1.226164

    1.303767

    1.213738

    1.150819

    1.269432

    1.27805

    1.309832

    1.350773

    1.160464

    1.14494

    1.195354

    1.272599

    1.271826

    1.406962

    1.46772

    1.28377

    1.274632

    1.296633

    1.312492

    1.32592

    1.261491

    1.157473

    1.252089

    1.221721

    1.247894

    1.237201

    1.196247

    1.191015

    1.276539

    1.265503

    1.167094

    1.120498

    1.136233

    1.154613

    1.163526

    1.361613

    1.21193

    1.21968

    1.205614

    1.188679

    1.239615

    1.074431

    1.125611

    1.205336

    1.098654

    1.192173

    1.308113

    1.212591

    1.157649

    Timestep

    Simulation time per step (s)

    With LB

    Sheet1

    2.060572

    1.931205

    1.878904

    1.627035

    1.54573

    1.838356

    1.940753

    1.66237

    1.798899

    1.699308

    1.609657

    1.582982

    1.725091

    1.724436

    1.416855

    1.912544

    1.992336

    1.843522

    1.787968

    1.578881

    1.749074

    1.721319

    1.720792

    1.548338

    1.404611

    1.852854

    1.768577

    1.598787

    1.687954

    1.706513

    1.620017

    1.255407

    1.233622

    1.127641

    1.234462

    1.287475

    1.193388

    1.189892

    0.99324

    0.962672

    0.992411

    1.006699

    1.04138

    0.969301

    1.113824

    0.916239

    0.936094

    0.954022

    0.989555

    1.02479

    1.005211

    1.163595

    1.201799

    1.102465

    1.098228

    1.028322

    1.036103

    0.972076

    1.065422

    1.138067

    0.986625

    1.005647

    1.095719

    0.958778

    0.998048

    1.010745

    1.11048

    0.983394

    0.993608

    1.083521

    1.074188

    1.038286

    1.088879

    1.148243

    1.041047

    1.119522

    1.069704

    1.069352

    0.985704

    0.960018

    1.025316

    0.988069

    0.978957

    1.035497

    0.976275

    1.068464

    0.967931

    0.974109

    0.985639

    0.977586

    1.080294

    0.95472

    1.010183

    1.008256

    1.042001

    1.000356

    1.04785

    0.989684

    0.950716

    0.907331

    0.879343

    0.887231

    0.874751

    0.856354

    0.920724

    0.900419

    0.945531

    1.044399

    0.935695

    0.918593

    1.049466

    1.081448

    1.116128

    1.116112

    1.180923

    1.246483

    1.292915

    1.214981

    1.314288

    1.305134

    1.386524

    1.285486

    1.339381

    1.21084

    1.229857

    1.248031

    1.224514

    1.168148

    1.29184

    1.127459

    1.076682

    1.115381

    1.119832

    1.165949

    1.225126

    1.175402

    0.972285

    0.999218

    1.049842

    1.089163

    1.0472

    1.019105

    1.02164

    1.067423

    0.980737

    1.012048

    1.05805

    1.130666

    1.120378

    1.065913

    1.076211

    1.076157

    1.019195

    1.08021

    1.108113

    1.052279

    1.011755

    1.004593

    1.012997

    1.083555

    1.146402

    1.169461

    1.085392

    1.134367

    1.098053

    1.073385

    1.104686

    1.069164

    1.099568

    1.184347

    1.113693

    1.190831

    1.255191

    1.038505

    1.073652

    1.106564

    1.078479

    1.102903

    1.079685

    1.122704

    1.213064

    1.12968

    1.08834

    1.083566

    1.071111

    1.102088

    1.148053

    1.143092

    0.98726

    1.034385

    1.074711

    1.149926

    1.121928

    1.114652

    1.232812

    1.12267

    1.07861

    1.065033

    1.019278

    0.893063

    0.864822

    0.875184

    0.872714

    0.867169

    0.914183

    2.51822

    3.274045

    3.38535

    3.327303

    3.326231

    1.572805

    1.291528

    1.282563

    1.195413

    1.281697

    1.351278

    1.41182

    1.420893

    1.35073

    1.207345

    1.423579

    1.215017

    1.169013

    1.101311

    0.984618

    1.002282

    1.009545

    0.940676

    0.930696

    1.089722

    1.12236

    1.094353

    1.02755

    1.10523

    1.078062

    1.185435

    1.172529

    1.180859

    1.135149

    1.215596

    1.146032

    1.173749

    1.160676

    1.114597

    1.098378

    1.20156

    1.126391

    1.218134

    1.365837

    1.112039

    1.179607

    1.352579

    1.134288

    1.149921

    1.091482

    1.230376

    1.3414

    1.270711

    1.321813

    1.387056

    1.179084

    1.071516

    1.080868

    1.194492

    1.003742

    1.153284

    1.288704

    1.218278

    1.10546

    1.107355

    1.082475

    1.126309

    1.127273

    1.260399

    1.205362

    1.23908

    1.176623

    1.151854

    1.142528

    1.084323

    1.031449

    1.047176

    1.096269

    1.068707

    1.075525

    1.020311

    1.11261

    1.063544

    1.040865

    1.123212

    1.004555

    1.057848

    1.149934

    1.080261

    1.188141

    1.201995

    1.203672

    1.232376

    1.20621

    1.268279

    1.249325

    1.142575

    1.075439

    1.066331

    0.939303

    1.028284

    1.150136

    0.975997

    1.072057

    1.032319

    1.047562

    1.021373

    0.991336

    0.978471

    1.077165

    1.074773

    1.065461

    1.097531

    1.154555

    1.069852

    1.096352

    1.145798

    1.194215

    1.259295

    1.22811

    1.387169

    1.400844

    1.296343

    1.161405

    1.207471

    1.096671

    1.099053

    1.116261

    1.224266

    1.173608

    1.107899

    1.170175

    1.134412

    1.289924

    1.14338

    1.157075

    1.023501

    1.171553

    1.208359

    1.264599

    1.318647

    1.381978

    1.313919

    1.408652

    1.355754

    1.19197

    1.227243

    1.186915

    1.073975

    1.111496

    1.159998

    1.049054

    1.053717

    1.087436

    1.1362

    1.146585

    1.171901

    1.304624

    1.393229

    1.344975

    1.247171

    1.19004

    1.214342

    1.169926

    1.102449

    1.176876

    1.189897

    1.260473

    1.219286

    1.240637

    1.196267

    1.222259

    1.092598

    1.20193

    1.204242

    1.272453

    1.207051

    1.131122

    1.226318

    1.099369

    1.124663

    1.13474

    1.183924

    1.203704

    1.099796

    1.08283

    1.110308

    1.172176

    1.161562

    1.08917

    1.158742

    1.171614

    1.26025

    1.152344

    1.078808

    1.214086

    1.242712

    1.320226

    1.422131

    1.269468

    1.050201

    1.126931

    0.980549

    0.89928

    0.951257

    1.018921

    1.072066

    1.022013

    0.997903

    1.070778

    1.235411

    1.145889

    1.184446

    1.13628

    1.316929

    1.273835

    1.270294

    1.325384

    1.339973

    1.201388

    1.180919

    1.107804

    1.191401

    1.193776

    1.237621

    1.287715

    1.216725

    1.302029

    1.437431

    1.267819

    1.224654

    1.387489

    1.359403

    1.438596

    1.325198

    1.210793

    1.308453

    1.157467

    1.294329

    1.253489

    1.30115

    1.247543

    1.270057

    1.302583

    1.413441

    1.224054

    1.333359

    1.250685

    1.293967

    1.255599

    1.206257

    1.134268

    1.21995

    1.232118

    1.199247

    1.276804

    1.333451

    1.325729

    1.347565

    1.336253

    1.180034

    1.218165

    1.295983

    1.250252

    1.36607

    1.20234

    1.21411

    1.187355

    1.179482

    1.202792

    1.269934

    1.209173

    1.191557

    1.201336

    1.212744

    1.310085

    1.226332

    1.317346

    1.328281

    1.295426

    1.255357

    1.232607

    1.187254

    1.19741

    1.15615

    1.240302

    1.247239

    1.05425

    1.175569

    1.259295

    1.316296

    1.312434

    1.214804

    1.214533

    1.182005

    1.091171

    1.090968

    1.228359

    1.221041

    1.077193

    1.042002

    1.107627

    0.969915

    0.960628

    0.936166

    1.060246

    0.962746

    0.991709

    1.038922

    1.182681

    1.139499

    1.113086

    1.104547

    1.134765

    1.251538

    1.200886

    1.24352

    1.29842

    1.450039

    1.387679

    1.321627

    1.201209

    1.142341

    1.089895

    1.201425

    1.210527

    1.287517

    1.397484

    1.458638

    1.181752

    1.177029

    1.178251

    1.246274

    1.135265

    1.195558

    1.142152

    1.114611

    1.301401

    1.242675

    1.245565

    1.345349

    1.239675

    1.244319

    1.221322

    1.288002

    1.296624

    1.242214

    1.277246

    1.311366

    1.261963

    1.288182

    1.226164

    1.303767

    1.213738

    1.150819

    1.269432

    1.27805

    1.309832

    1.350773

    1.160464

    1.14494

    1.195354

    1.272599

    1.271826

    1.406962

    1.46772

    1.28377

    1.274632

    1.296633

    1.312492

    1.32592

    1.261491

    1.157473

    1.252089

    1.221721

    1.247894

    1.237201

    1.196247

    1.191015

    1.276539

    1.265503

    1.167094

    1.120498

    1.136233

    1.154613

    1.163526

    1.361613

    1.21193

    1.21968

    1.205614

    1.188679

    1.239615

    1.074431

    1.125611

    1.205336

    1.098654

    1.192173

    1.308113

    1.212591

    1.157649

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Timestep

    Simulation time per step (s)

    With LB

    Sheet2

    Sheet3

    Chart2

    2.009614

    1.715303

    1.695939

    1.715194

    1.371235

    1.950715

    1.978949

    1.757267

    1.687806

    1.701496

    1.700515

    1.626623

    1.597934

    1.440968

    1.355004

    1.214825

    1.228043

    1.078525

    1.081117

    1.158797

    0.935191

    0.926109

    0.943082

    0.926749

    0.96469

    0.927354

    0.961384

    0.899342

    0.931402

    0.949933

    0.988828

    0.994174

    1.025657

    1.047986

    0.895591

    0.965002

    1.042113

    0.979932

    1.032694

    1.001448

    1.003917

    1.0304

    1.038706

    1.087721

    1.069413

    1.02002

    1.174063

    1.192655

    1.098414

    1.153628

    1.099

    0.967127

    1.157128

    1.223548

    1.213982

    1.212825

    1.261162

    1.040725

    1.035942

    0.944774

    0.927978

    0.947262

    0.913283

    1.042702

    0.950637

    0.933469

    0.869751

    0.958103

    1.040595

    1.064749

    1.041798

    0.966193

    1.075353

    0.991624

    1.118805

    1.101449

    1.044793

    1.166518

    1.063404

    0.949263

    0.985938

    1.044694

    0.97056

    0.944894

    1.026142

    0.992745

    1.04838

    1.019766

    1.043817

    1.063878

    1.062542

    1.077293

    1.078989

    0.927144

    1.045735

    1.184777

    0.968462

    1.010038

    0.943256

    0.902721

    0.898092

    0.935502

    0.881208

    0.936183

    0.899393

    0.88032

    0.967682

    1.004605

    1.03273

    1.054115

    1.14725

    1.007501

    1.002491

    1.101577

    1.203901

    1.170128

    1.105745

    1.172174

    1.362858

    1.349746

    1.276214

    1.232062

    1.223564

    1.200905

    1.246819

    1.257887

    1.261652

    1.266324

    1.322484

    1.148785

    1.170641

    1.128079

    1.080882

    1.146171

    1.112182

    1.184934

    1.103204

    1.241024

    1.255153

    1.137981

    1.10045

    1.05994

    1.160035

    1.11048

    1.071578

    1.089167

    1.156947

    1.134598

    1.176696

    1.051746

    1.08226

    1.148134

    1.077623

    1.114203

    1.157763

    1.120371

    1.082479

    1.100304

    1.139752

    1.095038

    1.102715

    1.073009

    1.130851

    1.235905

    1.29095

    1.21523

    1.149708

    1.11114

    1.26187

    1.048929

    1.038305

    1.071786

    1.193509

    1.176457

    1.292901

    1.19144

    1.196389

    1.168556

    1.127554

    1.103472

    1.175753

    1.141691

    1.168513

    1.089175

    1.108441

    1.079966

    1.004083

    1.109842

    1.080457

    1.0846

    1.011798

    1.029041

    1.010761

    1.090176

    1.106607

    1.094634

    1.134613

    1.198009

    1.267268

    1.116588

    0.906991

    0.930289

    0.947122

    0.991644

    0.859498

    2.88266

    3.293003

    3.388471

    3.211137

    3.232565

    3.17068

    2.866359

    2.631764

    2.534208

    2.352287

    2.087968

    1.896062

    1.994562

    1.998752

    2.010156

    1.97682

    1.994103

    1.995991

    2.007732

    1.924195

    1.923999

    1.947834

    2.002525

    1.987392

    1.931417

    1.941952

    1.961183

    1.965781

    1.964549

    1.979992

    1.962649

    1.984623

    2.042883

    2.032125

    1.99038

    2.028829

    2.020808

    1.950227

    1.922465

    1.955345

    2.047458

    2.038188

    1.982511

    1.932677

    1.936854

    2.03362

    1.950351

    1.924653

    1.965704

    1.954082

    2.03491

    1.934451

    1.958489

    1.927321

    1.930575

    1.943952

    1.987292

    1.918903

    1.941163

    1.924595

    1.991671

    1.980222

    1.973316

    2.051759

    2.033623

    2.031429

    1.968183

    2.083912

    2.07348

    2.015043

    1.914541

    1.957682

    1.931788

    1.923598

    1.937796

    1.955661

    1.921733

    2.001874

    1.97265

    1.967819

    1.930842

    1.928904

    2.024632

    1.927654

    1.99485

    2.019106

    2.0046

    1.984987

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    1.962997

    1.980825

    2.063889

    2.010659

    2.063889

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    2.200172

    1.981707

    2.017706

    1.912272

    1.975005

    1.901524

    1.989627

    1.950138

    1.944595

    1.909765

    1.952108

    1.941951

    1.909355

    1.956829

    1.985959

    1.960357

    1.897455

    1.923289

    1.933349

    1.945136

    1.908449

    1.921104

    1.898984

    1.935738

    1.939566

    1.962883

    1.959983

    2.004256

    1.95039

    1.990977

    1.988153

    1.971523

    1.931339

    1.936834

    1.929571

    1.932951

    1.927635

    1.907997

    1.914607

    1.961778

    1.96884

    1.914686

    1.93231

    1.92104

    1.930202

    2.006238

    1.974535

    1.971951

    1.958482

    1.995817

    1.942418

    1.970828

    1.934542

    1.97977

    1.964298

    1.93154

    1.93805

    1.958871

    1.985174

    1.934364

    1.970927

    1.945426

    1.93256

    1.944642

    1.898459

    1.924726

    1.975699

    1.933096

    1.956693

    1.90019

    1.976753

    1.950444

    1.986071

    1.976988

    1.949482

    1.984249

    1.942268

    1.923077

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.969842

    1.917033

    1.950176

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.975135

    1.994313

    1.986376

    1.932574

    1.930701

    1.938055

    1.931742

    1.925482

    1.920733

    1.904041

    1.877132

    1.892237

    1.968057

    1.930821

    1.966495

    1.923515

    1.934028

    2.001933

    1.968827

    1.964655

    1.914891

    1.97013

    1.903098

    1.98663

    1.951209

    1.941464

    1.921714

    1.915151

    1.993102

    1.982194

    1.928612

    1.912259

    1.950094

    1.979134

    1.945588

    2.004483

    1.935869

    1.937256

    1.965171

    1.951399

    1.92098

    1.916989

    1.932824

    1.914597

    1.887689

    1.893165

    1.93146

    1.926887

    1.953792

    1.936362

    1.957064

    1.930316

    1.926294

    1.977252

    1.938591

    1.950371

    1.97272

    1.93755

    1.97443

    1.971506

    1.942103

    1.966804

    1.93183

    1.925058

    1.966637

    2.013649

    1.924295

    1.933238

    1.897978

    1.920966

    1.949399

    1.892507

    1.886588

    1.917741

    1.925794

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    1.927846

    1.896039

    1.921018

    1.93107

    1.906748

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    2.048415

    1.932005

    1.996658

    1.950263

    1.963529

    1.931642

    1.914656

    1.921085

    1.922576

    1.98944

    1.920537

    1.962262

    1.894785

    1.959783

    1.929744

    1.91836

    1.917831

    1.930237

    1.906392

    1.95393

    1.952592

    1.961637

    1.925792

    1.92846

    1.987622

    1.949983

    1.967137

    1.922768

    1.938523

    1.915719

    1.963132

    1.891995

    1.911063

    1.951286

    1.929961

    1.918687

    1.943827

    1.952086

    1.987933

    1.92094

    1.949606

    1.923209

    1.921539

    1.885482

    1.923213

    1.944563

    1.981385

    1.910231

    1.939388

    1.897881

    1.999324

    1.905841

    1.924322

    1.924182

    1.906817

    1.920647

    1.913346

    1.918498

    1.989822

    1.937303

    1.90323

    1.918191

    1.910337

    1.909426

    1.944418

    1.948985

    1.907036

    1.953726

    1.923503

    1.949636

    1.96483

    1.935372

    1.943347

    1.945511

    1.976622

    1.900946

    1.90749

    1.906462

    1.993606

    1.919058

    1.970217

    1.925131

    1.965142

    1.922287

    1.931435

    1.936987

    1.938416

    1.967613

    1.907173

    1.892249

    Timestep

    Simulation time per step (s)

    Without LB

    Sheet1

    2.009614

    1.715303

    1.695939

    1.715194

    1.371235

    1.950715

    1.978949

    1.757267

    1.687806

    1.701496

    1.700515

    1.626623

    1.597934

    1.440968

    1.355004

    1.214825

    1.228043

    1.078525

    1.081117

    1.158797

    0.935191

    0.926109

    0.943082

    0.926749

    0.96469

    0.927354

    0.961384

    0.899342

    0.931402

    0.949933

    0.988828

    0.994174

    1.025657

    1.047986

    0.895591

    0.965002

    1.042113

    0.979932

    1.032694

    1.001448

    1.003917

    1.0304

    1.038706

    1.087721

    1.069413

    1.02002

    1.174063

    1.192655

    1.098414

    1.153628

    1.099

    0.967127

    1.157128

    1.223548

    1.213982

    1.212825

    1.261162

    1.040725

    1.035942

    0.944774

    0.927978

    0.947262

    0.913283

    1.042702

    0.950637

    0.933469

    0.869751

    0.958103

    1.040595

    1.064749

    1.041798

    0.966193

    1.075353

    0.991624

    1.118805

    1.101449

    1.044793

    1.166518

    1.063404

    0.949263

    0.985938

    1.044694

    0.97056

    0.944894

    1.026142

    0.992745

    1.04838

    1.019766

    1.043817

    1.063878

    1.062542

    1.077293

    1.078989

    0.927144

    1.045735

    1.184777

    0.968462

    1.010038

    0.943256

    0.902721

    0.898092

    0.935502

    0.881208

    0.936183

    0.899393

    0.88032

    0.967682

    1.004605

    1.03273

    1.054115

    1.14725

    1.007501

    1.002491

    1.101577

    1.203901

    1.170128

    1.105745

    1.172174

    1.362858

    1.349746

    1.276214

    1.232062

    1.223564

    1.200905

    1.246819

    1.257887

    1.261652

    1.266324

    1.322484

    1.148785

    1.170641

    1.128079

    1.080882

    1.146171

    1.112182

    1.184934

    1.103204

    1.241024

    1.255153

    1.137981

    1.10045

    1.05994

    1.160035

    1.11048

    1.071578

    1.089167

    1.156947

    1.134598

    1.176696

    1.051746

    1.08226

    1.148134

    1.077623

    1.114203

    1.157763

    1.120371

    1.082479

    1.100304

    1.139752

    1.095038

    1.102715

    1.073009

    1.130851

    1.235905

    1.29095

    1.21523

    1.149708

    1.11114

    1.26187

    1.048929

    1.038305

    1.071786

    1.193509

    1.176457

    1.292901

    1.19144

    1.196389

    1.168556

    1.127554

    1.103472

    1.175753

    1.141691

    1.168513

    1.089175

    1.108441

    1.079966

    1.004083

    1.109842

    1.080457

    1.0846

    1.011798

    1.029041

    1.010761

    1.090176

    1.106607

    1.094634

    1.134613

    1.198009

    1.267268

    1.116588

    0.906991

    0.930289

    0.947122

    0.991644

    0.859498

    2.88266

    3.293003

    3.388471

    3.211137

    3.232565

    3.17068

    2.866359

    2.631764

    2.534208

    2.352287

    2.087968

    1.896062

    1.994562

    1.998752

    2.010156

    1.97682

    1.994103

    1.995991

    2.007732

    1.924195

    1.923999

    1.947834

    2.002525

    1.987392

    1.931417

    1.941952

    1.961183

    1.965781

    1.964549

    1.979992

    1.962649

    1.984623

    2.042883

    2.032125

    1.99038

    2.028829

    2.020808

    1.950227

    1.922465

    1.955345

    2.047458

    2.038188

    1.982511

    1.932677

    1.936854

    2.03362

    1.950351

    1.924653

    1.965704

    1.954082

    2.03491

    1.934451

    1.958489

    1.927321

    1.930575

    1.943952

    1.987292

    1.918903

    1.941163

    1.924595

    1.991671

    1.980222

    1.973316

    2.051759

    2.033623

    2.031429

    1.968183

    2.083912

    2.07348

    2.015043

    1.914541

    1.957682

    1.931788

    1.923598

    1.937796

    1.955661

    1.921733

    2.001874

    1.97265

    1.967819

    1.930842

    1.928904

    2.024632

    1.927654

    1.99485

    2.019106

    2.0046

    1.984987

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    1.962997

    1.980825

    2.063889

    2.010659

    2.063889

    1.965613

    1.977414

    1.974106

    1.99165

    1.955033

    1.93804

    1.934888

    1.958446

    2.200172

    1.981707

    2.017706

    1.912272

    1.975005

    1.901524

    1.989627

    1.950138

    1.944595

    1.909765

    1.952108

    1.941951

    1.909355

    1.956829

    1.985959

    1.960357

    1.897455

    1.923289

    1.933349

    1.945136

    1.908449

    1.921104

    1.898984

    1.935738

    1.939566

    1.962883

    1.959983

    2.004256

    1.95039

    1.990977

    1.988153

    1.971523

    1.931339

    1.936834

    1.929571

    1.932951

    1.927635

    1.907997

    1.914607

    1.961778

    1.96884

    1.914686

    1.93231

    1.92104

    1.930202

    2.006238

    1.974535

    1.971951

    1.958482

    1.995817

    1.942418

    1.970828

    1.934542

    1.97977

    1.964298

    1.93154

    1.93805

    1.958871

    1.985174

    1.934364

    1.970927

    1.945426

    1.93256

    1.944642

    1.898459

    1.924726

    1.975699

    1.933096

    1.956693

    1.90019

    1.976753

    1.950444

    1.986071

    1.976988

    1.949482

    1.984249

    1.942268

    1.923077

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.969842

    1.917033

    1.950176

    1.975876

    1.969254

    1.97008

    1.957495

    1.9133

    1.951175

    1.974283

    1.920863

    1.935784

    1.954725

    1.975135

    1.994313

    1.986376

    1.932574

    1.930701

    1.938055

    1.931742

    1.925482

    1.920733

    1.904041

    1.877132

    1.892237

    1.968057

    1.930821

    1.966495

    1.923515

    1.934028

    2.001933

    1.968827

    1.964655

    1.914891

    1.97013

    1.903098

    1.98663

    1.951209

    1.941464

    1.921714

    1.915151

    1.993102

    1.982194

    1.928612

    1.912259

    1.950094

    1.979134

    1.945588

    2.004483

    1.935869

    1.937256

    1.965171

    1.951399

    1.92098

    1.916989

    1.932824

    1.914597

    1.887689

    1.893165

    1.93146

    1.926887

    1.953792

    1.936362

    1.957064

    1.930316

    1.926294

    1.977252

    1.938591

    1.950371

    1.97272

    1.93755

    1.97443

    1.971506

    1.942103

    1.966804

    1.93183

    1.925058

    1.966637

    2.013649

    1.924295

    1.933238

    1.897978

    1.920966

    1.949399

    1.892507

    1.886588

    1.917741

    1.925794

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    1.927846

    1.896039

    1.921018

    1.93107

    1.906748

    1.9202

    1.913254

    1.91181

    1.974917

    1.902329

    1.88712

    1.93976

    1.904374

    1.980521

    1.920983

    2.048415

    1.932005

    1.996658

    1.950263

    1.963529

    1.931642

    1.914656

    1.921085

    1.922576

    1.98944

    1.920537

    1.962262

    1.894785

    1.959783

    1.929744

    1.91836

    1.917831

    1.930237

    1.906392

    1.95393

    1.952592

    1.961637

    1.925792

    1.92846

    1.987622

    1.949983

    1.967137

    1.922768

    1.938523

    1.915719

    1.963132

    1.891995

    1.911063

    1.951286

    1.929961

    1.918687

    1.943827

    1.952086

    1.987933

    1.92094

    1.949606

    1.923209

    1.921539

    1.885482

    1.923213

    1.944563

    1.981385

    1.910231

    1.939388

    1.897881

    1.999324

    1.905841

    1.924322

    1.924182

    1.906817

    1.920647

    1.913346

    1.918498

    1.989822

    1.937303

    1.90323

    1.918191

    1.910337

    1.909426

    1.944418

    1.948985

    1.907036

    1.953726

    1.923503

    1.949636

    1.96483

    1.935372

    1.943347

    1.945511

    1.976622

    1.900946

    1.90749

    1.906462

    1.993606

    1.919058

    1.970217

    1.925131

    1.965142

    1.922287

    1.931435

    1.936987

    1.938416

    1.967613

    1.907173

    1.892249

    Sheet1

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    0

    Timestep

    Simulation time per step (s)

    Without LB

    Sheet2

    Sheet3

  • Recovery Performance10 crashes128 processorsCheckpoint every 10 time steps

  • LeanMD with Apoa1 benchmark90K atoms8498 objects

  • Proactive Fault Tolerance

  • MotivationRun-time reacts to a failureProactively migrate from a processor about to failModern hardware supports early fault indicationSMART protocol, Motherboard temperature sensors, Myrinet interface cardsPossible to create mechanism for fault prediction

  • RequirementsResponse time should be as low as possibleNo new processes should be requiredCollective operations should still workEfficiency loss should be proportional to computing power loss

  • System Application is warned of impending fault via signalProcessor, memory and interconnect should continue to work correctly for sometime after warningRun-time ensures that application continues to run on the remaining processors even if one processor crashes

  • Solution DesignMigrate Charm++ objects off warned processorPoint to point message delivery should continue to workCollective operations should cope with the possible loss of multiple processorsModify the runtime system's reduction tree to remove the warned processor.Minimal number of processors should be affectedRuntime system should remain load balanced after a processor has been evacuated

  • Proactive FT: Current StatusStatusSupport for multiple faults ready; currently testing support for simultaneous faultsFaults simulated via signal sent to processCurrent version fully integrated to Charm++ and AMPIExample: sweep3d (MPI code) on NCSAs tungsten*Utilization after LB

  • How to UsePart of default version of Charm++No extra compiler flags requiredThis code does not get executed until a warning Any detection system can be plugged in Can send signal (USR1) to process on compute nodeCan call a method (CkDecideEvacPe) to evacuate a processorUsed with any Charm++ and AMPI programFor AMPI needs to be used with -memory isomalloc

  • FTL-Charm++Message Logging

  • MotivationCheckpointing not fully automaticCoordinated checkpointing is expensiveCheckpoint/Rollback doesnt scaleAll nodes are rolled back just because 1 crashedEven nodes independent of the crashed node are restarted*

  • DesignMessage LoggingSender side message loggingAsynchronous checkpointsEach processor has a buddy processorStores its checkpoint in the buddys memoryCheckpoint on its own (no barrier)*

  • Message to Remote Chares*Chare PsenderChare Qreceiver

    If has been seen earlier TN is marked as received Otherwise create new TN and store the

  • StatusMost of Charm++ and AMPI has been portedSupport for migration has not yet been implemented in the fault tolerant protocolParallel restart not yet implementedNot in Charm main branch*

  • Thank You!Free source, binaries, manuals, and more information at: http://charm.cs.uiuc.edu/Parallel Programming Lab at University of Illinois

    Today I will briefly present the work we have done in this in-memory double checkpoint project.We give it name FTC-Charm++.As high performance clusters grow in size, the MTBF shrinks, this make the issue of FT more and more important for application scalability.FT in the context of processor virtualization environment. This is because science and engineering applications often run for several hours or even daysLow cost:Low overheadCheap storage

    Applications is not mission critical.Need fast, low cost FT. Applications do not pay much overhead because of FT in fault free runs, however when fault happens, it wants the restart to be fast enough.

    It is our interestThis motivation defines some of the requirement in our design for fault toleranceHave very low overheadThe advantage of coordinated checkpointing protocol is that It is simpleFor fault-free execution, it impose no constant overhead, the only overhead paid is on the demand when application starts the checkpointing.To eliminate the stable storageTo make checkpointing efficient

    Our scheme is based on in-memory checkpointing and automatic restarting.To eliminate the needs for stable storage,When a crash happens, it can restart a program on the remaining processors. We dont assume there are extra processors to replace the crashed ones. This schema integrated with load balancer to keep the application speed so that the impact of a crash is low.In a more realistic scenario, when there is no extra processors

    We use coordinated checkpointing strategy.The user is responsible for starting checkpointing at a global consistent state.In checkpointing, the state of every object is stored to the memory of two different (buddy) processors.

    Coordinated: All processors coordinate their checkpoints to form a global consistent state

    How to reduce memory overhead?Allow a programmer to encapsulate the application data so that only the useful data is checkpointed.2. An application can choose to initiate checkpointing at a time when the memory foot-print is small.

    The restart protocol is Initiated by the failure of a physical processor.And then the program will roll back to the state preserved in the last checkpoints.Load balancer can be dynamically called soon after a crash. This helps to sustain the performance.

    Memory usage increase by a factor of 2.We have run a lot of tests to study the performance of our scheme. Our experiments were carried out on NCSA platinum cluster.This cluster is composed of 512 1Ghz processors.Each processor has 1.5GigaBytes of RAM.It is connected by Myrinet and 100MegaBit Ethernet. Log scaleVaried the problem size from 6.4MB to as big as 6GBFor LeanMD, we compared the simulation speed over timesteps with and without load balancer.This was running on 128 processors with checkpointing every 100 timesteps.

    The first figure is without load balancing.It shows that when a crashed happened, the simulation was slow down by a factor of 2althrough only one processor out of 128 processors is lost.

    The second figure was with load balancing. We can see that the simulation speed is almost the same as before crash.

    Run time with multiple

    Any time migrationResend message based on logDiscard log