Scalable Stochastic Programming

Cosmin G. Petra

Mathematics and Computer Science DivisionArgonne National Laboratory

petra@mcs.anl.gov

Joint work with Mihai Anitescu, Miles Lubin and Victor Zavala

Motivation

Sources of uncertainty in complex energy systems– Weather– Consumer Demand– Market prices

Applications @Argonne – Anitescu, Constantinescu, Zavala– Stochastic Unit Commitment with Wind Power Generation– Energy management of Co-generation– Economic Optimization of a Building Energy System

Stochastic Unit Commitment with Wind Power

Wind Forecast – WRF(Weather Research and Forecasting) Model– Real-time grid-nested 24h simulation – 30 samples require 1h on 500 CPUs (Jazz@Argonne)

1min COST

s.t. , ,

ramping constr., min. up/down constr.

p u dsjk jk jk

s j ks

sjk kj

windsjk

wik ksj

c c cN

p D s k

p D R s k

Slide courtesy of V. Zavala & E. Constantinescu

Wind farmThermal generator

Thermal Units Schedule? Minimize Cost

Satisfy Demand/Adopt wind power

Have a Reserve

Technological constraints

Optimization under Uncertainty Two-stage stochastic programming with recourse (“here-and-now”)

0 0 0)( , )( ,x x

Min f x Mi f xn x E_

subj. to.

0 1 2,0 0

i ix x

Min f x f

0 1, .. , ..

0, 0,k k k k

subj. to.1 2, , , S

( ) : ( ( ), ( ), ( ), ( ), ( ))A B b Q c

continuous discrete

Sampling

Statistical Inference

M batches

Sample average approximation (SAA)

0) ( ) (( )

x b AB x

subj. to.

Go to ”Insert (View) | Header and Footer" to add your organization, sponsor, meeting name here; then, click "Apply to All"

Solving the SAA problem – PIPS solver

Interior-point methods (IPMs)– Polynomial iteration complexity: (in theory)– IPMs perform better in practice (infeasible primal-dual path-following)– No more than 30-50 iterations have been observed for n less than 10 million– We can confirm that this is still true for n being hundred times larger

– Two linear systems solved at each iteration

– Direct solvers needs to be used because IPMs linear systems are ill-conditioned and needs to be solved accurately

– We solve the SAA problems with a standard IPM (Mehrotra’s predictor-corrector) and specialized linear algebra

– PIPS solver

)(O nL

Linear Algebra of Primal-Dual Interior-Point Methods

2T Tx Qx c x

subj. to.

Convex quadratic problem

T xQ Arhs

IPM Linear System

01 2 0

0 00 0

0 0 00 0 0 0 0 0 0

S ST T T T

H BB A

A A A H AA

Two-stage SP

arrow-shaped linear system(modulo a permutation)

Multi-stage SP

nested

S is the number of scenarios

The Direct Schur Complement Method (DSC) Uses the arrow shape of H

1.Implicit factorization 2. Solving Hz=r

2.1. Backward substitution 2.2. Diagonal Solve

1 11 1 1 10

2 22 2 2

10 20 01 2 0

T TS NS S S S

S c cS

L DH G L L

L L L L DG G G H L

Ti i i i

Tc c c

L D L H

L G L D

, 1, ,i i i

Lw r i S

w L r wL

1 , 0,...,i i iv iD Sw

0 0 1,, .

T Ti i i i i S

L vz L z

2.3. Forward substitution

Parallelizing DSC – 1. Factorization phase

, , ,T Ti i i i i i

SL D L H L G L D

, , ,T Ti i i i i i

SL D L H L G L D

T Ti i i i i i i i p

L D L H L G L D i

c cD LL C 2. Triangular solves

Process 1

Process 2

Process p

Process 1

Factorization of the 1st stage Schur complement matrix = BOTTLENECK

Sparse linear algebraMA57

Dense linear algebraLAPACK

Parallelizing DSC – 2. Triangular solvesProcess 1

Process 2

Process p

,i i i

T Ti i i i

D wv i

L v L z

1 1 10 0

0 0 0, ,

w r vL z vwD L

Process 1

2,i i i

T Ti i i i

D wv i

L v L z

Process 2

,i i i

T Ti i i i

D wv i

L v L z

Process p

1st stage backsolve = BOTTLENECK

izatio

Sparse linear algebra Sparse linear algebra

Dense linear algebra

Implementation of DSC

Fact Backsolves

Dense fact backsolve

MPI_Allreduce

forw.subst.Dense solve

Dense fact backsolve

forw.subst.Dense solve

Factorization Triangular solves

Proc 1

Proc 2

Proc p

Computations are replicated on each process.

Scalability of DSC

Unit commitment 76.7% efficiency

but not always the case

Large number of 1st stage variables: 38.6% efficiency

on Fusion @ Argonne

BOTTLENECK SOLUTION 1: STOCHASTIC PRECONDITIONER

The Stochastic Preconditioner The exact structure of C is

IID subset of n scenarios:

The stochastic preconditioner (P. & Anitescu, in COAP 2011)

For C use the constraint preconditioner (Keller et. al., 2000)

T T Ti i i i

iiQ A B Q B A A

1 2{ , , , }nk k k K

i i i ii

k k k kki

T TnS Q A B B AQ

Implementation of PSC

MPI_Reduce(to proc p+1)

Fact Backsolves

Dense fact of prcnd.

MPI_Allreduce

backsolve

Backsolve

MPI_Reduce(to proc 1)

Krylov solve

Prcnd tri. slv. comm

forw.subst.Comm

forw.subst.

MPI_Bcast

Proc 1

Proc 2

Proc p

Proc p+1

Proc 1

Proc 2

Proc p

Proc p+1

REMOVES the factorization bottleneckSlightly larger solve bottleneck

Fact Backsolves

The “Ugly” Unit Commitment Problem

DSC on P processes vs PSC on P+1 processOptimal use of PSC – linear scaling

Factorization of the preconditioner can not behidden anymore.

• 120 scenarios

Quality of the Stochastic Preconditioner

“Exponentially” better preconditioning (P. & Anitescu, 2011)

Proof: Hoeffding inequality

Assumptions on the problem’s random data1. Boundedness2. Uniform full rank of and

24 24Pr(| ( exp) 1| ) 2

||2 ||n SS max

p L SS pS

)(A )(B

1i i i ii

T Tn k

k k k ki

S Q A B BQ An

S i i i iT T

iiS Q A B BQ A

not restrictive

Quality of the Constraint Preconditioner

has an eigenvalue 1 with order of multiplicity .

The rest of the eigenvalues satisfy

Proof: based on Bergamaschi et. al., 2004.

1M C 2r

ax1 1( ) ( ) ( )0 .min n S n SS S M C S S

The Krylov Methods Used for

BiCGStab using constraint preconditioner M

Preconditioned Projected CG (PPCG) (Gould et. al., 2001)– Preconditioned projection onto the

– Does not compute the basis for Instead,

0 0 0 0T T

nP S ZZ Z Z

0.KerA

1 10 0 0 0 0 0( )T

Ny xA rA A S

Pr .00

Tn g rS A

is computed from

200 00

xS A r

0 0Cz =r

0Z 0 .KerA

Performance of the preconditioner

Eigenvalues clustering & Krylov iterations

Affected by the well-known ill-conditioning of IPMs.

( ) , where and S

( ( )) ( ) ) ) )( ( ( )(

S Q A D

SOLUTION 2: PARALELLIZATION OF STAGE 1 LINEAR ALGEBRA

Parallelizing the 1st stage linear algebra

We distribute the 1st stage Schur complement system.

C is treated as dense.

Alternative to PSC for problems with large number of 1st stage variables.

Removes the memory bottleneck of PSC and DSC.

We investigated ScaLapack, Elemental (successor of PLAPACK)– None have a solver for symmetric indefinite matrices (Bunch-Kaufman);– LU or Cholesky only.– So we had to think of modifying either.

Q dense symm. pos. def., 0A sparse full rank.

Cholesky-based -like factorization

Can be viewed as an “implicit” normal equations approach.

In-place implementation inside Elemental: no extra memory needed.

Idea: modify the Cholesky factorization, by changing the sign after processing p columns.

It is much easier to do in Elemental, since this distributes elements, not blocks.

Twice as fast as LU

Works for more general saddle-point linear systems, i.e., pos. semi-def. (2,2) block.

, where 00

T T TT T

QQ LL AQ

L I LA

L AALL

A I LLA L

Distributing the 1st stage Schur complement matrix

All processors contribute to all of the elements of the (1,1) dense block

A large amount of inter-process communication occurs. Each term is too big to fit in a node’s memory.

Possibly more costly than the factorization itself.

Solution: collective MPI_Reduce_scatter calls• Reduce (sum) terms, then partition and send to destination (scatter) • Need to reorder (pack) elements to match matrix distribution• Columns of the Schur complement matrix are distributed as they are

calculated

1 T Ti i i i

Q A B BQ QS

DSC with distributed first-stage

Fact BacksolvesComm

backsolve

MPI_Reduce_scatter

MPI_Allreduce

forw.subst.

backsolve

forw.subst.

Proc 1

Proc 2

Proc p

ELEMENTAL

Fact Backsolves

Schur complement matrix is computed and reduced block-wise. (B blocks of columns )

For each b=1:B

1 2, , , BB B B

Reduce operations

Streamlined copying procedure - Lubin and Petra (2010) Loop over continuous memory and copy

elements in send buffer Avoids divisions and modulus ops needed to

compute the positions

“Symmetric” reduce for Only lower triangle is reduced

Fixed buffer size A variable number of columns reduced.

Effectively halves the communication (both data & # of MPI calls).

Large-scale performance

First-stage linear algebra: ScaLapack (LU), Elemental(LU), and

Strong scaling of PIPS with and 90.1% from 64 to 1024 cores 75.4% from 64 to 2048 cores > 4,000 scenarios On Fusion Lubin, P., Anitescu, in OMS 2011

SAA problem:1st stage variables: 82,000

Total #: 189 millionThermal units: 1,000Wind farms: 1,200

TLDLLU

Towards real-life models – Economic dispatch with transmission constraints

Current status: ISOs (Independent system operator) use– deterministic wind profiles, market prices and demand– network (transmission) constraints– Outer 1-h timestep 24 horizon simulation– Inner 5-min timestep 1h horizon corrections

Stochastic ED with transmission constraints (V. Zavala et. al. 2010)– Stochastic wind profiles & transmission constraints– Deterministic market prices and demand– 24 horizon with 1h timestep– Kirchoff’s laws are part of the constraints– The problem is huge: KKT systems are 1.8 Bil x 1.8 Bil

Generator Load node (bus)

Solving ED with transmission constraints on Intrepid BG/P 32k wind scenarios (k=1024) 32k nodes (131,072 cores) on Intrepid BG/P Hybrid programming model: SMP inside MPI

– Sparse 2nd-stage linear algebra: WSMP (IBM) – Dense 1st-stage linear algebra: Elemental with SMP BLAS + OpenMP for

packing/unpacking buffer. For a 4h Horizon problem very good strong scaling Lubin, P., Anitescu, Zavala – in proceedings of SC 11.

Stochastic programming – a scalable computation pattern

Scenario parallelization in a hybrid programming model MPI+SMP– DSC, PSC (1st stage < 10,000 variables)

Hybrid MPI/SMP running on Blue Gene/P– 131k cores (96% strong scaling) for Illinois ED problem with grid constraints. 2B

variables, maybe largest ever solved?

Close to real-time solutions (24 hr horizon in 1 hr wallclock)– Further development needed, since users aim for

• More uncertainty, more detail (x 10)• Faster Dynamics Shorter Decision Window (x 10)• Longer Horizons (California == 72 hours) (x 3)

Thank you for your attention!

Questions?

Scalable Stochastic Programming

Documents

Stochastic Programming Bibliography - RUG

Scalable inference for a full multivariate stochastic ...Scalable inference for a full multivariate stochastic volatility model P. Dellaportas 1, A. Plataniotisy2 and M. K. Titsiasz3

Scalable Multi-Stage Stochastic Programming Cosmin Petra and Mihai Anitescu Mathematics and Computer Science Division Argonne National Laboratory DOE Applied

Quadratic Assignment Problem Stochastic Programming

Clearing the Jungle of Stochastic Optimization · stochastic programming, (approximate) dynamic programming, simulation, and stochastic search. What have previously been viewed as

Stochastic Programming - LANCS Initiative

Stochastic Programming Applications in Finance

Statement of stochastic programming problems

Philpott Stochastic Programming

Scalable inference for a full multivariate stochastic volatility

Scalable Stochastic Reachability: Theory, Computation, and

An introduction to stochastic programming

Stochastic Programming

MIXED INTEGER PROGRAMMING APPROACHES FOR NONLINEAR AND STOCHASTIC PROGRAMMING · 2017-10-17 · MIXED INTEGER PROGRAMMING APPROACHES FOR NONLINEAR AND STOCHASTIC PROGRAMMING Approved

Approximation in Stochastic Integer Programming

Scalable Stochastic Programming Cosmin G. Petra Mathematics and Computer Science Division Argonne National Laboratory petra@mcs.anl.gov Joint work with

Scalable Graph Clustering using Stochastic Flows · Venu Satuluri and Srinivasan Parthasarathy Scalable Graph Clustering using Stochastic Flows Problem Statement Graph Clustering:

Shapiro_A Tutorial on Stochastic Programming

Stochastic Programming: Optimization When Uncertainty Mattersise.tamu.edu/INEN689-602/Papers/HigleInforms05.pdf · Stochastic Programming: Optimization When Uncertainty Matters

Multistage Stochastic Programming: A Scenario Tree Based ... · Multistage stochastic programming (the extension of stochastic programming to sequential decision making) is challenging