21
LHCb Experience with LFC LHCb Experience with LFC Database Replication Federico BONIFAZI (INFN-CNAF) Angelo CARBONE (INFN CNAF / LHCb Bologna) Angelo CARBONE (INFN-CNAF / LHCb Bologna) Barbara MARTELLI (INFN-CNAF) Gianluca PECO (INFN-Bologna / LHCb Bologna)

LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

LHCb Experience with LFCLHCb Experience with LFC Database Replication

Federico BONIFAZI (INFN-CNAF) Angelo CARBONE (INFN CNAF / LHCb Bologna)Angelo CARBONE (INFN-CNAF / LHCb Bologna)

Barbara MARTELLI (INFN-CNAF)Gianluca PECO (INFN-Bologna / LHCb Bologna) ( g g )

Page 2: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Overview

LHCb LFC replica implementation in collaboration with 3D project.LFC role in LHCb computing modelOracle technologies deployedOracle technologies deployedProduction setup

Single replica setup: functionality, scalability and stability testsTests description and goalsTests results

Multi-replicas setup: functionality, scalability and stability testsMulti replicas setup: functionality, scalability and stability tests Tests description and goals: does the setup scale with more than one replica?Tests resultsTests results

Conclusions

Barbara Martelli CHEP 2007 2

Page 3: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

LFC in LHCb Computing model

Every LHCb application running at T0/T1/T2 needs to read/write from the LFC Every T1+CERN will run reconstruction using information stored in the conditionsEvery T1+CERN will run reconstruction using information stored in the conditions databaseThe LHCb computing model foresees the LFC and conditions database replication at each T1The database replication becomes quite important in order to assure

ScalabilityGeographic redundancyF lt t l

Barbara Martelli CHEP 2007 3

Fault tolerance

Page 4: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

LHCb LFC usage and test

Monte Carlo simulationMonte Carlo simulationTransfer output from a MC job to one or more Storage Element and register the file in the catalogue (write)

Data processing (raw data reconstruction analysis stripping etc )Data processing (raw data reconstruction, analysis, stripping, etc…)Send the job to the T1 site where the data are available and produce an output to be registered (read/write)

Data transferData transferfind the replica to transfer, perform the transfer and register the new destination (read/write )

EtEtc… In order to efficiently use a replicated database it is mandatory that master and replica database are synchronized with low latency

Measure the latency between source a destination databases.LHCb requirements not dramatically strict: less than 30 minutes

Barbara Martelli CHEP 2007 4

Page 5: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Database Deployment for LFC

At each site the LFC backend databases are implemented using high availability technologies:availability technologies:

Storage level: protection from disk failures is achieved using Oracle Automatic Storage Management (ASM) on a Storage Area Network. Database level: Oracle Real Application Cluster allows sharing ofDatabase level: Oracle Real Application Cluster allows sharing of database across multiple instances.Replication Level: Oracle Streams enables the propagation and management of data transactions and events in a data stream frommanagement of data, transactions and events in a data stream from one database to another

•RAC

•Failover/Load Balancing

•ASM or RAID

Barbara Martelli CHEP 2007 5

•ASM or RAID mirroring/striping

Page 6: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Streams ReplicationCAPTURE: Source database events are captured, filtered and stored in LCR

(Logical Change Record).

STAGING: Streams publishes captured LCR into a staging areaMaster DB

Replica DBQueue

Capture

Queue A l

areaImplemented as a queueUse a temporary buffer in order to quickly access to the queue

Redo Log

Queue ApplyLCR

the queue

If the filling rate become too high, the buffer of the Streams queue becomes full and Oracle needs to write

Database Objects Database Objects

and Oracle needs to write the LCR on the disk (persistent part of the queue). This decreases performances. (Spill Over)Database Objects

APPLY

Barbara Martelli CHEP 2007 6

APPLY: Staged events are consumed by subscribers to the destination database

Page 7: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

LHCb LFC Replication deploymentCERN-CNAFCERN CNAF

CERN CNAFRead Only Clients Read Only Clients

LFC R-O

Population Clients

LFC R-O ServerLFC R-WServer LFC R W

LFC R O Server

LFC R-O ServerServer LFC R-W

Server

2 nodes ClusterReplica Oracle DB

6 nodes ClusterMaster Oracle DB

Oracle Streams

Barbara Martelli CHEP 2007 7

Replica Oracle DBMaster Oracle DBWAN

Page 8: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

LHCb LFC Replication Tests

Two different tests have been realized to evaluate th ti l t b t th t d li t d d t bthe time latency between the master and replicated databasethe performance of the LFC front-end with writing/deleting operations as a function of increasing number of clients

Python scripts using LFC API functions add files and replica to lfc-lhcb.cern.ch (Master database) Tests perform with increasing number of simultaneously writing and deletingTests perform with increasing number of simultaneously writing and deleting clients 10,20,40,76 For each number of clients (10,20,40,76) added:

8K fil d 10 li f h fil ( i il t LHCb ) T t I8K files and 10 replica for each file (similar to LHCb usage) Test I16K files and 25 replica for each file (beyond LHCb usage) Test IIThe load is uniformly distributed over the clients

Barbara Martelli CHEP 2007 8

Page 9: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Strmmon web monitoring tool

Most of the measurements and plots shown are taken from Strmmon: th ffi i l St it i t l i th 3D j tthe official Streams monitoring tool in the 3D project.

http://itrac315.cern.ch:4889/streams/streams.phpThe tool plots the monitoring streams quantities on web previouslyThe tool plots the monitoring streams quantities on web previously stored on a dedicated repository database Very useful to measure

The total LCR latency (time elapsed between the creation of the LCR at the master and the apply to the destination database)LCR rate (captured, queued, dequeued, applied)

Barbara Martelli CHEP 2007 9

Page 10: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Test results [TEST I]

TEST I : add and delete 8K files (10 replicas for each file) with 10 20 40 76 parallel clients (~90K entries)10,20,40,76 parallel clients ( 90K entries)

Used two tablesCNS_FILE_METADATACNS FILE REPLICA

650Delete entries (10 clients)

CNS_FILE_REPLICA

R/s

350

240 0[s]

Add entries (10 clients)

LCR

Latency (delete, 10 clients)20

R/s

140

0[s]

12LCR

econ

ds

Barbara Martelli CHEP 2007 10

0 [s]

Se

Page 11: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Test results [Test I]

550

Add entries (76 clients) delete entries (76 clients)

R/s

550

1000

LCR

/s LCR

0 0

200 500

[s]0

[s]

Latency (add , 76 clients) Latency (delete , 76 clients)y ( )

16 20

12

0Seco

nds

Seco

nds 12

Barbara Martelli CHEP 2007 11

0 sec[s] 0 [s]

S S

Page 12: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Test results [Test I]

Average LCR/s as function of clients Average latency as function of clients

500

600

LCR

/s

12

14

16

ency

[s]

300

400▲ delete entries■ add entries

6

8

10Late

▲ delete entries

0

100

200

0

2

4 ■ add entries

010 20 40 76

[clients]10 20 40 76

[clients]

• Linear growing of LCR rate as a function of writing and deleting clients• Latency, stable (12/13 sec. ), independent from the number of clients

Barbara Martelli CHEP 2007 12

Page 13: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Test result [Test II]

Test II: add and delete 16K files (25 replicas for each file) withTest II: add and delete 16K files (25 replicas for each file) with 10,20,40,76 parallel clients (~560K entries)

Adding more replica per file increase the LCR rate

delete entries (76 clients) Test I delete entries (76 clients) Test II

1500 2000

10001000

0 0LCR

/s

1000

LCR

/s

0[s] [s]

Barbara Martelli CHEP 2007 13

Page 14: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Test result [Test II]

1000 90

Average LCR/s as function of clients Average latency as function of clients

600

700

800

900

50

60

70

80

LCR

/s

▲ delete entries■ add entries

▲ delete entries■ add entriesat

ency

[s]

100

200

300

400

500

10

20

30

40

50■ add entries L

Linear increasing of LCR rate

0

100

10 20 40 760

10

10 20 40 76[clients] [clients]

gbut with 76 clients the LFC front-end becomes a limit

At the rate of 900 LCR/s the replication starts to accumulate latency/OIncrease the I/O at the source database due to grow activities

the latency is still much better than LHCb requirements

Barbara Martelli CHEP 2007 14

Page 15: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Multi-Replica Setup

•LFC Production

ReplicaReplicaLHCb computing model foresees 6 LFC read only replicas at T1s: CNAF, GRIDKA , IN2P3, PIC, RAL,

•Database• replica

•connected to

, , , , ,SARA.

•CERN•via

•Streams•but LFC

•frontendsfrontends •not

•setupAt the moment:

One LFC replica in production at CNAF: frontend and backend deployedLFC replica backends connected to CERN, but LFC frontend not yet deployed at GRIDKA , IN2P3, PIC, RAL.LFC d t b li t t d l d

Barbara Martelli CHEP 2007 15

•Replica setup to be done

LFC database replica not yet deployed at SARA

Page 16: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Multi-replica Setup: Scalability and Stability TestsStability Tests

Scalability and stability tests performedinserting entries in the LFC front-end at CERN;

i i h li i d l d i h i i CNAFmonitoring the replication speed, latency and sinchronization at CNAF, GRIDKA , IN2P3, PIC, RAL, SARA.

While tests with CNAF replica where performed reading the entries from the LFC front-end at T1 now we need to read directly from thefrom the LFC front-end at T1, now we need to read directly from the database back-ends because LFC front-ends are not yet deployed. This fact doesn’t impact the results at all.The same python test suite written for the single-replica test is usedThe same python test suite written for the single-replica test is used.Scalability test:

8K files are inserted plus 10 replicas for each file (similar to LHCb usage). 10, 20, 40, 76 threads per LFC are used: near to the maximum (80 threads) in present deployment.Comparison with previous tests (done in a single replica setup).p p ( g p p)

Stability test: the same script is run with 76 clients and 100K file (plus 10 replicas for each file) The files are first added and after a pouse of 5 minutes removed

Barbara Martelli CHEP 2007 16

each file). The files are first added and after a pouse of 5 minutes, removed from the catalog. This operation puts to work the LFC for ~1:30 hours.

Page 17: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Scalability Test Results

850Add entries (76 clients)

800

delete entries (76 clients)800

CR

/s

LCR

/s

0 4 0

LC

05 :17

210

07:1 0

Latency (add , 76 clients)36

Latency (delete , 76 clients)

07:0 07 07:

36 60

nds

ds

70

Seco

n

Seco

nd

45 21

Barbara Martelli CHEP 2007 17

07:10 0

07:1

4

07:0

5

07:2

Page 18: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Stability Tests: Apply Speed

Reached the speed 1K LCR/s (not less than 1 replica tests)

LCR/S Apply Speed: GRIDKA

1-replica tests).No spilling has been detected during the tests.

1000

500

00 0

Replication rate sustained for 1 hour and half.Plots have the same shape: all replicas

18:4

00

20:1

0

LCR/S

TimelineApply Speed: IN2P3

Plots have the same shape: all replicas behave in the same way.Peaks are due to queue filling up and

1000

500

0emptying.

LCR/S LCR/S

18:4

00

20:1

0

LCR/S

TimelineApply Speed: RAL Apply Speed: CNAF Apply Speed: PIC

1000

500

1000

500

1000

500

Barbara Martelli CHEP 2007 18

18:4

00

20:0

5

18:4

00

20:1

0

18:4

00

20:1

0

Timeline Timeline Timeline

Page 19: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Stability Tests: Replication Latency

Latency during the stress test varies from

Latency: GRIDKA55

Seconds

Latency during the stress test varies from ~15 s to ~55 seconds (peaks).Considering that latency during low load 0 10

35

15

periods is about 10-15 seconds, stress tests impact on latency is very low.

18:4

0

20:10

Latency: IN2P355

Seconds

0 0

35

15

18:4

0

20:10

Latency: CNAFLatency: RAL Latency: PICSeconds555555SecondsSeconds

Test Start

Test End35

1515

35

15

Barbara Martelli CHEP 2007 19

18:4

0

18:4

0

18:4

0

20:1

0

20:1

0

20:1

00 00

0

Page 20: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

No spilling during the stress tests!

So at the maximum load on LFC weload on LFC we don’t stress Streams.

Barbara Martelli CHEP 2007 20

Page 21: LHCb Experience with LFCLHCb Experience with LFC Database ...lhcb-doc.web.cern.ch/lhcb-doc/presentations/conferencetalks/postscript/2007... · LHCb LFC usage and test MonteCarlosimulationMonte

Conclusions

High Availability is a key issue for database services and is well dd d b t O l t h l iaddressed by present Oracle technologies.

3D project has successfully deployed such technologies achieving good stability and reliability of the service at CNAF as a pilot site, now to all the y y p ,other T1 centres . Adding replicas to the setup doesn’t impact Streams replication performances:performances:

Latency doesn’t grow.Replication speed doesn’t decrease.

All T1’s behave in the same way: Plots about replication speed and latency are pretty much the same

Streams replication is not a bottleneck on LFC performancesStreams replication is not a bottleneck on LFC performances. LHCb requirements about latency and performances are largerly met.

Barbara Martelli CHEP 2007 21