30
CHEP2000 - Feb 8 , 2000 Vicky White 1 The Data Access Layer for D0 Run II Design and Features of SAM Vicky White for the SAM team Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich Wellner, Steve White, Sinisa Veseli

The Data Access Layer for D0 Run II Design and Features of SAM

  • Upload
    waneta

  • View
    30

  • Download
    0

Embed Size (px)

DESCRIPTION

The Data Access Layer for D0 Run II Design and Features of SAM. Vicky White for the SAM team Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich Wellner, Steve White, Sinisa Veseli. SAM Overview. - PowerPoint PPT Presentation

Citation preview

Page 1: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 1

The Data Access Layer for D0 Run II

Design and Features of SAM

Vicky Whitefor the SAM team

Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich

Wellner, Steve White, Sinisa Veseli

Page 2: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 2

• SAM stands for Sequential Access Model. It is a major part of the D0 Data Handling System handling all data access :– from the writing of RAW event data files directly from the online,

– to the reading of whole data-streams into the Farms for reconstruction and writing of reconstructed data

– to read/write access to Thumbnail data and datasets of AOD

– to read access to RAW or reconstructed data for a single event

• Other services and layers of functionality are needed to build the entire Data Handling System. These include – access to tapes and movement of data from tape to disk

– regulation of analysis jobs through a batch system (where needed)

– management of production jobs through a Farm system

SAM Overview

Page 3: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 3

e.g. D0 file access package d0omD0 Framework package calls

E.g. Stage files Inter-Station

Hardware Resources

User Programs/Jobs for Analysis and writing of files

Farms Production System

Batch System(s)/Job Scheduler

Data Access (SAM)

Storage Management (ENSTORE)

Operator Interaction

Software Layers for Data Management and Analysis

CPUs and scratchdisk

e.g. Analysis job resourcesand parameters, stages of jobs

e.g. Farm specific job hooks

Tape Shelves

Tape Robots Tape

Drives

Local Disk Network Disk

Page 4: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 4

D0 Hardware Environment

Page 5: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 5

• Cluster data into tertiary storage in a manner corresponding to expected access patterns.

• Cache frequently accessed data on disk.

• Organize data access to optimize use of robot, tape drive, and network resources.

• Carefully track locations and processing steps for all data.

• Estimate the resources required before access requests are initiated.

• Provide user interfaces which integrate easily into data processing and analysis activities - ie d0 framework programs.

Goals and Strategies

Page 6: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 6

Organization of data - an optimization strategy

Physical Clustering

File & Event Catalog Database ties together all data tiers

EventDataTiers

WarmCache

User and physics group(derived) data

WarmCache

Page 7: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 7

Catagorize Access patterns - handling built into SAM

Mass Storage Type of data/mode of processing Data Consumers

=Disk Storage

=Tape Storage

=File

=Event

=Data flow =Group of Users

=Single User=Pipeline Name

Freight Train

Pick Event

User File

Thumbnail

Farm reconst. One user many jobs

Thumbnail

Page 8: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 8

Enstore• Enstore is the storage management system (E176 -

next talk)• Provides:

– encp - copy files between media and disk

– pnfs - namespace of files (courtesy DESY)

– enstore - user control interfaces to Enstore system

– web interface - details of what the system is doing

– enstore_tape - volume import mechanism

• Enstore provides scalable system - designed for each mover/tape combination to read/write data to tape at full tape rate (~ each 10MB/sec expected)

Page 9: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 9

SAM Database knows all!

The SAM database tracks information data about • Events, Files, File locations, File processing and

– lineage, meta-data, how/when analysed known for every file

• Also configuration and operational data for SAM itself– configure and control resources, set caching policies, etc….

– support robustness features -- restart

– track and understand performance of system itself

The database keeps excellent track of the correlation between “Physics Data” and “Conditions Data”.– Support for Data export and remote sites

Page 10: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 10

SAM Simplified Schema- Event/File meta-data part

FilesID

NameFormat

Size# Events

EventsID

Event NumberTrigger L1Trigger L2Trigger L3

Off-line FilterThumbnail

Volume

Project

Data Tier

PhysicalData Stream

TriggerConfiguration

Creation &Processing

Info

Run

Event-FileCatalog

Run Conditions

Luminosity

Calibration

Alignment

Page 11: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 11

DAQ-L3

Reco FarmFreight train

processing

Thumbnailprocessing

Bad

Good

Thumbnail

Analysisprocessing

Pick Eventsselection

Archived

Other streams

EDU50,250

Other

Typical file split

Potential filemerge location

Potential filemerge location

Reprocessing

Processing Chains including Reprocessing/Merging/Splitting

Page 12: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 12

The database is implemented in Oracle - why …

• robust and mature product - essential to stability -- need high availability system

• size of database will be large (including Event Catalog) up to 0.5 TB or more - needs partition/backup/index support for large databases

• many design tools and monitoring,backup and tuning tools available– Using Oracle Enterprise Manager to manage and monitor

databases

Page 13: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 13

SAM is a fully distributed client/server system

• both internal and client interfaces defined in IDL and implemented using CORBA. This gives us– simple clients

– support for multiple languages -- C++, Python, Java

– support for multiple platforms right from start - stay on top of technology changes (Linux, IRIX, OSF1, Sun current platforms)

Page 14: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 14

SAM command -> Servers

sam command Station Master

Database Server

manages disk cache and all projects on a single ‘Station’.Interfaces with Batch system

arranges the deliveryof the set of files for a single project - or stores a file,records locationweb page/GUI

supplies information,resolves queries,records transactionsand file information

Project Masteror

File Storage Server

Page 15: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 15

Behind the scenes are more servers ...

Log

Info

Station

Project orFile Storage

Database

Optimizer

Stager(s)Program which copiesor ‘gets’ a file for youwhen it is not in the

local disk cache

CORBA Name Server

Page 16: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 16

Client/Server and CORBA

• Only one ‘singleton’ - Optimizer -- must look globally to control access to the Robot

• Easy to run many parallel universes– production, development, integration, Mary’s test, etc.

• Many CORBA products -- chose 2 freeware ones– Orbacus (for C++ and Java) http://www.ooc.com

– Fnorb (for Python) http://www.fnorb.org

• Servers are currently all in C++ or Python• clients exist for all 3 languages

Page 17: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 17

SAM clients• programs or people that use the services of SAM

to store or retrieve data, categorize it, browse it, configure SAM resources and policies,etc.

Page 18: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 18

Using SAM

• SAM (from user perspective) is just a few useful commands– all are available on the command line

– a few from a web-GUI (define project etc.)

– some (more later) are available in from within d0reco or any other d0 framework program

• The SAM database can be queried and browsed extensively

Page 19: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 19

SAM user commands - e.g.

sam create project definition < defin. params>

sam create project snapshot <project params>

sam create analysis project <project params>

sam verify snapshot <snap params >

Page 20: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 20

SAM user commands e.g.

sam start project <…>

sam start consumer <…>

sam start process <…>

sam get next file <…>

sam release < file params…>

sam store <file and file metadata params…>

sam locate <file>

sam dump <project>

… and many more …..

a

Page 21: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 21

SAMManager and Framework and d0om (persistency)

SAM interaction through

a) name expanders - used by d0StreamName

b) File Open/Close messages generated by ReadEvent and WriteEvent

sam: in file name will be resolved by a SAM name expander --> SAM Servers to get next file, or get place/name for output file

Page 22: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 22

Constraint to SQL Helper

Page 23: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 23

Page 24: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 24

Page 25: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 25

Usage and Status• System is documented and has been used to store

over 1.5TB of MonteCarlo Data (see E311)• Most of the files have been fetched and processed

through a 50 node reconstruction Farm, using SAM (E60 - Monday)

• Data is added to the system daily-- produced at Fermilab and by collaborators in France, Amsterdam, Prague and ftp’d to Fermilab along with the required meta-data description file. A feature to store parameter files along with the data files has recently been implemented

Page 26: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 26

Performance and Robustness• we have started to build a serious test harness to emulate

the entire load - from online --> random end users

• promising performance numbers so far -- easily got 20MB/sec into Origin with only 1 Gbit Ethernet and limited number of current generation tape drives (3MB/sec)

• the test harness runs with tens of projects, each delivering cached and tape resident files, each with tens of consumer analysis processes gaining access to all of them

• Farms also got up to 20MB/sec. 50 nodes all requesting files - this is the required rate during a run (but of course we will test to saturation point of Farm nodes)

Page 27: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 27

Page 28: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 28

What next?• strong focus on testing and robustness

– aiming for a very high availabilty system -- all servers restart, clients recover, etc.

• need to integrate with the Batch system and address the resulting resource management issues

• will add pick-of single event data feature and test use of Event Catalog more extensively

• more support for outside Fermilab to use system and/or set up their own

Page 29: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 29

Analysis outside Fermilab, using SAM

• In addition to your program, which must talk to a SAM Project Server and Database Server somewhere, and may need to have files staged, you will need

Calibration Data

Alignment Data

Geometry Data

RCP Data

get throughd0om

RCP manager

dspack files

interface to a Database Server

Other I/o possib.

extracted RCP files

interface to a Database Server

Page 30: The Data Access Layer for D0 Run II Design and Features of SAM

CHEP2000 - Feb 8, 2000

Vicky White 30

Conclusions• We have a working system to first order

• Users are starting to use it

• Making it robust and highly available is high on our priority list

– many aspects from Robot hardware, operating systems, network components, database server machine , Enstore movers and servers and software and last of all SAM servers and client code which sits atop all of that

• Involving off-site users and planning to provide access to data for all is starting now and will be there before we run