Upload
waneta
View
30
Download
0
Tags:
Embed Size (px)
DESCRIPTION
The Data Access Layer for D0 Run II Design and Features of SAM. Vicky White for the SAM team Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich Wellner, Steve White, Sinisa Veseli. SAM Overview. - PowerPoint PPT Presentation
Citation preview
CHEP2000 - Feb 8, 2000
Vicky White 1
The Data Access Layer for D0 Run II
Design and Features of SAM
Vicky Whitefor the SAM team
Lee Lueking, Vicky White, Heidi Schellman, Igor Terekhov, Matt Vranicar, Julie Trumbo, Rich
Wellner, Steve White, Sinisa Veseli
CHEP2000 - Feb 8, 2000
Vicky White 2
• SAM stands for Sequential Access Model. It is a major part of the D0 Data Handling System handling all data access :– from the writing of RAW event data files directly from the online,
– to the reading of whole data-streams into the Farms for reconstruction and writing of reconstructed data
– to read/write access to Thumbnail data and datasets of AOD
– to read access to RAW or reconstructed data for a single event
• Other services and layers of functionality are needed to build the entire Data Handling System. These include – access to tapes and movement of data from tape to disk
– regulation of analysis jobs through a batch system (where needed)
– management of production jobs through a Farm system
SAM Overview
CHEP2000 - Feb 8, 2000
Vicky White 3
e.g. D0 file access package d0omD0 Framework package calls
E.g. Stage files Inter-Station
Hardware Resources
User Programs/Jobs for Analysis and writing of files
Farms Production System
Batch System(s)/Job Scheduler
Data Access (SAM)
Storage Management (ENSTORE)
Operator Interaction
Software Layers for Data Management and Analysis
CPUs and scratchdisk
e.g. Analysis job resourcesand parameters, stages of jobs
e.g. Farm specific job hooks
Tape Shelves
Tape Robots Tape
Drives
Local Disk Network Disk
CHEP2000 - Feb 8, 2000
Vicky White 4
D0 Hardware Environment
CHEP2000 - Feb 8, 2000
Vicky White 5
• Cluster data into tertiary storage in a manner corresponding to expected access patterns.
• Cache frequently accessed data on disk.
• Organize data access to optimize use of robot, tape drive, and network resources.
• Carefully track locations and processing steps for all data.
• Estimate the resources required before access requests are initiated.
• Provide user interfaces which integrate easily into data processing and analysis activities - ie d0 framework programs.
Goals and Strategies
CHEP2000 - Feb 8, 2000
Vicky White 6
Organization of data - an optimization strategy
Physical Clustering
File & Event Catalog Database ties together all data tiers
EventDataTiers
WarmCache
User and physics group(derived) data
WarmCache
CHEP2000 - Feb 8, 2000
Vicky White 7
Catagorize Access patterns - handling built into SAM
Mass Storage Type of data/mode of processing Data Consumers
=Disk Storage
=Tape Storage
=File
=Event
=Data flow =Group of Users
=Single User=Pipeline Name
Freight Train
Pick Event
User File
Thumbnail
Farm reconst. One user many jobs
Thumbnail
CHEP2000 - Feb 8, 2000
Vicky White 8
Enstore• Enstore is the storage management system (E176 -
next talk)• Provides:
– encp - copy files between media and disk
– pnfs - namespace of files (courtesy DESY)
– enstore - user control interfaces to Enstore system
– web interface - details of what the system is doing
– enstore_tape - volume import mechanism
• Enstore provides scalable system - designed for each mover/tape combination to read/write data to tape at full tape rate (~ each 10MB/sec expected)
CHEP2000 - Feb 8, 2000
Vicky White 9
SAM Database knows all!
The SAM database tracks information data about • Events, Files, File locations, File processing and
– lineage, meta-data, how/when analysed known for every file
• Also configuration and operational data for SAM itself– configure and control resources, set caching policies, etc….
– support robustness features -- restart
– track and understand performance of system itself
The database keeps excellent track of the correlation between “Physics Data” and “Conditions Data”.– Support for Data export and remote sites
CHEP2000 - Feb 8, 2000
Vicky White 10
SAM Simplified Schema- Event/File meta-data part
FilesID
NameFormat
Size# Events
EventsID
Event NumberTrigger L1Trigger L2Trigger L3
Off-line FilterThumbnail
Volume
Project
Data Tier
PhysicalData Stream
TriggerConfiguration
Creation &Processing
Info
Run
Event-FileCatalog
Run Conditions
Luminosity
Calibration
Alignment
CHEP2000 - Feb 8, 2000
Vicky White 11
DAQ-L3
Reco FarmFreight train
processing
Thumbnailprocessing
Bad
Good
Thumbnail
Analysisprocessing
Pick Eventsselection
Archived
Other streams
EDU50,250
Other
Typical file split
Potential filemerge location
Potential filemerge location
Reprocessing
Processing Chains including Reprocessing/Merging/Splitting
CHEP2000 - Feb 8, 2000
Vicky White 12
The database is implemented in Oracle - why …
• robust and mature product - essential to stability -- need high availability system
• size of database will be large (including Event Catalog) up to 0.5 TB or more - needs partition/backup/index support for large databases
• many design tools and monitoring,backup and tuning tools available– Using Oracle Enterprise Manager to manage and monitor
databases
CHEP2000 - Feb 8, 2000
Vicky White 13
SAM is a fully distributed client/server system
• both internal and client interfaces defined in IDL and implemented using CORBA. This gives us– simple clients
– support for multiple languages -- C++, Python, Java
– support for multiple platforms right from start - stay on top of technology changes (Linux, IRIX, OSF1, Sun current platforms)
CHEP2000 - Feb 8, 2000
Vicky White 14
SAM command -> Servers
sam command Station Master
Database Server
manages disk cache and all projects on a single ‘Station’.Interfaces with Batch system
arranges the deliveryof the set of files for a single project - or stores a file,records locationweb page/GUI
supplies information,resolves queries,records transactionsand file information
Project Masteror
File Storage Server
CHEP2000 - Feb 8, 2000
Vicky White 15
Behind the scenes are more servers ...
Log
Info
Station
Project orFile Storage
Database
Optimizer
Stager(s)Program which copiesor ‘gets’ a file for youwhen it is not in the
local disk cache
CORBA Name Server
CHEP2000 - Feb 8, 2000
Vicky White 16
Client/Server and CORBA
• Only one ‘singleton’ - Optimizer -- must look globally to control access to the Robot
• Easy to run many parallel universes– production, development, integration, Mary’s test, etc.
• Many CORBA products -- chose 2 freeware ones– Orbacus (for C++ and Java) http://www.ooc.com
– Fnorb (for Python) http://www.fnorb.org
• Servers are currently all in C++ or Python• clients exist for all 3 languages
CHEP2000 - Feb 8, 2000
Vicky White 17
SAM clients• programs or people that use the services of SAM
to store or retrieve data, categorize it, browse it, configure SAM resources and policies,etc.
CHEP2000 - Feb 8, 2000
Vicky White 18
Using SAM
• SAM (from user perspective) is just a few useful commands– all are available on the command line
– a few from a web-GUI (define project etc.)
– some (more later) are available in from within d0reco or any other d0 framework program
• The SAM database can be queried and browsed extensively
CHEP2000 - Feb 8, 2000
Vicky White 19
SAM user commands - e.g.
sam create project definition < defin. params>
sam create project snapshot <project params>
sam create analysis project <project params>
sam verify snapshot <snap params >
CHEP2000 - Feb 8, 2000
Vicky White 20
SAM user commands e.g.
sam start project <…>
sam start consumer <…>
sam start process <…>
sam get next file <…>
sam release < file params…>
sam store <file and file metadata params…>
sam locate <file>
sam dump <project>
… and many more …..
a
CHEP2000 - Feb 8, 2000
Vicky White 21
SAMManager and Framework and d0om (persistency)
SAM interaction through
a) name expanders - used by d0StreamName
b) File Open/Close messages generated by ReadEvent and WriteEvent
sam: in file name will be resolved by a SAM name expander --> SAM Servers to get next file, or get place/name for output file
CHEP2000 - Feb 8, 2000
Vicky White 22
Constraint to SQL Helper
CHEP2000 - Feb 8, 2000
Vicky White 23
CHEP2000 - Feb 8, 2000
Vicky White 24
CHEP2000 - Feb 8, 2000
Vicky White 25
Usage and Status• System is documented and has been used to store
over 1.5TB of MonteCarlo Data (see E311)• Most of the files have been fetched and processed
through a 50 node reconstruction Farm, using SAM (E60 - Monday)
• Data is added to the system daily-- produced at Fermilab and by collaborators in France, Amsterdam, Prague and ftp’d to Fermilab along with the required meta-data description file. A feature to store parameter files along with the data files has recently been implemented
CHEP2000 - Feb 8, 2000
Vicky White 26
Performance and Robustness• we have started to build a serious test harness to emulate
the entire load - from online --> random end users
• promising performance numbers so far -- easily got 20MB/sec into Origin with only 1 Gbit Ethernet and limited number of current generation tape drives (3MB/sec)
• the test harness runs with tens of projects, each delivering cached and tape resident files, each with tens of consumer analysis processes gaining access to all of them
• Farms also got up to 20MB/sec. 50 nodes all requesting files - this is the required rate during a run (but of course we will test to saturation point of Farm nodes)
CHEP2000 - Feb 8, 2000
Vicky White 27
CHEP2000 - Feb 8, 2000
Vicky White 28
What next?• strong focus on testing and robustness
– aiming for a very high availabilty system -- all servers restart, clients recover, etc.
• need to integrate with the Batch system and address the resulting resource management issues
• will add pick-of single event data feature and test use of Event Catalog more extensively
• more support for outside Fermilab to use system and/or set up their own
CHEP2000 - Feb 8, 2000
Vicky White 29
Analysis outside Fermilab, using SAM
• In addition to your program, which must talk to a SAM Project Server and Database Server somewhere, and may need to have files staged, you will need
Calibration Data
Alignment Data
Geometry Data
RCP Data
get throughd0om
RCP manager
dspack files
interface to a Database Server
Other I/o possib.
extracted RCP files
interface to a Database Server
CHEP2000 - Feb 8, 2000
Vicky White 30
Conclusions• We have a working system to first order
• Users are starting to use it
• Making it robust and highly available is high on our priority list
– many aspects from Robot hardware, operating systems, network components, database server machine , Enstore movers and servers and software and last of all SAM servers and client code which sits atop all of that
• Involving off-site users and planning to provide access to data for all is starting now and will be there before we run