21
High Energy Physics Data Management using CLOUD Computing ANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLING PAPER BY: ABHISHEK DEY, CSE 2 nd Year | DIYA GHOSH, CSE 2 nd year | Mr. SOMENATH ROY CHOWDHURY 0 6 / 2 6 / 2 0 2 2 1

Handling High Energy Physics Data using Cloud Computing

Embed Size (px)

Citation preview

Page 1: Handling High Energy Physics Data using Cloud Computing

04

/13

/20

23

1

High Energy Physics Data Management using CLOUD ComputingANALYSIS OF THE FAMOUS BABAR EXPERIMENT DATA HANDLING

PAPER BY: ABHISHEK DEY, CSE 2nd Year | DIYA GHOSH, CSE 2nd year | Mr. SOMENATH ROY CHOWDHURY

Page 2: Handling High Energy Physics Data using Cloud Computing

04/13/2023

2

Contents

Motivation

HEP Legacy Project

CANFAR Astronomical Research Facility

System Architecture

Operational Experience

Summary

Page 3: Handling High Energy Physics Data using Cloud Computing

04/13/2023

3

What exactly is BaBar?

It’s design was motivated by the investigation of CP violation.

set up to understand the disparity between the matter and antimatter content of the universe by measuring CP violation.

 BaBar focuses on the study of CP violation in the B meson system.

 nomenclature for the B meson (symbol B) and its antiparticle (symbol B, pronounced B bar)

Page 4: Handling High Energy Physics Data using Cloud Computing

04/13/2023

4

BaBar : Data Point of View

9.5 million lines of C++ and Fortran

Compiled size is 30 GB

Significant amount of manpower is required to maintain the software

Each installation must be validated before generated results will be accepted.

CANFAR is a partnership between :

– University of Victoria – University of British Columbia – National Research Council, Canadian Astronomy Data Centre – Herzberg Institute for Astrophysics

Helps in providing Infrastructure for VMs.

Page 5: Handling High Energy Physics Data using Cloud Computing

04/13/2023

5

Need for Cloud Computing:

Jobs are embarrassingly parallel, much like HEP.

Each of these surveys requires a different processing environment, which require:

A specific version of a Linux distribution.

A specific compiler version. Specific libraries

Applications have little documentation.

These environments are evolving rapidly

Page 6: Handling High Energy Physics Data using Cloud Computing

04/13/2023

6

DATA is precious, too precious..

We need Infrastructure, which comes easily as a Service

Page 7: Handling High Energy Physics Data using Cloud Computing

04/13/2023

7

A word about Cloud Computing:

Page 8: Handling High Energy Physics Data using Cloud Computing

04/13/2023

8

IaaS: What next?

With IaaS, we can easily create many instances of a VM image

How do we Manage the VMs once booted?

How do we get jobs to the VMs?

Page 9: Handling High Energy Physics Data using Cloud Computing

04/13/2023

9Our Solution: Cloud Scheduler + Condor

Users create a VM with their experiment software installed.

A basic VM is created by one group, and users add on their analysis or processing software to create their custom VM.

Users then create batch jobs as they would on a regular cluster, but they specify which VM should run their images.

CONDOR

Page 10: Handling High Energy Physics Data using Cloud Computing

04/13/2023

10Steps for the successful architecture setup:

Page 11: Handling High Energy Physics Data using Cloud Computing

04/13/2023

11

Page 12: Handling High Energy Physics Data using Cloud Computing

04/13/2023

12

Page 13: Handling High Energy Physics Data using Cloud Computing

04/13/2023

13

Page 14: Handling High Energy Physics Data using Cloud Computing

04/13/2023

14CANFAR : MAssive Compact Halo Objects

Detailed re-analysis of data from the MACHO experiment Dark Matter search.

Jobs perform a wget to retrieve the input data (40 M) and have a 4-6 hour run time. Low I/O great for clouds.

Astronomers happy with the environment.

Page 15: Handling High Energy Physics Data using Cloud Computing

04/13/2023

15

Data Handling in BaBar:

Analysis Jobs

Event data

Real Data

Simulated Data

ConfigurationBaBar

Conditions Database

Data is approximately 2PB.

The file system is hosted on a cluster of six nodes, consisting of a Management/Metadata server (MGS/MDS).

five Object Storage servers (OSS).

single gigabit interface/VLAN to communicate both internally and externally.

Page 16: Handling High Energy Physics Data using Cloud Computing

04/13/2023

16

Xrootd : Need for Distributed Data

Xrootd is a file server providing byte level access and is used by many high energy physics experiments.

provides access to the distributed data.

a read-ahead value of 1 MB

a read-ahead cache size of 10 MB was set on each Xrootd client

Page 17: Handling High Energy Physics Data using Cloud Computing

04/13/2023

17

How a DFS works?

Blocks replicated across several datanodes(usually 3)

Single namenode stores metadata (file names, block locations, etc.)

Optimized for large files, sequential reads Clients read from closest replica available.(note:

locality of reference.) If the replication for a block drops below target,

it is automatically re-replicated.

Datanodes

11223344

112244

221133

114433

332244

Namenode

Page 18: Handling High Energy Physics Data using Cloud Computing

04/13/2023

18

Results and Analysis:

Page 19: Handling High Energy Physics Data using Cloud Computing

04/13/2023

19

Fault tolerant model:

Page 20: Handling High Energy Physics Data using Cloud Computing

04/13/2023

20

Acknowledgements

A special word of appreciation and thanks to Mr. Somenath Roy Chowdhury.

My heartiest thanks to the entire team who worked hard to build the cloud.

Page 21: Handling High Energy Physics Data using Cloud Computing

04

/13

/20

23

21

Questions Please?