11
Im Young Jung Seoul National University An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid

An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid

  • Upload
    aspen

  • View
    40

  • Download
    0

Embed Size (px)

DESCRIPTION

An Efficient and Transparent Transaction Management based on the Data Workflow of HVEM DataGrid. Im Young Jung Seoul National University. Introduction. Transaction Management for a safe data update and insertion on e-Science DataGrid - PowerPoint PPT Presentation

Citation preview

Im Young Jung

Seoul National University

An Efficient and Transparent Transaction Management based on the Data Workflow

of HVEM DataGrid

IntroductionTransaction Management for a safe data update and

insertion on e-Science DataGridHeterogeneous storages according to the characteristics

and the size of dataBased on workflow, the storing precedence of data across

heterogeneous storages in a transaction

In this paperAn efficient and transparent transaction management on

HVEM DataGridDividing the transaction into sub-transactions according to the

transaction states and Classifying them Transaction hierarchy and parallelism provide

efficient and safe large data upload to HVEM DataGrid transparency in the transaction including simultaneous access

to heterogeneous storagesAutomatic garbage collection

2

HVEM Grid

3

High Voltage Electron Microscope(HVEM)Let scientists realize the 3D structure analysis of new

materials in micrometer-scaleHVEM Grid

Remote users can perform the same tasks as on-site scientists.Remote controlling of HVEMStoring, retrieval and search data through HVEM DataGridProcessing data through HVEM Computational Grid

4

Designed for Biologic experiments using HVEM A logical view of one storage for DB and file storage

The small metadata is stored at DB Information for materials, material handling methods, HVEM

experiments, Images, experimenters The large files are stored in file storages

2D or 3D image files, the documents related to HVEM experiments

Internal process to find files After finding their logical path in the file storage by

searching the DB, users can retrieve the files they want in the file storage

HVEM DataGrid

HVEM DataGrid

5

A unified data management The storing precedence

among dataWhen store all biological

information for the images, we should keep the images in HVEM Grid at the same time

The relational semantics between various data stored in distributed heterogeneous storages

To upload many large files to HVEM DataGrid efficiently and safelyUpload dependency &

SerializationEnsure the transactions for

safe parallel uploads

An efficient and transparent transaction management

6

Requirement for the transactions on HVEM DataGridConsider the semantic of HVEM DataGrid

A project is composed of several experimentsThe data for an experiment should be inserted according to its data

workflowThe file and its metadata should be stored to HVEM DataGrid

simultaneously. Otherwise, all of them should be deletedSupport

the long lifetime transaction according to the timelimit of experiment or project

the short lifetime transaction which stores the data to HVEM DataGrid physically

The optimization for the upload of large files to reduce the blocking time should ensure safe transactionsAn asynchronous and parallel upload scheme should protect upload

dependency and ensure safe transactions

An efficient and transparent transaction management

7

Transaction hierarchyThe transaction units as

checkpoints on incomplete data insertion Confine the rollback extent

When the data for an experiment or a project is not inserted to HVEM DataGrid until each timelimit, the experiment or the project should be vanished by the rollback of TnE or TnP

TnS((((1)2)5)2)(1) represents the identity of TnP

it belongs toThe next index ‘2’ indicates the

identity of TnE and so on

For Project For

Experiment

For a group of

TnSs

For storing data to physical storage

Support Autonomous garbage collection It is dependent on users to insert data or delete it on HVEM

DataGrid. When they do not insert experimental data any more due to any

reason without deleting the related data, HVEM DataGrid would have a big garbage.

Parallel Processing

8

Transaction management Scheme

HVEM DataGrid forks two processes to connect DB and file storage each. When the connections succeed, it gets the next requests and so on. The state change of TnS(((())j)i)

jSiS jSiD(the notification from DB), jSiF(the notification from the file storage) jSiE (both of them arrive) : TnS completes

In the light failure(LF) due to temporary failures on network or server, retry the transaction fixed times

When the retries fail, a serious failure(SF) is assumed rollback process

Evaluation

9

AnalysisTransparency

Through transaction hierarchy and fine grained state management the transaction manager in HVEM DataGrid enables the transparent

transaction to upload the image files to the file storage and store their metadata to DB simultaneously.

Serializability Many TnSs are upload serializable because their state changes are logged

through transaction index. To keep the upload dependency,

the transaction manager protects the first user entering TnW.o If he withdraws the TnW, then an other user can initiate the TnW

Transaction performance Support the transaction scheme asynchronism and parallelism Experiment Setting

Because the sub-transaction time on DB is negligible compared with that on file storage due to data size, we only considered the upload time for image file

Considering the semantic of the data workflow in HVEM DataGrid For an asynchronous file transfer, the request intervals for file transfer are

chosen randomly within 50 sec The physical locations of the file storages are assumed to be distributed

10

OverheadLog management cost

The cost for TnP, TnE and TnW; The general transaction management requires the log for TnS The log size for TnP, TnE and TnW is smaller than that for TnS because

they function as checkpoint rather than real transaction units.Rollback cost

The cascade rollback of TnS in TnW due to the upload dependency on parallel processing of TnS At LF, if the retry succeeds, the gain from transaction parallelism can be

very large especially for large file handling There are not many SFs or LFs because e-Science DataGrid is not popular

as the multimedia storage

Evaluation

ConclusionA transaction management on HVEM Grid

SafetyEnsure a safe transaction considering the data workflow in

HVEM DataGridEfficiency

Improve the performance to upload large files by asynchronism and parallelism

TransparencyData management across the heterogeneous storages

Automatic garbage collectionReduce garbage

11