22
Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Embed Size (px)

Citation preview

Page 1: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Standardized Workflows (II)

Carlos Oscar SorzanoTechn. Director I2PC

Natl. Center Biotechnology (CSIC)

Page 2: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Interchange Points needan interchange standard

Specific proposal in the discussion

Page 3: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Some interchange points• CTF estimation of micrographs:

• List of Micrograph, Voltage, DefocusU, DefocusV, AngleUX, Cs• Particle picking:

• List of Micrograph, micrographXcoor, micrographYcoor• Particle extraction:

• List of images• 2D Alignment/Classification:• 3D Alignment/Classification:

• List of image/volume [2D or 3D alignment]• [A list of class representatives]• [Class representative assignment]

Page 4: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Interchange Proposal• Interchange information:– Data: images, volumes and stacks– MetaData: list of …

• Data structure– Data: Array of real values with X varying faster– MetaData: Table with specific column names

• Data format– Data: MRC file– MetaData: STAR file with specific block names

Page 5: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Data: Images, volumes and stacks.mrc MRC 2D image.mrc MRC 3D image (can be distinguished from 2D by header).mrcs MRC Stack of 2D images

Heymann, J. B.; Chagoyen, M. & Belnap, D. M.J. Structural Biology, 2005, 151, 196-207

Header Val0 Val1 ValN

ValN

Val0 Val1

x

y

(0,0)x

y

(0,0)

Page 6: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: Interchange format

Page 7: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: Components

• UID: Unique Identifier• Micrograph (one or many motifs)• Image (motif of interest)• Volume• CTF• Coordinates• 2D Alignment• 3D Alignment• Comment

Page 8: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: UID, Micrograph, CommentAny entry in the metadata file has a unique entry number

listOfMicrographs.3dem:# 3DEM_STAR_1 General file comment # data_block_1loop_ _UID _micrographLocator _comment1 InputData/micrograph1.mrc “Comment 1” 2 InputData/micrograph2.mrc “Comment 2”data_block_2loop_ _UID _micrographLocator3 InputData/micrographA.mrc 4 InputData/micrographB.mrc

Page 9: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: CTFAny entry in the metadata file has a unique entry number

listOCTFs.3dem:# 3DEM_STAR_1 *# data_block_1loop_ _UID _Voltage _DefocusU _DefocusV _AngleUX _Cs1 200 1.50 1.54 30.0 2.0

(in kV)(in μm)(in μm)

(in degrees)(in mm.) X

Y

UV

Page 10: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: Images and VolumesImages can be in individual files (mrc) or stacks (mrcs).Volumes are in individual files (mrc).

listOfImages.3dem:# 3DEM_STAR_1 * # data_blockloop_ _UID _imageLocator1 image00001.mrc 2 image00002.mrc3 [email protected] [email protected]

listOfVolumes.3dem:# 3DEM_STAR_1 * # data_blockloop_ _UID _volumeLocator1 volume00001.mrc 2 volume00002.mrc

Page 11: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: Coordinates

Heymann, J. B.; Chagoyen, M. & Belnap, D. M.J. Structural Biology, 2005, 151, 196-207

x

y

(0,0)

listOfCoordinates.3dem:# 3DEM_STAR_1 * # data_blockloop_ _UID _micrographXcoor _micrographYcoor1 500 10002 750 2500

Page 12: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: 2D Alignment

x

y Δx=+5

Δy=-15

Δψ=60°

[email protected] [email protected] [email protected]

listOfAlignedImages.3dem:# 3DEM_STAR_1 * # data_blockloop_ _UID _imageLocator _imageXOff _imageYOff _imagePsiOff1 [email protected] 0 0 02 [email protected] 5 -15 03 [email protected] 5 -15 60

Page 13: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: 2D Alignment

x

y

[email protected]

listOfAlignedImages.3dem:# 3DEM_STAR_1 * # data_alignedImageList_Aloop_ _UID _imageLocator _imageXOff _imageYOff _imagePsiOff _imageFlip1 [email protected] 0 0 0 02 [email protected] 5 -15 60 1

Δx=+5

Δy=-15

Δψ=60°

flip

[email protected]

Page 14: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: 2D Alignment

x

y

[email protected]

Δx=+5

Δy=-15

Δψ=60°

flip

[email protected]

14 1( ) ( )I I Rr r

1

x

y

r

cos sin

sin cos

0 0 1

x

R m m m y

1,1m

Page 15: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: 3D AlignmentList of 3D aligned images.3dem:# 3DEM_STAR_1 * # data_blockloop_ _UID _imageLocator _homogeneousEulerMatrixI1 mage00001.mrc [a11 a12 a13 Xoff a21 a22 a23 Yoff a31 a32 a33 0]

Heymann, J. B.; Chagoyen, M. & Belnap, D. M.J. Structural Biology, 2005, 151, 196-207 List of 3D aligned volumes.3dem:

# 3DEM_STAR_1 * # data_blockloop_ _UID _volumeLocator _homogeneousEulerMatrix1 volume00001.mrc [a11 a12 a13 Xoff a21 a22 a23 Yoff a31 a32 a33 Zoff]

Page 16: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: RelationshipslistOfImagesAndDownsampledImages.3dem:# 3DEM_STAR_1 * # data_fullSize_imagesloop_ _UID _imageLocator1 [email protected] [email protected]_downsampledSize_imagesloop_ _UID _imageLocator3 [email protected] [email protected]_correspondingDownsampleImageloop_ _UID _UID1 _UID25 1 36 2 4

1 2RUID UID

Page 17: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

MetaData: Interchange format

• There can be any number of blocks within a Star file.• Each block can combine any number of components• Examples:

• Micrograph and locations:micrographLocator, micrographXcoor, micrographYcoor

• Micrograph and CTF:micrographLocator, Voltage, DefocusU, DefocusV, AngleUX, Cs

• Micrograph, image, location and CTF:micrographLocator, imageLocator,

micrographXcoor, micrographYcoor , Voltage, DefocusU, DefocusV, AngleUX, Cs

Page 18: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Some interchange points• CTF estimation of micrographs:

• Block name: micrographCTFs• micrographLocator, Voltage, DefocusU, DefocusV, AngleUX, Cs

• Particle picking:• Block name: particlePicking• micrographLocator, micrographXcoor, micrographYcoor

• Particle extraction:• Block name: imageList• imageLocator

• 2D Alignment/Classification:• 3D Alignment/Classification:

• A block per class: class_00001, class_00002, …• Class blocks: imageLocator [2D or 3D alignment]

• [A block of class representatives]• Blockname: class_representatives• imageLocator

• [Class representative assignment]• Blockname: class_representative_assignment• UID1, UID2

Page 19: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

How to check?

Automatic check of results

Syntactically and Semantically

Page 20: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Discussion

• Is this the correct strategy? (interchange points + interchange format)

• Which interchange points?• Which interchange format?• How to reach a standard?

Page 21: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

Interchange objectsCTF determination

Micrograph screening

Particle picking

Particle extraction and screening

2D alignment and classification

Initial volume construction

3D Model refinement

Micrograph phase correction

3D Model amplitude correction

Page 22: Standardized Workflows (II) Carlos Oscar Sorzano Techn. Director I 2 PC Natl. Center Biotechnology (CSIC)

• What do we want to interchange?• What is the minimum amount of information

needed?• How are we going to store it?