24
DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers @ BP.com Joseph A. Dellinger* Joseph.Dellinger @ BP.com D W

DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

Embed Size (px)

Citation preview

Page 1: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

DDS, A Seismic Processing Architecture

Reproducible research workshop UBC, Vancouver, 2006

Randall L. Selzler RSelzler @ Data-Warp.com

Jerry Ehlers Jerry.Ehlers @ BP.com

Joseph A. Dellinger* Joseph.Dellinger @ BP.com

DW

Page 2: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

2

DDS ORIGINS: Amoco TRC, early 90’s

DDS began at the Amoco Tulsa Research Center at a time of great organizational strain.

The job of the TRC was to do research and crunch data, not to write software.

Creating software is expensive!

Amoco’s solution was an edict that

“everyone will use DISCO, or else”.

Page 3: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

3

Else!But DISCO just wasn’t good enough! And so chaos ensued... We were “mired in seismic processing diversity”.

DDS grew up surrounded by:

• USP (Amoco internal trace-header based)

• SEPlib (ASCII header pointing to data cubes)

• SU (SEGY trace-header based)

• DISCO (proprietary monitor-based system)

.... and needed to be compatible with all of these!

Page 4: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

Although formally cast as a research group, in fact the TRC also functioned as an “internal contractor” processing shop.

1) So to catch on, not only would any software have to be usable for quick-turnaround research, but

2) the ability to process large datasets efficiently and in parallel was also of vital importance.

[Terabytes of data, Connection Machines, MPI, OpenMP]

3) The group had accumulated a considerable number and variety of computers. [All “Unix”, but

CM5, Cray, Sun, SGI, Linux, Linux clusters, 32 and 64 bit...]

4) Finally, there was an urgent need for software that could accomodate all the various mutant SEGY formats coming into the shop, as well as DISCO, SEPlib, SU, and USP!

Page 5: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

5

and out of the chaos came...

John Etgen was using SEPlib for migration algorithm research on the CM200, a machine that required massively parallel data I/O.

He showed SEPlib to Randy Selzler:

“I want something that looks like THIS, but can handle the large industrial-strength jobs I need to do!”

And thus DDS was born...

Page 6: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

6

How SEPlib did it

“header” file

... processing history ...

esize=4 (bytes)data_format=xdr_float

in=data_locationn1=trace_lengthn2=number_traces_per_recordn3=number_records

d1=sample_intervalo1=starting sample etc...

regularly sampled cube ofIEEE 4-byte floats ofdimension n1 x n2 x n3

data file

SEPlib was the system favored by the folks writing programs thatworked on large data volumes instead of individual traces.

Page 7: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

7

DDS can look a lot like SEPlib

SEPlib header file

... processing history ...

esize=4 (bytes)data_format=xdr_float

in=data_location

n1=trace_lengthn2=number_traces_per_recordn3=number_records

d1=sample_intervalo1=starting samplelabel1=seconds etc...

DDS “dictionary” file

... processing history ...

type=float4format=fcube

data= data location

axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records

delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...

Page 8: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

8

DDS can look a lot like SEPlib

“dictionary” file

type=float4format=fcube

data= data location

axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records

delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...

regularly sampled cube ofIEEE 4-byte floats of dimension size.t x size.offset x size.cdp

data file

(command-line argumentslook a LOT like SEPlib too)

Page 9: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

9

Binary Data

Dictionary

DDS’s Generalizations

…axis= t y cmp…size.t= 1000size.y= 96size.cmp= 24…delta.t= 0.008units.t= s…origin.y= 5000units.y= m…format= segydata= oak39_@

Card HeaderLine Header

Traces…

• N-Dimensional Array of I/O Records• Densely populated for random access• Sequential access if sparse

• Meaningful Axis Names• t, x, y, z, w, kx, ky, kz, cmp, shot, offset, …

• Extensible Axis Attributes• Regular grid (size, origin, delta, units, …)• Variable grid (grid.z= 1 3 5 7 11, …)• Non-numeric (label.attr= Vp Vs rho)

Great for research! Exotic algorithms and unforeseen domains can be accurately represented and processed as easily as traditional ones.

Page 10: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

10

How USP did it

USP-format data file

historical line header(processing historyand 3 data dimensions)

element counttrace headertrace samples

element counttrace headertrace samples

element counttrace headertrace samples

...

traces

Unix Seismic Processing

USP was Amoco’sinternally home-growntrace-based processingsystem, beloved of Amoco’ssignal processors.

USP is similar to SU inconcept.

USP uses longer traceheaders than SU, butthey still turned out to notbe long enough!

USP is still used as much asever today.

Page 11: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

11

SU and USP use fixed-format trace headers defined by include files

/* * hdr.h – SU include file for segy offset array */static struct {

char *key; char *type; int offs;} hdr[] = {

{ "tracl", "i", 0},{ "tracr", "i", 4},{ "fldr", "i", 8},{ "tracf", "i", 12},{ "ep", "i", 16},{ "cdp", "i", 20},{ "cdpt", "i", 24},{ "trid", "h", 28},{ "nvs", "h", 30},{ "nhs", "h", 32},{ "duse", "h", 34},{ "offset", "i", 36},{ "gelev", "i", 40},{ "selev", "i", 44},{ "sdepth", "i", 48},{ "gdel", "i", 52},{ ...

Page 12: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

12

DDS also plays well with USP

USP-format data file

line header(three dimensions)

element counttrace headertrace samples

element counttrace headertrace samples

element counttrace headertrace samples

...

DDS dictionary file

type=float4format=usp

data= data location

axis= t offset cdp compsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number recordssize.comp= number components

delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...

traces

DDS knows what USP headers look like!

Page 13: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

13

and SEGY...

SEGY-format data file

EBCDIC cardsbinary header

...

DDS dictionary file

type=float4ibmformat=segy

data= data location

axis= t offset cdp compsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number recordssize.comp= number components

delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...

traces

trace headerIBM-format samples

trace headerIBM-format samples

trace headerIBM-format samples

Note DDS only bothers to convert back toSEGY’s archaic IBM floats when writing to disk!

Page 14: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

editd in=minute2.usp \ 3s=16 3e=16 2s=2 2e=32 2i=2 \ out_format= su \ out_data= stdout: | \supswigp clip=.2 > wiggle.ps

DDS can speak SU

note input format auto-detected

Page 15: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

15

DDS dictionaries can point at dictionaries!

type=float4ibmformat=segy slice.comp

data= dict.comp1 dict.comp2 dict.comp3

axis= t offset cdp compsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number recordssize.comp= number components

...

type=float4ibmformat=segy

data= data.c1.segy

axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records...

SEGYbinarydata filedata.c1.segy

type=float4ibmformat=segy

data= dict.c2.segy

axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records...

SEGYbinarydata filedata.c2.segy

dict.comp2

dict.comp1

Page 16: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

16

DDS plays well with mutant SEGY

bridge in= Atlantis_EQ.segy \ in_format=segy \ out_format=usp \

comment="Component Type" \ map:segy:usp.RcComp= "TotalStatic" \

\comment="Src and rec locations" \

map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \

\ map:segy:usp.RcPtXC= "GrpX / 10" \ map:segy:usp.RcPtYC= "GrpY / 10" \ map:segy:usp.GrpElv= "Spare.I4[10] / 10" \ map:segy:usp.CabDep= "Spare.I4[10]" \ map:segy:usp.DstSgn= "DstSgn / 10" \

\comment="Rec point and line numbers" \

map:segy:usp.DpPtLn= "Spare.I4[8]" \ map:segy:usp.DpPtLt= "Spare.I4[9]" \

\comment="Dead or Live" \

map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \ |\editd in= stdin: 3e=106 out_data= raw.usp

straight map

fixed number

arithmeticcalculation

Page 17: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

17

Data formats and mappings

• This is how DDS differs from SEPlib... The properties of the binary data, and all the elements

within the binary data, are looked up in the “dictionary”.• Even the array of trace samples is just another trace field

as far as DDS is concerned.• DDS knows a few default formats, but can use any

format that you can define.• It can also map to and from any format that you can

define the necessary mappings for.

• This has the important side effect of documenting the data format, making future reproducibility possible

Page 18: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

18

DDS supports generic formats

In fact, besides having a few built-in default formats such as USP, SU, and SEGY that are convenient for geophysicists,

there is nothing in the core of DDS that limits it to being a seismic processing system!

Page 19: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

19

Internal data formats

• Programs can define their own internal data formats as well, simply by writing definitions into their own internal dictionary:

fdds_printf (‘MOD_FIELD’, ‘ *+ float MyHeader1, MyHeader2;\n\0’)

• DDS will then convert from the format of the data, as documented by its dictionary, to the internal format specified by the program.

• On output, the internal format will be converted back into whatever output format has been requested on the command line, or by default, the output format will be the same as the input format.

Page 20: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

20

Leverage Diversity? Interoperate!

Data handling is fundamental…

DDSApplication

Generic Write

Generic Read

Disk FilePipe/Socket

Tape

Non-DDS Application

Non-DDS Application

Disk FilePipe/Socket

Tape

Any DDSSupported

Format

Non-DDSApplication

API Emulation

Generic I/O

DDSApplication

Generic I/O

API Emulation

ForeignFormat

Foreign Library

DISCO Support1997-2003

USP Re-link1998 Proofof Concept

Format and API EmulationWith Random Access I/O

Page 21: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

21

Are you scared yet?

• You can probably imagine that all this translating between formats can get very complicated...

...

fmt:SAMPLE_TYPE= typedef float4 SAMPLE_TYPE; fmt:USP_ADJUST= typedef enum4 {USP_LINE_PAD \= 0, USP_TRACE_PAD \= 0, USP_HLH_SIZE \= 2236} USP_ADJUST; fmt:SEQUENCE= typedef USP_TRACE SEQUENCE; alias:fmt:USP_TRACE_PAD= fmt:USP_ADJUSTalias:fmt:USP_HLH_SIZE= fmt:USP_ADJUSTalias:fmt:USP_LINE_PAD= fmt:USP_ADJUSTusp_NumRec= 2056

...

But still better than having to change your code or relink your codefor every different mutant data format! It also makes it possible tointeroperate with historical data formats without too much pain.

Page 22: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

22

DDS scripting as a Rosetta stone

/apps/global/bin/bridge \in= /hpc/dat13/zdsr01/Node/EQ/all.segy \

in_format=segy out_format=usp \comment="Component Type" \

map:segy:usp.RcComp= "TotalStatic" \comment="Src and rec locations" \

map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \

comment="Azimuth, Roll Tilt" \ map:segy:usp.TVPT01= "100 * Spare.F4[11]" \ map:segy:usp.TVPT02= "100 * Spare.F4[12]" \ map:segy:usp.TVPT03= "100 * Spare.F4[13]" \

comment="Dead or Live" \ map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \

comment="Shot Time" \map:segy:usp.TVPT15=Date.DateYear \map:segy:usp.TVPT16=Date.DateDay \map:segy:usp.TVPT17=Date.DateHour \map:segy:usp.TVPT18=Date.DateMin \map:segy:usp.TVPT19=Date.DateSec \

....

Page 23: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

23

In Conclusion: caveats

• Things aren’t so complicated if you use DDS as if it were SEPlib, but then what’s the point?

• Because so much functionality already exists in USP, there has been little motivation to flesh out DDS.

• The external distribution is a subset of the same stuff we use internally. There has been little effort put into improving the “packaging”.

• While there is some documentation, it is somewhat lacking!

Page 24: DDS, A Seismic Processing Architecture Reproducible research workshop UBC, Vancouver, 2006 Randall L. Selzler RSelzler @ Data-Warp.com Jerry Ehlers Jerry.Ehlers

24

In Conclusion: upsides

• The software infrastructure inside BP today is based almost entirely on DDS and USP. It is BP’s infrastructure both for research and for processing. BP’s advanced imaging team in Houston is “BP’s largest contractor”.

• The DDS I/O library was released publicly in 2003 on “freeusp.org”. The core of the USP system was released a year or so earlier on the same web site, along with some ARCO-heritage processing systems as well.

• By releasing USP and DDS, BP hoped to make it easier to share algorithms with academia and contractors.

• Randy Selzler now wants to create a successor to DDS, but that’s his talk, as the “prophet”, to give...