View
224
Download
0
Category
Preview:
Citation preview
DDS, A Seismic Processing Architecture
Reproducible research workshop UBC, Vancouver, 2006
Randall L. Selzler RSelzler @ Data-Warp.com
Jerry Ehlers Jerry.Ehlers @ BP.com
Joseph A. Dellinger* Joseph.Dellinger @ BP.com
DW
2
DDS ORIGINS: Amoco TRC, early 90’s
DDS began at the Amoco Tulsa Research Center at a time of great organizational strain.
The job of the TRC was to do research and crunch data, not to write software.
Creating software is expensive!
Amoco’s solution was an edict that
“everyone will use DISCO, or else”.
3
Else!But DISCO just wasn’t good enough! And so chaos ensued... We were “mired in seismic processing diversity”.
DDS grew up surrounded by:
• USP (Amoco internal trace-header based)
• SEPlib (ASCII header pointing to data cubes)
• SU (SEGY trace-header based)
• DISCO (proprietary monitor-based system)
.... and needed to be compatible with all of these!
Although formally cast as a research group, in fact the TRC also functioned as an “internal contractor” processing shop.
1) So to catch on, not only would any software have to be usable for quick-turnaround research, but
2) the ability to process large datasets efficiently and in parallel was also of vital importance.
[Terabytes of data, Connection Machines, MPI, OpenMP]
3) The group had accumulated a considerable number and variety of computers. [All “Unix”, but
CM5, Cray, Sun, SGI, Linux, Linux clusters, 32 and 64 bit...]
4) Finally, there was an urgent need for software that could accomodate all the various mutant SEGY formats coming into the shop, as well as DISCO, SEPlib, SU, and USP!
5
and out of the chaos came...
John Etgen was using SEPlib for migration algorithm research on the CM200, a machine that required massively parallel data I/O.
He showed SEPlib to Randy Selzler:
“I want something that looks like THIS, but can handle the large industrial-strength jobs I need to do!”
And thus DDS was born...
6
How SEPlib did it
“header” file
... processing history ...
esize=4 (bytes)data_format=xdr_float
in=data_locationn1=trace_lengthn2=number_traces_per_recordn3=number_records
d1=sample_intervalo1=starting sample etc...
regularly sampled cube ofIEEE 4-byte floats ofdimension n1 x n2 x n3
data file
SEPlib was the system favored by the folks writing programs thatworked on large data volumes instead of individual traces.
7
DDS can look a lot like SEPlib
SEPlib header file
... processing history ...
esize=4 (bytes)data_format=xdr_float
in=data_location
n1=trace_lengthn2=number_traces_per_recordn3=number_records
d1=sample_intervalo1=starting samplelabel1=seconds etc...
DDS “dictionary” file
... processing history ...
type=float4format=fcube
data= data location
axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records
delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...
8
DDS can look a lot like SEPlib
“dictionary” file
type=float4format=fcube
data= data location
axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records
delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...
regularly sampled cube ofIEEE 4-byte floats of dimension size.t x size.offset x size.cdp
data file
(command-line argumentslook a LOT like SEPlib too)
9
Binary Data
Dictionary
DDS’s Generalizations
…axis= t y cmp…size.t= 1000size.y= 96size.cmp= 24…delta.t= 0.008units.t= s…origin.y= 5000units.y= m…format= segydata= oak39_@
Card HeaderLine Header
Traces…
• N-Dimensional Array of I/O Records• Densely populated for random access• Sequential access if sparse
• Meaningful Axis Names• t, x, y, z, w, kx, ky, kz, cmp, shot, offset, …
• Extensible Axis Attributes• Regular grid (size, origin, delta, units, …)• Variable grid (grid.z= 1 3 5 7 11, …)• Non-numeric (label.attr= Vp Vs rho)
Great for research! Exotic algorithms and unforeseen domains can be accurately represented and processed as easily as traditional ones.
10
How USP did it
USP-format data file
historical line header(processing historyand 3 data dimensions)
element counttrace headertrace samples
element counttrace headertrace samples
element counttrace headertrace samples
...
traces
Unix Seismic Processing
USP was Amoco’sinternally home-growntrace-based processingsystem, beloved of Amoco’ssignal processors.
USP is similar to SU inconcept.
USP uses longer traceheaders than SU, butthey still turned out to notbe long enough!
USP is still used as much asever today.
11
SU and USP use fixed-format trace headers defined by include files
/* * hdr.h – SU include file for segy offset array */static struct {
char *key; char *type; int offs;} hdr[] = {
{ "tracl", "i", 0},{ "tracr", "i", 4},{ "fldr", "i", 8},{ "tracf", "i", 12},{ "ep", "i", 16},{ "cdp", "i", 20},{ "cdpt", "i", 24},{ "trid", "h", 28},{ "nvs", "h", 30},{ "nhs", "h", 32},{ "duse", "h", 34},{ "offset", "i", 36},{ "gelev", "i", 40},{ "selev", "i", 44},{ "sdepth", "i", 48},{ "gdel", "i", 52},{ ...
12
DDS also plays well with USP
USP-format data file
line header(three dimensions)
element counttrace headertrace samples
element counttrace headertrace samples
element counttrace headertrace samples
...
DDS dictionary file
type=float4format=usp
data= data location
axis= t offset cdp compsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number recordssize.comp= number components
delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...
traces
DDS knows what USP headers look like!
13
and SEGY...
SEGY-format data file
EBCDIC cardsbinary header
...
DDS dictionary file
type=float4ibmformat=segy
data= data location
axis= t offset cdp compsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number recordssize.comp= number components
delta.t= sample_intervalorigin.t= starting sampleunits.t= seconds etc...
traces
trace headerIBM-format samples
trace headerIBM-format samples
trace headerIBM-format samples
Note DDS only bothers to convert back toSEGY’s archaic IBM floats when writing to disk!
editd in=minute2.usp \ 3s=16 3e=16 2s=2 2e=32 2i=2 \ out_format= su \ out_data= stdout: | \supswigp clip=.2 > wiggle.ps
DDS can speak SU
note input format auto-detected
15
DDS dictionaries can point at dictionaries!
type=float4ibmformat=segy slice.comp
data= dict.comp1 dict.comp2 dict.comp3
axis= t offset cdp compsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number recordssize.comp= number components
...
type=float4ibmformat=segy
data= data.c1.segy
axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records...
SEGYbinarydata filedata.c1.segy
type=float4ibmformat=segy
data= dict.c2.segy
axis= t offset cdpsize.t = trace lengthsize.offset=number traces per recordsize.cdp= number records...
SEGYbinarydata filedata.c2.segy
dict.comp2
dict.comp1
16
DDS plays well with mutant SEGY
bridge in= Atlantis_EQ.segy \ in_format=segy \ out_format=usp \
comment="Component Type" \ map:segy:usp.RcComp= "TotalStatic" \
\comment="Src and rec locations" \
map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \
\ map:segy:usp.RcPtXC= "GrpX / 10" \ map:segy:usp.RcPtYC= "GrpY / 10" \ map:segy:usp.GrpElv= "Spare.I4[10] / 10" \ map:segy:usp.CabDep= "Spare.I4[10]" \ map:segy:usp.DstSgn= "DstSgn / 10" \
\comment="Rec point and line numbers" \
map:segy:usp.DpPtLn= "Spare.I4[8]" \ map:segy:usp.DpPtLt= "Spare.I4[9]" \
\comment="Dead or Live" \
map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \ |\editd in= stdin: 3e=106 out_data= raw.usp
straight map
fixed number
arithmeticcalculation
17
Data formats and mappings
• This is how DDS differs from SEPlib... The properties of the binary data, and all the elements
within the binary data, are looked up in the “dictionary”.• Even the array of trace samples is just another trace field
as far as DDS is concerned.• DDS knows a few default formats, but can use any
format that you can define.• It can also map to and from any format that you can
define the necessary mappings for.
• This has the important side effect of documenting the data format, making future reproducibility possible
18
DDS supports generic formats
In fact, besides having a few built-in default formats such as USP, SU, and SEGY that are convenient for geophysicists,
there is nothing in the core of DDS that limits it to being a seismic processing system!
19
Internal data formats
• Programs can define their own internal data formats as well, simply by writing definitions into their own internal dictionary:
fdds_printf (‘MOD_FIELD’, ‘ *+ float MyHeader1, MyHeader2;\n\0’)
• DDS will then convert from the format of the data, as documented by its dictionary, to the internal format specified by the program.
• On output, the internal format will be converted back into whatever output format has been requested on the command line, or by default, the output format will be the same as the input format.
20
Leverage Diversity? Interoperate!
Data handling is fundamental…
DDSApplication
Generic Write
Generic Read
Disk FilePipe/Socket
Tape
Non-DDS Application
Non-DDS Application
Disk FilePipe/Socket
Tape
Any DDSSupported
Format
Non-DDSApplication
API Emulation
Generic I/O
DDSApplication
Generic I/O
API Emulation
ForeignFormat
Foreign Library
DISCO Support1997-2003
USP Re-link1998 Proofof Concept
Format and API EmulationWith Random Access I/O
21
Are you scared yet?
• You can probably imagine that all this translating between formats can get very complicated...
...
fmt:SAMPLE_TYPE= typedef float4 SAMPLE_TYPE; fmt:USP_ADJUST= typedef enum4 {USP_LINE_PAD \= 0, USP_TRACE_PAD \= 0, USP_HLH_SIZE \= 2236} USP_ADJUST; fmt:SEQUENCE= typedef USP_TRACE SEQUENCE; alias:fmt:USP_TRACE_PAD= fmt:USP_ADJUSTalias:fmt:USP_HLH_SIZE= fmt:USP_ADJUSTalias:fmt:USP_LINE_PAD= fmt:USP_ADJUSTusp_NumRec= 2056
...
But still better than having to change your code or relink your codefor every different mutant data format! It also makes it possible tointeroperate with historical data formats without too much pain.
22
DDS scripting as a Rosetta stone
/apps/global/bin/bridge \in= /hpc/dat13/zdsr01/Node/EQ/all.segy \
in_format=segy out_format=usp \comment="Component Type" \
map:segy:usp.RcComp= "TotalStatic" \comment="Src and rec locations" \
map:segy:usp.SrPtXC= "SrcX / 10" \ map:segy:usp.SrPtYC= "SrcY / 10" \ map:segy:usp.SrPtEl= "15" \ map:segy:usp.ShtDep= "SrcDepth / 10" \
comment="Azimuth, Roll Tilt" \ map:segy:usp.TVPT01= "100 * Spare.F4[11]" \ map:segy:usp.TVPT02= "100 * Spare.F4[12]" \ map:segy:usp.TVPT03= "100 * Spare.F4[13]" \
comment="Dead or Live" \ map:segy:usp.StaCor= '( TrcIdCode - 1 ) * 30000' \
comment="Shot Time" \map:segy:usp.TVPT15=Date.DateYear \map:segy:usp.TVPT16=Date.DateDay \map:segy:usp.TVPT17=Date.DateHour \map:segy:usp.TVPT18=Date.DateMin \map:segy:usp.TVPT19=Date.DateSec \
....
23
In Conclusion: caveats
• Things aren’t so complicated if you use DDS as if it were SEPlib, but then what’s the point?
• Because so much functionality already exists in USP, there has been little motivation to flesh out DDS.
• The external distribution is a subset of the same stuff we use internally. There has been little effort put into improving the “packaging”.
• While there is some documentation, it is somewhat lacking!
24
In Conclusion: upsides
• The software infrastructure inside BP today is based almost entirely on DDS and USP. It is BP’s infrastructure both for research and for processing. BP’s advanced imaging team in Houston is “BP’s largest contractor”.
• The DDS I/O library was released publicly in 2003 on “freeusp.org”. The core of the USP system was released a year or so earlier on the same web site, along with some ARCO-heritage processing systems as well.
• By releasing USP and DDS, BP hoped to make it easier to share algorithms with academia and contractors.
• Randy Selzler now wants to create a successor to DDS, but that’s his talk, as the “prophet”, to give...
Recommended