If you can't read please download the document
Upload
markus-neteler
View
1.785
Download
2
Embed Size (px)
Citation preview
GFOSS04 interoperability
Free GIS and Interoperability
GIS Open Source, interoperabilit e cultura del dato
nei SIAT della Pubblica Amministrazione
[GIS Open Source, interoperability and the 'culture of data'
in the spatial data warehouses of the Public
Administration]GFOSS'04
ITC-irst, 16 Nov 2004
(last revised 10 2005)
M. Neteler
neteler at itc ithttp://mpa.itc.it
ITC-irst, Povo (Trento), Italy
The need for Interoperability
The problem
nowadays data have to be exchanged across often very heterogeneous groups
the personal choice of application software/operating system
should not affect
the data exchange
data exchange standards are available
limited awareness for the need of interoperability
limited implementation of interoperability in processes and
software
commonly used file formats let to believe in interoperability: false friends
What are Standardization & Interoperability?
Standardization versus Interoperability
Standardization: Written/published document describing data
formats, models etc.
Example Office Standards: ASCII, HTML, XML, ...
Example GIS Standards: GML, ISO 08211, ISO/IEC 15444-1, WMS
etc.
Only published standards are acceptable.
Interoperability: More than application of standardization, it also
comprises the
interpretation of the standard (sometimes definitions are
incomplete)
Desired: Lossless transfer of static or dynamic data
between
- different users, systems, applications, and
- different operating systems, platforms.
Interoperability?
The two dimensions of Interoperability
Longitudinal Interoperability: time - long term storage
Data shall be readable over time (years, decades, ...).This is of
particular interest for data of public administration
and long-term projects.
Transversal Interoperability: sharing data between users
Data shall be readable across user communities, independentfrom
software or operating system used (freedom of software
choice).Again, this is of particular interest for data of public
administration
and long-term projects.
Part I: Office Interoperability
Example: MS-Word .DOC format
Are WORD.doc files a suitable for data exchange?
the format is undocumented, to some extend it was
reverse-engineered
does not support transversal interoperability
the format is regularly changed (Word 1, 2, 95, 97, NT, 2000,
XP, ...
also named WinWORD 6, 8, 10,...)
does not support longitudinal interoperability
Prone to MS-Windows macro viruses
severe security/privacy issues (example next slide)
- DOC files contain sensitive information about user
(unrelated
to the contents)
- deleted text may still be legible outside of MS-Word
contents cannot be completely verified
Example: MS-Word .DOC format - security/privacy issues
Descrambling a WORD.doc file
Your unique MS-Windows user ID (or similar):
PID_GUIDAN{714738E3-FF4C-11D3-ZD7C-00E0281D67A7}
This makes your (anonymous) document traceable.
Sometimes delete text is still visible (think of re-using an
existing WORD file)
A famous example:
In February 2003, the British government of Tony Blair published a
dossier on
Iraq's security and intelligence organizations. This dossier was
cited by
Colin Powell in his address to the United Nations the same
month.
Dr. Glen Rangwala, a lecturer in politics at Cambridge University,
quickly
discovered that much of the material in the dossier was actually
plagiarized
from a U.S. researcher on Iraq.
http://www.computerbytesman.com/privacy/blair.htm
# in any UNIX/Linux system, simply run: tr -d [:cntrl:] < wordfile.docWhat you may find:
Descrambling a WORD.doc file: The British Iraq dossier 2003 1/2
http://nytimes.com
Example: MS-Word .DOC format - security/privacy issues
[neteler@dandre2 gfoss04]$ tr -d [:cntrl:] < blair.doc>z|y [...]-xxxx-o#o#{'?^,k6-* RuG (-$IRAQ ITS INFRASTRUCTURE OF CONCEALMENT,DECEPTION AND INTIMIDATIONThis report draws upon a number of sources, including intelligence material, and shows how the Iraqi regime is constructed to have, and to keep, WMD, and is now engaged in a campaign of obstruction of the United Nations Weapons Inspectors.[...][`azbhhhh?h-i/isjcic22JC:\DOCUME~1\phamill\LOCALS~1\Temp\AutoRecovery save of Iraq - security.asdcic22JC:\DOCUME~1\phamill\LOCALS~1\Temp\AutoRecovery save of Iraq - security.asdcic22JC:\DOCUME~1\phamill\LOCALS~1\Temp\AutoRecovery save of Iraq - security.asdJPrattC:\TEMP\Iraq - security.docJPrattA:\Iraq - security.docablackshaw!C:\ABlackshaw\Iraq - security.docablackshaw#C:\ABlackshaw\A;Iraq -security.docablackshawA:\Iraq - security.docMKhanC:\TEMP\Iraq - security.docMKhan(C:\WINNT\Profiles\mkhan\Desktop\Iraq.docPjzXV*uzLl_bzLl_[...]jP@GTimes New Roman5SymbolG&ArialHelveticaA&Arial Narrow?&ArialBlack"qh_r&r&aq#JV,?RVW,!??20di?fCIraq- ITS INFRASTRUCTURE OFCONCEALMENT, DECEPTION AND INTIMIDATIONdefaultMKhanOh+'0?4DPlx??DIraq- ITS INFRASTRUCTURE OF CONCEALMENT, DECEPTION ANDINTIMIDATIONraqdefaultefaefaNormal.dotNMKhan.d4haMicrosoft Word 8.0C@Ik@n)@"Zf@du#JV[...]
http://www.computerbytesman.com/privacy/blair.htm
- "cic22" stands for "Communications Information Centre," a unit
of the
British Government- Paul Hamill - Foreign Office official
- John Pratt - Downing Street official
- Alison Blackshaw - The personal assistant of the Prime Minister's
press secretary
- Murtaza Khan - Junior press officer for the Prime Minister
Weapons of mass destruction
Descrambling a WORD.doc file: The British Iraq dossier 2003 2/2
Example: MS-Word .DOC format - security/privacy issues
Example: MS-Excel .XLS format
Are EXCEL.xls files a suitable for data exchange?
the format is undocumented, to some extend it was
reverse-engineered
does not support transversal interoperability
the format is regularly changed (Excel 95, 97, NT, 2000,
...)
does not support longitudinal interoperability
Prone to MS-Windows viruses
Limitation: max. 65535 lines in a table (216)
Auto-conversion feature risky: Some fields/columns are
automatically changed to
date-time format (see example next slides)
risk of accidental data damage high
Example: MS-Excel .XLS format accidental data damage
The Human Genome Project case 1/3
In 2004 scientists discovered that some gene names were being
changed
inadvertently to non-gene names. Citation:
A little detective work traced the problem to default date format
conversions and
floating-point format conversions in the very useful Excel program
package.
The date conversions affect at least 30 gene names; the
floating-point conversions
affect at least 2,000 if Riken identifiers are included. These
conversions are
irreversible; the original gene names cannot be recovered.
A default date conversion feature in Excel (Microsoft Corp.,
Redmond, WA) was
altering gene names that it considered to look like dates. For
example, the tumor
suppressor DEC1 [Deleted in Esophageal Cancer 1] [3] was being
converted
to '1-DEC.'
Cited after:
B.R. Zeeberg, J. Riss, D.W. Kane, K.J. Bussey, E. Uchio, W.M.
Linehan,
J.C. Barrett and J.N. Weinstein, BMC Bioinformatics 2004,
5:80
http://dx.doi.org/10.1186/1471-2105-5-80
The Human Genome Project case 2/3
Example: MS-Excel .XLS format accidental data damage
http://dx.doi.org/10.1186/1471-2105-5-80
The Human Genome Project case 3/3
Example: MS-Excel .XLS format accidental data damage
http://dx.doi.org/10.1186/1471-2105-5-80
Suggestions for Office data interoperability
Text files:ASCII, HTML, RTF, XML, Latex
Postscript/PDF for read-only documents
Tables:CSV, xBase (dBase), XML
Databases:SQL92-ASCII
Bibliography:BibTex
Use documented ASCII formats instead of undocumented binary
formats
(disk space is not an issue today)
Files can be compressed later (deflate compression, which is
supported
by all common compression tools and Web browsers)
Suggestions for Office data interoperability
Automated conversion tools can be used to provide all
formatsText files:ASCII, HTML, RTF, XML
Postscript/PDF
Tables:CSV, xBase (dBase), XML
Databases:SQL92-ASCII
Bibliography:BibTex
Converters (examples): OpenOffice.org [1]
wvWare [2[
OpenOffice.org, xbase2pg [3]
ODBC, xbase2pg
Bibutils [4]
Bibtex2html [5], (Endnote)
[1] http://OpenOffice.org itself uses XML as own standard format[2] http://wvware.sourceforge.net/[3] http://www.klaban.torun.pl/prog/pg2xbase/[4] http://www.scripps.edu/~cdputnam/software/bibutils/bibutils.html[5] http://www.lri.fr/~filliatr/bibtex2html/
OASIS: Office data interoperability
Promotion of Open Document Exchange Format Proposed and
implemented new open standard format:
OASIS OpenDocument XML format
The OASIS OpenDocument format [1] is a vendor and implementation
independent
file format which guarantees freedom and independence
E.g., OpenOffice.org uses OASIS as default format from version
2.0 onwards as well
as KOffice, StarOffice software and other vendors
The OASIS OpenDocument file format is one of the file
formats
recommended by the European Commision [2]
[1]
http://www.oasis-open.org/committees/tc_home.php?wg_abbrev=office
[2] http://europa.eu.int/idabc/en/document/3439
Part II: GIS Interoperability
GIS Standards and Organizations
GIS data sets are more than geometry: Metadata- geographic reference- colors, display attributes etc- history of data modifications
GRASS Interagency
Steering CommiteeOpen GIS Open Geospatial
Consortium (OGC) Consortium (OGC)1990
1992
2004
Open GRASS
Foundation (OGF)1994
ISO/TC 2111997
WMS etcGMLhttp://www.opengeospatial.org
De-facto standard for GIS formats
Abstraction layer
GIS Interoperability: GDAL and OGR libraries
Data abstraction GDAL
GDAL
Raster OGR
Vector
http://www.gdal.org
Abstraction layerENVIGeoTIFFSARGRASSECWHDF4JPEG2000MrSIDArcGRIDMetadata- Number of bands- Color table- ...
- Coordinate system- Projection
40 FrmtsEPSG
Codes
PROJ.4
Abstraction layer
GIS Interoperability: GDAL and OGR libraries
Data abstraction OGR
GDAL
Raster
OGR
Vector
Metadata
- Coordinate system- ProjectionAbstraction layerEPSG
Codes
ArcCoverMITABOracleSHAPEPostGISGeodatabaseDGN20 Frmts
PROJ.4http://www.gdal.org/ogr/
GIS Data formats and support question
GDAL Development: Raster formats
Direct fundings:- Atlantis (ENVISAT, MFF, HKV Blobs)- eCognition Germany (FUJI BAS Format)- Los Alamos Nat. Labs (FITS)- OPeNDAP Inc. (OPeNDAP/DODS)- PeopleSoft (ERDAS LAN)- Safe Software (USGS SDTS, ISO8211 support)- Yukon Department of Environment (USGS DEM)
Public formats/Open documents/Reverse engineered- ERDAS Imagine (IMG)- ERMAPPER (ECW)- ESRI formats (ArcGrid)- GDAL Virtual Format- JasPer (JPEG2000); Kakadu (GeoJP2 interface for JPEG2000 = ISO/IEC 15444-1)- LizardTech (MrSID, JPEG2000)- NOAA (AVHRR data)
GIS Data formats and support question
OGR Development: Vector formats
Direct fundings: - DM Solutions Group and GoMOOS (SQLite RDBMS, Comma Sep. Values CSV) - OPeNDAP Inc. (OPeNDAP/DODS) - Safe Software (FMEObjects) - SRC, LLC (Oracle Spatial)
Public formats/Open documents/Reverse engineered- ESRI (SHAPE, ArcCoverage)- GML- IHO S-57- MapInfo (TAB and MIF/MID)- Microsoft (ODBC OGR)- Microstation (DGN)- MySQL (non-spatial data)
OGC Simple Features
Conformance
GRASS topological model
OGR
- OGDI Vectors (VMAP) - OGR Virtual Format - PostgreSQL/PostGIS - SDTS - UK Ordnance Survey (NTF) - U.S. Census (TIGER)
GIS formats
Why so many formats? No big problem!
Application specific requirements, which partially contradict
each other
high compression rate
small runtime storage requirements
coding without information loss
fast decoding
easy access to pixels
simple algorithm
Hardware-/CPU-independence
Good software can handle numerous formats.
Software patents and rights of third parties: future traps ?!
GIS formats and Software Patents
How software patents affect GIS users
LZW (Lempel Ziv Welch) Compression Used in many raster formats (e.g. GIF)
Integrated into GRASS before it became patent, later replaced by Zlib Deflate
Unisys started to charge for usage after waiting some
years
MrSID (Multi-resolution Seamless Image Database) wavelet based image file format
three patents covering both the image compression and on the
fly
image decompression technology
GDAL support MrSID but requires MrSID SDK license
ECW (ERMAPPER Compressed Wavelets) Patent pending
GPL released source code available (of patented code?)
JPEG 2000 Situation not very clear
Public administration must take care
to avoid patent and license traps.
Summary
The personal choice of application software/operating system
should not affect
the data exchange
longitudinal and transversal interoperability must be granted
Only documented formats may be used
There is no excuse: start to use interoperable formats today
GIS interoperability is at a better state than Office documents interoperability
Interoperability awareness needs to be promoted: today and in future
License of this document
Document home:
http://mpa.itc.it/gfoss04/neteler_gfoss04_interoperability2005.pdf
This work is licensed under a Creative Commons License.
http://creativecommons.org/licenses/by-sa/2.0/deed.en Free GIS and
Interoperability, 2004-2005 Markus Neteler
[ OpenOffice SXI file available upon request: neteler at itc it
neteler at osgeo org ]
License details: Attribution-ShareAlike 2.0 You are free: to copy,
distribute, display, and perform the work
to make derivative works
to make commercial use of the work
Under the following conditions: Attribution. You must give the
original author credit.
Share Alike. If you alter, transform, or build upon this work, you
may distribute the resulting work
only under a license identical to this one.
For any reuse or distribution, you must make clear to others the
license terms of this work.
Any of these conditions can be waived if you get permission from
the copyright holder.
Your fair use and other rights are in no way affected by the
above.
Markus Neteler ITC-irst 2004, 2005