33
Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Embed Size (px)

Citation preview

Page 1: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Alberto GarcíaUniversidad del País Vasco, Bilbao, SPAIN

Jon WakelinUniversity of Cambridge

XML in FORTRAN

Page 2: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

•Aren’t there nicer alternatives in Python, Java, C++ ... ?

•If your code is in Fortran you might not want to take the extra step.

•Who said that Fortran is not nice?

•Can’t you just wrap some C parser routines in Fortran?

•No if you care about portability.

XML in Fortran?

Page 3: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

•Native parser in Fortran90

•SAX interface

•Higher-level XPath-like (stream) interface

•Portable and reasonably fast

•Small memory footprint

•DOM interface

Page 4: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

SAX paradigm<item id="003"> <description>Washing machine</description> <price currency="euro">1500.00</price></item>

Begin element: itemBegin element: descriptionPCDATA chunk: “Washing machine”End element: descriptionBegin element: pricePCDATA chunk: “1500.00”End element: priceEnd element: item

Events:

Page 5: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

SAX API in Fortranprogram simpleuse flib_sax

type(xml_t) :: fxml ! XML file object (opaque)integer :: iostat ! Return code (0 if OK)

call open_xmlfile("inventory.xml",fxml,iostat)if (iostat /= 0) stop "cannot open xml file"

call xml_parse(fxml, begin_element_handler=begin_element_print)

end program simple

Page 6: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Event handlers

subroutine begin_element_print(name,attributes) character(len=*), intent(in) :: name type(dictionary_t), intent(in) :: attributes character(len=3) :: id integer :: status print *, "Start of element: ", name if (has_key(attributes,"id")) then call get_value(attributes,"id",id,status)

print *, " Id attribute: ", id endifend subroutine begin_element_print

Page 7: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Attribute dictionary (Python-like)

<person name=”john” age=”35” />

attributes: { name : john ; key:value age : 35 }

has_key(attributes,”name”) (function)get_value(attributes,”name”,value,status)

number_of_entries(attributes) (function)print_dict(attributes)...

Page 8: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

<inventory><item id="003">

<description>Washing machine</description> <price currency="euro">1500.00</price> </item>

<item id="007">

<description>Microwave oven</description> <price currency="euro">300.00</price> </item> <item id="011">

<description>Dishwasher</description> <price currency="swedish crown">10000.00</price> </item>

</inventory>

Page 9: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Parsing run

Start of element: inventory Start of element: item Id attribute: 003 Start of element: description Start of element: price Start of element: item Id attribute: 007 Start of element: description Start of element: price Start of element: item Id attribute: 011 Start of element: description Start of element: price

Here we handle only the “start element” event. But we can watch out for other events...

Page 10: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

module m_handlersuse flib_saxprivatepublic :: begin_element, end_element, pcdata_chunk!logical, private :: in_item, in_description, in_pricecharacter(len=40), private :: what, price, currency, id!contains !-----------------------------------------!subroutine begin_element(name,attributes)..subroutine end_element(name)..subroutine pcdata_chunk(chunk)..end module m_handlers

Page 11: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

subroutine begin_element(name,attributes) character(len=*), intent(in) :: name type(dictionary_t), intent(in) :: attributes integer :: status select case(name) case("item") in_item = .true. call get_value(attributes,"id",id,status) case("description") in_description = .true. case("price") in_price = .true. call get_value(attributes,"currency",currency,status)

end select end subroutine begin_element

Page 12: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

subroutine pcdata_chunk(chunk) character(len=*), intent(in) :: chunk

if (in_description) what = chunk if (in_price) price = chunk

end subroutine pcdata_chunk

The chunk of characters is assignedto a variable depending on context

We keep track of the contextusing logical variables

Page 13: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

subroutine end_element(name) character(len=*), intent(in) :: name select case(name) case("item") in_item = .false. write(unit=*,fmt="(5(a,1x))") trim(id), trim(what), ":", & trim(price), trim(currency) case("description") in_description = .false. case("price") in_price = .false.

end select end subroutine end_element

Page 14: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

003 Washing machine : 1500.00 euro 007 Microwave oven : 300.00 euro 011 Dishwasher : 10000.00 swedish crown

•We handle parsing events.

•Context is monitored using logical module variables.

•Data is assigned to user variables depending on the context.

•Actions taken depending on context.

•Everything is done in one pass!

•Small memory footprint.

Page 15: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

XML for scientific data<data> 8.90679398599 8.90729421510 8.90780189594 8.90831710494 8.90883991832 8.90937041202 8.90990866166 8.91045474255 8.91100872963 8.91157069732 8.91214071958 8.91271886986 ... 8.92100651514 8.92170571605 8.92241403816 8.92313153711 8.92385826683 8.92459427943 8.92533962491 8.92609435120 8.92685850416 8.92763212726 8.92841526149 8.92920794545</data>

real, dimension(10000) :: x ! array to hold datandata = 0...subroutine pcdata_chunk_handler(chunk) character(len=*), intent(in) :: chunk

if (in_data) call build_data_array(chunk,x,ndata) ...end subroutine pcdata_chunk_handler

Page 16: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

•build_data_array fills the array with numbers of the correct kind, on the fly.

•Updates the counter for number of elements.

•Multidimensional arrays: can use reshape afterwards.

call build_data_array(pcdata_chunk,x,ndata)

Page 17: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Other features

•It checks well-formedness, reporting errors and giving the line and column where they occur.

•It understands the standard entities in PCDATA (&lt; &gt; , etc).

•Handles <![CDATA[ &5<<!!(unparsed)]]> objects and other constructs.

•It does not validate the document.

Page 18: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Stream XPath interface

//a : Any occurrence of element 'a', at any depth. /a/*/b : Any 'b' which is a grand-child of 'a' ./a : A relative path (to the current element) a : (same as above) /a/b/./c : Same as /a/b/c (the dot (.) is a dummy) //* : Any element. //a/*//b : Any 'b' under any children of 'a'.

PATH: /inventory/item/description

<inventory><item id="003"> <description>Washing machine</description> <price currency="euro">1500.00</price></item> </inventory>

Page 19: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

program simpleuse flib_xpath

type(xml_t) :: fxmlinteger :: statuscharacter(len=100) :: what

call open_xmlfile("inventory.xml",fxml,status)!do call get_node(fxml,path="//description", & pcdata=what,status=status) if (status < 0) exit print *, "Appliance: ", trim(what)enddoend program simple

get_node: select the next element node with given path

Collect the PCDATA in variable what

Page 20: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

get_node(fxml,path,pcdata,attributes,status)

•/item Gets the next ‘item’ element, not all

•/item[3] Need to use get_node in a loop

•/item/price@currency (Need to look in attributes)

•/item/description/text() (Get and process pcdata)

•/item/price@currency=”euro” ....

XPath standard is not fully supported

Page 21: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

program eurosuse flib_xpath

type(xml_t) :: fxml

integer :: statuscharacter(len=100) :: price

call open_xmlfile("inventory.xml",fxml,status)!do call get_node(fxml,path="//price", & att_name="currency",att_value="euro", & pcdata=price,status=status) if (status < 0) exit print *, "Price (euro): ", trim(price)enddoend program euros

Selection based on attribute values

Page 22: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Contexts: Encapsulation of parsing tasks

call open_xmlfile("inventory.xml",fxml,status)!do call mark_node(fxml,path="//item",status=status) if (status /= 0) exit ! No more items call get_item_info(fxml,what,price,currency) write(unit=*,fmt="(5a))") "Appliance: ", trim(what), & ". Price: ", trim(price), " ", & trim(currency) call sync_xmlfile(fxml)enddo

Page 23: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

subroutine get_item_info(context,what,price,currency)type(xml_t), intent(in) :: contextcharacter(len=*), intent(out) :: what, price, currency

call get_node(context,path="price", & attributes=attributes,pcdata=price,status=status) call get_value(attributes,"currency",currency,status) ! call get_node(context,path="description", pcdata=what,status=status)

end subroutine get_item_info

Routine to extract the price and description information from any element with the ‘item’

structure

Page 24: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

call sync_xmlfile(fxml)

•This version of XPATH is built on top of SAX: It inherits the stream paradigm.

•One has to be careful to go back to the appropriate place in the file to pick up the data within a given context.

•Alternative: digest all the information at once (use DOM as foundation)

Page 25: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

DOM API in FortranConceived by Jon Wakelin

Inventory

Item ItemItem

Description Washing Machine

Price

Children

Sibling

Text Node

Element

Page 26: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

type, public :: fnode type(string) :: nodeName type(string) :: nodeValue

integer :: nc = 0 integer :: nodeType = 0 type(fnode), pointer :: parentNode => null() type(fnode), pointer :: firstChild => null() type(fnode), pointer :: lastChild => null() type(fnode), pointer :: previousSibling => null() type(fnode), pointer :: nextSibling => null() type(fnode), pointer :: ownerDocument => null() type(fnamedNodeMap), pointer :: attributes => null() end type fnode

Data structure representing a node

Use variable strings module by Mart Rentmeester

Page 27: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

use flib_dom

type(fnode), pointer :: myDoc, myNode type(fnodeList), pointer :: myList

! Parsing

myDoc => parsefile("inventory.xml")

! Navigation of the tree

myList => getChildNodes(myDoc) print *, "Number of children of doc: ", getLength(myList)

myNode => item(myList, 0) print *, “First child’s name:” , getNodeName(myNode)

myList => getElementsByTagName(myDoc,”price”)

Page 28: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

! Modification of the tree

newNode => cloneNode(myNode) np => insertAfter(myDoc,myNode,newNode)

! Dumping of the tree

call dumpTree(myDoc) call xmlize(myDoc,”output.xml”)

•DOM keeps a copy of a representation of the whole document in memory.

•Flexibility versus cost.

Page 29: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

Use of Fortran derived types

<table units="nm" npts="100"> <data> 2.3 4.5 5.6 3.4 2.3 1.2 ... ... ... </data></table>

type table_t character(len=20) :: units integer :: npts real, dimension(:), pointer :: dataend type table_t

Write a parsing method for each type !

Page 30: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

do i=0, getLength(myList) - 1 np => item(myList,i) if (getNodeName(np) == "table") then call getTable(np,table) endif enddo

subroutine getTable(element,table)type(fnode), pointer :: elementtype(table_t), intent(out) :: table

type(fnode), pointer :: np

table%units = getAttribute(element,"units") table%npts = getAttribute(element,"npts") np => getFirstChild(element) call getData(np,table%data)

end subroutine getTable

Page 31: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

call xml_NewElement(xf,"radfunc") call xml_NewElement(xf,"data") call xml_AddAttribute(xf,"units","rydberg") call xml_AddArray(xf,a(1:100)) call xml_EndElement(xf,"data") call xml_EndElement(xf,"radfunc")

<radfunc> <data units=”rydberg”> -3.319003851720E-05 -6.679755632540E-05 -1.008278046670E-04 -1.701778290170E-04 -2.055084411070E-04 -2.412834575880E-04 -3.141891337090E-04 -3.513311850090E-04 -3.889404258000E-04 .... </data> </radfunc>

WXML A library to generate well-formed XML

Page 32: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

subroutine xml_AddTable(xf,table)type(fxml_t), intent(inout) :: xftype(table_t), intent(in) :: table

call xml_NewElement(xf,"table") call xml_AddAttribute(xf,"units",table%units) call xml_AddAttribute(xf,"npts",table%npts) call xml_AddArray(xf,table%data)

end subroutine xml_AddTable

Encapsulation

Real-world example: CML writing library

call cmlAddMolecule(....) (Implemented by Jon Wakelin)

Page 33: Alberto García Universidad del País Vasco, Bilbao, SPAIN Jon Wakelin University of Cambridge XML in FORTRAN

•Process XML files directly in Fortran

•Choice of interface (SAX / DOM / XPath)

•Reasonably fast and memory-efficient

•Support for scientific data handling.

•Free and open to contributions.

Summary

http://lcdx00.wm.lc.ehu.es/ag/xml