Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus...

Preview:

Citation preview

Improving the outputcapabilities of Stata with

Open Document Format xml

Adam Jacobs

Dianthus Medical Limited

Stata’s 3-fold capabilities

Statistics

Graphics

Data management

Statistics

Graphics

Data management

But there is a 4th...

Text output

A recent clinical study:– 92 pages of raw data listings– 124 pages of descriptive data tabulations– 3 pages of statistical analysis

All from a study in 12 healthy volunteers

Stata’s text output

Problems with Stata’s text output

No pagination

No formatting (or limited formatting with smcl)

Variable labels not always shown

No Unicode support

No tables of contents

etc etc

Some examples...

So how did I do it?

Open Document Format

An open standard, approved by ISO

XML based

For a variety of office-type documents

Used by the popular open-source office suite OpenOffice.org

Here, we are just interested in word-processing documents

.odt files

A .odt file is the native file format of OpenOffice.org Writer

A zip file

Contains various files, the most important of which is content.xml

content.xml is simply a plain-text file

Stata is good at writing plain-text files!

The Stata code

Creates the content.xml file by writing data with appropriate xml tags

Added to other files, zipped to .odt file

.odt file can be opened directly with Writer

Some examples...

Basics of XML

<company name=“Dianthus Medical Limited”><employee role=“speaker”>

<firstname>Adam</firstname><lastname>Jacobs</lastname>

</employee><employee role=“delegate”>

<firstname>Flavia</firstname><lastname>White</lastname>

</employee></company>

XML code for start of table

<table:table table:style-name="Table42">

<table:table-column table:style-name="TabCol13"/>

<table:table-column table:style-name="TabCol9"/>

<table:table-column table:style-name="TabCol8"/>

<table:table-column table:style-name="TabCol8"/>

XML code for table cells

<table:table-cell table:style-name="cell1211"><text:p text:style-name="Table_20_Contents">

Mileage (mpg)</text:p></table:table-cell><table:table-cell table:style-name="cell1111">

<text:p text:style-name="Table_20_Contents">N</text:p></table:table-cell><table:table-cell table:style-name="cell1111"> <text:p text:style-name= "Table_20_ContentsNumeric">

52<text:s text:c="3"/></text:p></table:table-cell>

Was this a lot of work?

123 kB of code

21 ado files

45 Mata functions

And not finished yet!

Any questions?

Recommended