Ing. Petr Aubrecht 10/17/2000 - labe.felk.cvut.cz

Tutorial of Sumatra Embedding

Ing. Petr Aubrecht

10/17/2000

Abstract

As data warehouses become more popular, the importance of data transfor-mation grows. The amount of different data to be handled increases—loadingfrom miscellaneous sources, processing many various errors in data etc.

Nowaday tools are usually very limited. They are often applicable onlyfor one type of warehouse and almost each is platform–dependent.

We have developed a language which is platform independent (it is writtenin pure C++). The only dependent part is the database connection, becausethere is no independent connection at all.

Our language is similar to C or Java (→ Sumatra) that are well knownto programmers and it’s easy to learn it. On the other hand, Sumatra isan “empty” language, i.e. it is possible to define user objects with specialbehavior.

Contents

1 Introduction 2

2 Sumatra basics 42.1 Typical scenario . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Step 1—Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 52.3 Step 2—Compiling Sumatra Core . . . . . . . . . . . . . . . . 52.4 Step 3—Types Registration . . . . . . . . . . . . . . . . . . . 6

2.4.1 AutoDocumentation . . . . . . . . . . . . . . . . . . . 72.5 Step 4—Function Implementation . . . . . . . . . . . . . . . . 8

2.5.1 Registration . . . . . . . . . . . . . . . . . . . . . . . . 82.5.2 Definition . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.6 Step 5—First Running Script . . . . . . . . . . . . . . . . . . 102.7 Step 6—Data Types Definition (object) . . . . . . . . . . . . . 10

2.7.1 Substep a—Creating a new object . . . . . . . . . . . . 112.7.2 Substep b—Methods . . . . . . . . . . . . . . . . . . . 112.7.3 Substep c—Attributes . . . . . . . . . . . . . . . . . . 132.7.4 Substep d—Optional . . . . . . . . . . . . . . . . . . . 14

2.8 Step 7—Handling errors . . . . . . . . . . . . . . . . . . . . . 15

3 Experimental usage 17

1

Chapter 1

Introduction

The idea of developing an “empty” language is based on looking at applica-tions as a bag of objects directed by user through GUI interface (see the Fig.1.1).

The standard type of application lacks the ability to be customized. Anapplication itself provides optional dialogues, rich and deep menu, a lotof toolbars and other tools allowing users to work with application in manyways they want to.

This can be solved by implementing a script (programming) language.All bigger applications are able to process some scripts, using VBA1, Perlor Python. A modified structure of an advanced application is shown at theFig. 1.2.

The problem with implementing these languages is in their size and com-plexity (especially VBA). Moreover, implementation takes a lot of time andhuman resources. Our solution is simple enough to allow to be implementedinto any existing applications quickly.

As our test implementations show, Sumatra can be implemented (by anexperienced implementor) in 1 hour to be fully operable—e.g. to be able torun first meaningful script. This feature is possible because of the simplicityand independence of Sumatra core.

As already mentioned, Sumatra language has been developed as an emptylanguage. That means, in Sumatra there are only several basic types andother can be defined depending on concrete usage.

1Visual Basic for Applications, Microsoft’s standard for all their applications (Word,Excel, PowerPoint, Access, Visio)

2

Application

Internal objects

%%JJ %

%@@

GUI ��

��\\

cc��~

~~

Figure 1.1: Application

Application

%%JJ %

%@@

GUI ��

��\\

cc��~

~~h hhQ

QQs

XXXXz

-

Script language

-�

Script

Internal objects

Figure 1.2: Application with internal scripting language

3

Chapter 2

Sumatra basics

The dashed box at the Fig. 1.2 (Script language) means in out case Sumatrainterpreter.

In Sumatra implementation there are three types of objects1:

native objects are application objects.

Sumatra objects are special Sumatra objects. They don’t exist in C++.In a fact, they are abstactions.

proxy objects connect native objects with Sumatra ones. They provide aninformation how handling with Sumatra objects affects the native ones.

In the following text we will describe a way how to implement this lan-guage into (practically arbitrary) application.

2.1 Typical scenario

In a typical case the application has a set of objects to be accessible froma script language. There can be also some global functions.

For all these objects than has to be created a proxy object, which asso-ciates an internal object with its image in Sumatra.

The structure of objects is shown at the Fig. 2.1.1Object in this context means both object and function.

4

C++

Proxy objects Native objects

Object TSumObject TObject

val : TObject

func SumFunc func

Sumatra

-

-

��*

-

Figure 2.1: Controling project objects/functions from Sumatra

2.2 Step 1—Analysis

Firstly, the implementor have to select several objects to be implementedin Sumatra. For all these objects should define methods2 and data mem-bers3. Sumatra also allows to define operators (unary or binary) and defaultovercast4.

At this step internal objects have to be clearly defined and attached totheir images in Sumatra. That defines the way Sumatra script objects controlthe internal objects in an application.

If implementor wants to use global functions (e.g. a common always usedfunction is Message(string) popping up a message), he needs to prepare theirdefinitions.

2.3 Step 2—Compiling Sumatra Core

Sumatra core is written in C++ using the standard STL5. Therefore it shouldhave no problems while compilations across platforms (even compilers, evenoperating systems). Several compilations have been successfully tested underWindows with Borland C++ 4.0 and 5.0 and Visual C++ 6.0.

2Method—function concerning the object, called using dot: object.method(), e.g. date-time dt;dt.AddDay(1)

3Data member—variable included in object: e.g. datetime dt;dt.day=1;4Overcast—assigning between variables with different types (int→double)5STL stands for Standard Type Library, freely available from www.sun.com

5

There is a full core to be simply added into application’s project. If theIDE of compiler allows creating groups, it’s well manageable to put the corefiles into one separate group.

A list of files could change from version to version. These files are alwayshold in a single directory. This concept of separation is kept in all releases.

Notes for Visual C++:

• For file SumatraGrammar.cpp set not to use precompiled headers. Vi-sual C++ looks for #include "stdafx.h" at the beginning of the file,but there is none.

• Enable Run-Time Type Information (RTTI). This is because of execu-tion safety—Sumatra often uses <dynamic_cast> instead of a C–likeovercast.

2.4 Step 3—Types Registration

In order to allow Sumatra to work with user defined types and functions,there has to be a way how to deliver information about them into the Sumatracore.

The idea used in Sumatra implementation is a definition of a global func-tion SumatraRegisterTypes(), which is responsible for all the registration.After execution of SumatraRegisterTypes, all types and functions an im-plementor wants to use in Sumatra scripts have be registered.

Sumatra is accompanied with a set of templates—skeleton of whole regis-tration, function and object definitions. First edit the file SumatraTypes.cppand SumatraTypes.h. These files provide a simple template for a functionSumatraRegisterTypes(). An implementor has to include it in the defini-tion of functions and proxy objects’ Register call.

An example of Sumatra types definition follows:

void SumatraRegisterTypes(){

// register basic typesSumatraRegisterBasicTypes();

// here is the place to register dependant typesSumatraRegisterDate();

6

// define Message functionTSumFceDef *fMessage =

new TSumFceDef("Message","int",SumFceMessage,"Shows text message");

fMessage->AddArg("text","string","the text of the message");SumatraFunctions.push_back(fMessage);

// register user defined objects// NewObject::Register();

}

In this example, SumatraRegisterBasicTypes() registers such typeswhich are supported by Sumatra core: int, double, and string. Nextfunction SumatraRegisterDate() registers datetime. The Message func-tion is declared as the only additional function. Take note of describing textin function declaration.

2.4.1 AutoDocumentation

At the start of implementation of Sumatra, the definition of objects and func-tions changes a lot. Therefore it’s a big problem to maintain a documentationup-to-date. To solve this problem there exists a AutoDocument function inSumatra. This function reads all types of information and generates a HTMLfile containing a full documentation.

In the example of SumatraRegisterTypes as shown above, each functiondefinition, each parameter have an ability to be described. All aspects ofobject definition (object itself, data member, methods) can be describedas well. This information is afterwards used for automatic generation ofa documentation.

It is a good idea to use this function from application’s interface. a typicalusage of this function is quite simple:

SumatraRegisterTypes();AutoDocument("name-of-a-HTML-file");SumatraUnregisterClasses();

7

2.5 Step 4—Function Implementation

2.5.1 Registration

A function definition is much simpler, therefore implementation should startfrom it. It’s easier to search for problems in easier implementation than afterfifteen objects have been written.

The easiest function to implement is Message(string). This messageusually shows a message (this is a system–dependent function and there-fore cannot be in a core). Other implementations could save messages andgenerate a report.

Function implementation comprises of two parts. First, the function hasto be declared (usually in SumatraRegisterTypes). A structure of declara-tion is quite simple:

TSumFceDef *fFunc = new TSumFceDef("name","rettype",FcePointer,"description")

fFunc->AddArg("argname1","argtype1","argdescription1");fFunc->AddArg("argname2","argtype2","argdescription2");...SumatraFunctions.push_back(fFunc);

where

name a name of the function (in Sumatra script), in our case ”Message”

rettype a return (Sumatra) type of a function

FcePointer a name of proxy function, usually named SumFceXXXX

description a description of the function for AutoDocument

argnameX a name of the argument X

argtypeX a type of this argument

argdescriptionX a description of the argument for AutoDocument

Adding arguments is naturally optional.An example of function definitions is shown in SumatraRegisterTypes

at page 6.

8

2.5.2 Definition

A definition of a new function is supported by file NewFunction.cpp, con-taining a template for a new function. The only thing is to replace the textFUNCTION with the name of a new function and put the execution bodyin.

void TSumFceFUNCTION(TSumExpr *xbase, TSumArgList *args,TSumExpr *retval)

{// sets ret to return object - don’t forget set it up!METHOD_RET(ret,TSumReturnType);// Get argument from args//GET_ARG(text,0,TSumArgType);// a~body of function???// setting up the return valueret->SetVal( ReturnedValue );

}

Replace the ??? with the body of function you want to define and setthe return value. Because Sumatra doesn’t support void data type, use intinstead and set it to 0.

As an example of definition let us show the Message function:

void SumFceMessage(TSumArgList *args,TSumExpr *retval){

METHOD_RET(ret,TSumInteger);GET_ARG(text,0,TSumString);AfxMessageBox(text->GetVal().c_str());ret->SetVal(0);

}

This example bear on Visual C++. In Borland C++ Builder should beused ShowMessage instead. And in GNU C++ can be printf("%s",...)in the body of function.

9

2.6 Step 5—First Running Script

At this moment, Sumatra is able to start the first script, e.g. to show a mes-sage. Let’s consider the following scripts:

Message("Hello world!");

Most favorite program in every language.

int i;for(i=0;i<5;i++) Message(IntToStr(i));

A bit more complicated script counting numbers from 0 to 4 and showingthem.

Type int is registered as a basic type, as an IntToStr function is. TheMessage function has been registered during this tutorial. Then everythingis prepared to run the script.

An executing part of Sumatra is covered by a SumatraSpace object. Be-fore it is used, the SumatraRegisterTypes function has to be run and fol-lowed SumatraUnregisterClasses in the end, which removes objects regis-tration (prevents memory leaks).

Let us consider the text C-string variable contains the text of the scriptabove. The simplest way to run the script is as follows:

SumatraRegisterTypes(); // Prepare Sumatra typesSumatraSpace sum;sum.simplerun(text);SumatraUnregisterClasses();

More complex way how to run scripts will be described later.

2.7 Step 6—Data Types Definition (object)

The most interesting feature of Sumatra is an ability to define objects. Theseobjects appears in Sumatra like C++ objects, allowing control of “real”,project dependant objects. This ability allows to control application’s be-havior in a clear way.

Definition of a new object is simplified (like a function creation) by thetemplate files. These files are named NewObject.cpp and .h. Before usage,

10

they should be copied to a new location and renamed accordingly to thename of an object.

During this part of tutorial a new object will be developed: IntArray.This object will act as a array of integers.

2.7.1 Substep a—Creating a new object

First copy the template files (NewObject.*) to a new location and renamethem to IntArray.*.

Edit the files and replace the text “NewObject” with “IntArray”.In fact, it assumes there is a native object IntArray, which this proxy

object uses. In this case it isn’t truth, therefore it has to be defined. Directlybefore the definition of class TSumIntArray (arisen by renaming) put thefollowing text:

typedef vector<int> IntArray;

This defines a type IntArray. Based on Sumatra terminology, it looksas a native object. “vector” is a STL template class. Because the wholeSumatra is written using STL, it’s safe to use it.

A choice of an underlying object is very important. This type is supposedit have the assign (=) operator defined for itself. The object will be assignedduring execution of a script. This may be a problem for more complexstructures/objects, thus it’s recommended to provide only a pointer to themwith New and Delete methods for dynamic creation/deletion.

It is highly recommended to go through these files now to explore optionsoffered by templates. What is essential is to provide a description of theclass in Register method. All descriptions will appear in the automaticallygenerated documentation. After view don’t forget to add IntArray.cpp intoyour project.

Now whole project can be compiled. If the compiler reports any errors,correct them. The most common error is incorrect path in header.

2.7.2 Substep b—Methods

There are two way how to manipulate with objects. Let start with definingmethods as a more direct way.

11

The first method needed to fill the array with values is add. It will takean integer argument.

Each new method has to appear at two places—in Register method ofa proxy object and has to be defined as a function.

Registration of a method has a sample in template:

args.clear();args.push_back( TSumArgument("ParameterName",

"ParameterType", "description of ParameterName"));ti.AddMethod("Method", "ReturnType", args,

TSumNewObjectMethod, "description of Method");

Let’s replace the appropriate strings to register the add method:

args.clear();args.push_back( TSumArgument("toadd", "int",

"the integer number added"));ti.AddMethod("add", "int", args, TSumIntArrayadd,

"adds a new integer into array");

Now we have to define the function providing the add method (again—editing the template):

void TSumIntArrayadd(TSumExpr *xbase, TSumArgList *args,TSumExpr *retval)

{// set the base object to IntArray (corresponds to zero-th// parameter in C++)METHOD_BASE(base,TSumIntArray);// sets ret to point to return object - don’t forget// set it up!METHOD_RET(ret,TSumInteger);// Get argument from argsGET_ARG(num,0,TSumInteger);// a body of methodbase->GetValRef().push_back(num->GetVal());// setting up the return valueret->SetVal(0);

}

12

The only “interesting” part of the code is called a body of method.base->GetValRef() return a reference to the value (in this case of typeIntArray) of the object itself, num->GetVal() returns a value of object num—integer. Returned value is always zero.

Further IntArray’s methods can be created by this way: clear(), get(intposition), set(int position, int value).

2.7.3 Substep c—Attributes

Attributes implementation is even simpler. There are two types of attributes:provided by the native object itself and provided by the proxy object (cal-culated).

Let’s imagine the IntArray provides a size attribute (data member) pro-viding the length of an array. Looking into the template it shows two placesconcerning attributes: defining and registering (in the same way as methods).But all work is done by two macros.

The first macro is dedicated to a definition. Edit the sample in a template:

DEF_ATTR_VAL(TSumIntArray,size,TSumInteger)

The second macro is in Register method:

DECLARE_ATTR_D(TSumIntArray,size,int,"length of an array");

That’s all.On the other hand, not all attributes can be handled so easily. It’s also

this case—IntArray doesn’t provide size as a property, but as a method.Thus macro DEF ATTR VAL will fail saying that IntArray has no size at-tribute.

The size attribute has to be provided by the proxy class. The previousexample is slightly changed. Only rename DEF ATTR VAL to DEF ATTR CALC.Now the size is demanded from the proxy object. The proxy class simulatesthis by a pair of methods: set and get. They have to have prototypes voidsetsize(int x) and int getsize():

// sizeint getsize() {return val->size();}void setsize(int x) {throw

TSumException("IntArray::size is read-only");}

13

2.7.4 Substep d—Optional

Default overcast

Although it’s easy to implement a method to provide methods for convertingobject to other types, Sumatra supports also default overcast. Sample fromC is implicit conversion between int and double.

In IntArray case let’s provide a default overcast to string.Overcasting is implemented by conversion class. This class is a descendant

of a target class (in this case of TSumString).

class TSumOverIntArray2String : public TSumString{public:

virtual ~TSumOverIntArray2String() {delete item;}TSumOverIntArray2String() {item=NULL;}TSumOverIntArray2String(TSumExpr *i) {item=i;}static TSumExpr *classFact(TSumExpr *e) {return new

TSumOverInt2Double(e);}virtual TSumExpr *Deref() {SetVal(GetVal()); return this;}// here calculate the conversionvirtual string GetVal(){

TSumIntArray *a = %%@dynamic_cast<TSumIntArray*>(item->Deref());

string ret="",delim="";for(int i=0;i<a->GetValRef().size();i++){

ret+= itoa(a->GetValRef()[i]) + delim;delim=",";

}return ret;

}protected:

TSumExpr *item;};

As in other parts of definitions the overcast has to be registered (inRegister method of the TSumIntArray class). The registration is follow-ing:

14

ti.AddOvertype("string",TSumOverIntArray2String::classFact,"overcast to string - values separated by commas");

Operators

The last simplification offered by Sumatra are operators. There is a set ofoperators which can be used:

+,−, ∗, /, ++,−−, &&, ||, <,>, <=, >=, ==, ! =, %, !

They are defined in basic types (int, double) and have usual meaning.Here we will discuss usage of + operator for concatenation of two arrays.

The task is again simplified by using macros:

#define DEFINE_BINOP(TSumIntArray,+,Plus)

And in Register method:

#define DECLARE_BINOP_D(TSumIntArray,IntArray,+,IntArray,Plus,"concatenates two arrays")

In this case the right side of operator has other type (not IntArray +IntArray, but e.g. IntArray + int), the macros change their names to DE-CLARE/DEFINE BINOP2, which have one more parameter (the right–sidetype). The simplicity of definition of operators is based on C++ operators:if you use operator + in IntArray, then is expected if is defined + on the un-derlying object (here IntArray → vector). IntArray complies this condition.

2.8 Step 7—Handling errors

Error reporting in Sumatra is done through the TSumException class andC++ exception handling. This class is thrown every time Sumatra encoun-ters any problems—during grammar processing (syntax error) or execution(e.g. division by zero).

Looking for syntax error is much simplified by the method getposition()of SumatraSpace. Let’s consider the following code:

15

SumatraSpace sum;try{

SumatraRegisterTypes(); // Prepare Sumatra typessum.simplerun(text);SumatraUnregisterClasses();

}catch(TSumException e){

editor->SetCursor(sum.getposition());ShowMessage("Error: "+e.GetText())

}

16

Chapter 3

Experimental usage

PumpSumatra has been developed (acronym for Sumatra for data pump pur-poses) under the GOAL project, A2 project ([A2S99]). It’s built in BorlandC++ Builder 5 (BCB). It has three objects concerning databases. All ofthem represent pointers to Borland’s objects, thus they have 1) Ptr at theend of name and 2) methods New for creation of a new (Borland) object andDelete method for removing them. In the program, method New has to becalled first and Delete last. Object names are the same as in BCB (plus Ptrat the end).

TDatabasePtr Object representing connection to database. A programhas to set Alias property in order to select data source, DatabaseNamefor it’s internal name and Active to true for connection activating.

TQueryPtr Represents one SQL query or a table. First, it is necessary toset the DatabaseName property for connection to a database (same as inTDatabasePtr). Then a program puts SQL command or query to SQLproperty. For commands ExecSQL method should be used, for queries(SELECT) Active property should set to true. Scrolling through aresult set is possible using FindFirst and FindNext methods. TheGetFieldByName is used for accessing fields.

TFieldPtr This object is used for manipulation with single field. It hasproperties AsString, AsInteger . . . for setting/getting values of ap-propriate types.

17

Acknowledgment

The research of Sumatra has been carried out under the support of the INCO–COPERNICUS 977091 GOAL (Geographical Information On-Line Analysis)and under the support of the supporting grant of MSMT CR OK387.

18

Bibliography

[A2S99] Miksovsky P., Kouba Z.: Application A2 Specification, TechnicalReport TR11, INCO–COPERNICUS 977091 GOAL

[Mel97] Melichar B., Holub J., Muzatko P.: Languages and Translations.1. ed. Prague: CTU, 1997, ISBN 80-01-01692-7

19

Documents

Ing. Petr Aubrecht 10/17/2000 - labe.felk.cvut.cz