Java Transformation Object

Informatica's Velocity Methodology

Working with JAVA Transformation Object

Challenge

Occasionally special processing of data is required that is not easy to accomplish using existingPowerCenter transformation objects. Transformation tasks like looping through data 1 to xnumber of times is not a functionality native to the existing PowerCenter transformation objects.For these situations, the Java Transformation provides the ability to develop Java code withunlimited possibilities for transformation capabilities. This Best Practice addresses questionsthat are commonly raised about using JTX and how to make effective use of it, andsupplements the existing PowerCenter documentation on the JTX.

Description

The Java Transformation (JTX) introduced in PowerCenter 8.0 provides a uniform means ofentering and maintaining program code written in Java to be executed for every record beingprocessed during a session run. The Java code is maintained, entered, and viewed within thePowerCenter Designer tool.

Below is a summary of some of typical questions about JTX.

Is a JTX a passive or an active transformation?

A JTX can be either passive or active. When defining a JTX you must choose one or the othertype. Once you make this choice you will not be able to change it without deleting the JTX,saving the repository and recreating the object.

Hint: If you are working with a versioned repository, you will have to purge the deleted JTX fromthe repository before you can recreate it with the same name.

What parts of a typical Java class can be used in a JTX?

The following standard features can be used in a JTX:

static initialization blocks can be defined on the tab Helper Code.import statements can be listed on the tab Import Packages.static variables of the Java class as a whole (i.e., counters for instances of this class) aswell as non-static member variables (for every single instance) can be defined on the tabHelper Code.Auxiliary member functions or static functions may be declared and defined on the tabHelper Code.static final variables may be defined on the tab Helper Code. However, they are privateby nature; no object of any other Java class will be able to utilize these.

© 2012 Informatica Corporation. All rights reserved.

Phoca PDF

http://www.phoca.cz/phocapdf


Auxiliary functions (static and dynamic) can be defined on the tab Helper Code.

Important Note:

Before trying to start a session utilizing additional import clauses in the Java code, make surethat the environment variable CLASSPATH contains the necessary .jar files or directories before the PowerCenter Integration Service has been started.

All non-static member variables declared on the tab Helper Code are automatically available toevery partition of a partitioned session without any precautions. In other words, one object of therespective Java class that is generated by PowerCenter will be instantiated for every singleinstance of the JTX and for every session partition. For example, if you utilize two instances ofthe same reusable JTX and have set the session to run with three partitions, then six individualobjects of that Java class will be instantiated for this session run.

What parts of a typical Java class cannot be utilized in a JTX?

The following standard features of Java are not available in a JTX:

Standard and user-defined constructorsStandard and user-defined destructorsAny kind of direct user-interface, be it a Swing GUI or a console-based user interface

What else cannot be done in a JTX?

One important note for a JTX is that you cannot retrieve, change, or utilize an existing DBconnection in a JTX (such as a source connection, a target connection, or a relationalconnection to a LKP). If you would like to establish a database connection, use JDBC in theJTX. Make sure in this case that you provide the necessary parameters by other means.

How can I substitute constructors and the like in a JTX?

User-defined constructors are mainly used to pass certain initialization values to a Java classthat you want to process only once. The only way in a JTX to get this work done is to passthose parameters into the JTX as a normal port; then you define a boolean variable (initial valueis true). For example, the name might be constructMissing on the Helper Code tab. The veryfirst block in the On Input Row block will then look like this:

if (constructMissing)

{

// do whatever you would do in the constructor

constructMissing = false;

}


Phoca PDF



Interaction with users is mainly done to provide input values to some member functions of aclass. This usually is not appropriate in a JTX because all input values should be provided bymeans of input records.

If there is a need to enable immediate interaction with a user for one or several or all inputrecords, use an inter-process communication mechanism (i.e., IPC) to establish communicationbetween the Java class associated with the JTX and an environment available to a user. Forexample, if the actual check to be performed can only be determined at runtime, you might wantto establish a JavaBeans communication between the JTX and the classes performing theactual checks. Beware, however, that this sort of mechanism causes great overhead andsubsequently may decrease performance dramatically. Although in many cases suchrequirements indicate that the analysis process and the mapping design process have not beenexecuted optimally.

How do I choose between an active and a passive JTX?

Use the following guidelines to identify whether you need an active or a passive JTX in yourmapping:

As a general rule of thumb, a passive JTX will usually execute faster than an active JTX.If one input record equals one output record of the JTX, you will probably want to use apassive JTX.If you have to produce a varying number of output records per input record (i.e., forsome input values the JTX will generate one output record, for some values it willgenerate no output records, for some values it will generate two or even more outputrecords) you will have to utilize an active JTX . There is no other choice.If you have to accumulate one or more input records before generating one or moreoutput records, you will have to utilize an active JTX . There is no other choice.If you have to do some initialization work before processing the first input record, thenthis fact does in no way determine whether to utilize an active or a passive JTX.If you have to do some cleanup work after having processed the last input record, thenthis fact does in no way determine whether to utilize an active or a passive JTX.

If you have to generate one or more output records after the last input record has beenprocessed, then you have to use an active JTX. There is no other choice except changing themapping accordingly to produce these additional records by other means.

How do I set up a JTX and use it in a mapping?

As with most standard transformations you can either define a reusable JTX or an instancedirectly within a mapping. The following example will describe how to define a JTX in amapping. For this example assume that the JTX has one input port of data type String and threeoutput ports of type String, Integer, and Smallint.


Phoca PDF



Note: As of version 8.1.1 the PowerCenter Designer is extremely sensitive regarding the portstructure of a JTX; make sure you read and understand the Notes section below beforedesigning your first JTX, otherwise you will encounter issues when trying to run a sessionassociated to your mapping.

Click the button showing the java icon, then click on the background in the main windowof the Mapping Designer. Choose whether to generate a passive or an active JTX (seeHow do I choose between an active and a passive JTX above). Remember, you cannotchange this setting later.Rename the JTX accordingly (i.e., rename it to JTX_SplitString).Go to the Ports tab; define all input-only ports in the Input Group, define all output-onlyand input-output ports in the Output Group. Make sure that every output-only and everyinput-output port is defined correctly.Make sure you define the port structure correctly from the onset as changing data typesof ports after the JTX has been saved to the repository will not always work.Click Apply.On the Properties tab you may want to change certain properties. For example, thesetting "Is Partitionable" is mandatory if this session will be partitioned. Follow the hintsin the lower part of the screen form that explain the selection lists in detail.Activate the tab Java Code. Enter code pieces where necessary. Be aware that all portsmarked as input-output ports on the Ports tab are automatically processed aspass-through ports by the Integration Service. You do not have to (and should not) enterany code referring to pass-through ports. See the Notes section below for more details.Click the Compile link near the lower right corner of the screen form to compile the Javacode you have entered. Check the output window at the lower border of the screen formfor all compilation errors and work through each error message encountered; then clickCompile again. Repeat this step as often as necessary until you can compile the Javacode without any error messages.Click OK.Only connect ports of the same data type to every input-only or input-output port of theJTX. Connect output-only and input-output ports of the JTX only to ports of the samedata type in transformations downstream. If any downstream transformation expects adifferent data type than the type of the respective output port of the JTX, insert an EXPto convert data types. Refer to the Notes below for more detail.Save the mapping.

Notes:

The primitive Java data types available in a JTX that can be used for ports of the JTX toconnect to other transformations are Integer, Double, and Date/Time. Date/time valuesare delivered to or by a JTX by means of a Java long value which indicates thedifference of the respective date/time value to midnight, Jan 1st, 1970 (the so-calledEpoch) in milliseconds; to interpret this value, utilize the appropriate methods of theJava class GregorianCalendar. Smallint values cannot be delivered to or by a JTX.The Java object data types available in a JTX that can be used for ports are String, bytearrays (for Binary ports), and BigDecimal (for Decimal values of arbitrary precision).In a JTX you check whether an input port has a NULL value by calling the function isNull("name_of_input_port"). If an input value is NULL, then you should explicitly set all


Phoca PDF



depending output ports to NULL by calling setNull("name_of_output_port"). Bothfunctions take the name of the respective input / output port as a string.You retrieve the value of an input port (provided this port is not NULL, see previousparagraph) simply by referring to the name of this port in your Java source code. Forexample, if you have two input ports i_1 and i_2 of type Integer and one output port o_1of type String, then you might set the output value with a statement like this one:

In contrast to a Custom Transformation, it is not possible to retrieve the names, datatypes, and/or values of pass-through ports except if these pass-through ports have beendefined on the Ports tab in advance. In other words, it is impossible for a JTX to adapt toits port structure at runtime (which would be necessary, for example, for something like aSorter JTX).

If you have to transfer 64-bit values into a JTX, deliver them to the JTX by means of astring representing the 64-bit number and convert this string into a Java “long” variableusing the static method Long.parseLong(). Likewise, to deliver a 64-bit integer from aJTX to downstream transformations, convert the “long” variable to a string which will bean output port of the JTX using a statement like this one:

As of version 8.1.1, the PowerCenter Designer is very sensitive regarding data types ofports connected to a JTX. Supplying a JTX with not exactly the expected data types orconnecting output ports to other transformations expecting other data types (i.e., a stringinstead of an integer) may cause the Designer to invalidate the mapping such that theonly remedy is to delete the JTX, save the mapping, and re-create the JTX.Initialization Properties and Metadata Extensions can neither be defined nor retrieved ina JTX.The code entered on the Java Code sub-tab On Input Row is inserted into some othercode; only this complete code constitutes the method execute() of the resulting Javaclass associated to the JTX (see output of the link "View Code" near the lower-rightcorner of the Java Code screen form). The same holds true for the code entered on thetabs On End Of Data and On Receiving Transactions with regard to the methods. Thisfact has a couple of implications which will be explained in more detail below.If you connect input and/or output ports to transformations with differing data types, youmight get error messages during mapping validation. One such error message occurringquite often indicates that the byte code of the class cannot be retrieved from therepository. In this case, rectify port connections to all input and/or output ports of theJTX and edit the Java code (inserting one blank comment line usually suffices) andrecompile the Java code again.The JTX (Java Transformation) doesn't currently allow pass-through ports. Thus theyhave to be simulated by splitting them up into one input port and one output port, then


Phoca PDF



the values of all input ports have to be assigned to the respective output port. The keyhere is the input port of every pair of ports has to be in the Input Group while therespective output port has to be in the Output Group. If you do not do this, there is nowarning in designer but it will not function correctly.

Where and how to insert what pieces of Java code into a JTX?

A JTX always contains a code skeleton that is generated by the Designer. Every piece of codewritten by a mapping designer is inserted into this skeleton at designated places. Because allthese code pieces do not constitute the sole content of the respective functions, there arecertain rules and recommendations as to how to write such code.

As mentioned previously, a mapping designer can neither write his or her own constructor norinsert any code into the default constructor or the default destructor generated by the Designer.All initialization work can be done in either of the following two ways:

as part of the static initialization block,by inserting code that in a standalone class would be part of the destructor into the tabOn End Of Data,by inserting code that in a standalone class would be part of the constructor into the tabOn Input Row.

The last case (constructor code being part of the On Input Row code) requires a little trick:constructor code is supposed to be executed once only, namely before the first method iscalled. In order to resemble this behavior, follow these steps:

1. On the tab Helper Code, define a boolean variable (i.e., constructorMissing) andinitialize it to true.

2. At the beginning of the On Input Row code, insert code that looks like the following:

if( constructorMissing)

{

// do whatever the constructor should have done

constructorMissing = false;

}

This will ensure that this piece of code is executed only once, namely directly before the veryfirst input row is processed.

The code pieces on the tabs On Input Row, On End Of Data, and On Receiving Transaction are


Phoca PDF



embedded in other code. There is code that runs before the code entered here will execute, andthere is more code to follow; for example, exceptions raised within code written by a developerwill be caught here. As a mapping developer you cannot change this order, so you need to beaware of the following important implication.

Suppose you are writing a Java class that performs some checks on an input record and, if thechecks fail, issues an error message and then skips processing to the next record. Such a pieceof code might look like this:

if (firstCheckPerformed( inputRecord) &&

secondCheckPerformed( inputRecord))

{ logMessage( ERROR: one of the two checks failed!);

return;

}

// else

insertIntoTarget( inputRecord);

countOfSucceededRows ++;

This code will not compile in a JTX because it would lead to unreachable code. Why? Becausethe return at the end of the if statement might enable the respective function (in this case, themethod will have the name execute()) to ignore the subsequent code that is part of theframework created by the Designer.

In order to make this code work in a JTX, change it to look like this:

if (firstCheckPerformed( inputRecord) &&

secondCheckPerformed( inputRecord))

{ logMessage( ERROR: one of the two checks failed!);

}

else

{ insertIntoTarget( inputRecord);

countOfSucceededRows ++;


Phoca PDF



}

The same principle (never use return in these code pieces) applies to all three tabs On InputRow, On End Of Data, and On Receiving Transaction.

Another important point is that the code entered on the On Every Record tab is embedded in atry-catch block. So never include any try-catch code on this tab.

How fast does a JTX perform?

A JTX communicates with PowerCenter by means of JNI (Java Native Invocation). Thismechanism has been defined by Sun Micro-systems in order to allow Java code to interact withdynamically linkable libraries. Though JNI has been designed to perform fast, it still createssome overhead to a session due to:

the additional process switches between the PowerCenter Integration Service and theJava Virtual Machine (JVM) that executes as another operating system processJava not being compiled to machine code but to portable byte code (although this hasbeen largely remedied in the past years due to the introduction of Just-In-Timecompilers) which is interpreted by the JVMThe inherent complexity of the genuine object model in Java (except for most sorts ofnumber types and characters everything in Java is an object that occupies space andexecution time).

So it is obvious that a JTX cannot perform as fast as, for example, a carefully written CustomTransformation.

The rule of thumb is for simple JTX to require approximately 50% more total running time thanan EXP of comparable functionality. It can also be assumed that Java code utilizing several ofthe fairly complex standard classes will need even more total runtime when compared to anEXP performing the same tasks.

When should I use a JTX and when not?

As with any other standard transformation, a JTX has its advantages as well as disadvantages.The most significant disadvantages are:

The Designer is very sensitive in regards to the data types of ports that are connected tothe ports of a JTX. However, most of the troubles arising from this sensitivity can beremedied rather easily by simply recompiling the Java code.Working with long values representing days and time within, for example, theGregorianCalendar can be extremely difficult to do and demanding in terms of runtimeresources (memory, execution time). Date/time ports in PowerCenter are by far easier touse. So it is advisable to split up date/time ports into their individual components, such


Phoca PDF



as year, month, and day, and to process these singular attributes within a JTX if needed.In general a JTX can reduce performance simply by the nature of the architecture. Onlyuse a JTX when necessary.A JTX always has one input group and one output group. For example, it is impossibleto write a Joiner as a JTX.

Significant advantages to using a JTX are:

Java knowledge and experience are generally easier to find than comparable skills inother languages.Prototyping with a JTX can be very fast. For example, setting up a simple JTX thatcalculates the calendar week and calendar year for a given date takes approximately10-20 minutes. Writing Custom Transformations (even for easy tasks) can take severalhours.Not every data integration environment has access to a C compiler used to compileCustom Transformations in C. Because PowerCenter is installed with its own JDK, thisproblem will not arise with a JTX.

In Summary

If you need a transformation that adapts its processing behavior to its ports, a JTX is notthe way to go. In such a case, write a Custom Transformation in C, C++, or Java toperform the necessary tasks. The CT API is considerably more complex than the JTXAPI, but it is also far more flexible.Use a JTX for development whenever a task cannot be easily completed using otherstandard options in PowerCenter (as long as performance requirements do not dictateotherwise).If performance measurements are slightly below expectations, try optimizing the Javacode and the remainder of the mapping in order to increase processing speed.

Powered by TCPDF (www.tcpdf.org)


Phoca PDF

http://www.tcpdf.org


Documents

Java Transformation Object