Chapter 8 Data Analysis - Suraj @ LUMSsuraj.lums.edu.pk/~te/simandmod/Opnet/08 data analysis.pdfDatan.2.2 Data Sources Analysis panels may be created by a number of different operations

511-Datan

Data Analysis

Chapter 8

512-Datan

Modeling Concepts

513-Datan

Modeling Concepts

IntroductionD

ata An

alysis

Datan.1 Introduction

Simulations can be used to generate a number of different forms of output, asdescribed in the

Simulation Design

chapter

of the

Modeling Concepts

manual.These forms include several types of numerical data, animation, and detailedstatistics provided by the OPNET debugger. In addition, because OPNETsimulations support open interfaces to the C and C++ languages, and the hostcomputer’s operating system, simulation developers may generate proprietaryforms of output ranging from messages printed in the console window, togeneration of ASCII or binary files, and even live interactions with other programs.However, the most commonly used forms of output data are those that are directlysupported by Simulation Kernel interfaces for collection, and by existing tools forviewing and analysis. Both animation data and numerical statistics fall into thiscategory. Animation data is generated either by using automatic animation probesor by developing custom animations with the KPs of the Simulation Kernel’s

Anim

package; the

op_vuanim

utility is then used to view the animations. Similarly,statistic data is generated by setting statistic probes, and/or by the KPs of theKernel’s

Stat

package; OPNET’s Analysis Tool can then be used to view andmanipulate the statistical data. The capabilities of the Analysis Tool are the subjectof this chapter.

514-Datan

General Editor Organization

Modeling Concepts

Datan.2 General Editor Organization

The service provided by the Analysis Tool is to display information in the formof graphs. Graphs are presented within rectangular areas called

analysis panels

.Each analysis panel may have one or more graphs. A graph is the part of theanalysis panel that can contain statistics. A number of different operations can beused to create graphs

and analysis panels, all of which have as their basic purposeto display a new set of data or to transform an existing one. An analysis panelconsists of a plotting area, with two numbered axes generally referred to as thehorizontal axis (abscissa), and the vertical axis (ordinate). The plotting area cancontain one or more graphs describing relationships between variables mapped tothe two axes. For example, the graph in the panel below shows how the size of aqueue varies as a function of time.

Datan.2.1 Analysis Data: Statistics

The graphs displayed in analysis panels represent data sets referred to as

statistics

. Each statistic consists of a sequence of data points called

entries

. Anentry in turn consists of two real numbers, called the entry’s

abscissa

and

ordinate

.

Statistic Content

entry index entry abscissa entry ordinate

0 x0 y0

1 x1 y1

2 x2 y2

3 x3 y3

Analysis Panel

Graph

Analysis Panel Components

515-Datan

Modeling Concepts

General Editor OrganizationD

ata An

alysis

The relationship between the abscissa and ordinate variables is then describedby the correspondence established by each of the entries. For a given entry thisrelationship can usually be read

“When the abscissa variable takes on the value x,the ordinate variable takes on the value y”

, where

x

and

y

are the values stored inthe entry. In the analysis panel, this entry may be represented by a point located atthe intersection of the lines represented by the equations

abscissa = x

and

ordinate = y

, as shown below.

Since each statistic may consist of multiple entries, panels usually containmany points. The resulting graph describes the relationship between the abscissaand the ordinate not only in terms of the dependency at each point, but also byexpressing sensitivity of one variable to the other; in other words, graphs can givean indication of the effect that changing one variable has on the other. Usually, ifone variable is considered to be varied intentionally, or is treated as a system inputparameter, it is called an

independent variable

and is placed on the horizontal axis.The second variable is called a

dependent variable

and is mapped to the verticalaxis.

4 x4 y4

5 x5 y5

6 x6 y6

Statistic Content (Cont.)

entry index entry abscissa entry ordinate

entry with:abscissa = 2.5ordinate = 3.5

Graphing of an Entry based on its Abscissa and Ordinate

516-Datan


Modeling Concepts

Statistics represented in the Analysis Tool are intrinsically discrete, eventhough they may represent relationships between variables of a continuous nature.The spacing of entries within a statistic can be arbitrary, and depends solely on theapplications that generate the statistic data (usually a simulation). Irregular spacingof entries on the horizontal axis is a common case since entries are often created asa function of unevenly spaced events in a simulation.

It is possible for a statistic to represent a mapping between abscissa andordinate variables that is one-to-many. Thus, a statistic does not necessarily behaveas a mathematical function. For instance, multiple ordinate values may occur foran abscissa value as a result of multiple events occurring at the same simulationtime and each event generating an additional entry. One such situation is that of aqueue that receives many packets at once to be enqueued; as each packet isenqueued, the queue size statistic increases by one, and if the queue is capable ofaccepting multiple packets at once, multiple values of this statistic will becoincident in time.

independent variable “TTRT” is mapped to horizontal axis

dependent variable “Throughput” is mapped to vertical axis

graphed entries show relationship between Throughput and TTRT

Graphing of Dependent Variable vs. Independent Variable

entries have a denser distribution along hori-zontal axis than others in statistic

Variable Density of Entry Spacing on Horizontal Axis

517-Datan

Modeling Concepts


ata An

alysis

Several special entries are defined in order to represent features of statisticsthat do not correspond to ordinary numerical data. In some cases, these features arenaturally incorporated into certain statistics based on their definitions; in othercases they result from transformations of ordinary statistical data by the operationsof the Analysis Tool.

•

Undefined Value

. This type of entry represents a point in the statisticwhere the system has no knowledge of or cannot compute the ordinatevariable, or where it does not make sense to attribute a value to the or-dinate variable, based on its definition. For example, the

average

queuing

delay

statistic for packets in a queue is considered undefineduntil at least one packet has entered the queue. Similarly, the

signal-

to-noise

ratio

measured at a receiver for transmissions arriving froma particular source is considered undefined while that source is not ac-tive. Undefined values also frequently result from mathematical opera-tions that are not well-defined, such as dividing one zero-valued entryby another, or multiplying an entry with an infinite value by one with azero-value.

In a graph using the discrete draw style, undefined values are simplyomitted from the graph.

In a graph using the

linear

draw style, the line is suspended until thenext defined value. When a defined value is surrounded by undefinedvalues, it appears as a point. The following example shows how unde-fined value entries appear as gaps in a line graph.

Statistic with Multiple Entries Sharing Same Abscissa

518-Datan


Modeling Concepts

The default graph style changes from linear to discrete if a graph hasten or more disconnected lines or points. (This threshold may be lowerfor smaller data sets.) You can switch the graph to linear mode in thiscase, but the line will appear highly fractured.

•

Infinite Value

. Like an undefined value entry, this type of entry mayarise either from the definition of a statistic, or from numerical manip-ulations performed in the Analysis Tool. For example, the space avail-able in a queue with unlimited capacity is a statistic that has apermanently infinite value. A common numerical manipulation thatgenerates infinite value entries is dividing a nonzero value entry by onewith a zero value. Negative infinite values are differentiated from pos-itive infinite values.

Like an undefined value, an infinite value is omitted from a graphdrawn with the discrete style. The linear style distinguishes an infinitevalue from an undefined value by drawing a line straight up or down (topositive or negative infinity) from the last finite value and then backagain to the next finite value. A positive-infinity-to-negative-infinitytransition appears as a straight top-to-bottom line. The following paneldepicts a statistic for the function

f(x) = 1 / x

, containing a negative infi-nite value followed immediately by a positive infinite value.

Graph Discontinuities Caused by Undefined Entries

undefined entries cause a break in the statistic’s graph

519-Datan

Modeling Concepts


ata An

alysis

•

End of statistic

. For certain statistics, statistics can be thought of as ap-proximations of continuous mappings between the abscissa and the or-dinate variables. Because the tool is limited to a discrete representation,an interpolation method for estimating values between samples issometimes needed. A particular case of this issue arises at the end of astatistic, when the last entry is recorded at a time that is strictly less thanthe ending time of a simulation. While no specific ordinate value can begiven for the final abscissa, it is useful to know what this abscissa valueis, both for plotting purposes, and possibly for users to make their owninterpolations. The statistic generation mechanisms of the SimulationKernel and the Analysis Tool retain this information by placing a spe-cial entry at the final abscissa position of each statistic. This entry iscalled an

end-of-statistic entry

.

Datan.2.2 Data Sources

Analysis panels may be created by a number of different operations in theAnalysis Tool. Since all panels must contain at least one statistic, these operationsrequire a source of data on which to base the new panel. There are four possiblesources of data, only some of which are applicable, depending on the operation.

Datan.2.2.1 Output Vector Files

Output vector files are usually generated by simulations to store dynamicstatistics (i.e., statistics that vary as a function of simulation time). However, theymay also be generated via the

External Model Access

(

Ema

) interface, which isOPNET’s general API (application program interface) for proprietary file formats.In both cases, output vector files consist of a set of vectors, and a directory thatdescribes the vectors’ locations within the file, in order to support fast access. Eachvector has essentially the same content as a statistic, including a series of abscissa-

Example of Statistic Containing Infinite Entries

520-Datan


Modeling Concepts

ordinate entries. The abscissa variable generally represents simulation time in anoutput vector generated by a simulation; however, if

Ema

is used to build the file,the abscissa can represent any user-defined variable.

Vectors may be loaded into the Analysis Tool to serve as the basis of most ofthe available operations. The simplest vector-loading operation is called

create

vector

panel

, and allows one statistic to be viewed in a panel. Numerous filterscan also be applied to the data. For details, refer to section

Pt.9.11 View Results(Advanced)

in the

Editor Reference

manual.

Datan.2.2.2 Output Scalar Files

Statistic data collected in output vector files can be useful for characterizingcertain aspects of a system’s dynamic behavior or performance; however, eachvector output file is collected during a single simulation run and therefore cannotcapture the range of results that would occur for different configurations oroperating conditions of the system. In addition, if a system’s simulationincorporates stochastic behavior (a common case), the resulting output vector fileis affected by the choice of random number seed. Therefore, there is a need tocombine results from multiple simulations allowing for a range of modelparameters as well as a range of random number seeds. The combination of theseresults can serve to characterize the typical or expected behavior of the modeledsystem.

In order to support the collection of data over multiple simulations,

opnet

provides a statistic collection mechanism that allows results to accumulate within asingle file as new simulations are run. The type of statistic stored in this file iscalled

scalar statistic

, and the file is referred to as an

output scalar file

. Outputscalar files are usually generated and augmented by simulations as they complete;however, they may also be created via the External Model Access (

Ema

) interface,and via the utility program

op_cvos

. The file creation mechanism is transparent tothe Analysis Tool, which is where scalar statistics are usually extracted andanalyzed.

Scalar statistics are stored as individual real numbers in an output scalar file.Typically, each scalar statistic accumulates one additional value per simulation,although it is possible to accumulate multiple values in a single simulation. Ascalar statistic can be thought of as a “summary” of some aspect of the system’sbehavior or performance, as evidenced during one particular simulation run. Scalarstatistics can also represent system input or operating conditions, obtained eitherfrom model attributes or from measurements made during the simulation. Refer tothe

Simulation Design

chapter of the

Modeling Concepts

manual for moreinformation on how to generate output scalars.

In order to keep track of a set of scalar statistics as it grows from simulation tosimulation, an output scalar file is organized into blocks that correspond tosimulation runs. Within each block, the model name and random number seed arestored, as well as all of the scalar statistics generated during that particular

521-Datan

Modeling Concepts


ata An

alysis

simulation run. Each scalar statistic consists of a name, which can be an arbitrarystring, and a double-precision floating point value. This abstract format isillustrated below:

In order for scalar data to be loaded into the Analysis Tool, the

create

scalar

panel

and

create parametric scalar panel

operations can be used to plot thescalar data of an existing scalar file.

Since scalar statistics do not depend on time, but on other quantities in thesystem, they cannot be plotted without choosing another variable with which adependency can be expressed. The Analysis Tool therefore supports plotting ofscalar statistics “against” one another. Plotting scalar Y against scalar X shows thepossible values of scalar Y for individual values of scalar X. If there are severalvalues of Y for a given value of X (e.g., in different simulations using distinctrandom number seeds), then several vertically “stacked” data points appear in thegraph. The following example of a scalar panel in the Analysis Tool illustrates thisstacking effect.

Example Output Scalar File

Model Name Random Seed

Scalar Name Scalar Value

net_a

100 alpha 100.0

beta 0.5

gamma 16.0

net_a

101 alpha 90.0

beta 0.7

gamma 10.0

net_a

102 alpha 85.0

beta 0.6

gamma 16.3

522-Datan


Modeling Concepts

Note that the relationship that is shown in a scalar plot is not necessarily due toan inherent dependency between the output scalars. The plot merely shows howthe two quantities varied simultaneously over a series of experiments. The causalnature of the relationship between the two variables must be inferred by the userbased on additional knowledge about the actual meaning of these variables.OPNET is not able to make such inferences in an automated fashion.

The Analysis Tool supports a second approach to visualizing scalar data whichis useful when the relationship between three scalar variables is of interest. Thesupporting panel is called a

parametric scalar panel

, and the correspondingoperation is

create parametric scalar

. In a parametric scalar panel, anabscissa variable and an ordinate variable play the same role as in an ordinaryscalar panel. However, a third variable called “parameter”, is used to separate thesets of resulting points into distinct subsets. In each subset, the parameter has afixed value which is distinct from the parameter’s value in each of the othersubsets. The result is a “family” of curves plotted in the panel, as shown in thefollowing example.

For each value of processing speed, several different values of Peak Queue Size have been recorded in the scalar file.

Two Output Scalars Plotted Against Each Other

523-Datan

Modeling Concepts


ata An

alysis

Datan.2.2.3 ASCII Representations of Statistic Data

Graphical plotting is the main form of information display used in the AnalysisTool in order to visualize relationships between variables. However, in some cases,visual representation may be ambiguous due to limited screen space andresolution. For example, two values that are distinct but extremely close to eachother may be interpreted as being equal, or the points that represent them mayocclude each other. In addition, it is sometimes necessary to have knowledge ofexact, or near-exact values shown in the statistics. The Analysis Tool provides twooperations to allow users to obtain more detailed knowledge of statistic contents:

• The Statistic Data option

• The General Statistic Information option

For instructions on both options, refer to the

Project Editor

chapter of the

Editor Reference

manual.

Datan.2.2.4 Statistic Data Option

The most detailed view of a statistic’s data can be obtained by using theStatistic Data option, which is provided by the Statistic Information

Each curve corresponds to a distinct value of the “Packet Size” sca-lar parameter. For that value of “Packet Size” it shows a relationship between “Queue Size” and “Processing Speed”.

Three Output Scalars in Parametric Scalar Form

524-Datan

General Editor Organization Modeling Concepts

operation. This option displays the explicit contents of the statistics that the panelincludes. The statistics’ lengths and axes labels are given as well as each entry’sabscissa and ordinate value. This operation applies to the visible portion of thepanel, meaning that if a panel’s axes bounds have been modified, or if the zoomoperation has been used, less than the statistic’s full content may be displayed. Thefollowing panel and editing pad illustrate the capability provided by theStatistic Data option.

Datan.2.2.5 General Statistic Information Option

In order to allow users to quickly obtain high-level information about thestatistic(s) in an existing panel, the Analysis Tool provides the general statisticinfo option of the Statistic Info... operation. The information is provided ona per-statistic basis and applies to the portion of a statistic that falls within theabscissa range of the panel (though entries immediately preceding andimmediately following the range may be included as well). Thus, if a panel’s fullvertical span has been reduced by editing the vertical or horizontal scales, or byzooming, less than the full content of the statistic will be taken into account. Thefollowing table explains the information provided by this option:

Analysis Panel and Equivalent Textual Representation

The Statistic Data option shows ab-scissa and ordinate values for each point in the statistic.

number of values does not include undefined entries or end-of-statistic marker; length includes both

Edit out undesired values, then use Build New Statistic to cre-ate a new graph with only the rele-vant data.

525-Datan

Modeling Concepts General Editor OrganizationD

ata An

alysis

Datan.2.3 Exporting Vectors and Statistics

Three interfaces allow output vectors and statistics to be exported to and fromOPNET’s environment:

• The External Model Access (Ema) interface allows an output vector fileto be created or data to be extracted from it. Refer to the External Mod-el Access chapter of External Interfaces for details.

• The Statistic Information operation displays statistic data as text. Youcan then use the edit pad operations to export the data to a text file. Fordetails, refer to the Project Editor chapter of Editor Reference.

• The Export Data to Spreadsheet operation converts the data to a text filethat can be opened and converted by a spreadsheet program. For details,refer to the Project Editor chapter of the Editor Reference manual.

General Statistic Information

length Number of entries in statistic (including special entry for end-of-statistic).

number of values number of entries in statistic, not including undefined val-ues or end-of-statistic marker.

horizontal min, max Minimum and maximum abscissa values.

vertical min, max Minimum and maximum ordinate values.

initial value Ordinate value of first entry.

final value Ordinate value of last entry.

expected value Average value of ordinate variable treated as a step func-tion (i.e., using sample-and-hold interpretation of data) and weighting each entry by the abscissa interval until the next entry; corresponds to calculation performed by time-average filter.

sample mean Mean value of entries’ ordinates computed by weighting all entries equally; corresponds to calculation performed by mean filter.

variance Variance of ordinate values; this is the mean value of the squared deviation from the sample mean.

standard deviation Square root of the variance; represents typical distance between an ordinate value and the mean ordinate value.

confidence intervals Intervals estimated to contain true mean of entries’ ordi-nate values with five separate levels of confidence; calcu-lations are based on principles discussed earlier in this chapter for confidence limits; these results are meaningful only if entries are independent measurements.

526-Datan


Datan.2.4 Analysis Configurations and Template Statistics

The work done in the Analysis Tool can often involve many detailed operationsto load, filter, and combine data and adjust its presentation. The result of this workis a collection of panels, graphs, and the scholastics they contain, which is referredto as an analysis configuration. Obviously, it may be of interest to store theseresults in order to be able to recall them, either for reference to earlier simulations,or in order to make presentations of the data. The Analysis Tool supports saving ofthe entire contents of the editor as an analysis configuration file (“.ac” suffix),which can later be loaded back into the editor. Refer to the Analysis Tool chapterof the Editor Reference manual for more information on these operations.

Template Statistics

Users of the Analysis Tool frequently find that they must execute the sameoperations in order to view data from different simulation runs. In other words,after each simulation, or set of simulations, the same statistics are loaded into theAnalysis Tool, with only the content of those statistics changing. This leads to thenotion that the specification for the manipulations and presentation of data can besaved independently from the data itself. The specifications can then be simply“applied” to data resulting from new simulations, in order to automatically obtainprocessed and displayed information. The Analysis Tool supports this capabilitywith a feature called template statistics.

Each graph in an Analysis Panel can be given a special status called“template”. A template graph contains no data (it is stripped of its data at the timethat it becomes a template). However, it does contain all of the configurationinformation, such as the name of the original vectors or scalars that were used tocreate it, and the operations that might have been applied to that data. It alsocontains display information such as draw style and color. In other words, only thegraph’s entries (abscissa-ordinate pairs) are missing. The Analysis Tool providesseveral operations that support converting graphs from ordinary form to templateform. Refer to the Project Editor chapter of the Editor Reference manual for moreinformation on these operations.

The utility of a template graph is that it can again become an ordinary graph byusing its configuration information to display new data that is “applied” to it. Thenew graph data need only match the graphs requirements - namely that the namesof the original scalar or vector statistics be the same. Using this feature, the outputstatistics from many different simulations can be automatically processed anddisplayed in an identical manner without having to go through the individual stepsrequired to generate each graph. The Analysis Tool supports applying data totemplate panels when the data is loaded from output files. In other words, when anoutput vector file or output scalar file is opened, the Analysis Tool provides theoption to match the data against the template graphs’ specifications and “fill in” thedata if possible.

527-Datan


ata An

alysis

The Analysis Tool provides the flexibility of changing individual graphs totemplates, allowing panels to simultaneously contain template and ordinarygraphs. However, in typical cases, all of the graphs in a panel, and even in anAnalysis Configuration are converted to template form at the same time (theAnalysis Tool provides a “global” operation to do this). A panel containing onlytemplate graphs is referred to as a template panel. An analysis configurationcontaining only template panels is a fairly common case, since it can provide aspecification for all of the operations that may be done to the output file generatedby a particular simulation model. Alternatively, the panels may contain “referencedata” in ordinary graphs, as well as template graphs that are to be filled in by

OV

OVOV

OV

Successive Applications of OV Data to a Template Panel

four separate output vector (OV) files are applied to the template panelto generate graphs of the same statistic for four separate simulations

The (T) in the statistic label indicates that the statisticis in template form. The (S) indicates that it is select-

528-Datan


applying an output file to the analysis configuration. The following figure showsexamples of these cases.

Note: Output files can be applied to graphs at any time and only modify thosegraphs that are selected and that match the data. This allows successive appli-cations to be performed to the same analysis configuration in order to progres-sively fill in additional data.

Datan.2.5 Data Presentation

The Analysis Tool offers a number of options with regard to the graphicalpresentation of a panel. These options never affect the data content of the panel,but only the manner in which the data is displayed. Access to the presentationoptions is via the edit panel properties and edit graph propertiesoperations, which are activated by clicking with the right mouse button while thecursor is in a panel or graph, respectively.

Application of OS Data to a Mixed Template Panel

The panel originally contains two sta-tistics, but the “Simulated QueueSize” statistic is a template, allowingits data to be supplied later. The“Measured Queue Size” is an ordi-nary statistic, containing “referencedata”.

After the OS files isapplied, the “Simu-lated Queue Size”becomes an ordi-nary statistic, allow-ing the OS data tobe compared to the“reference data”.

OS

529-Datan


ata An

alysis

Graphs

The graphs shown previously in this chapter show analysis panels that containa single graph. An analysis panel can have more than one graph, however, so longas all graphs can share the same horizontal axis. While the vertical axes may differin a panel, all graphs in a panel must be able to use the same horizontal axis.Because of this, separate graphs in an analysis panel stack vertically, as shownbelow.

You can create a panel with multiple graphs or add graphs to the analysis panellater.

Panels

Graphs reside in an analysis panel. Clicking in the panel, as opposed to agraph, brings up the edit panel properties dialog box, which allows theappearance of the horizontal axis to be changed or the draw style for all statistics inthe graph to be globally set. The Statistics Info... operation provides usefulinformation about the statistics contained in the panel, including the data pointsthemselves. The edit panel properties dialog box also allows additionalgraphs to be added to the panel.

Drawing Style

Each graph within a panel can be assigned one of five possible graphicalrepresentations, called the graph’s draw style. Each graph’s draw style is controlledindependently of the draw styles of other graphs. The six drawing styles are calleddiscrete, linear, sample-hold, bar, bar chart, and square-wave.

Two Graphs In One Analysis Panel

530-Datan


The discrete draw style provides the most direct view of the actual data contentsince a single “dot” is used to represent each entry in the statistic (provided that theordinate is not undefined). Since no attempt is made to attribute ordinate values tointermediate abscissa values, as is intrinsically done by the other draw styles, thediscrete drawing style is most appropriate for graphs that represent a set ofindependent samples where intermediate values are not well defined. For example,a typical statistic resulting from measuring end-to-end delay for each receivedpacket at the time where it is received is plotted below. Though it may be ofinterest in some cases to use the linear draw style to emphasize a trend in thediscrete points, estimating the delay value at times between the packet arrivalsdoes not correspond to a measurement that could actually be taken.

The linear draw style consists of drawing line segments between the points thatare defined by a statistic’s entries. One of the uses of this style is to representintermediate points for which the statistic contains no samples, but which can beassumed to exist nonetheless. A common example of this is for panels containingscalar data, where each point represents the result collected by a simulation; thelinear draw style can be used to “fill in” or approximate parts of the curve that liebetween available data points, as shown in the following example.

Statistic Plotted using Discrete Draw Style

531-Datan


ata An

alysis

Because the resulting graph is without breaks (except at undefined points), thelinear draw style is also sometimes used simply to emphasize the trend in astatistic, even if the statistic is discrete in nature. An example of this is shown in thesecond panel below, which contains the statistic for the size of a queue (i.e., thenumber of packets it contains) as it varies over time.

The sample-hold draw style is based on the notion that between abscissavalues, no new information is known about certain types of statistics, and thereforethese statistics should be assumed to maintain their previous ordinate value. Thisinterpretation of a statistic’s discrete set of entries makes sense for many statisticscollected in OPNET-based simulations. Any statistic that represents a counter ofsome type, such as a queue size, the number of packets received without errors, orthe number of times a queue has overflowed, inherently maintains its value until anew sample is obtained.

Scalar Data Plotted with Linear Draw Style

Discrete Variable Plotted using Linear Draw Style

532-Datan


The bar draw style is essentially a simple extension of the sample-hold drawstyle, where the horizontal segment that is drawn at each entry is instead extendedinto a filled in bar that reaches down to the horizontal axis. This is the traditional“bar chart” which is useful for expressing the weight associated with each recordedabscissa value. This style is therefore often used to represent histogram data andprobability distributions.

The square wave draw style is similar to both the sample-hold draw style andthe bar draw style. It is, in effect, a bar graph that is not filled in. Vertical linesconnect each horizontal segment, but the horizontal segments do not extend to theabscissa.

Counter Variable Plotted using Sample-hold Draw Style

Probability Distribution Plotted using Bar Draw Style

533-Datan


ata An

alysis

Data Plotted using Square Wave Draw Style

534-Datan

Computing Confidence in Simulation Results Modeling Concepts

Datan.3 Computing Confidence in Simulation Results

As explained in the Simulation Design chapter of Modeling Concepts, systemmodels that include stochastic behavior have results that are dependent on theinitial seeding of the random number generator. Because a particular random seedselection can potentially result in an anomalous, or non-representative behavior, itis important for each model configuration to be exercised with several randomnumber seeds, in order to be able to determine standard, or typical behavior. Thebasic principal applied here is that if a typical behavior exists, and if manyindependent trials are performed, it is likely that a significant majority of thesetrials will fall within a close range of the standard.

One of the important issues a simulation designer must confront, whenperforming simulations incorporating stochastic processes, is that of deciding howmany separately seeded trials to run in order to have sufficient data to make astatement about the system’s typical behavior. Of course, the simulation designercan never be absolutely certain that the results obtained are representative, since itis possible to be “unlucky” in the sense that all, or most of the chosen randomseeds could lead to anomalous behavior. However, as more seeds are chosen, thepossibility of this happening becomes more remote, particularly since after all,standard behavior should be observed more frequently than anomalous behavior.This leads to the notion of attempting to achieve a certain level of confidence in theresults obtained from a simulation study, by ensuring that a sufficient number ofsimulations are performed.

Datan.3.1 Confidence Intervals

The field of statistics provides methods for calculating confidence in anestimate, based on a trial or series of random trials. The techniques that it providesare also frequently used in applied sciences where field measurements are subjectto error, and multiple measurements are taken to attempt to place a bound on themagnitude of that error. OPNET’s Analysis Tool provides a basic capability in thisarea, by automatically calculating and displaying confidence intervals for statisticsalready contained within panels. This capability is supported by the showconfidence interval checkbox in the edit graph properties dialog box.

The confidence intervals calculated by the Analysis Tool are for the meanordinate value of a set of entries. For the purposes of this operation, entry sets aredefined by collocation at the same abscissa. This approach to calculatingconfidence intervals is designed primarily to support confidence estimation forscalar data collected in multi-seed parametric experiments, where one or moreinput parameters are varied, and for each input parameter value, multiple randomnumber seeds are used to obtain multiple output parameters. The type of statisticsthat result from this type of simulation study (prior to confidence intervalcalculation) are illustrated by the example below. The vertical “columns” of entriescorrespond to the multiple experiments run by varying the random seed andmaintaining a fixed value for an input parameter.

535-Datan

Modeling Concepts Computing Confidence in Simulation ResultsD

ata An

alysis

Suppose that a number of simulations of a system have been run with differentrandom number seeds in order to obtain N samples of the statistic X. Even thoughX may take on many values, and X’s precise distribution is unknown, it is possibleto define a value µ, which is the true mean of the random variable X. One way tothink of µ is as the mean value of an extremely large set of samples of X, if it werepossible to run such a large number of simulations to obtain this sample set. Thereason µ is interesting as the true mean of X is that it represents the typicalbehavior of the modeled system with regard to the statistic X.

Since it is not usually possible to run a very large number of simulations todetermine µ (theoretically, an infinite number would be required), it is interestingto determine the degree of precision with which the mean value x of an N-sampleset approximates µ. This determines whether the value x can be used withconfidence to make statements about the typical behavior of the modeled system.

The fundamental principal used in establishing confidence in an xmeasurement is called the central limit theorem. In order to understand thistheorem, consider the experiment consisting in the collection of N samples of X,and the calculation of the average value x of the N samples; in other words, x isconsidered to be the result of one trial of the experiment. Then consider that thisexperiment may be performed many times, and that the resulting statistic X has itsown distribution. The central limit theorem states that regardless of X’s actualdistribution (this is an important generalization, since very little may be knownabout X), as the number of samples N grows large, the random variable X has adistribution that approaches that of a normal random variable with mean µ, thesame mean as the random variable X itself. The theorem further states that if thetrue variance of X is σ2, then the variance of the statistic X is σ2/N.

Statistic Consisting of Scalar Data from Multiple Simulation Runs

stacking of values along same vertical line corresponds to multiple simulation runs with different ran-dom seeds

536-Datan


The utility of this theorem, with respect to establishing confidence in ameasurement, lies in the fact that if sufficient samples are taken, a normaldistribution, which has known properties, can be worked with, rather than theunknown distribution of X. Assuming that a sufficient sample size N has beenchosen, the sample x of the variable X taken from a sequence of simulations can beplaced on the hypothetical sampling distribution of X as shown below.

Because the distribution of X is normal, the probability that the random samplex falls within a particular distance of µ can be computed. Usually this distance ismeasured in terms of the number of standard deviations that separate the randomsample from the mean. In this manner a “standardized normal variable”

is defined, for which the standard deviation is unity and themean is zero, as shown below.

If the positive value zα is defined such that Prob (-zα < z < zα) = α, then thefollowing statement can be made by substituting for z (note: most standardstatistics textbooks provide tables mapping α to zα, or equivalent variables). Thisstatement can simply be thought of as defining the probability that x is within aparticular distance of µ, based on the fact that the distribution of X is normal.

F(X)

Xx

randomly obtained sample of X

µ

sampling distribution of X

σn

-------

Sampling Distribution of X

z x µ–( ) σx

⁄=

F(z)

z0

1

Sampling Distribution of z = (x - µ) / σx

Probx µ–σ

x

--------------- zα< α=

537-Datan


ata An

alysis

Conversely, this statement can be thought of as assigning a probability to thecondition that the true mean µ is within a particular distance of the random samplex, as shown below (note that the standard deviation of X is expressed in terms ofthe standard deviation of X, based on the central limit theorem).

This statement introduces the notion of a confidence interval for µ, which isdefined to be the interval of real numbers [ΘL, ΘR], such that the probability that µlies within this interval has a particular value α. This interval is referred to as the100α percent confidence interval for µ. In other words, if α = 0.95, then theinterval is called the 95% confidence interval for µ, because there is a 95%certainty level that the true mean of X lies within the interval’s bounds. It is clearfrom the preceding inequalities that the confidence interval limits can be expressedas follows:

From these definitions, it is clear that the confidence interval widens as thedegree of confidence increases; this makes sense, since in order to achieve a highlevel of confidence that the true mean is within a particular interval, one mayexpect to make a less restrictive hypothesis about that interval; similarly if one iswilling to accept a lower degree of confidence, a more constraining hypothesis canbe made about the interval. As an extreme example, one can be 100% confidentthat the value µ lies between negative and positive infinity. In practice, a fewparticular confidence levels are chosen as shown in the following table:

The expressions for a confidence interval on µ may be also be viewed asproviding information concerning the number of trials n that must be executed inorder to achieve a particular degree of confidence that the error in estimating µ is

confidence level α

zα

99% 2.575

98% 2.327

95% 1.96

90% 1.645

80% 1.282

Prob x zασN

-------- – µ x zα

σN

-------- +< < α=

ΘL x zασN

-------- –= ΘR x zα

σN

-------- +=

538-Datan


less than a specified value. This can easily be seen by rearranging the aboveequations.

Since x is the estimator for µ, the error is simply the absolute value of thedifference between these two values. Then if e is the upper bound on the error withcertainty α, the number of required samples n is given by:

Datan.3.2 Small Sample Confidence Estimates

Note that the above expressions for confidence limits on the mean µ rely onknowledge of the standard deviation σx. However, it is not necessarily the case thatthis quantity is known since the actual distribution of X is not known. If the samplesize N is sufficiently large (generally ≥ 30), the sample variance can be used inplace of the true variance in order to compute confidence intervals.

For cases where variance is unknown and the sample size is small, a method isused that is similar to the one described above, but is based on the T-distributionrather than the normal distribution. The T-distribution resembles the normaldistribution in its characteristic “bell curve” shape. However, this distribution isbased on the use of the sample variance rather than the assumed or knownvariance. It is therefore useful for simulation studies where fewer than 30 samplesare used to estimate µ, which is actually a frequent case.

As the number of samples become large, the T-distribution begins toapproximate the normal distribution. In calculating confidence intervals, theAnalysis Tool uses the sample size 30 as a threshold to begin using the normaldistribution. For all sample sets with fewer than 30 values, the T-distribution isused instead. The expressions for confidence limits in this case are similar, exceptthat the true variance σ2 is replaced by the sample variance s2, as shown below, andthe constant tα is used rather than zα to represent areas under the distribution’scurve.

Some common values of tα are provided in the following table. More extensivetables are available in standard statistics textbooks.

Prob x µ– zασn

-------< α=

e zασn

-------= n⇒

zασe

---------

2

=

ΘL x tαs

N--------

–= ΘR x tαs

N--------

+=

539-Datan


ata An

alysis

When the show confidence interval operation is applied to a panel, thestatistics are shown such that entries aligned on the same abscissa are treated asgroups. Each group is collapsed into a single entry whose ordinate is the mean ofthe group. The confidence interval for the mean of each group is calculated usingthe methods discussed above and is then displayed as a vertical bar centered at themean and ending at the upper and lower confidence limits. For entries that areunique at a particular abscissa, no confidence intervals can be calculated. The lackof a confidence bar is indicated by a small circle surrounding the point of interest.This operation offers a choice of five confidence intervals: 80%, 90%, 95%, 98%,99%. An example of a graph with confidence limits appears below.

It is usually possible to control an abscissa variable mapped to the horizontalaxis, such that it has the exact same value across multiple simulations. In thismanner, the columns of scalar statistic values are perfectly vertical. However, insome cases, the independent parameter may have some variability due to the factthat it is specified on a stochastic basis. For example, consider a simulation study

confidence level α

tα, Ν = 3 tα, Ν = 5 tα, Ν = 10 tα, Ν = 20

99% 9.925 4.604 3.250 2.861

98% 6.965 3.747 2.821 2.539

95% 4.303 2.776 2.262 2.093

90% 2.920 2.132 1.833 1.729

80% 1.886 1.533 1.383 1.328

Statistic with Confidence Intervals

540-Datan


where performance characteristics of a modeled computer system are to bemeasured as a function of the job load applied to the system. Suppose in addition,that the job load is specified as a parameter of the simulation, but that the specifiedvalue is actually the average job load which controls stochastic job generatorswithin the model. In this case, even with the job load parameter maintained at aconstant value, the actual job load would itself vary from simulation to simulation,as random number seeds are changed. Thus, if the actual job load is taken as theabscissa variable of a scalar versus scalar plot, the groups of points correspondingto a set of identically parameterized simulations will usually not be perfectlyvertical.

Because the show confidence interval operation applies only to verticallyaligned groups of entries, it is sometimes necessary to eliminate the abscissacoordinate variability of a small group of entries before using this operation. Onepossible approach that could be adopted in the computer system example, in orderto ensure the alignment of scalar-based entries, is to record the prescribed averagejob load via the Kernel Procedure op_stat_scalar_write() rather than the actualmeasured average. In this manner, the exact same value would be obtained for thisparameter, regardless of random number seed. As an alternative, the Analysis Toolprovides an operation called group data points for this purpose. This operationprovides a number of options for justifying entries parallel to the abscissa orordinate axes and specifying whether the value to which they are aligned should bebased on their minimum, maximum, or mean. The most common use of thisoperation is for cases such as the computer system example described above,where correction of abscissas is performed such that a group of entries becomesvertically aligned, with new abscissas equal to the mean of the previous abscissas.Refer to the Analysis Tool chapter of Editor Reference for more information on theuse of the group data points operation.

The Analysis Tool also computes confidence intervals. Although confidenceintervals are computed regardless of the actual content of a panel, the providedresult is only meaningful for data sets of a certain nature. In particular, the entriesof the panel’s statistic(s) must be considered independent samples of a randomvariable; this determination is left to the user since the Analysis Tool has noinformation concerning the source of the statistic data. Under appropriatecircumstances, the reported confidence limits for the five confidence levels 80%,90%, 95%, 98%, and 99% may be used to provide an interval estimate for themean value of a series of samples contained in a statistic. Refer to the ProjectEditor chapter of Editor Reference for details on using the Show ConfidenceInterval operation.

541-Datan

Modeling Concepts Vector/Statistic OperationsD

ata An

alysis

Datan.4 Vector/Statistic Operations

In addition to display of statistic data, the Analysis Tool provides a number ofoperations that can be used to transform this data in order to generate newstatistics. Since vectors stored in output vector files have the same data content asstatistics, these operations can also be applied directly to vectors. However, inorder to simplify discussion, in this chapter, all operations will be discussed interms of their application to statistics.

Datan.4.1 Histograms and Probability Profiles

Five operations are provided for the purpose of establishing a distribution of asample set of collected values:

• Probability Density (PDF)

• Cumulative Distribution (CDF)

• Probability Mass (PMF)

• Histogram (Sample-Distribution)

• Histogram (Time-Distribution)

• Scatter Plot (Time-Join)

Each operation is unary (i.e., requires only one statistic as input) and producesa new single-statistic panel to hold its result when it completes. The computationsperformed by each of these operations are discussed in this section. Refer to theProject Editor chapter of Editor Reference for instructions on their use.

Datan.4.1.1 Probability Density Function

The probability density function (PDF) operation can be thought of as acontinuous equivalent of the PMF described earlier. Like probability mass,probability density corresponds to the likelihood that the input statistic’s ordinatelies within a specific range; however, density is evaluated proportionally to theinterval of interest. Thus, if the statistic’s ordinate value has a likelihood 0.1 offalling in a given interval with width d, and also has a likelihood 0.1 of falling in asecond interval with width d/2, then the probability density in the second interval istwice as high. In other words, probability density is highest when a small set ofpossible values has a high associated probability mass.

The actual definition of a probability density function is based on the fact thatits integral over a given interval yields the probability mass associated with thatinterval. The probability mass associated with an interval can also be obtained bycomputing the difference in the CDF for the upper and lower limits of the interval.As interval widths become infinitesimally small, it can be seen that the PDF istherefore the derivative of the CDF with respect to the outcome (i.e., ordinate)variable.

542-Datan

Vector/Statistic Operations Modeling Concepts

The relationship between a PDF and a CDF is in fact the basis for the methodused by the Analysis Tool to compute PDFs. A CDF is first computed as describedearlier in this chapter, and a differentiation is performed to construct a PDF. Sincethe original statistic data is necessarily discrete, differentiation is performed in anapproximate manner by dividing probability mass associated with an interval bythe interval’s width. In other words, the difference between two consecutive CDFvalues is divided by the difference in the corresponding ordinates. The resultingvalue is taken as the density associated with the interval and is placed at theinterval’s lower limit. Thus if a statistic contains two consecutive ordinate valuesy1, and y2, the PDF is computed as follows:

An immediate consequence of this computation method is that PDFs can haveextremely large values when the input statistic has distinct but closely spacedordinate values, since the (y2 - y1) difference becomes small. Thus, if input statisticordinate values are unevenly spaced (i.e., some very small differences exist, butalso some significantly larger ones), PDFs can have a “spiky” or discontinuousappearance, with certain density values dwarfing others. In such cases, PDFs tendnot to be as useful as the PMF or histogram operations.

A second consequence of this calculation is that the PDF contains one lessentry than the CDF due to the fact that no forward-looking difference can becalculated for the final (i.e., maximum) ordinate value.

Finally, the integral of the PDF statistic, which can be computed using theappropriate filter, produces a statistic that is identical to the CDF in its shape.However, the initial value of the CDF is lost in computing the PDF, meaning thatthe two statistics differ by a constant. This difference is particularly noticeablewhen the original statistic has a small number of distinct ordinate values, since theCDFs value for the minimum ordinate is at least the reciprocal of this number (i.e.,this is the probability mass associated with the first ordinate value).

PDF y1( ) CDF y2( ) CDF y1( )–y2 y1–

-----------------------------------------------------=

543-Datan


ata An

alysis

Datan.4.1.2 Cumulative Distribution Function

Like the probability mass function, the cumulative distribution function (CDF)of a statistic relates to the likelihood of occurrence of the statistic’s ordinate values.However, rather than provide the probability mass of each ordinate’s occurrence,the CDF shows the accumulated probability mass of all ordinates less than or equalto a particular ordinate, hence the term “cumulative”. This form of presentation isuseful when particular ordinate value thresholds are of interest. For example, itmay be of interest to determine the likelihood of receiving a message whose delaysexceeds a particular value, because under such conditions, the packet can not be ofany practical use. In fact in many cases, system performance requirements arestated in terms of maximum tolerances for such probabilities; i.e., “the probabilityof receiving a packet with delay ≥ 20 ms must be no greater than 0.1”. The CDFsresulting statistic allows compliance with such a requirement to be readilydetermined by finding the threshold value on the horizontal axis and thecorresponding probability on the vertical axis. In the example of this paragraph, acompliant system would be characterized by a CDF value of at least 0.90 at the 20ms abscissa position. A possible CDF is shown below for a non-compliant systemunder the conditions of this example.

Results of PDF Operation for Regularly and Irregularly Spaced Ordinates

regularly spaced ordinates yields smooth pdf

irregularly spaced ordinates yields spiky pdf

544-Datan


The computation of a CDF resembles that of a PMF in the sense thatproportions for each ordinate value in the original statistic are computed. The sameweight, which is the reciprocal of the number of entries, is attributed to each entry.Thus if there are 100 entries, then each entry has a weight of 0.01, and if there arefive entries whose ordinate values is y, then the ordinate y has a total probabilitymass of 0.05. The entries of the CDF are constructed by positioning the distinctordinate values of the original statistic in increasing order on the abscissa, with oneentry for each such value. The CDF value for the initial entry is simply theprobability mass of the corresponding ordinate. The CDF value for the secondentry is equal to the CDF value of the first entry augmented by the probability massof its corresponding ordinate value, and so on. The CDF is essentially a runningsum of the values of the PMF.

Two simple properties of the CDF result from the method of computationdescribed above: (1) since each CDF value is computed by adding a positiveprobability mass to the previous value, CDFs are monotonically increasing;(2) since the sum of all probability masses must add up to unity, all CDFs musthave a final value of 1.0; this also makes sense under the definition of the CDF,because one would expect the likelihood of obtaining an ordinate value less than orequal to the maximum ordinate value to simply be 1.0.

Datan.4.1.3 Probability Mass Function

The Probability Mass Function (PMF) operation allows the distribution of astatistic’s ordinate values to be obtained, much in the same manner as a sample-distribution histogram. While PMFs do count entries to measure the frequencywith which ordinate values occur, counters are not maintained on a per-intervalbasis. Instead, each distinct ordinate value is counted separately, so that onlyentries with the exact same ordinate can combine to produce higher PMF values.

Using a CDF to Determine Proportion of Entries Below a Threshold

roughly 65% of delay values are less than 0.02

545-Datan


ata An

alysis

The counters used by the PMF operation to compute the frequency of eachordinate value are normalized with respect to the total number of entries in theoriginal statistic. In other words, the resulting PMF represents the frequency ofoccurrence of a particular ordinate value as a proportion of the number ofoccurrences of all ordinate values. Thus, the measurement provided by a PMF canbe though of as the likelihood that an entry chosen at random among all the entriesof the original statistic, would have a particular ordinate value. For such a selectionexperiment, the likelihood of choosing a particular ordinate value is alsosometimes called the probability mass of that outcome, hence the name of theoperation.

The following set of data, and the accompanying statistic, illustrate thecalculation of a PMF.

The fact that distinct ordinate values are not aggregated on the basis ofintervals makes PMFs appropriate to apply to statistics that contain a relativelysmall number of discrete ordinate values. In such cases, sample-distributionhistograms may be less appropriate than PMFs due to one primary problem: if thediscrete values that are present are unevenly spaced, it may be difficult to choose ahistogram interval width that provides for both good separation of the values and areasonable number of intervals. For example, consider a statistic containing thethree ordinate values 0.0, 0.001, and 1000.0. In order to treat the values distinctly, asample-distribution histogram would require an interval width that is the smallestdifference between consecutive values, or in this case 0.001. However, the highest

statistic “y” ordinate values(20 samples; abscissa

values not shown)

0.0 3.0

1.0 3.0

1.0 4.0

1.0 4.0

2.0 4.0

2.0 5.0

2.0 5.0

2.0 6.0

2.0 6.0

3.0 7.0

Each interval indicates the probability mass of the abscissa on its left edge. For example, this interval shows that the value 2 represents 25% of the ordi-nate values of the variable Y.

Calculation of Probability Mass Function

546-Datan


value, 1000.0, can only be encompassed with one million intervals in this case,causing the sample-distribution histogram to produce an extremely large statistic.

Conversely, PMFs may not provide significant insight into the characteristicsof statistics containing a very diverse set of ordinate values. This is due to the factthat each ordinate value is separately counted and that as a result, little can be saidabout which ordinate region(s) exhibit the highest density in terms of the statistic’spresence. In the extreme case, if each value in the original statistic is unique, thenthe resulting PMF will have a constant value of 1.0, providing almost no visuallyapparent information on the distribution of the values.

Datan.4.1.4 Histogram (Sample-Distribution)

The sample-distribution histogram of a statistic reflects the distribution of itsordinate values over evenly spaced intervals of the vertical axis. The vertical axis isdivided into N distinct intervals beginning at the lower bound and ending at theupper bound. By default, N is 100, but this value may vary according to a user-selected interval width. For each interval, the sample-distribution histogramoperation then creates and initializes a separate counter to represent the frequencywith which entries occur in that interval. Subsequently, the entire statistic istraversed and each entry analyzed; the counter whose interval contains the entry’sordinate value is incremented by one.

The statistic that results from this operation contains N entries correspondingto the N intervals; since these intervals divide the vertical axis of the originalstatistic, they appear on the horizontal axis of the new statistic, and the vertical axiscorresponds to the frequencies of occurrence held in the N counters. Note from thedescription of this computation, that abscissa values in the original statistic are notrelevant to the sample-distribution histogram. As an example, consider computinga histogram for the following set of entries:

abscissa “x” ordinate “y”

1.0 1.0

2.0 4.0

3.0 1.0

4.0 2.0

5.0 1.0

6.0 3.0

7.0 4.0

8.0 6.0

9.0 1.0

10.0 0.0

547-Datan


ata An

alysis

The ordinate values of this statistic range from 0.0 to 6.0 and are all integers.The default setting of 100 intervals would create far more intervals than there arevalues, yielding an essentially empty histogram. An interval size of 1.0 is moresensible. The counting process performed by the sample-distribution histogramtable is summarized by the table below. Notice that intervals are inclusive of theirlower bound, but not of their upper bound, so that they provide a completepartitioning of the vertical axis within its range, but do not overlap with each other.

The statistic resulting from the sample-distribution histogram operation iseasily obtained from the frequency table above. The “y” label now appears on thehorizontal axis and the vertical axis measures the frequency of occurrence. Theprofile of the histogram indicates the degree to which regions of the variable y’srange are occupied by the original statistic. In general, this is viewed as directlyrelated to the likelihood that a randomly selected sample of the variable y will fallwithin a particular interval. This is illustrated for the example data set by thefollowing graph.

Interval Frequency of Occurrence

0.0 ≤ y < 1.0 1

1.0 ≤ y < 2.0 4

2.0 ≤ y < 3.0 1

3.0 ≤ y < 4.0 1

4.0 ≤ y < 5.0 2

5.0 ≤ y < 6.0 0

6.0 ≤ y < 7.0 1

548-Datan


For more complex input statistics containing a richer set of ordinate values,sample-distribution histograms can be interpreted as a density profile, showingwhere the ordinate values are concentrated. The following graph illustrates thisinterpretation of the sample-distribution histogram.

Datan.4.1.5 Histogram (Time-Distribution)

Time-distribution histograms resemble sample-distribution histograms in thesense that they establish a profile for the ordinate value of a statistic. The resultingprofile shows how frequently the ordinate value of the statistic lies within specificranges. Therefore, this operation divides the vertical axis into intervals in the samemanner as the sample-distribution histogram. However, rather than use the numberof entries falling within each interval as the measure of frequency, a time-distribution histogram is based on the “time spent” by the statistic within theintervals. In other words, ordinate values are still the basis for the histogram, but

this interval indicates that there are 4 entries withy = 1.0

this interval indicates that there are zero entries withy = 5.0

Sample-Distribution Histogram for Y variable

Sample-Distribution Histogram Showing Density Profile of Input Statistic

these intervals indicate where ordinate values of input statistic are most concentrated

549-Datan


ata An

alysis

weighting of each entry is performed differently: sample-distribution histogramsweight each entry with a coefficient of 1.0; time-distribution histograms weighteach entry with the difference between its abscissa value and the abscissa value ofthe next entry.

Time-distribution histograms are computed in much the same way as sample-distribution histograms, except that for each interval, an accumulator is used tototal the abscissa interval widths. The following example illustrates thecomputation procedure for a specific set of data. Note that the ordinate values areidentical to those used in the example for the sample-distribution histogram in theprevious section; only the abscissas are changed. Since sample-distributionhistograms are not sensitive to abscissa values, the result for this statistic would beidentical to the one shown in the previous section. Thus, the result shown here canbe used to contrast the behavior of the two histogram methods.

The ordinate values of this statistic range from 0.0 to 6.0 and are all integers.The default setting of 100 intervals would create far more intervals than there arevalues, yielding an essentially empty histogram. An interval size of 1.0 is thereforemore sensible. The calculation performed by the time-distribution histogram tableis summarized by the table below. For each interval an accumulator variable ismaintained to compute the total abscissa span for which the statistic’s ordinate fallswithin the interval. Notice that intervals are inclusive of their lower bound, but notof their upper bound, so that they provide a complete partitioning of the verticalaxis within its range, but do not overlap with each other.

abscissa “x” ordinate “y”

1.0 1.0

1.5 4.0

3.0 1.0

3.25 2.0

4.0 1.0

5.25 3.0

7.0 4.0

7.75 6.0

8.0 1.0

8.5 0.0

9.0 end-of-statis-tic

550-Datan


Comparing Time-Distribution and Sample-Distribution Histograms

For certain input statistics, a time-distribution histogram yields results thathave a very similar profile to a sample-distribution histogram. In particular, forinput statistics that have regularly spaced entries in terms of abscissa values, suchas the example in the previous section, the shapes of the two histograms areidentical (note that the values shown in the histogram are not necessarily identicalsince abscissa values may be spaced by a value other than 1.0). However, for caseswhere abscissa values of entries are not regularly spaced, results can varysignificantly between the two methods.

In general, a time-distribution histogram is most appropriate for statistics thatmeasure a quantity representing state information. For these types of quantities, theentries of the statistic are merely samples of a statistic that is defined at all times.Examples of such quantities include the size of a queue, and the average utilizationof a channel, both measured over time. In the case of queue size, the time-

Interval abscissa-span accumulator

0.0 ≤ y < 1.0 0.5

1.0 ≤ y < 2.0 2.5

2.0 ≤ y < 3.0 0.75

3.0 ≤ y < 4.0 1.75

4.0 ≤ y < 5.0 2.25

5.0 ≤ y < 6.0 0.0

6.0 ≤ y < 7.0 0.25

Time-Distribution Histogram for Y Variable

551-Datan


ata An

alysis

distribution histogram represents the amount of time that the queue size actuallyholds a particular value, or falls within a range of values; similarly, for averagechannel utilization, a time-distribution histogram indicates the total durationduring which this statistic falls in the selected intervals.

In contrast, sample-distribution histograms are more appropriate forinstantaneously measured quantities such as delays associated with receivedmessages, and error rates in transmitted packets. These statistics tend tocharacterize the occurrence of particular events, whose frequency of occurrencecan be counted. Thus, the sample-distribution histogram is useful to indicate howmany messages experienced a particular level of delay, regardless of when thesedelays were measured. However, a sample-distribution histogram applied to aqueue size statistic actually only indicates how many times the queue size changedin order to then arrive at a particular new size; this provides no definite informationabout how often one might expect to find the queue at a particular size.

Datan.4.1.6 Scatter Plot (Time-Join)

The time-join operation of the Analysis Tool accepts two statistics and/orvectors as inputs in order to create a new statistic that shows the relationship (orlack thereof) between them. The ordinate variable of the first statistic that isselected becomes the abscissa variable of the new statistic; the ordinate variable ofthe second statistic is mapped into the ordinate variable of the new statistic. Theabscissa variable of both statistics is assumed (but not verified) to be the same andbecomes the implicit parameter used to relate the two input statistics.

This operation uses a simple mechanism to create the entries of the newstatistic. For each entry in the first statistic, an entry with equal abscissa is searchedfor in the second statistic; if no exact match is found, then the nearest entry with alesser abscissa value is selected. The ordinate values of each of this pair of entriesis used to form an entry for the new statistic (i.e., the ordinate of the first entrybecomes the abscissa of the new entry, and the ordinate of the second entrybecomes the ordinate of the new entry).

The scatter plot statistic that results from this operation shows a possiblecorrelation between the two input statistics based on their abscissa variables. Ingeneral, the resulting statistic is viewed using the discrete draw style and appearsas a cloud of points. If the cloud appears relatively shapeless with many ordinatesfor each abscissa, and vice versa, then it can be assumed that there is no strongcorrelation between the two input statistics. Otherwise, the scatter plot statisticprovides a mapping indicating either a dependency between the ordinate variablesof the two input statistics, or a correlated dependency on one or more other factors.The following scatter plots provide examples of the operation’s result forcorrelated and uncorrelated pairs of input statistics.

552-Datan


Datan.4.2 Filter Operations

In addition to histogram and probability distribution functions, the AnalysisTool provides the ability to transform and combine statistic data with a variety ofmathematical operators, including arithmetic, calculus, and statistical functions.Statistics and/or vectors may be fed through computational block diagrams calledfilters in order to generate and plot new statistics. Filters are developed using theFilter Editor. You apply the filter to a statistic in the Analysis Tool by selecting thefilter from the filter pull-down menu of the View Results dialog box.

Datan.4.2.1 Filter Models

A filter model is a specification for a computation that operates on one or morestatistics in order to create exactly one new statistic. Abstractly, a filter can bethought of as a single system that has a defined set of inputs and an algorithm forcomputing its output. In addition to inputs and outputs, a filter also has associatedparameters that may factor into the execution of its algorithm. Inputs andparameters are given names when a filter model is created in the Filter Editor.

Filter Model Structure

Internally, a filter has a hierarchical structure, meaning that it can be composedof other, subordinate filters. These filters can also be composed of othersubordinate filters, and so on. A filter that is composed of other filters is referred to

Examples of Scatter Plots

scattered variables are uncorrelated scattered variables are correlated

input 0

input 1

input n

output

parameter 0parameter 1parameter n

Filter

Abstract Model of a Filter

553-Datan


ata An

alysis

as a macro filter. Ultimately, at the lowest levels, all macro filter models mustconsist of predefined filters provided by OPNET. The available predefined filtersare discussed later in this chapter.

In order to form macro filters, existing filter models are used to createsubordinate filters and are attached using filter connections. A filter connection isdefined between the output of a filter and the input of another filter in order tospecify the flow of statistic data. Each filter has only one output, but this output cansupport outgoing connections to any number of destination filters. However, eachfilter input can be the recipient of at most one connection.

Exactly one output of one subordinate filter must be left unattached whencompiling a filter model. This output becomes the output of the encompassingmacro filter. Thus, the subordinate filter with no connections attached to its outputis the final subordinate filter that is executed last; the output data is then madeavailable to the encompassing filter or to the Analysis Tool in order to create a newpanel.

Filter connections may not be used to create feedback paths within a filtermodel. Feedback paths are individual connections, or sequences of connectionsthat would create a flow of data such that a filter input would receive data morethan once in a single filter execution. Feedback conditions are detected during thecompilation process in the Filter Editor (refer to Filter Execution). Some examplesof feedback paths are shown below.

Hierarchical Structure of Filters

554-Datan


In addition to preventing feedback paths, the Filter Editor also disallowscircular inclusion of filter models within macro filters. In other words, a filtermodel may not appear at any level of depth within its own definition. Thus if filtermodel A incorporates a subordinate filter with model B, and model B incorporatesmodel C, then it would be illegal for models B or C to incorporate model A.

Promotion of Filter Inputs and Parameters

Filter parameters and inputs can be “passed up” from subordinate filters toencompassing macro filters via a mechanism called promotion. Promotion for filtermodels is similar to promotion for modeling attributes in the Process, Node, andNetwork domains. For a discussion of the promotion concept as it applies to thesedomains, refer to the Modeling Framework, chapter of Modeling Concepts. Thepromotion mechanism allows a property of a lower level object or model tobecome the property of an encompassing, higher level object.

In the case of a filter, a numeric parameter can be set to the promoted status inorder to automatically become a parameter of the macro filter. That is, thepromoted filter parameter will appear in the parameter menu of the macro filterwhen the latter is deployed as part of a higher-level macro filter, or when it isexecuted in the Analysis Tool.

The inputs of a subordinate filter can be promoted simply by leaving themunconnected. In the same manner as promoted parameters, promoted inputsautomatically become inputs of the higher-level macro filter. The promotion of aninput is apparent in that it appears in the connection input menu of the macro filterwhen this macro filter is used as a component in a higher-level macro filter model.In addition, promoted inputs are referred to in the Analysis Tool when the macrofilter is executed, in order to prompt for the selection of an input statistic.

Filter Execution

A filter can only be executed to operate on statistics in the Analysis Tool if thefilter model has been compiled at some earlier time in the Filter Editor. Successfulcompilation is also required for a macro filter to be usable as a component in a stillhigher-level macro filter. When a filter model is compiled, all promoted inputs andparameters must be given names. These names serve to identify the inputs and

Examples of Feedback in Filter Models

555-Datan


ata An

alysis

parameters when the filter model is used, both for execution in the Analysis Tool,and for deployment in other filter models.

When a macro filter is executed, all unconnected filter inputs must be providedwith either a statistic or a vector from an output vector file. This data, together withassignments for promoted parameters, constitutes the input of the filter’scomputation and is responsible for directly or indirectly triggering allcomputations of subordinate filters. The filter execution method follows the data-flow paradigm, meaning that each subordinate filter may only be executed once allof its connections have received data, either directly from the encompassing filter’sinputs, or from another subordinate filter. Once a subordinate filter is executed, thenew statistic that results from its computation is transferred from its output to eachof the connected filter inputs. This may in turn trigger the destination subordinatefilters to be executed, provided that their other inputs have also received data.

Execution completes when all subordinate filters have executed. The final filterto executed must have no connection attached to its output. The output that itproduces is instead made available to the Analysis Tool to incorporate into a newpanel.

Datan.4.2.2 Predefined Filters

As mentioned earlier, all macro filter models are based at the lowest level onpredefined filters that actually perform computations to generate new statistic data.The top level macro filter and intermediate levels of macro filters merely serve tostructure the user’s approach to designing a filter model that performs a particularcomputation.

The predefined filters that are provided can perform a variety of computationsincluding arithmetic, basic calculus and statistical operations. At this time, there isno open interface for defining new filter models other than macro filters; as aresult, fundamentally new types of computations cannot be performed with thefilter facility. In the ensuing discussion, the terms unary and binary are used torefer to filters that require one and two inputs, respectively. Also, a filter can beassumed to have no parameters, unless these parameters are explicitly mentioned.

The descriptions of some of the filters include equations and tables in order toexplain the computations that are performed. These tables and equations make useof the following notations:

Tα x( ) ordinate value of statistic Τα when abscissa is x

yα n[ ] ordinate value of entry with index n of statistic Τα

xα n[ ] abscissa value of entry with index n of statisticΤα. Note: entry indices begin at zero.

556-Datan


Datan.4.2.2.1 Arithmetic Filters

Adder Filter

The adder filter is a binary filter used to combine two input statistics T0 and T1in order to generate a third statistic, Tout, which represents their sum.

If the two input statistics T0 and T1 have exactly the same number of entries andthese entries are aligned with respect to their abscissa values, then Tout can becomputed simply by adding ordinate values for entries of equal abscissa, asfollows:

However, if the input statistics are not initially perfectly aligned with respect toeach other, then an abscissa alignment mechanism is automatically applied by thisfilter before adding is performed. Alignment consists of two steps:

1) T0 and T1 are truncated to the same minimum and maximum abscissavalues, by removing entries as necessary. This essentially correspondsto finding the intersection of the two statistics on the horizontal axis.

2) The truncated statistics resulting from step 1 are augmented to ensurethat each statistic contains entries at the same abscissas as the other; thisinvolves inserting points into each of the statistics. For example, if thetwo statistics had no abscissa values in common, once augmented theywould contain a number of entries equal to the sum of their originallengths. When an entry is inserted, its ordinate value is taken to be theordinate value of the previous entry in the same statistic (i.e., it is as-sumed that the statistic’s ordinate value remains constant until the nextoriginal entry).

Once alignment has completed, the two resulting statistics can be addeddirectly, entry by entry. When adding entries, the rules presented in the followingtable are applied (note: because T0 and T1 are treated identically, the table’scorresponding column headings can be inverted to address symmetric cases).

Synopsis of Adder Filter

input statistic(s) parameters output

two interchangeable inputs none sum of two input statistics

Tout x( ) T0 x( ) T1 x( )+=

557-Datan


ata An

alysis

Constant Shift Filter

The constant_shift filter is a unary filter used to operate on a single inputstatistic T0 in order to generate a second statistic, Tout, which is a translation of T0by an amount ∆ along the direction of the vertical axis. The shift quantity ∆ is areal number specified as a parameter of the filter.

The Tout statistic has the same number of entries as T0 and the two statistics arealigned with each other with respect to the entries’ abscissa values. Only theordinate values differ by the constant ∆, as follows:

When computing the entries of Tout, the rules summarized in the followingtable are applied (note: the content of the T0 and ∆ columns may be interchanged toaddress symmetric cases).

Entry Calculation Rules for Adder Filter(a, b = real constants, * = any value)

T0 T1 Tout

a b a + b

a +infinity +infinity

a -infinity -infinity

* undefined undefined

+infinity +infinity +infinity

-infinity -infinity -infinity

+infinity -infinity undefined

Synopsis of Constant Shift Filter


single input shift: translation distance along vertical axis

translated version of input statistic

yout n[ ] y0 n[ ] ∆+=

558-Datan


Gain Filter

The gain filter is a unary filter used to operate on a single input statistic T0 inorder to generate a second statistic, Tout, which is a scaled version of T0 by a factorK along the direction of the vertical axis. The scaling factor, or gain K, is a realnumber specified as a parameter of the filter.

The Tout statistic has the same number of entries as T0 and the two statistics arealigned with each other with respect to the entries’ abscissa values. Only theordinate values differ by the factor K, as follows:

When computing the entries of Tout, the rules summarized in the followingtable are applied (note: the content of the T0 and K columns may be interchanged toaddress symmetric cases).

Entry Calculation Rules for Constant Shift Filter(a, b = real constant, * = any value)

T0 ∆ Tout

a b a + b

+infinity a +infinity

-infinity a -infinity

undefined * undefined


-infinity -infinity -infinity

-infinity +infinity undefined

Synopsis of Gain Filter


single input gain: scaling factor along vertical axis

scaled version of input statistic

yout n[ ] y0 n[ ] K⋅=

559-Datan


ata An

alysis

Multiplier Filter

The multiplier filter is a binary filter used to combine two input statistics T0and T1 in order to generate a third statistic, Tout, which represents their product byreal-number multiplication.

If the two input statistics T0 and T have the same number of entries and theseentries are aligned with respect to their abscissa values, then Tout can be computedsimply by multiplying ordinate values for entries of equal abscissa, as follows:

However, if the input statistics are not initially perfectly aligned with respect toeach other, then an abscissa alignment mechanism is automatically applied by thisfilter before multiplication is performed. This alignment process is identical to thatperformed for the adder filter; a complete explanation of this mechanism appearsin the corresponding section.

Entry Calculation Rules for Gain Filter(a, b = real constant, * = any value)

T0 K Tout

a b a · b

+infinity a > 0 +infinity

+infinity a < 0 -infinity

±infinity a = 0 undefined

-infinity a > 0 -infinity

-infinity a < 0 +infinity

undefined * undefined


-infinity -infinity +infinity

-infinity +infinity -infinity

Synopsis of Multiplier Filter


two interchangeable inputs none product of two input statis-tics by multiplication

Tout x( ) T0 x( ) T1 x( )⋅=

560-Datan


Once alignment has completed, the two resulting statistics can be multiplieddirectly, entry by entry. When multiplying entries the rules presented in thefoll

Documents

Chapter 8 Data Analysis - Suraj @ LUMSsuraj.lums.edu.pk/~te/simandmod/Opnet/08 data analysis.pdfDatan.2.2 Data Sources Analysis panels may be created by a number of different operations