Trace generation: just some examples

Paraver specifies a trace format and some mechanisms how the records and the values encoded will be processed in the visualization. Every record specifies the object to which it refers (indicating application task and thread) and the absolute time at which it happens. For each type of record, some additional fields can be encoded as desired by the user. These fields are:
-
State records include an integer value that is usually referred to as the state.
-
Event records include a user event type and a user event value.
-
Relation/Communication records include a communication tag and a communication size. Even if they have this name because it is usual to encode the tag and size of a MPI communication, Paraver does not rely on these semantics.
The flexibility of this approach makes it possible to use Paraver for many types of analyses. It is quite easy to implement instrumentation tools for many systems and purposes. The main issue in such instrumentation is how to encode the information in the fields available in the record formats. Special emphasis should be put in a proper selection of what to encode as state and what (and how) to encode as events. It is our experience that a clean design of these encoding concepts results in studies being later carried out with Paraver that were not foreseen when the analysis was planned.
In this section we briefly describe the encoding criteria of several instrumentation/trace generation tools that we distribute (see the Software Distribution section). For more detailed information about these tools refer to the Tool Documentation section.
All the tools described on this page generate also a paraver configuration file with simbolic information that includes for instance the function names and that facilitates to relate the trace information with the application source code. Most of the tools allow to include explicit instrumentation (selected points, variables, hardware counters) and to stop/resume the tracing trough calls to the tool library.
The programs must be executed on dedicated resources to avoid the large perturbations that OS scheduling may cause in the presence multiple concurrent applications.
OpenMP
The OMPtrace tool instruments parallel codes that use the OpenMP programming model. This tool generates a Paraver trace file where the basic activity in an OpenMP program is recorded. Paraver state and flag records are emmited to reflect the evolution of the application behaviour. With Paraver the user can visualize the execution at thread, task or application level.
![]() |
![]() |
The major encoding choices are:
-
States: will record whether the thread is Idle (waiting for work), Scheduling (generating work/notifying termination), Running (executing application code) or doing I/O.
-
Events:
-
Entry and exit to/from a parallel region tracing bot the parallel loop/sections and the parallel directives.
-
Entry and exit to a work sharing construct (with a different value for do/sections and single section).
-
For each lock, an event type is reserved, and different values are emitted when willing to get, owning or releasing the lock.
-
Value of the hardware counters selected.
-
In IBM platforms, the entry and exit to/from the user routines that include OpenMP directives. Additional routines can be traced using an environment variable.
-
Besides getting a qualitative graphical perception of program behavior, this encoding makes it possible to visualize and measure the load balance, the profile of parallelism achieved, the percentage of time inside a mutual exclusion, the conflicts in getting locks and percentage of sequential parts among others.
OMPtrace is currently available on SGI-IRIX and IBM SP machines.
MPI
The major encoding choices are:
-
States: will record whether the thread is Running, Waiting for Messages or doing I/O.
-
Communication: The tag and size are set according to those in the calls. Physical communication is assumed to be identical to logical communication as it is not possible through the MPI instrumentation to find out when the actual data transfer takes place.
-
Events: are used to tag the beginning and ending of MPI operations, such as Barriers, Broadcast, AlltoAll, and all kind of Send - Receive calls.
This instrumentation module provides the typical message passing visualization functionalities.
MPItrace is currently available on SGI-IRIX, IBM-SP and Linux platforms.
OpenMP+MPI
|
The OMPItrace tool instruments parallel codes based on the OpenMP programming model and/or applications using the message passing (MPI) programming model. This tool generates a Paraver trace file where the basic activity of the program is recorded.
The major encoding choices are: |
![]() |
-
States: will record whether the thread is Idle (waiting for work), Running (application code), Scheduling (generating work/notifying termination), Waiting for Messages or doing I/O.
-
Communication: The tag and size are set according to those in the calls. Physical communication is assumed to be identical to logical communication as it is not possible through the MPI instrumentation interface to find out when the actual data transfer takes place.
-
Events are used to tag the basic program activity. For example:
-
to mark the entry to a parallel region.
-
to mark the entry to a work sharing construct.
-
to read the value of the hardware counters.
-
to tag the beginning and ending of MPI operations, such as Barriers, Broadcast, AlltoAll, and all kind of Send - Receive calls.
-
OMPItrace is currently available on SGI-IRIX and IBM-SP platforms.
The analysis and visualization of Java Applications is based on two specific tools: JIS (Java Instrumentation Suite) and JACIT (Java Automatic Code I nterposition Tool). They are complementary and can be used to get very detailed traces of the execution of Java bytecodes without recompilation. The whole environement is especially intended to perform Performance Analysis of J2EE Application Servers, and has been succesfully tested on WebSphere 4 .x and on Jboss 3.x.
JIS is available for Linux 2.4 and 2.5/2.6 platforms and JACIT is a cross-platform Java tool. The Java Instrumentation Suite (JIS) gets detailed information from all the levels involved in the execution of J2EE applications: System, JVM proces s, Middleware (i.e. J2EE appserver) and User Application. This information is automatically generated as a Paraver tracefile. All the levels are corr elated to offer a global view of the system execution. To summarize, the information collected from each level of JIS is described below:
- System level: Thread scheduling information (extracted from inside the kernel scheduler) and detailed information of the system calls performed by the JVM process
- JVM level: Information from the Java threads is offered (such as their names) and put in relation with system threads. JVM monitors and raw monitors are also instrumented on this level. All information is extraced through the JVMPI (Java Virtual Machine Profiler Interface).
- Middleware level: Information from the middleware architecture components status is offered by this level, shown in the generated tracefile as Paraver events on boundaries of software components.
- Application level: User generated events can be produced from the Java application bytecode, that later will be displayed as Paraver events. A native C library is provided with JIS to allow Java applications to generate user level events on the Paraver trace produced by JIS, using the Java Native Interface (JNI).
The Java Automatic Code Interposition Tool (JACIT) is a cross-platform java tool designed to make it easy the task of inserting probes on Java codes. With a user-friendly graphical interface, JACIT allows the insertion of pieces of Java code (inclunding JNI calls to C or C++ libraries) to be execu ted before or after any of the methods of a java existing bytecode without need of recompilation. As a possible use, interposed code can be composed of calls to a native library interface to JIS.
![]() |
![]() |
Performance counters
The infoPerfex tool relies on the SGI perfex tool and the hardware performance counters interface to generate a trace containing the values of the performance counters sampled at periodic intervals. infoPerfex can instrument running applications without having the source code.
The trace only contains events for a single thread in a single application. Several types of events may appear in a trace: system calls, context switches, bytes read, bytes written... and the two selected performance counters (cache misses, floating point operations, TLB misses...). For all of them the value field represents the actual count in the previous sampling interval.
The profile of the above type of events can be displayed with Paraver. This profile can provide useful information about periodic patterns, phases in the program... This is quite more useful than only having the global total number.

System activity
The SCPUs tool instruments the operating system scheduling. It uses the /proc interface to obtain information about the existing processes. It can generate a Paraver trace file where the execution and scheduling of the processes is recorded.
SCPUs uses all the levels of Paraver process model (thread, task and application ) and it also records information about the activity of the different CPUs.
The trace contains two types of records:
-
States: encode the application. The CPU view shows the application that is active on each processor. Parallel applications use the same state for all their threads/processes, so the whole application could be painted using the same color.
-
Communication: represent the migration of one process between two processors. It encodes as tag of the message the pair application task to which the process belongs. The size field encodes the thread/process number within the application.
With this encoding it is possible to measure the total number of process migrations, to visualize the migrations suffered by one application, to compute the total system utilization or to display the profile of processors allocated to one application.
![]() |
![]() |
SCPUS is currently available on SGI-IRIX machines.
The NanosCompiler allows the instrumentation of parallel applications. The instrumentation is based on the generation of calls to an instrumentation library that gathers information from the hardware counters of the machine, records the execution status of each thread and inserts events related to the OpenMP directives.
The major encoding choices are:
Dimemas
The Dimemas simulator reconstructs the time behavior of a parallel application on a machine modelled by a set of performance parameters. Thus, performance experiments can be done easily. The supported target architecture classes include networks of workstations, single and clustered SMPs, distributed memory parallel computers, and even heterogeneous systems.
For more information on Dimemas click here.
UTE translator
ute2paraver is a filter that translates UTE traces to the Paraver format. UTE is a tracing tool for IBM SP systems that obtains a fair amount of information about the activity of SP systems running MPI applications (or MPI+OpenMP). In addition to process activity, UTE records scheduling information.
![]() |
![]() |
AIX Trace translator
aix2prv is a filter that translates traceso obtained with the IBM AIX trace facility to the Paraver format. The AIX trace facility allows to collect very low level information on the processes scheduling, system calls... for all the processes running on a SP node.
![]() |
With this translator now we are able to use all the flexibility and potential of Paraver to analyze the low level detail information captured by the AIX trace facility. |
MLP instrumentation
We have developped our own version of the MLP library and modified OMPItrace to intercept the MLP library calls including information of the hardware counters related to memory accesses. We are currently stuing the kind of information provided by these counters and how it can be used to analyze effect of memory placement on the performance of MLP programs.
Tracedrive preprocessing





















