This Document Is a Compilation of the Reference Material Needed by a Programmer to Effectively

PAPI Programmer’s Reference Version 2.3


This document is a compilation of the reference material needed by a programmer to effectively use PAPI. It is identical to the material found in the PAPI man pages, but organized in a way that may be more portable and accessible. The information here is extensively hyperlinked, which makes it useful in electronic formats, but less useful in hardcopy format.

For other PAPI documentation, see also:
the PAPI User’s Guide
and
the PAPI Software Specification.

NAME

PAPI - Performance Application Programming Interface

SYNOPSIS

The PAPI Performance Application Programming Interface provides machine and operating system independent access to hardware performance counters found on most modern processors. Any of over 100 preset events can be counted through either a simple high level programming interface or a more complete low level interface from either C or Fortran. A list of the function calls in these interfaces is given below, with references to other pages for more complete details. For general information on the Fortran interface see: PAPIF

PAPI Presets

An extensive list of predefined events is implemented on all systems where they can be supported. For a list of these events, see: PAPI_presets

High Level Functions

A simple interface for single thread applications. Fully supported on both C and Fortran. See individual functions for details on usage.

PAPI_num_counters - get the number of hardware counters available on the system

PAPI_flops - simplified call to get Mflops/s, real and processor time

PAPI_accum_counters - add current counts to array and reset counters

PAPI_read_counters - copy current counts to array and reset counters

PAPI_start_counters - start counting hardware events

PAPI_stop_counters - stop counters and return current counts

Note that when using the high-level interface the use of PAPI_library_init to initialize the library is optional. If, however, explicit initialization is not made, either of PAPI_flops or PAPI_num_counters must be called before any other call to a PAPI function.

Low Level Functions

Advanced interface for all applications and performance tools. Some functions may be implemented only for C or Fortran. See individual functions for details on usage and support.

PAPI_accum - accumulate and reset hardware events from an event set

PAPI_add_event - add single PAPI preset or native hardware event to an event set

PAPI_add_events - add array of PAPI preset or native hardware events to an event set

PAPI_add_pevent - reserved for future use

PAPI_cleanup_eventset - remove all PAPI events from an event set

PAPI_create_eventset - create a new empty PAPI event set

PAPI_describe_event - return description of event given the name or event code

PAPI_destroy_eventset - deallocates memory associated with an empty PAPI event set

PAPI_event_code_to_name - translate an integer PAPI event code into an ASCII PAPI preset name

PAPI_event_name_to_code - translate an ASCII PAPI preset name into an integer PAPI event code

PAPI_get_executable_info - get the executable’s address space information

PAPI_get_hardware_info - get information about the system hardware

PAPI_get_opt - query the option settings of the PAPI library or a specific event set

PAPIF_get_clockrate - get the processor clockrate in MHz. Fortran only.

PAPIF_get_domain - get the domain of the specified eventset. Fortran only.

PAPIF_get_granularity - get the granularity of the specified eventset. Fortran only.

PAPIF_get_preload - get the ’LD_PRELOAD’ environment equivalent. Fortran only.

PAPI_get_overflow_address - return the address at which overflow occurred for profiling

PAPI_get_real_cyc - return the total number of cycles since some arbitrary starting point

PAPI_get_real_usec - return the total number of microseconds since some arbitrary starting point

PAPI_get_virt_cyc - return the process cycles since some arbitrary starting point

PAPI_get_virt_usec - return the process microseconds since some arbitrary starting point

PAPI_label_event - return a short label for an event given the event code

PAPI_library_init - initialize the PAPI library

PAPI_list_events - list the events defined in an event set

PAPI_lock - lock the PAPI internal mutex variable

PAPI_multiplex_init - initialize multiplex support in the PAPI library

PAPI_num_hwctrs - return the number of physical hardware counters available

PAPI_overflow - set up an event set to begin registering overflows

PAPI_perror - return a copy of the error message corresponding to a specified error code

PAPI_profil - generate PC histogram data where hardware counter overflow occurs

PAPI_query_all_events_verbose - request detailed information on all PAPI events

PAPI_query_event - query if a PAPI event exists

PAPI_query_event_verbose - request detailed information on a PAPI event

PAPI_read - read hardware events from an event set with no reset

PAPI_rem_event - remove a hardware event from a PAPI event set

PAPI_rem_events - remove an array of hardware events from a PAPI event set

PAPI_rem_pevent - reserved for future use

PAPI_reset - reset the hardware event counts in an EventSet

PAPI_restore - Restore the saved state of the PAPI library

PAPI_save - Restore the saved state of the PAPI library

PAPI_set_debug - set the current debug level for PAPI

PAPI_set_domain - set the default execution domain for new event sets

PAPIF_set_event_domain - set the execution domain for a specific event set. Fortran only.

PAPI_set_granularity - set the default granularity for new event sets

PAPI_set_opt - change the option settings of the PAPI library or a specific event set

PAPI_set_multiplex - convert a standard event set to a multiplexed event set

PAPI_shutdown - finish using PAPI and free all related resources

PAPI_sprofil - generate PC histogram data where hardware counter overflow occurs

PAPI_start - start counting hardware events in an event set

PAPI_state - return the counting state of an event set

PAPI_stop - stop counting hardware events in an event set and return current events

PAPI_strerror - return a pointer to the error message corresponding to a specified error code

PAPI_thread_id - get the thread identifier of the current thread

PAPI_thread_init - initialize thread support in the PAPI library

PAPI_unlock - unlock the PAPI internal mutex variable

PAPI_write - write counter values into counters

AUTHORS

Philip J. Mucci <>
Kevin London <>
Dan Terpstra <>

SEE ALSO

The PAPI Web Site: http://icl.cs.utk.edu/projects/papi

NAME

PAPIF - Performance Application Programming Interface (Fortran)

SYNOPSIS

#include fpapi.h

call PAPIF_function_name(arg1,arg2,...,check)

Fortran Calling Interface

The PAPI library comes with a specific Fortran library interface. The Fortran interface covers the complete library with a few minor exceptions. Functions returning C pointers to structures, such as PAPI_get_opt and PAPI_get_executable_info , are either not implemented in the Fortran interface, or implemented with different calling semantics.

Semantics for specific functions in the Fortran interface are documented on the equivalent C man page. For example, the semantics and functionality of PAPIF_accum are covered in the PAPI_accum man page.

For most architectures the following relation holds between the pseudo-types listed and Fortran variable types.

Pseuodo-type / Fortran type / Description
C_INT / INTEGER / Default Integer type
C_FLOAT / REAL / Default Real type
C_LONG_LONG / INTEGER*8 / Extended size integer
C_STRING / CHARACTER*(PAPI_MAX_STR_LEN) / Fortran string
C_INT FUNCTION / EXTERNAL INTEGER FUNCTION / Fortran function returning integer result
C_INT(*) / Array of corresponding type / C_TYPE(*) refers to an array of the corresponding Fortan type. The length of the array needed is context dependent. It may be e.g. PAPI_MAX_HWCTRS or PAPIF_num_counters.
C_FLOAT(*)
C_LONG_LONG(*)

Array arguments must be of sufficent size to hold the input/output from/to the subroutine for predictable behavior. The array length is indicated either by the accompanying argument or by internal PAPI definitions. For details on this see the corresponding C routine.

Subroutines accepting C_STRING as an argument are on most implementations capable of reading the character string length as provided by Fortran. In these implementations the string is truncated or space padded as necessary. For other implementations the length of the character array is assumed to be of sufficient size. No character string longer than PAPI_MAX_STR_LEN is returned by the PAPIF interface.

DIAGNOSTICS

The return code of the corresponding C routine is returned in the argument check in the Fortran interface.

SEE ALSO

The PAPI Web Site: http://icl.cs.utk.edu/projects/papi
The PAPI Interface: PAPI

NAME

PAPI_presets - PAPI predefined named events

SYNOPSIS

#include <papi.h>

DESCRIPTION

The PAPI library names a number of predefined events. This set is a collection of events typically found in many CPUs that provide performance counters. A PAPI preset event name is mapped onto one or more of the countable events on each hardware platform. On any particular platform, the preset can either be directly available as a single counter, derived using a combination of counters or unavailable.

The PAPI preset events can be broken loosely into several categories, as shown in the table below:

Conditional Branching:

Name / Description
Conditional Branching
PAPI_BR_CN / Conditional branch instructions
PAPI_BR_INS / Branch instructions
PAPI_BR_MSP / Conditional branch instructions mispredicted
PAPI_BR_NTK / Conditional branch instructions not taken
PAPI_BR_PRC / Conditional branch instructions correctly predicted
PAPI_BR_TKN / Conditional branch instructions taken
PAPI_BR_UCN / Unconditional branch instructions
PAPI_BRU_IDL / Cycles branch units are idle
PAPI_BTAC_M / Branch target address cache misses
Cache Requests:
PAPI_CA_CLN / Requests for exclusive access to clean cache line
PAPI_CA_INV / Requests for cache line invalidation
PAPI_CA_ITV / Requests for cache line intervention
PAPI_CA_SHR / Requests for exclusive access to shared cache line
PAPI_CA_SNP / Requests for a snoop
Conditional Store:
PAPI_CSR_FAL / Failed store conditional instructions
PAPI_CSR_SUC / Successful store conditional instructions
PAPI_CSR_TOT / Total store conditional instructions
Floating Point Operations:
PAPI_FAD_INS / Floating point add instructions
PAPI_FDV_INS / Floating point divide instructions
PAPI_FLOPS / Floating point instructions per second
PAPI_FMA_INS / FMA instructions completed
PAPI_FML_INS / Floating point multiply instructions
PAPI_FNV_INS / Floating point inverse instructions
PAPI_FP_INS / Floating point instructions
PAPI_FP_STAL / Cycles the FP unit
PAPI_FPU_IDL / Cycles floating point units are idle
PAPI_FSQ_INS / Floating point square root instructions
Instruction Counting:
PAPI_FUL_CCY / Cycles with maximum instructions completed
PAPI_FUL_ICY / Cycles with maximum instruction issue
PAPI_FXU_IDL / Cycles integer units are idle
PAPI_HW_INT / Hardware interrupts
PAPI_INT_INS / Integer instructions
PAPI_IPS / Instructions per second
PAPI_TOT_CYC / Total cycles
PAPI_TOT_IIS / Instructions issued
PAPI_TOT_INS / Instructions completed
PAPI_VEC_INS / Vector/SIMD instructions
Cache Access:
PAPI_L1_DCA / L1 data cache accesses
PAPI_L1_DCH / L1 data cache hits
PAPI_L1_DCM / Level 1 data cache misses
PAPI_L1_DCR / L1 data cache reads
PAPI_L1_DCW / L1 data cache writes
PAPI_L1_ICA / L1 instruction cache accesses
PAPI_L1_ICH / L1 instruction cache hits
PAPI_L1_ICM / Level 1 instruction cache misses
PAPI_L1_ICR / L1 instruction cache reads
PAPI_L1_ICW / L1 instruction cache writes
PAPI_L1_LDM / Level 1 load misses
PAPI_L1_STM / Level 1 store misses
PAPI_L1_TCA / L1 total cache accesses
PAPI_L1_TCH / L1 total cache hits
PAPI_L1_TCM / Level 1 cache misses
PAPI_L1_TCR / L1 total cache reads
PAPI_L1_TCW / L1 total cache writes
PAPI_L2_DCA / L2 data cache accesses
PAPI_L2_DCH / L2 data cache hits
PAPI_L2_DCM / Level 2 data cache misses
PAPI_L2_DCR / L2 data cache reads
PAPI_L2_DCW / L2 data cache writes
PAPI_L2_ICA / L2 instruction cache accesses
PAPI_L2_ICH / L2 instruction cache hits
PAPI_L2_ICM / Level 2 instruction cache misses
PAPI_L2_ICR / L2 instruction cache reads
PAPI_L2_ICW / L2 instruction cache writes
PAPI_L2_LDM / Level 2 load misses
PAPI_L2_STM / Level 2 store misses
PAPI_L2_TCA / L2 total cache accesses
PAPI_L2_TCH / L2 total cache hits
PAPI_L2_TCM / Level 2 cache misses
PAPI_L2_TCR / L2 total cache reads
PAPI_L2_TCW / L2 total cache writes
PAPI_L3_DCA / L3 data cache accesses
PAPI_L3_DCH / Level 3 Data Cache Hits
PAPI_L3_DCM / Level 3 data cache misses
PAPI_L3_DCR / L3 data cache reads
PAPI_L3_DCW / L3 data cache writes
PAPI_L3_ICA / L3 instruction cache accesses
PAPI_L3_ICH / L3 instruction cache hits
PAPI_L3_ICM / Level 3 instruction cache misses
PAPI_L3_ICR / L3 instruction cache reads
PAPI_L3_ICW / L3 instruction cache writes
PAPI_L3_LDM / Level 3 load misses
PAPI_L3_STM / Level 3 store misses
PAPI_L3_TCA / L3 total cache accesses
PAPI_L3_TCH / L3 total cache hits
PAPI_L3_TCM / Level 3 cache misses
PAPI_L3_TCR / L3 total cache reads
PAPI_L3_TCW / L3 total cache writes
Data Access:
PAPI_LD_INS / Load instructions
PAPI_LST_INS / Load/store instructions completed
PAPI_LSU_IDL / Cycles load/store units are idle
PAPI_MEM_RCY / Cycles Stalled Waiting for memory Reads
PAPI_MEM_SCY / Cycles Stalled Waiting for memory accesses
PAPI_MEM_WCY / Cycles Stalled Waiting for memory writes
PAPI_PRF_DM / Data prefetch cache misses
PAPI_RES_STL / Cycles stalled on any resource
PAPI_SR_INS / Store instructions
PAPI_STL_CCY / Cycles with no instructions completed
PAPI_STL_ICY / Cycles with no instruction issue
PAPI_SYC_INS / Synchronization instructions completed
TLB Operations:
PAPI_TLB_DM / Data translation lookaside buffer misses
PAPI_TLB_IM / Instruction translation lookaside buffer misses
PAPI_TLB_SD / Translation lookaside buffer shootdowns
PAPI_TLB_TL / Total translation lookaside buffer misses

AUTHOR

Nils Smeds <>

BUGS

The exact semantics of an event counter are platform dependent. PAPI preset names are mapped onto available events in a way so as to count as similar types of events as possible on different platforms. Due to hardware implementation differences it is not necessarily possible to directly compare the counts of a particular PAPI event obtained on different hardware platforms.

SEE ALSO

PAPI, PAPI_query_event The PAPI Web Site: http://icl.cs.utk.edu/projects/papi

NAME

PAPI_read, PAPI_accum - read hardware events, accumulate and reset hardware events from an event set

SYNOPSIS

C Interface

#include <papi.h>

int PAPI_read(int EventSet, long_long *values);

int PAPI_accum(int EventSet, long_long *values);

Fortran Interface

#include fpapi.h

PAPIF_read(C_INT EventSet, C_LONG_LONG(*) values, C_INT check)

PAPIF_accum(C_INT EventSet, C_LONG_LONG(*) values, C_INT check)

DESCRIPTION

PAPI_read() copies the counters of the indicated event set into the array values. The counters are left counting after the read.

PAPI_accum() adds the counters of the indicated event set into the array values. The counters are zeroed and left counting after the operation.

ARGUMENTS

EventSet -- an integer handle for a PAPI Event Set as created by PAPI_create_eventset