ProbeVue in Action: Identifying a crashing Process

Recently, our monitoring reported a full /var file system on one of our systems. We detected that core files in the directory /var/adm/core had filled the file system. It quickly turned out that all core files came from perl. However, based on the core files we could not determine which perl script had caused the crash of perl. A look at the timestamps of the core files unfortunately showed no pattern:

-bash-4.4$ ls -ltr /var/adm/core
total 2130240
drwxr-xr-x    2 root     system          256 Jan 29 10:20 lost+found/
-rw-------    1 root     system    100137039 Jun 26 04:51 core.22610328.26025105.Z
-rw-------    1 root     system     99054991 Jun 26 06:21 core.21102892.26042104.Z
-rw-------    1 root     system     99068916 Jun 26 08:06 core.18153840.26060607.Z
-rw-------    1 root     system    100132866 Jun 26 08:21 core.19005848.26062105.Z
-rw-------    1 root     system     97986020 Jun 26 16:36 core.15270246.26143608.Z
-rw-------    1 root     system     99208958 Jun 26 22:21 core.22675838.26202106.Z
-rw-------    1 root     system     97557063 Jun 27 01:06 core.5505292.26230604.Z
-rw-------    1 root     system     98962499 Jun 27 10:06 core.8257960.27080603.Z
-rw-------    1 root     system     99804173 Jun 27 14:51 core.18940202.27125107.Z
-rw-------    1 root     system     99633676 Jun 28 03:21 core.17563960.28012107.Z
-rw-------    1 root     system     99116032 Jun 28 19:06 core.8651210.28170608.Z
-bash-4.4$

Also, the entries in the error report provided no information about the crashing perl script and how it was started.

-bash-4.4$ sudo errpt -j A924A5FC –a
...
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC

Date/Time:       Wed May 29 15:21:25 CEST 2019
Sequence Number: 17548
Machine Id:      XXXXXXXXXXXX
Node Id:         XXXXXXXX
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC        

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
         11
USER'S PROCESS ID:
              13369662
FILE SYSTEM SERIAL NUMBER
           1
INODE NUMBER
                 69639
CORE FILE NAME
/var/adm/core/core.13369662.29132106
PROGRAM NAME
perl
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER

PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 19

ADDITIONAL INFORMATION

Unable to generate symptom string.
Too many stack elements.
-bash-4.4$

The only information that could be found was that the processes were terminated with signal 11 (SIGSEGV), that is, due to an access to an invalid memory address.

The question arose: how to determine which perl script was crashing and how it is started.

This should be a task for ProbeVue.

The sysproc provider, which generates an event in case of an exit of a process, seemed to be the right probe to use. The special built-in variable __exitinfo provides more detailed information about the exit, such as exit status or the signal number that terminated the process. This can be used to write the following clause:

1: @@sysproc:exit:*
2: when ( __exitinfo->signo == 11 )
3: {
4:         printf( "%llu:  %s\n" , __pid , __pname );
5:         ptree(10);
6: }

The 6 lines are briefly explained here:

  1. The probe point: provider is sysproc, event is exit, * means any process.
  2. By using the above predicate, the subsequent action block is executed only when the process was terminated with signal 11 (SIGSEGV).
  3. Start of the action block.
  4. Output of the PID and the program name of the process.
  5. The function ptree outputs the father, grandfather, etc. (up to 10 levels) of the process.
  6. End of the action block.

Unfortunately, no program arguments can be listed here, which in our case would have given the name of the perl script. But at least we get informations from the function ptree about the way the program was called, which is sufficient in some cases to ultimately identify the program.

For identification we would like to have the information about the arguments with which perl was called. This information is provided by the syscall provider for the system call execve with which the program is started. The probe point is thus syscall: *: execve: entry, since the arguments are known when entering the function. The signature of execve for ProbeVue looks like this:

int execve( char* , struct arg_t* args , char* );

Here, the first argument (provided by ProbeVue as __arg1) is the program name. The second argument is a structure with the arguments in question (provided by __arg2). The third argument gives access to environment variables, which is not important in our case. The structure struct arg_t looks like this for 5 arguments:

struct arg_t
{
        union
        {
                char* arg[5];
                int num[5];
        } u;
};

This structure and the signature of execve must be declared in the ProbeVue script before they can be used.

When accessing the arguments, there is another small problem: if the action block for our sample is addressed, we are in kernel mode, but the arguments themselves are addresses in the user mode of the process. The data (in this case character strings) must be copied out of the user address space. This is done by the function get_userstring.

For every execve we get the PID, the program name, the command and up to 5 arguments. This is implemented in the following program:

#! /usr/bin/probevue

struct arg_t
{
        union
        {
                char* arg[5];
                int num[5];
        } u;
};

int execve( char* , struct arg_t* args , char* );

@@syscall:*:execve:entry
{
        __auto String command[128];
        __auto String argument[128];
        __auto struct arg_t argv;
        copy_userdata( __arg2 , argv );
        command = get_userstring( __arg1 , -1 );
        argument = get_userstring( argv.u.arg[0] , -1 );
        printf( "%llu: %s called execve(%s) with arguments: %s " , __pid , __pname , command , argument )
;
        if ( argv.u.num[1] != 0 )
        {
                argument = get_userstring( argv.u.arg[1] , -1 );
                printf( "%s " , argument );
                if ( argv.u.num[2] != 0 )
                {
                        argument = get_userstring( argv.u.arg[2] , -1 );
                        printf( "%s " , argument );
                        if ( argv.u.num[3] != 0 )
                        {
                                argument = get_userstring( argv.u.arg[3] , -1 );
                                printf( "%s " , argument );
                                if ( argv.u.num[4] != 0 )
                                {
                                        argument = get_userstring( argv.u.arg[4] , -1 );
                                        printf( "%s " , argument );
                                }
                        }
                }
        }
        printf( "\n" );
}

@@sysproc:exit:*
when ( __exitinfo->signo == 11 )
{
        printf( "%llu:  %s\n" , __pid , __pname );
        ptree(10);
}

We called the script capture_segv.e and made it executable.

In theory, after startup the program should output all starting programs with PID, name and up to 5 arguments. In addition, an output occurs when a process is terminated with signal 11 (SIGSEGV). The corresponding PID can then be matched further up in the output and thus the program with arguments can be identified.

Unfortunately, the following small problem arises in practice: if a program is terminated very quickly after the execve, before ProbeVue can copy the arguments with get_userstring, get_userstring accesses a no longer existing address and the ProbeVue script is aborted. We bypassed this by simply starting the ProbeVue script over and over in an infinite loop:

# while true; do ./capture_segv.e >>/tmp/wait_for_segv ; done

We then ran the ProbeVue script for a few hours, until the next crash of the perl script. The file /tmp/wait_for_segv contained about 23,000 lines! We have listed only the relevant lines here:

# cat /tmp/wait_for_segv
…
8651210: ksh called execve(xxxx_hacheck.pl) with arguments: xxxx_hacheck.pl -c
8651210: ksh called execve(/var/opt/OV/bin/instrumentation/xxxx_hacheck.pl) with arguments: xxxx_hacheck
.pl -c
20054518: ksh called execve(/bin/proc2mon.pl) with arguments: proc2mon.pl
…
8651210:  perl

     PID              CMD
       1              init
                        |
                        V
9634196              ovcd
                        |
                        V
9765232              opcacta
                        |
                        V
8651210              perl    <=======
…

One can see that perl was started via the program opcacta, which was started by ovcd. These processes belong to HP OpenView that is in use here. At the top of the output one can see that the perl script /var/opt/OV/bin/instrumentation/xxxx_hacheck.pl has been started. So we found the script that generates the many core files.

The script was written recently and obviously needs to be re-examined and reworked.

With the help of ProbeVue a short script and several hours of waiting was enough to find the cause of the problem! ProbeVue is not only useful for investigating problems; also in performance monitoring, ProbeVue proves to be extremely helpful.

Resources of not activated LPARs and Memory Affinity

When an LPAR is shut down, resources such as processors, memory, and I/O slots are not automatically released by the LPAR. The resources remain assigned to the LPAR and are then reused on the next activation (with the current configuration). In the first part of the article Resources of not activated LPARs we had already looked at this.

(Note: In the example output, we use version 1.4 of the LPAR tool, but in all cases we show the underlying commands on the HMC command line, so you can try everything without using the LPAR tool.)

The example LPAR lpar1 was shut down, but currently still occupies 100 GB of memory:

linux $ lpar status lpar1
NAME   LPAR_ID  LPAR_ENV  STATE          PROFILE   SYNC  RMC       PROCS  PROC_UNITS  MEM     OS_VERSION
lpar1  39       aixlinux  Not Activated  standard  0     inactive  1      0.2         102400  Unknown
linux $

The following commands for the output above were executed on the corresponding HMC hmc01:

hmc01: lssyscfg -r lpar -m ms09 --filter lpar_names=lpar1
hmc01: lshwres -r mem -m ms09 --level lpar --filter lpar_names=lpar1
hmc01: lshwres -r proc -m ms09 --level lpar --filter lpar_names=lpar1

As the output shows, the LPAR lpar1 has still allocated its resources (processors, memory, I/O adapters).

In order to understand why deactivating an LPAR does not release the resources, you have to look at the “Memory Affinity Score”:

linux $ lpar lsmemopt lpar1
             LPAR_SCORE  
LPAR_NAME  CURR  PREDICTED
lpar1      100   0
linux $

HMC command line:

hmc01: lsmemopt -m ms09 -r lpar -o currscore –filter lpar_names=lpar1

The Memory Affinity Score describes how close processors and memory are, the closer the memory to the memory is, the better is the throughput to the memory. The command above indicates, with a value between 1 and 100, how big the affinity between processors and LPARs is. Our LPAR lpar1 currently has a value of 100, which means the best possible affinity of memory and processors. If the resources were freed when deactivating an LPAR, then the LPAR would lose this Memory Affinity Score. The next time you enable the LPAR, it then depends on the memory and processors available then how good the memory affinity will be then. We release the resources once:

linux $ lpar -d rmprocs lpar1 1
linux $

HMC command line:

hmc01: chhwres -m ms09 -r proc  -o r -p lpar1 --procs 1

No more score will be given, since the LPAR has no longer allocated any resources:

linux $ lpar lsmemopt lpar1
             LPAR_SCORE  
LPAR_NAME  CURR  PREDICTED
lpar1      none  none
linux $

HMC command line:

hmc01: lsmemopt -m ms09 -r lpar -o currscore –filter lpar_names=lpar1

Now we allocate resources again and look at the effect this has on memory affinity:

linux $ lpar applyprof lpar1 standard
linux $

HMC command line:

hmc01: chsyscfg -r lpar -m ms09 -o apply -p lpar1 -n standard

We again determine the Memory Affinity Score:

linux $ lpar lsmemopt lpar1
             LPAR_SCORE  
LPAR_NAME  CURR  PREDICTED
lpar1      53    0
linux $

HMC command line:

hmc01: lsmemopt -m ms09 -r lpar -o currscore –filter lpar_names=lpar1

The score is now only 53, the performance of the LPAR has become worse. Whether and how much this is noticeable, depends ultimately on the applications on the LPAR.

The fact that the resources are not released when deactivating an LPAR, thus guarantees the next time you activate (with the current configuration) the memory affinity remains the same and thus the performance should be the same.

If you release the resources of an LPAR (manually or automatically), then you have to realize that this has an effect on the LPAR if it is later activated again, because then the resources are reassigned and a worse (but possibly also a better) Memory Affinity Score can result.

Conversely, before activating a new LPAR you can also make sure that there is a good chance for a high memory affinity score for the new LPAR by releasing resources of inactive LPARs.

(Note: resource distribution can be changed and improved at runtime using the Dynamic Platform Optimizer DPO. DPO is supported as of POWER8.)

 

Resources of not activated LPARs

When an LPAR is shutdown, resources such as processors, memory, and I/O slots are not automatically released by the LPAR. The resources remain assigned to the LPAR and are reused on the next activation (with the current configuration).

The article will show how such resources are automatically released and, if desired, how to manually release resources of an inactive LPAR.

(Note: In the example output, we use version 1.4 of the LPAR tool, but in all cases we show the underlying commands on the HMC command line, so you can try everything without using the LPAR tool.)

The example LPAR lpar1 was shut down, but currently still occupies 100 GB of memory:

linux $ lpar status lpar1
NAME   LPAR_ID  LPAR_ENV  STATE          PROFILE   SYNC  RMC       PROCS  PROC_UNITS  MEM     OS_VERSION
lpar1  39       aixlinux  Not Activated  standard  0     inactive  1      0.2         102400  Unknown
linux $

The following commands for the output above were executed on the corresponding HMC hmc01:

hmc01: lssyscfg -r lpar -m ms09 --filter lpar_names=lpar1
hmc01: lshwres -r mem -m ms09 --level lpar --filter lpar_names=lpar1
hmc01: lshwres -r proc -m ms09 --level lpar --filter lpar_names=lpar1

The resource_config attribute of an LPAR indicates whether the LPAR has currently allocated resources (resource_config=1) or not (resource_config=0):

linux $ lpar status -F resource_config lpar1
1
linux $

Or on the HMC command line:

hmc01: lssyscfg -r lpar -m ms09 --filter lpar_names=lpar1 –F resource_config

The resources allocated by an not activated LPAR can be released in 2 different ways:

  1. Automatic: The resources used are needed by another LPAR, e.g. because memory is expanded dynamically or an LPAR is activated that does not have sufficient resources. In this case, resources are automatically removed from a not activated LPAR. We will show this below with an example.
  2. Manual: The allocated resources are explicitly released by the administrator. This is also shown below in an example.

First we show an example in which resources are automatically taken away from an not activated LPAR.

The managed system ms09 currently has about 36 GB free memory:

linux $ ms lsmem ms09
NAME  INSTALLED  FIRMWARE  CONFIGURABLE  AVAIL  MEM_REGION_SIZE
ms09  786432     33792     786432        36352  256
linux $

HMC command line:

hmc01: lshwres -r mem -m ms09 --level sys

We start an LPAR (lpar2) which was configured with 100 GB of RAM. The managed system has only 36 GB of RAM and is therefore forced to take resources from inactive LPARs in order to provide the required 100 GB. We start lpar2 with the profile standard and look at the memory relations:

linux $ lpar activate -b sms -p standard lpar2
linux $

HMC command line:

hmc01: chsysstate -m ms09 -r lpar -o on -n lpar2 -b sms -f standard

Overview of the storage relations of lpar1 and lpar2:

linux $ lpar status lpar\*
NAME   LPAR_ID  LPAR_ENV  STATE          PROFILE   SYNC  RMC       PROCS  PROC_UNITS  MEM     OS_VERSION
lpar1  4        aixlinux  Not Activated  standard  0     inactive  1      0.2         60160   Unknown
lpar2  8        aixlinux  Open Firmware  standard  0     inactive  1      0.2         102400  Unknown
linux $ ms lsmem ms09
NAME  INSTALLED  FIRMWARE  CONFIGURABLE  AVAIL  MEM_REGION_SIZE
ms09  786432     35584     786432        0      256
linux $

HMC command line:

hmc01: lssyscfg -r lpar -m ms09
hmc01: lshwres -r mem -m ms09 --level lpar
hmc01: lshwres -r proc -m ms09 --level lpar
hmc01: lshwres -r mem -m ms09 --level sys

The LPAR lpar2 has 100 GB RAM, the managed system has no more free memory and the memory allocated by LPAR lpar1 has been reduced to about 60 GB. Allocated resources from non-activated LPARs are automatically released, when needed and assigned to other LPARs.

But you can of course also release the resources manually. This is also shown briefly here. We are reducing the memory of LPAR lpar1 by 20 GB:

linux $ lpar -d rmmem lpar1 20480
linux $

HMC command line:

hmc01: chhwres -m ms09 -r mem  -o r -p lpar1 -q 20480

As stated, the allocated memory has been reduced by 20 GB:

linux $ lpar status lpar\*
NAME   LPAR_ID  LPAR_ENV  STATE          PROFILE   SYNC  RMC       PROCS  PROC_UNITS  MEM     OS_VERSION
lpar1  4        aixlinux  Not Activated  standard  0     inactive  1      0.2         39680   Unknown
lpar2  8        aixlinux  Open Firmware  standard  0     inactive  1      0.2         102400  Unknown
linux $ ms lsmem ms09
NAME  INSTALLED  FIRMWARE  CONFIGURABLE  AVAIL  MEM_REGION_SIZE
ms09  786432     35584     786432        20480  256
linux $

HMC command line:

hmc01: lssyscfg -r lpar -m ms09
hmc01: lshwres -r mem -m ms09 --level lpar
hmc01: lshwres -r proc -m ms09 --level lpar
hmc01: lshwres -r mem -m ms09 --level sys

The 20 GB are immediately available to the managed system as free memory. If you remove the entire memory or all processors (or processor units), then all resources of an inactive LPAR are released:

linux $ lpar -d rmmem lpar1 39680
linux $

HMC command line:

hmc01: chhwres -m ms09 -r mem  -o r -p lpar1 -q 39680

Here are the resulting memory relations:

linux $ lpar status lpar\*
NAME   LPAR_ID  LPAR_ENV  STATE          PROFILE   SYNC  RMC       PROCS  PROC_UNITS  MEM     OS_VERSION
lpar1  4        aixlinux  Not Activated  standard  0     inactive  0      0.0         0       Unknown
lpar2  8        aixlinux  Open Firmware  standard  0     inactive  1      0.2         102400  Unknown
linux $ ms lsmem ms09
NAME        INSTALLED  FIRMWARE  CONFIGURABLE  AVAIL  MEM_REGION_SIZE
ms09  786432     31232     786432        64512  256
linux $

HMC command line:

hmc01: lssyscfg -r lpar -m ms09
hmc01: lshwres -r mem -m ms09 --level lpar
hmc01: lshwres -r proc -m ms09 --level lpar
hmc01: lshwres -r mem -m ms09 --level sys

The LPAR lpar1 now has 0 processors, 0.0 processor units and 0 MB of memory! In addition, the resource_config attribute now has the value 0, which indicates that the LPAR no longer has any resources configured!

linux $ lpar status -F resource_config lpar1
0
linux $

HMC command line:

hmc01: lssyscfg -r lpar -m ms09 --filter lpar_names=lpar1 –F resource_config

Finally, the question arises as to why you should release resources manually if they are automatically released by the managed system when needed?

We will answer this question in a second article.