Error: 0509-022 Cannot load module

Recently, we have installed yum from the AIX Toolbox (www.ibm.com/support/pages/aix-toolbox-linux-applications-overview) on some of our systems. So far, we have installed RPM packages from Perzl (www.perzl.org). When upgrading to yum, some RPMs were updated with versions from the AIX Toolbox. For some of the RPMs, there were problems with dynamic libraries afterwards.

Here we show one of the issues with the curl program:

$ curl
exec(): 0509-036 Cannot load program curl because of the following errors:
        0509-022 Cannot load module /opt/freeware/lib/libcurl.a(libcurl.so.4).
        0509-150   Dependent module /opt/freeware/lib/libcrypto.a(libcrypto.so) could not be loaded.
        0509-152   Member libcrypto.so is not found in archive 
        0509-022 Cannot load module curl.
        0509-150   Dependent module /opt/freeware/lib/libcurl.a(libcurl.so.4) could not be loaded.
        0509-022 Cannot load module .
$

Curl was working before installing the RPM packages. The command ldd for listing dynamic dependencies provides a similar error message:

$ ldd /usr/bin/curl
/usr/bin/curl needs:
         /usr/lib/libc.a(shr.o)
         /opt/freeware/lib/libcurl.a(libcurl.so.4)
         /opt/freeware/lib/libz.a(libz.so.1)
         /unix
         /usr/lib/libcrypt.a(shr.o)
         /opt/freeware/lib/libcrypto.a(libcrypto.so)
ar: 0707-109 Member name libcrypto.so does not exist.
dump: /tmp/tmpdir31457524/extract/libcrypto.so: 0654-106 Cannot open the specified file.
         /opt/freeware/lib/libssl.a(libssl.so)
ar: 0707-109 Member name libssl.so does not exist.
dump: /tmp/tmpdir31457524/extract/libssl.so: 0654-106 Cannot open the specified file.
$

The file /opt/freeware/lib/libcrypto.a is an archive and the error message says that the shared object libcrypto.so does not exist in this archive. This can easily be checked using the command ar:

$ ar -X any tv /opt/freeware/lib/libcrypto.a
rwxr-xr-x     0/0     3036810 Apr 08 18:46 2014 libcrypto.so.1.0.1
rwxr-xr-x     0/0     3510308 Apr 08 18:41 2014 libcrypto.so.1.0.1
rw-r--r--     0/0     2012251 Apr 08 18:46 2014 libcrypto.so.0.9.7
rw-r--r--     0/0     2491620 Apr 08 18:46 2014 libcrypto.so.0.9.8
rwxr-xr-x     0/0     2921492 Apr 08 18:46 2014 libcrypto.so.1.0.0
rw-r--r--     0/0     2382757 Apr 08 18:46 2014 libcrypto.so.0.9.7
rw-r--r--     0/0     2923920 Apr 08 18:46 2014 libcrypto.so.0.9.8
rwxr-xr-x     0/0     3381316 Apr 08 18:46 2014 libcrypto.so.1.0.0
$

(The option “-X any” ensures that both 32– and 64-bit objects are displayed)

The shared library libcrypto.so is obviously not in the archive! The archive is part of Perzl’s OpenSSL package:

$ rpm -qf /opt/freeware/lib/libcrypto.a
openssl-1.0.1g-1.ppc
$

The OpenSSL package is the cause of the problems when calling curl and possibly other programs. Perzl’s packages have an OpenSSL package, which provides OpenSSL  at /opt/freeware. The IBM packages from the AIX Toolbox, however, use the operating system’s OpenSSL under /usr/lib. There is no OpenSSL RPM package in the AIX Toolbox. Perzl’s OpenSSL package should be removed.

# rpm –e openssl
#

Of course, if you have RPM packages (Perzl) that depend on OpenSSL, removing the OpenSSL RPM will fail:

# rpm -e openssl
error: Failed dependencies:
            openssl >= 0.9.8 is needed by (installed) openldap-2.4.23-0.3.ppc
#

In this case, there are 2 options: either uninstall the dependent package as well, or update the package with a version from the AIX toolbox. Here we briefly show the second case:

# yum update openldap
AIX_Toolbox                                                                                   | 2.9 kB  00:00:00    
AIX_Toolbox_71                                                                                | 2.9 kB  00:00:00    
AIX_Toolbox_noarch                                                                            | 2.9 kB  00:00:00    
Setting up Update Process
Resolving Dependencies
--> Running transaction check
---> Package openldap.ppc 0:2.4.23-0.3 will be updated
---> Package openldap.ppc 0:2.4.46-1 will be an update
...

Is this ok [y/N]: y

...

Updated:

  openldap.ppc 0:2.4.46-1                                                                                           

Complete!
#

Afterwards, the OpenSSL RPM can be uninstalled:

# rpm -e openssl
#

Now curl can be started again without a problem:

$ curl
curl: try 'curl --help' or 'curl --manual' for more information
$

Conclusion: When switching to the AIX Toolbox, any existing OpenSSL RPM should be uninstalled!

 

Virtual Processor Folding

A simple and direct way to see which processors are active and which processors are not available due to “virtual processor folding” is the kernel debugger kdb.

The following command displays the desired information:

# echo vpm | kdb
...
VSD Thread State.
CPU  CPPR VP_STATE FLAGS SLEEP_STATE  PROD_TIME: SECS   NSECS     CEDE_LAT

   0     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00  
   1     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00  
   2     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00  
   3     0  ACTIVE      0 AWAKE        0000000000000000  00000000  00  
   4     0  DISABLED    0 AWAKE        0000000000000000  00000000  00  
   5    11  DISABLED    0 SLEEPING     000000005D9F15BE  0F77D2C6  02  
   6    11  DISABLED    0 SLEEPING     000000005D9F15BE  0F77D0C8  02  
   7    11  DISABLED    0 SLEEPING     000000005D9F15BE  217D3F61  02  

#

The output was created on a system with 2 processor cores and SMT4. CPUs 4-7 are currrently DISABLED, which is the second processor core.

Building your own AIX installp packages (Part 1)

This blog post will show you how to build installp packages on AIX. The use of the command mkinstallp is demonstrated by a simple example. More complex packages will be covered in a later article.

It is relatively easy to build installp packages under AIX. This requires the command /usr/sbin/mkinstallp. If the command is not installed, the package bos.adt.insttools must be installed. Building a package requires root privileges.

First of all, we create a directory somewhere, in which the package is to be built:

# mkdir pwrcmps.installp.example
#

The package should contain a small shell script:

# cat <<EOF >hello
#! /bin/ksh
print "hello world"
EOF
# chmod a+x hello
#

The script will later be installed under /usr/local/bin. Files to be installed must be in the same place relative to the build directory (pwrcmps.installp.example in our case), as later relative to the root directory. We therefore create the necessary directory structure and copy the hello script to the appropriate location:

# mkdir –p pwrcmps.installp.example/usr/local/bin
# cp hello pwrcmps.installp.example/usr/local/bin
#

We change to the build directory and start the command mkinstallp to build the package:

# cd pwrcmps.installp.example
# mkinstallp
Using /export/src/installp/pwrcmps.installp.example as the base package directory.
Cannot find /export/src/installp/pwrcmps.installp.example/.info. Attempting to create.
Using /export/src/installp/pwrcmps.installp.example/.info to store package control files.
Cleaning intermediate files from /export/src/installp/pwrcmps.installp.example/.info.

************************************************************
|            Beginning interactive package input           |
|   * - required; () - example value; [] - default value   |
************************************************************

* Package Name (xyz.net) []: pwrcmps.installp
* Package VRMF (1.0.0.0) []: 1.0.0.0
Update (Y/N) [N]:
Number of filesets in pwrcmps.installp (1) [1]:

Gathering info for new fileset (1 remaining)
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
* Fileset Name (pwrcmps.installp.rte) []: pwrcmps.installp.example
* Fileset VRMF (1.0.0.0) []: 1.0.0.0
* Fileset Description (some text) []: Example Fileset
Do you want to include a copyright file for this fileset? (Y/N) [N]:
Entering information for the USER part liblpp files
Do you want to include an installp pre_i/u script for this fileset? (Y/N) [N]:
Do you want to include an installp post_i/u script for this fileset? (Y/N) [N]:
Do you want to include a pre-deinstall Script for this fileset? (Y/N) [N]:
Do you want to include an installp pre_rm script for this fileset? (Y/N) [N]:
Do you want to include an installp config script for this fileset? (Y/N) [N]:
Would you like to create ROOT part? (Y/N) [N]:
Bosboot required (Y/N) [N]:
License agreement acceptance required (Y/N) [N]:
Include license files for pwrcmps.installp.example in this package (Y/N) [N]:
Do you want to specify Requisites using a file for this fileset? (Y/N) [N]:
Number of Requisites for pwrcmps.installp.example (1) [0]:
Number of filesystems requiring additional space for pwrcmps.installp.example [0]:

You should include any directories that you are creating in the file count.
(ie: For /usr/proj/myFile, enter 2; 1 for /usr/proj and 1 for /usr/proj/myFile)
Number of USR part files in pwrcmps.installp.example (1) [0]: 1
* 1 of 1. Directory or File Path (/usr/proj/myFile) []: /usr/local/bin/hello
Do you want to make this fileset relocatable? (Y/N) [N]:
Do you want to include an override inventory file for this fileset? (Y/N) [N]:
Do you want to add WPAR attributes to your fileset? (Y/N) [N]:


Using /export/src/installp/pwrcmps.installp.example/.info/pwrcmps.installp.template as the template file.
pwrcmps.installp 1.0.0.0 I
processing pwrcmps.installp.example
creating ./.info/liblpp.a
creating ./tmp/pwrcmps.installp.1.0.0.0.bff
#

The built end product can be found in the subdirectory tmp:

# ls –l tmp
total 8
-rw-r--r--    1 root     system         2560 Sep 25 09:49 pwrcmps.installp.1.0.0.0.bff
#

When creating the package, we have always confirmed the default values for simplicity. The product name is pwrcmps.installp and the file set name is pwrcmps.installp.example, each with version 1.0.0.0. All files and directories of the package must be listed explicitly with absolute path! This can be a bit impractical for a few hundred paths, but it can be simplified and automated by using scripts.

We install the newly generated package once, and check if our shell script is also installed:

# installp -ad tmp/pwrcmps.installp.1.0.0.0.bff all
+-----------------------------------------------------------------------------+
                    Pre-installation Verification...
+-----------------------------------------------------------------------------+
Verifying selections...done
Verifying requisites...done
Results...

SUCCESSES
---------
  Filesets listed in this section passed pre-installation verification
  and will be installed.

  Selected Filesets
  -----------------
  pwrcmps.installp.example 1.0.0.0            # Example Fileset

  << End of Success Section >>

+-----------------------------------------------------------------------------+
                   BUILDDATE Verification ...
+-----------------------------------------------------------------------------+
Verifying build dates...done
FILESET STATISTICS
------------------
    1  Selected to be installed, of which:
        1  Passed pre-installation verification
 ----
    1  Total to be installed

+-----------------------------------------------------------------------------+
                         Installing Software...
+-----------------------------------------------------------------------------+

installp:  APPLYING software for:
        pwrcmps.installp.example 1.0.0.0

Finished processing all filesets.  (Total time:  1 secs).

+-----------------------------------------------------------------------------+
                                Summaries:
+-----------------------------------------------------------------------------+

Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
pwrcmps.installp.example    1.0.0.0         USR         APPLY       SUCCESS   
# which hello
/usr/local/bin/hello
# hello
hello world
#

We are now extending our script by printing the message “hello world, how are you“.

# vi usr/local/bin/hello
…
print "hello world, how are you"
#

The package should now be rebuilt, before we want to increase the version to 1.1.0.0. Of course, we could start the command mkinstallp interactively again and enter all the necessary information again, but this is very time consuming and not necessary. The command mkinstallp supports the specification of a template file with all necessary information. When building the first version of our package, such a template file was generated under .info/pwrcmps.installp.template. We copy this template file directly into the build root directory and rename it to template:

# cp .info/pwrcmps.installp.template template
# cat template
Package Name: pwrcmps.installp
Package VRMF: 1.0.0.0
Update: N
Fileset
  Fileset Name: pwrcmps.installp.example
  Fileset VRMF: 1.0.0.0
  Fileset Description: Example Fileset
  USRLIBLPPFiles
  EOUSRLIBLPPFiles
  Bosboot required: N
  License agreement acceptance required: N
  Include license files in this package: N
  Requisites:
  USRFiles
    /usr/local/bin/hello
  EOUSRFiles
  ROOT Part: N
  ROOTFiles
  EOROOTFiles
  Relocatable: N
EOFileset
#

The template file contains the information that we gave interactively. In this template file we change the version number from 1.0.0.0 to 1.1.0.0:

# vi template
…
Package VRMF: 1.1.0.0
…
  Fileset VRMF: 1.1.0.0
…
#

We now try to build the version 1.1.0.0 by starting mkinstallp with the option -T (for template) and the template file template:

# mkinstallp -T template
Using /export/src/installp/pwrcmps.installp.example as the base package directory.
Using /export/src/installp/pwrcmps.installp.example/.info to store package control files.
Cleaning intermediate files from /export/src/installp/pwrcmps.installp.example/.info.
0503-844 mkinstallp: Cannot overwrite existing
        /export/src/installp/pwrcmps.installp.example/.info/pwrcmps.installp.template file.
#

An error message appears, the template file under .info can not be overwritten. The .info directory is created again and again during the build process and can therefore simply be deleted before the next build process is started:

# rm -rf .info
# mkinstallp -T template
Using /export/src/installp/pwrcmps.installp.example as the base package directory.
Cannot find /export/src/installp/pwrcmps.installp.example/.info. Attempting to create.
Using /export/src/installp/pwrcmps.installp.example/.info to store package control files.
Cleaning intermediate files from /export/src/installp/pwrcmps.installp.example/.info.

Using template as the template file.
0503-880 mkinstallp: This file /usr/local/bin/hello
        already exists as a system file.
pwrcmps.installp 1.1.0.0 I
processing pwrcmps.installp.example
creating ./.info/liblpp.a
creating ./tmp/pwrcmps.installp.1.1.0.0.bff
#

The new version of the package can be found under tmp as before:

# ls -l tmp
total 16
-rw-r--r--    1 root     system         2560 Sep 25 10:08 pwrcmps.installp.1.0.0.0.bff
-rw-r--r--    1 root     system         2560 Sep 25 10:35 pwrcmps.installp.1.1.0.0.bff
#

This makes it easy to make changes and then package them.

TCP connection aborts due to “max assembly queue depth”

Recently we had frequent program crashes from some Java client programs. The following was found in the Java stack trace:

[STACK] Caused by: java.io.IOException: Premature EOF
[STACK]           at sun.net.www.http.ChunkedInputStream.readAheadBlocking(Unknown Source)
[STACK]           at sun.net.www.http.ChunkedInputStream.readAhead(Unknown Source)
[STACK]           at sun.net.www.http.ChunkedInputStream.read(Unknown Source)
[STACK]           at java.io.FilterInputStream.read(Unknown Source)

The problem occurs in the ChunkedInputStream class in the readAheadBlocking method. In the source code of the method you will find:

558 /**
559 * If we hit EOF it means there's a problem as we should never
560 * attempt to read once the last chunk and trailers have been
561 * received.
562 */
563 if (nread < 0) {
564 error = true;
565 throw new IOException("Premature EOF");
566 }

The value nread becomes less than 0 when the end of the data stream is reached. This can happen if the other party closes the connection unexpectedly.

The server side in this case was an AIX system (AIX 7.1 TL5 SP3). A review of the TCP connections for dropdowns using netstat resulted in:

$ netstat -p tcp | grep drop
        361936 connections closed (including 41720 drops)
        74718 embryonic connections dropped
                0 connections dropped by rexmit timeout
                0 connections dropped due to persist timeout
                0 connections dropped by keepalive
        0 packets dropped due to memory allocation failure
        0 Connections dropped due to bad ACKs
        0 Connections dropped due to duplicate SYN packets
        1438 connections dropped due to max assembly queue depth
$

Thus, there were 1438 connection drops because of the maximum TCP assembly queue depth. The queue depth is configured via the new kernel parameter tcp_maxqueuelen, which was introduced as a fix for CVE-2018-6922 (see: The CVE-2018-6922 fix (FreeBSD vulnerability) and scp). The default value is 1000. With larger packet runtimes, the queue may overflow.

After increasing the kernel parameter tcp_maxqueuelen, no connection drops because of max assembly queue have occurred anymore.

 

ProbeVue in Action: Monitoring the Queue Depth of Disks

Disk and storage systems support Tagged Command Queuing, i.e. connected servers can send multiple I/O jobs to the disk or storage system without waiting for older I/O jobs to finish. The number of I/O requests you can send to a disk before you have to wait for older I/O requests to complete can be configured using the hdisk queue_depth attribute on AIX. For many hdisk types, the value 20 for the queue_depth is the default value. In general, most storage systems allow even greater values for the queue depth.

With the help of ProbeVue, the utilization of the disk queue can be monitored very easily.

Starting with AIX 7.1 TL4 or AIX 7.2 TL0, AIX supports the I/O Probe Manager. This makes it easy to trace events in AIX’s I/O stack. If an I/O is started by the disk driver, this is done via the iostart function in the kernel, the request is forwarded to the adapter driver and then passed to the storage system via the host bus adapter. Handling the response is done by the iodone function in the kernel. The I/O Probe Manager supports (among others) probe events at these locations:

@@io:disk:iostart:read:<filter>
@@io::disk:iostart:write:<filter>
@@io:disk:iodone:read:<filter>
@@io::disk:iodone:write:<filter>

As a filter, e.g. a hdisk name like hdisk2 can be specified. The probe points then only trigger events for the disk hdisk2. This allows to perform an action whenever an I/O for a hdisk begins or ends. This would allow to measure how long an I/O operation takes or just to count how many I/Os are executed. In our example, we were interested in the utilization of the disk queue, i.e. the number of I/Os sent to the disk which are not yet completed. The I/O Probe Manager has a built-in variable __diskinfo for the iostart and iodone I/O probe events with the following fields (https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix .genprogc / probevue_man_io.htm):

name          char*     Name of the disk.
…
queue_depth   int       The queue depth of the disk (value from ODM)
cmds_out      int       Number of outstanding I/Os
…

The cmds_out field indicates how many I/Os have already been sent to the disk for which the I/O has not yet been completed (response has not yet arrived at the server).

The following section of code determines the minimum, maximum, and average number of entries in the disk queue:

@@io:disk:iostart:*:hdisk0     // Only I/Os for hdisk0 are considered
{
   queue = __iopath->cmds_out; // Store number of outstanding I/Os in variable queue
   ++numIO;                    // Number of I/Os (used for calculating the average)
   avg += queue;               // Add number of outstanding I/Os to variable avg
   if ( queue < min )
      min = queue;             // Check if minimum
   if ( queue > max )
      max = queue;             // Check if maximum
}

The calculated values are then printed once per second using the interval probe manager:

@@interval:*:clock:1000
{
   if ( numIO == 0 )
      numIO = 1;    // Prevent division by 0 when calculating the average
   if ( min > max )
      min = max;
   printf( "%5d  %5d  %5d\n" , min , avg/numIO , max );
   min = 100000;   // Reset variables for the next interval
   avg = 0;
   max = 0;
   numIO = 0;
}

The full script is available for download on our website: ioqueue.e.

Here is a sample run of the script for the disk hdisk13:

# ./ioqueue.e hdisk13
  min    avg    max
    1      1      2
    1      1      9
    1      1      2
    1      1      8
    1      1      2
    1      1      2
    1      1      8
    1      1     10
    1      1      2
    1      1      1
    1      1     10
    1      1      2
    1      1     11
...

The script expects an hdisk as an argument, and then outputs once per second the values determined for the specified hdisk.

In the example output you can see that the maximum number of entries in the disk queue is 11. An increase of the attribute queue_depth therefore makes no sense from a performance perspective.

Here’s another example:

# ./ioqueue.e hdisk21
  min    avg    max
    9     15     20
   11     17     20
   15     19     20
   13     19     20
   14     19     20
   17     18     20
   18     18     19
   16     19     20
   13     18     20
   18     19     19
   17     19     20
   18     19     20
   17     19     19
...

In this case, the maximum value of 20 (the hdisk21 has a queue_depth of 20) is reached on a regular basis. Increasing the queue_depth can improve throughput in this case.

Of course, the sample script can be expanded in various ways; to determine the throughput in addition, or the waiting time of I/Os in the wait queue, or even the position and size of each I/O on the disk. This example just shows how easy it is to get information about I/Os using ProbeVue.

Full file system: df and du show different space usage

Full file systems occur in practice again and again, everyone knows this. Usually you search for large files or directories and check if older data can be deleted to make space again (but sometimes the file system will be simply enlarged without further investigation). In some cases, however, you can not find any larger files that could be deleted or you discover that file system space is seems to be gone, but you can not identify where this space is used. The command du then displays a smaller value for the file system space used than df. In the following, such an example is shown, as well as the hint how to identify where the filesystem-space is and how it can finally be recovered. AIX has a nice feature to offer that is not found in most other UNIX derivatives.

The file system /var/adm/log is 91% filled, currently 3.6 GB of the file system are in use:

# df -g  /var/adm/log
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/varadmloglv      4.00      0.39   91%      456     1% /var/adm/log
#

A quick check with the command du shows that apparently much less space is occupied:

# du –sm /var/adm/log
950.21   /var/adm/log
#

The command “disk usage” shows only 950 MB occupied space! This is 2.7 GB less than the value from the df command. But where is the missing space?

The difference comes from files that have been deleted but are still open by at least one process. The entry for such files is removed from the associated directory, which makes the file inaccessible. Therefore the command du does not take thes files into account and comes up with a smaller value. As long as a process still has the deleted file in use, however, the associated blocks are not released in the file system, so df correctly displays these as occupied.

So there is at least one file in the file system /var/adm/log which has been deleted but is still open by a process. The question is how to identify the process and the file.

AIX provides an easy way to identify processes that have opened deleted files, the fuser command supports the -d option to list processes that have deleted files open:

# fuser -d /var/adm/log
/var/adm/log:  9110638
#

If you also use the -V option, you will also see information about the deleted files, such as the inode number and file size:

# fuser -dV /var/adm/log
/var/adm/log:
inode=119    size=2882647606   fd=12     9110638
#

The output shows that here the file with the inode number 119 with a size of approximately 2.8 GB was deleted, but is still opened by the process with the PID 9110638 over the file descriptor 12.

Using ps you can quickly find out which process it is:

# ps -ef|grep 9110638
    root  9110638  1770180   0   Nov 20      - 28:28 /usr/sbin/syslogd
    root  8193550  8849130   0 09:13:35  pts/2  0:00 grep 9110638
#

In our case it is the syslogd process. Presumably a log file was rotated via mv without informing the syslogd (refresh -s syslogd). We fix this shortly and check the file system again:

# refresh -s syslogd
0513-095 The request for subsystem refresh was completed successfully.
#
# df -g /var/adm/log
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/varadmloglv      4.00      3.07   24%      455     1% /var/adm/log
#

The output shows that the file system blocks have now been released.

ProbeVue in Action: Identifying a crashing Process

Recently, our monitoring reported a full /var file system on one of our systems. We detected that core files in the directory /var/adm/core had filled the file system. It quickly turned out that all core files came from perl. However, based on the core files we could not determine which perl script had caused the crash of perl. A look at the timestamps of the core files unfortunately showed no pattern:

-bash-4.4$ ls -ltr /var/adm/core
total 2130240
drwxr-xr-x    2 root     system          256 Jan 29 10:20 lost+found/
-rw-------    1 root     system    100137039 Jun 26 04:51 core.22610328.26025105.Z
-rw-------    1 root     system     99054991 Jun 26 06:21 core.21102892.26042104.Z
-rw-------    1 root     system     99068916 Jun 26 08:06 core.18153840.26060607.Z
-rw-------    1 root     system    100132866 Jun 26 08:21 core.19005848.26062105.Z
-rw-------    1 root     system     97986020 Jun 26 16:36 core.15270246.26143608.Z
-rw-------    1 root     system     99208958 Jun 26 22:21 core.22675838.26202106.Z
-rw-------    1 root     system     97557063 Jun 27 01:06 core.5505292.26230604.Z
-rw-------    1 root     system     98962499 Jun 27 10:06 core.8257960.27080603.Z
-rw-------    1 root     system     99804173 Jun 27 14:51 core.18940202.27125107.Z
-rw-------    1 root     system     99633676 Jun 28 03:21 core.17563960.28012107.Z
-rw-------    1 root     system     99116032 Jun 28 19:06 core.8651210.28170608.Z
-bash-4.4$

Also, the entries in the error report provided no information about the crashing perl script and how it was started.

-bash-4.4$ sudo errpt -j A924A5FC –a
...
---------------------------------------------------------------------------
LABEL:          CORE_DUMP
IDENTIFIER:     A924A5FC

Date/Time:       Wed May 29 15:21:25 CEST 2019
Sequence Number: 17548
Machine Id:      XXXXXXXXXXXX
Node Id:         XXXXXXXX
Class:           S
Type:            PERM
WPAR:            Global
Resource Name:   SYSPROC        

Description
SOFTWARE PROGRAM ABNORMALLY TERMINATED

Probable Causes
SOFTWARE PROGRAM

User Causes
USER GENERATED SIGNAL

        Recommended Actions
        CORRECT THEN RETRY

Failure Causes
SOFTWARE PROGRAM

        Recommended Actions
        RERUN THE APPLICATION PROGRAM
        IF PROBLEM PERSISTS THEN DO THE FOLLOWING
        CONTACT APPROPRIATE SERVICE REPRESENTATIVE

Detail Data
SIGNAL NUMBER
         11
USER'S PROCESS ID:
              13369662
FILE SYSTEM SERIAL NUMBER
           1
INODE NUMBER
                 69639
CORE FILE NAME
/var/adm/core/core.13369662.29132106
PROGRAM NAME
perl
STACK EXECUTION DISABLED
           0
COME FROM ADDRESS REGISTER

PROCESSOR ID
  hw_fru_id: 1
  hw_cpu_id: 19

ADDITIONAL INFORMATION

Unable to generate symptom string.
Too many stack elements.
-bash-4.4$

The only information that could be found was that the processes were terminated with signal 11 (SIGSEGV), that is, due to an access to an invalid memory address.

The question arose: how to determine which perl script was crashing and how it is started.

This should be a task for ProbeVue.

The sysproc provider, which generates an event in case of an exit of a process, seemed to be the right probe to use. The special built-in variable __exitinfo provides more detailed information about the exit, such as exit status or the signal number that terminated the process. This can be used to write the following clause:

1: @@sysproc:exit:*
2: when ( __exitinfo->signo == 11 )
3: {
4:         printf( "%llu:  %s\n" , __pid , __pname );
5:         ptree(10);
6: }

The 6 lines are briefly explained here:

  1. The probe point: provider is sysproc, event is exit, * means any process.
  2. By using the above predicate, the subsequent action block is executed only when the process was terminated with signal 11 (SIGSEGV).
  3. Start of the action block.
  4. Output of the PID and the program name of the process.
  5. The function ptree outputs the father, grandfather, etc. (up to 10 levels) of the process.
  6. End of the action block.

Unfortunately, no program arguments can be listed here, which in our case would have given the name of the perl script. But at least we get informations from the function ptree about the way the program was called, which is sufficient in some cases to ultimately identify the program.

For identification we would like to have the information about the arguments with which perl was called. This information is provided by the syscall provider for the system call execve with which the program is started. The probe point is thus syscall: *: execve: entry, since the arguments are known when entering the function. The signature of execve for ProbeVue looks like this:

int execve( char* , struct arg_t* args , char* );

Here, the first argument (provided by ProbeVue as __arg1) is the program name. The second argument is a structure with the arguments in question (provided by __arg2). The third argument gives access to environment variables, which is not important in our case. The structure struct arg_t looks like this for 5 arguments:

struct arg_t
{
        union
        {
                char* arg[5];
                int num[5];
        } u;
};

This structure and the signature of execve must be declared in the ProbeVue script before they can be used.

When accessing the arguments, there is another small problem: if the action block for our sample is addressed, we are in kernel mode, but the arguments themselves are addresses in the user mode of the process. The data (in this case character strings) must be copied out of the user address space. This is done by the function get_userstring.

For every execve we get the PID, the program name, the command and up to 5 arguments. This is implemented in the following program:

#! /usr/bin/probevue

struct arg_t
{
        union
        {
                char* arg[5];
                int num[5];
        } u;
};

int execve( char* , struct arg_t* args , char* );

@@syscall:*:execve:entry
{
        __auto String command[128];
        __auto String argument[128];
        __auto struct arg_t argv;
        copy_userdata( __arg2 , argv );
        command = get_userstring( __arg1 , -1 );
        argument = get_userstring( argv.u.arg[0] , -1 );
        printf( "%llu: %s called execve(%s) with arguments: %s " , __pid , __pname , command , argument )
;
        if ( argv.u.num[1] != 0 )
        {
                argument = get_userstring( argv.u.arg[1] , -1 );
                printf( "%s " , argument );
                if ( argv.u.num[2] != 0 )
                {
                        argument = get_userstring( argv.u.arg[2] , -1 );
                        printf( "%s " , argument );
                        if ( argv.u.num[3] != 0 )
                        {
                                argument = get_userstring( argv.u.arg[3] , -1 );
                                printf( "%s " , argument );
                                if ( argv.u.num[4] != 0 )
                                {
                                        argument = get_userstring( argv.u.arg[4] , -1 );
                                        printf( "%s " , argument );
                                }
                        }
                }
        }
        printf( "\n" );
}

@@sysproc:exit:*
when ( __exitinfo->signo == 11 )
{
        printf( "%llu:  %s\n" , __pid , __pname );
        ptree(10);
}

We called the script capture_segv.e and made it executable.

In theory, after startup the program should output all starting programs with PID, name and up to 5 arguments. In addition, an output occurs when a process is terminated with signal 11 (SIGSEGV). The corresponding PID can then be matched further up in the output and thus the program with arguments can be identified.

Unfortunately, the following small problem arises in practice: if a program is terminated very quickly after the execve, before ProbeVue can copy the arguments with get_userstring, get_userstring accesses a no longer existing address and the ProbeVue script is aborted. We bypassed this by simply starting the ProbeVue script over and over in an infinite loop:

# while true; do ./capture_segv.e >>/tmp/wait_for_segv ; done

We then ran the ProbeVue script for a few hours, until the next crash of the perl script. The file /tmp/wait_for_segv contained about 23,000 lines! We have listed only the relevant lines here:

# cat /tmp/wait_for_segv
…
8651210: ksh called execve(xxxx_hacheck.pl) with arguments: xxxx_hacheck.pl -c
8651210: ksh called execve(/var/opt/OV/bin/instrumentation/xxxx_hacheck.pl) with arguments: xxxx_hacheck
.pl -c
20054518: ksh called execve(/bin/proc2mon.pl) with arguments: proc2mon.pl
…
8651210:  perl

     PID              CMD
       1              init
                        |
                        V
9634196              ovcd
                        |
                        V
9765232              opcacta
                        |
                        V
8651210              perl    <=======
…

One can see that perl was started via the program opcacta, which was started by ovcd. These processes belong to HP OpenView that is in use here. At the top of the output one can see that the perl script /var/opt/OV/bin/instrumentation/xxxx_hacheck.pl has been started. So we found the script that generates the many core files.

The script was written recently and obviously needs to be re-examined and reworked.

With the help of ProbeVue a short script and several hours of waiting was enough to find the cause of the problem! ProbeVue is not only useful for investigating problems; also in performance monitoring, ProbeVue proves to be extremely helpful.

Accessing the Update Access Key Expiration Date from AIX

As part of the introduction of POWER8 systems, IBM has also introduced the “Update Access Key”, which is necessary to perform firmware updates of the managed system. By default, newly delivered systems have an update access key that usually expires after 3 years. Thereafter, the Update Access Key can has to be extended every 6 month, but only if a maintenance contract exists(https://www.ibm.com/servers/eserver/ess/index.wss).

Of course, it is easy to find out when the current Update Access Key runs through the HMC, GUI or CLI. But you can also display the expiration date via the lscfg command from AIX:

In the case of AIX 7.1, this looks like this:

$ lscfg -vpl sysplanar0 | grep -p "System Firmware"
      System Firmware:
...
        Microcode Image.............SV860_138 SV860_103 SV860_138
        Microcode Level.............FW860.42 FW860.30 FW860.42
        Microcode Build Date........20180101 20170628 20180101
        Microcode Entitlement Date..20190825
        Hardware Location Code......U8284.22A.XXXXXXX-Y1
      Physical Location: U8284.22A.XXXXXXX-Y1

In the case of AIX 7.2, the output is slightly different:

$ lscfg -vpl sysplanar0 |grep -p "System Firmware"
      System Firmware:
...
        Microcode Image.............SV860_138 SV860_103 SV860_138
        Microcode Level.............FW860.42 FW860.30 FW860.42
        Microcode Build Date........20180101 20170628 20180101
        Update Access Key Exp Date..20190825
        Hardware Location Code......U8284.22A.XXXXXXX-Y1
      Physical Location: U8284.22A.XXXXXXX-Y1

Relevant are the lines “Microcode Entitlement Date” respectively “Update Access Key Exp Date“.

Special features of NFSv4 Mounts

Many AIX administrators use NFSv4 in the same way as NFSv3 and NFSv2 before. When exporting and mounting version 4 is specified, otherwise everything else is done as before. While this works in most cases, it also prevents the use of some interesting features of NFSv4.

The first significant difference between NFSv4 and its predecessors already shows up when mounting. NFSv2 and NFSv3 use a separate MOUNT protocol. For example, to run the following NFS mount:

clientv3 # mount aixnim:/export/data /mnt
clientv3 #

an RPC request is sent to the rpc.mountd on the NFS server to get an NFS filehandle for the file system/directory /export/data. All NFS operations must always specify an NFS filehandle that uniquely identifies a file or directory on the NFS server. The very first filehandle is requested via the MOUNT protocol.

In the case of NFSv4 this looks quite different, no separate MOUNT protocol is needed. Instead, a so-called root filehandle is used, which is defined by the NFS standard and need not be learned through a separate protocol. At the following NFSv4 mount

clientv4 # mount –o vers=4 aixnim:/export/data /mnt
clientv4 #

the client starts an NFS LOOKUP specifying the well-known (NFS default) root filehandle and relative path “export/data“, then the NFS server returns the associated filehandle. This can easily be tracked with the help of tcpdump, which we did not do here for reasons of space.

A side effect that is surprising for many administrators (and maybe not always understood) is that you can not see the filesystems exported with NFSv4 with “showmount -e“. This is simply because there is no MOUNT protocol for NFSv4. So you can not easily find out on the NFS client which file systems the NFS server has exported for NFSv4.

clientv4 # showmount -e aixnim
no exported file systems for aixnim
clientv4 #

The command “showmount -e” does not show exported file systems, although we were able to mount successfully using NFSv4. We’ll come back to that later.

The second significant difference is that for NFSv4 the NFS server generates a pseudo-file system for each NFSv4 client. This file system starts with the nfsroot directory (by default this is /) and contains all underlying directories and files exported to the client. The pseudo-file system is also created if there is no exported file system for the client!

For demonstration purposes, we have un-exported all file systems on our NFS server aixnim:

aixnim # lsnfsexp
aixnim #

Although nothing has been exported yet for the NFS client, the per client-generated pseudo-file system can be mounted using NFSv4:

clientv4 # mount -o vers=4 aixnim:/ /mnt
clientv4 # ls -il /mnt
total 0
clientv4 #

The mounted file system is of course empty, since nothing has been exported yet. We unmount the mounted file system again (not shown here) and export the directory /var/adm on the NFS server aixnim:

aixnim # mknfsexp -d /var/adm -v 4 -r clientv4
aixnim # lsnfsexp
/var/adm -vers=4,root=clientv4
aixnim #

We now mount the pseudo-file system / again:

clientv4 # mount -o vers=4 aixnim:/ /mnt
clientv4 #

In order to illustrate the differences to NFSv2 and NFSv3 more easily, we briefly show the useful command nfs4cl for the NFSv4 client:

clientv4 # nfs4cl showfs

Server      Remote Path          fsid                 Local Path        
--------    ---------------      ---------------      ---------------   
aixnim    /                    0:42949672964        /mnt              
clientv4 #

The command shows the pseudo-file system / from aixnim, which is mounted under /mnt. We now look briefly with the command ls in the directory /mnt

clientv4 # ls -il /mnt
total 1
    2 dr-xr-xr-x    2 root     system            3 May 21 07:34 var
clientv4 #

In the pseudo-file system generated by the NFS server, only the path component /var is visible. This path component is a prefix of the exported directory /var/adm. Other directories such as /opt or /usr are not visible in the pseudo-file system, because they are not prefixes of an exported path. We take a look at /mnt/var:

clientv4 # ls -il /mnt/var   
total 8
   32 drwxrwxr-x   15 root     adm            4096 May  2 11:30 adm
clientv4 #

Also under directory var, only the directory adm is visible, since only /var/adm is a prefix of an exported path. The pseudo-file system is of course unchangeable at the places that were not exported, as the attempt to create a file under /mnt/var shows:

clientv4 # touch /mnt/var/file
touch: /mnt/var/file cannot create
clientv4 #

Starting with /mnt/var/adm everything looks like we know from NFSv2 and NFSv3, one has access to the exported data:

clientv4 # ls -il /mnt/var/adm
total 704
  110 drw-r-----    2 root     system          256 May 20 14:33 SRC
4165 drwxrwxr-x    2 adm      adm             256 Apr 17 08:07 acct
   70 drwx------    4 root     system          256 Apr 17 07:50 config
4133 drwx------    2 root     system          256 Apr 17 08:03 corrals
...
4   33 -rw-rw-r--    1 adm      adm          337608 May 20 09:30 wtmp
clientv4 #

Now let’s look again at the output of the command “nfs4cl showfs“:

Clientv4 # nfs4cl showfs

Server      Remote Path          fsid                 Local Path        
--------    ---------------      ---------------      ---------------   
aixnim   /var                 0:42949672966        /mnt/var          
aixnim   /                    0:42949672964        /mnt        
clientv4 #

For each physical file system on the server, a separate pseudo-file system is created. The respective pseudo-file system grants access to exported directories of the underlying physical file system and generates read-only directories for path prefixes of exported directories.

On the NFS server aixnim we export the directory /usr/share for the client:

aixnim # mknfsexp -d /usr/share -v 4 -r clientv4
aixnim # lsnfsexp
/var/adm -vers=4,root=clientv4
/usr/share -vers=4,root=clientv4
aixnim #

On the client this time we do not umount and re-mount, but simply use ls to access the mount point /mnt again:

clientv4 # ls -il /mnt
total 2
    2 dr-xr-xr-x    2 root     system            3 May 21 08:13 usr
    2 dr-xr-xr-x    2 root     system            3 May 21 07:34 var
clientv4 #

The path prefix usr of the currently exported directory /usr/share appears on the client without having explicitly performed a mount. A look at /mnt/usr shows that the directory share appears as expected:

clientv4 # ls -il /mnt/usr

total 0

16390 drwxr-xr-x    8 bin      bin             256 Apr 17 08:31 share

clientv4 #

And under /mnt/usr/share are, as expected, the exported data:

clientv4 # ls -il /mnt/usr/share
total 24
74212 drwxr-xr-x    2 bin      bin             256 Apr 17 08:24 X11
49162 drwxr-xr-x    2 bin      bin             256 Nov  3 2015  dict
16391 drwxr-xr-x   12 bin      bin            4096 Apr 17 08:22 lib
17762 lrwxrwxrwx    1 root     system           26 Apr 17 08:31 locale -> /opt/freeware/share/locale
20653 drwxr-xr-x    5 root     system          256 Apr 24 15:46 lpp
16911 drwxr-xr-x   11 bin      bin            4096 May 20 14:25 man
45096 drwxr-xr-x    2 bin      bin            4096 Apr 17 08:03 modems
clientv4 #

The command “nfs4cl showfs” now shows 3 file systems:

Clientv4 # nfs4cl showfs

Server      Remote Path          fsid                 Local Path        
--------    ---------------      ---------------      ---------------   
aixnim /usr                 0:42949672965        /mnt/usr          
aixnim /var                 0:42949672966        /mnt/var          
aixnim /                    0:42949672964        /mnt              
clientv4 #

The last example shows that in the case of NFSv4 new file systems do not necessarily have to be manually mounted on the client. If further file systems are exported on the NFS server for an NFSv4 client, the client can simply access the new file system. Prerequisite, however, is that the path for the new file system is part of the client-mounted pseudo-file system. Since we had the entire pseudo-file system starting from the nfsroot mounted, this is trivially the case. But if we had only mounted /var from the NFS server, then /usr/share would not be part of the pseudo-file system of /var and a separate mount would have been necessary.

That additional file systems can be accessed without explicit mount, as just shown, comes from a third difference of NFSv4 to its predecessors. For NFSv2 and NFSv3, all filehandles are persistent, that is, immutable. NFSv4 introduced volatile filehandles in addition to persistent filehandles. The file handles used by the pseudo-file system are volatile. That means the client must expect that such a filehandle can change. This is the case if another path is exported on the NFS server, the filehandles for the path prefixes in the pseudo-file system change then, the client recognizes this after a short time and responds accordingly.

Finally, a small problem should be noted: we had 1 mount done, but the command “nfs4cl showfs” showed that 3 filesystems were involved. But there is only one file system mounted, as df shows:

clientv4 # df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           1.00      0.86   15%     8367     5% /
/dev/hd2           6.12      2.90   53%    65563     9% /usr
/dev/hd9var        2.00      1.81   10%     1770     1% /var
/dev/hd3           2.00      1.89    6%      452     1% /tmp
/dev/hd1           1.00      0.88   12%      454     1% /home
/dev/hd11admin      0.50      0.50    1%       15     1% /admin
/proc                 -         -    -        -      - /proc
/dev/hd10opt       2.00      0.94   54%    22460    10% /opt
/dev/livedump      1.00      1.00    1%        4     1% /var/adm/ras/livedump
/dev/varadmloglv      2.00      1.85    8%      275     1% /var/adm/log
aixnim:/       0.38      0.20   47%    12644    22% /mnt
clientv4 #

The file system mounted under /mnt is 0.38 GB in size. On the NFS server, /usr/share and /var/adm were exported, a df shows the following sizes:

aixnim # df –g / /usr/share /var/adm
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/hd4           0.38      0.20   47%    12644    22% /
/dev/hd2           2.06      0.23   89%    39236    41% /usr
/dev/hd9var        0.50      0.45   10%      614     1% /var
aixnim #

Obviously, the values of the file system / of the NFS server are used on the client! Under /usr and thus also /usr/share would be available free space of close to 2 GB, but this is not reflected in the df output on the client. Of course it is difficult to specify values on the client, because multiple file systems are involved on the NFS server. The df command simply displays the data of the physical file system underlying the mounted pseudo-file system. In our case, this is the root file system of the NFS server. Here again you can use the command nfs4cl, which has a subcommand for displaying file system information similar to df:

clientv4 # nfs4cl showstat

Filesystem       512-blocks        Free  %Used       Iused %Iused  Mounted on
aixnim:/usr     4325376      482752    89%       39236    41% /mnt/usr  
aixnim:/var     1048576      947064    10%         614     1% /mnt/var  
aixnim:/      786432      417944    47%       12644    22% /mnt     
clientv4 #

This is identical to the values displayed by df on the NFS server.

But even with the standard df of AIX you can get this information, as the following output shows:

clientv4 # df -g /mnt /mnt/usr /mnt/usr/share /mnt/var /mnt/var/adm
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
aixnim:/           0.38      0.20   47%    12644    22% /mnt
[NFSv4]            2.06      0.23   89%    39236    41% /mnt/usr
[NFSv4]            2.06      0.23   89%    39236    41% /mnt/usr/share
[NFSv4]            0.50      0.45   10%      614     1% /mnt/var
[NFSv4]            0.50      0.45   10%      614     1% /mnt/var/adm
clientv4 #

Of course there are a number of other differences of NFSv4 and previous versions, but they should not be discussed here. Maybe there will be another article on the subject.

The features presented above are especially useful for hierarchical mounts. With NFSv4, you only need to mount one filesystem and you need not worry about the order of mounts.

Understanding how the pseudo-file system works for the NFSv4 client helps determine the exported file systems on the client. Instead of using “showmount -e” as before with NFSv2 and NFSv3 (which does not get results with NFSv4), you can simply mount everything starting with / and then use cd and ls to find out what the NFS server has exported.