LPAR tool with test license until 15th september 2019

In our download area, version 1.3.0.2 of our LPAR tool, including a valid test license (valid until 15th september 2019) is available for download. The license is contained directly in the binaries, so no license key must be entered. The included trial license allows use of the LPAR tool for up to 10 HMCs, 100 managed systems and 1000 LPARs.

ProbeVue in Action: Monitoring the Queue Depth of Disks

Disk and storage systems support Tagged Command Queuing, i.e. connected servers can send multiple I/O jobs to the disk or storage system without waiting for older I/O jobs to finish. The number of I/O requests you can send to a disk before you have to wait for older I/O requests to complete can be configured using the hdisk queue_depth attribute on AIX. For many hdisk types, the value 20 for the queue_depth is the default value. In general, most storage systems allow even greater values for the queue depth.

With the help of ProbeVue, the utilization of the disk queue can be monitored very easily.

Starting with AIX 7.1 TL4 or AIX 7.2 TL0, AIX supports the I/O Probe Manager. This makes it easy to trace events in AIX’s I/O stack. If an I/O is started by the disk driver, this is done via the iostart function in the kernel, the request is forwarded to the adapter driver and then passed to the storage system via the host bus adapter. Handling the response is done by the iodone function in the kernel. The I/O Probe Manager supports (among others) probe events at these locations:

@@io:disk:iostart:read:<filter>
@@io::disk:iostart:write:<filter>
@@io:disk:iodone:read:<filter>
@@io::disk:iodone:write:<filter>

As a filter, e.g. a hdisk name like hdisk2 can be specified. The probe points then only trigger events for the disk hdisk2. This allows to perform an action whenever an I/O for a hdisk begins or ends. This would allow to measure how long an I/O operation takes or just to count how many I/Os are executed. In our example, we were interested in the utilization of the disk queue, i.e. the number of I/Os sent to the disk which are not yet completed. The I/O Probe Manager has a built-in variable __diskinfo for the iostart and iodone I/O probe events with the following fields (https://www.ibm.com/support/knowledgecenter/en/ssw_aix_72/com.ibm.aix .genprogc / probevue_man_io.htm):

name          char*     Name of the disk.
…
queue_depth   int       The queue depth of the disk (value from ODM)
cmds_out      int       Number of outstanding I/Os
…

The cmds_out field indicates how many I/Os have already been sent to the disk for which the I/O has not yet been completed (response has not yet arrived at the server).

The following section of code determines the minimum, maximum, and average number of entries in the disk queue:

@@io:disk:iostart:*:hdisk0     // Only I/Os for hdisk0 are considered
{
   queue = __iopath->cmds_out; // Store number of outstanding I/Os in variable queue
   ++numIO;                    // Number of I/Os (used for calculating the average)
   avg += queue;               // Add number of outstanding I/Os to variable avg
   if ( queue < min )
      min = queue;             // Check if minimum
   if ( queue > max )
      max = queue;             // Check if maximum
}

The calculated values are then printed once per second using the interval probe manager:

@@interval:*:clock:1000
{
   if ( numIO == 0 )
      numIO = 1;    // Prevent division by 0 when calculating the average
   if ( min > max )
      min = max;
   printf( "%5d  %5d  %5d\n" , min , avg/numIO , max );
   min = 100000;   // Reset variables for the next interval
   avg = 0;
   max = 0;
   numIO = 0;
}

The full script is available for download on our website: ioqueue.e.

Here is a sample run of the script for the disk hdisk13:

# ./ioqueue.e hdisk13
  min    avg    max
    1      1      2
    1      1      9
    1      1      2
    1      1      8
    1      1      2
    1      1      2
    1      1      8
    1      1     10
    1      1      2
    1      1      1
    1      1     10
    1      1      2
    1      1     11
...

The script expects an hdisk as an argument, and then outputs once per second the values determined for the specified hdisk.

In the example output you can see that the maximum number of entries in the disk queue is 11. An increase of the attribute queue_depth therefore makes no sense from a performance perspective.

Here’s another example:

# ./ioqueue.e hdisk21
  min    avg    max
    9     15     20
   11     17     20
   15     19     20
   13     19     20
   14     19     20
   17     18     20
   18     18     19
   16     19     20
   13     18     20
   18     19     19
   17     19     20
   18     19     20
   17     19     19
...

In this case, the maximum value of 20 (the hdisk21 has a queue_depth of 20) is reached on a regular basis. Increasing the queue_depth can improve throughput in this case.

Of course, the sample script can be expanded in various ways; to determine the throughput in addition, or the waiting time of I/Os in the wait queue, or even the position and size of each I/O on the disk. This example just shows how easy it is to get information about I/Os using ProbeVue.

More Articles on ProbeVue

ProbeVue: Practical Introduction

ProbeVue: Practical Introduction II

ProbeVue in Action: Identifying a crashing Process

ProbeVue in Action: Monitoring the Queue Depth of Disks

Numbers: FC World Wide Names (WWNs)

Most of us know WWNs as 64-bit WWNs, written as 16 hexadecimal digits. The knowledge that there are different formats of WWNs and that there are also 128-bit WWNs is not quite as well known. In this article, therefore, the different formats of WWNs are briefly presented.

The basic structure of 64-bit WWNs looks like this:

+---+----------------+
|NAA| NAME           |
+---+----------------+
4-bit 60-bit

The 4-bit NAA (Network Address Authority) field specifies the type of address and the format of the address.

There are a number of different possibilities for the 60-bit NAME field.

 

1. Format 1 Address (NAA = 0001)

+---+--------+------------------------+
|NAA|Reserved| 48-bit IEEE MAC Address|
+---+--------+------------------------+
4-bit 12-bit   48-bit

In the Reserved (12-bit) field, all bits must be set to 0!

Example:

1 000 00507605326d (To clarify the format, the fields are separated by spaces)

 

2. Format 2 Address (NAA = 0010)

+---+---------------+-----------------------+
|NAA|Vendor Assigned|48-bit IEEE MAC Address|
+---+---------------+-----------------------+
4-bit  12-bit         48-bit

The 12-bit “Vendor Assigned” field can be used arbitrarily by the manufacturer.

Example:

2 001 00507605326d (To clarify the format, the fields are separated by spaces)

 

3. Format 3 Address (NAA = 0011)

+---+-----------------+
|NAA|Vendor Assigned  |
+---+-----------------+
4-bit 60-bit

The field “Vendor Assigned” (60-bit) is assigned arbitrarily by the manufacturer. Thus, this type of address is not unique worldwide and therefore usually not found in practice.

Example:

3 0123456789abcde (To clarify the format, the fields are separated by spaces)

 

4. Format 4 Address (NAA = 0100)

+---+---------+--------------+
|NAA|Reserved | IPv4 Address |
+---+---------+--------------+
4-bit 28-bit     32-bit

The “IPv4 Address” (32-bit) field contains a 32-bit IPv4 address.

Example for IP 10.0.0.1:

4 0000000 0a000001 (To clarify the format, the fields are separated by spaces)

 

5. Format 5 Address (NAA = 0101)

+---+-------+-----------------+
|NAA| OUI   | Vendor Assigned |
+---+-------+-----------------+
4.bit 24-bit 36-bit

The OUI (24-bit) field contains the 24-bit IEEE-assigned ID (Organizational Unique ID).

The field “Vendor Assigned” (36-bit) can be assigned arbitrarily by the manufacturer.

Example:

5 005076 012345678 (To clarify the format, the fields are separated by spaces)

 

6. Format 6 Address (NAA = 0110)

Format 6 addresses are 128-bit addresses and are often used for LUNs on the SAN.

+---+-------+---------------+-------------------------+
|NAA|  OUI  |Vendor Assigned|Vendor Assigned Extension|
+---+-------+---------------+-------------------------+
4.bit 24-bit  36-bit          64-bit

The OUI (24-bit) field contains the 24-bit ID assigned by the IEEE.

The field “Vendor Assigned” (36-bit) can be arbitrarily assigned by the manufacturer.

The field “Vendor Assigned Extension” (64-bit) can also be assigned arbitrarily by the manufacturer.

Example:

6 005076 012345678 0123456789abcdef (To clarify the format, the fields are separated by spaces)

 

7. IEEE EUI-64 Address (NAA=11)

In the case of this address format, the NAA field is shortened to only 2 bits, where NAA is 11.

+---+-------------+---------------+
|NAA|OUI shortened|Vendor Assigned|
+---+-------------+---------------+
2-bit 22-bit       40-bit

The “OUI shortened” field (22-bit) is a 22-bit shortened version of the IEEE-assigned 24-bit ID.

(The two least significant bits of the first byte are omitted and the remaining 6 bits are shifted 2 bits to the right to make room for the two NAA bits.)

The field “Vendor Assigned” (40-bit) can be arbitrarily assigned by the manufacturer.

These types of addresses are often used in the area of virtualization, e.g. when it comes to NPIV (N_Port ID Virtualization).

Example:

c05076 0123456789 (To clarify the format, the fields are separated by spaces)

 

 

Full file system: df and du show different space usage

Full file systems occur in practice again and again, everyone knows this. Usually you search for large files or directories and check if older data can be deleted to make space again (but sometimes the file system will be simply enlarged without further investigation). In some cases, however, you can not find any larger files that could be deleted or you discover that file system space is seems to be gone, but you can not identify where this space is used. The command du then displays a smaller value for the file system space used than df. In the following, such an example is shown, as well as the hint how to identify where the filesystem-space is and how it can finally be recovered. AIX has a nice feature to offer that is not found in most other UNIX derivatives.

The file system /var/adm/log is 91% filled, currently 3.6 GB of the file system are in use:

# df -g  /var/adm/log
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/varadmloglv      4.00      0.39   91%      456     1% /var/adm/log
#

A quick check with the command du shows that apparently much less space is occupied:

# du –sm /var/adm/log
950.21   /var/adm/log
#

The command “disk usage” shows only 950 MB occupied space! This is 2.7 GB less than the value from the df command. But where is the missing space?

The difference comes from files that have been deleted but are still open by at least one process. The entry for such files is removed from the associated directory, which makes the file inaccessible. Therefore the command du does not take thes files into account and comes up with a smaller value. As long as a process still has the deleted file in use, however, the associated blocks are not released in the file system, so df correctly displays these as occupied.

So there is at least one file in the file system /var/adm/log which has been deleted but is still open by a process. The question is how to identify the process and the file.

AIX provides an easy way to identify processes that have opened deleted files, the fuser command supports the -d option to list processes that have deleted files open:

# fuser -d /var/adm/log
/var/adm/log:  9110638
#

If you also use the -V option, you will also see information about the deleted files, such as the inode number and file size:

# fuser -dV /var/adm/log
/var/adm/log:
inode=119    size=2882647606   fd=12     9110638
#

The output shows that here the file with the inode number 119 with a size of approximately 2.8 GB was deleted, but is still opened by the process with the PID 9110638 over the file descriptor 12.

Using ps you can quickly find out which process it is:

# ps -ef|grep 9110638
    root  9110638  1770180   0   Nov 20      - 28:28 /usr/sbin/syslogd
    root  8193550  8849130   0 09:13:35  pts/2  0:00 grep 9110638
#

In our case it is the syslogd process. Presumably a log file was rotated via mv without informing the syslogd (refresh -s syslogd). We fix this shortly and check the file system again:

# refresh -s syslogd
0513-095 The request for subsystem refresh was completed successfully.
#
# df -g /var/adm/log
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
/dev/varadmloglv      4.00      3.07   24%      455     1% /var/adm/log
#

The output shows that the file system blocks have now been released.