secldapclntd: LDAP failed to bind to server

We recently had a minor outage on one of our systems. Users could no longer log in and users could no longer use a web GUI. The problem occurred sporadically and then disappeared again. During these times there were a lot of messages of the following types in the syslog:

Mar  4 07:56:05 aix01 daemon:err|error secldapclntd: LDAP failed to bind to server aixldapp11.
Mar  4 07:56:05 aix01 daemon:err|error secldapclntd: LDAP failed to bind to server aixldapp12.
Mar  4 07:56:10 aix01 daemon:err|error secldapclntd: LDAP failed to bind to server aixldapp11.
Mar  4 07:56:10 aix01 daemon:err|error secldapclntd: LDAP failed to bind to server aixldapp12.
...

These messages indicated that both LDAP servers were temporarily unavailable. Investigations of the logs on the two LDAP servers did not reveal any connection attempts that were rejected. An examination of the firewall logs also showed that the firewall did not block any packets to the LDAP servers.

We then opened a case at IBM and promptly received instructions back for setting up special LDAP tracing. However, the problem then no longer occurred and therefore the tracing could not determine the cause.

The system in question can be reached via the Internet and therefore the number of ephemeral ports has been restricted for security reasons. Only about 1500 ephemeral ports are allowed through the firewall and configured on AIX (tcp_ephemeral_high and tcp_ephemeral_low). We then simulated on a test system what happens when the available ephemeral ports are exhausted. The above messages came immediately as soon as the LDAP client tried to open a new connection.

One reason for the LDAP client error message “LDAP failed to bind to server” can be the unavailability of ephemeral ports!

The “label” Attribute for FC Adapters

As of AIX 7.2 TL4 and VIOS 3.1.1.10 there is a new attribute “label” for physical FC adapters. The administrator can set this attribute to any character string (maximum 255 characters). Even if the attribute is only informative, it can be extremely useful in PowerVM virtualization environments. If you have a large number of managed systems, it is not always clear to which FC fabric a certain FC port is connected. This can of course be looked up in the documentation of your systems, but it does involve a certain amount of effort. It is easier if you link this information directly with the FC adapters, which is exactly what the new “label” attribute allows in a simple way. On AIX:

# chdev -l fcs0 -U -a label="Fabric_1"
fcs0 changed
# lsattr -El fcs0 -a label -F value
Fabric_1
#

On virtual I/O servers, the attribute can also be set using the padmin account:

/home/padmin> chdev -dev fcs1 -attr label="Fabric_2" -perm
fcs1 changed
/home/padmin> lsdev -dev fcs1 -attr label                
value

Fabric_2
/home/padmin>

The attribute is also defined for older FC adapters.

If the “label” attribute is consistently used, it is always possible to determine online for each FC adapter to which fabric the adapter is connected to. This information only needs to be stored once for each FC adapter.

(Note: The “label” attribute is not implemented for AIX 7.1, at least not until 7.1 TL5 SP6.)

Correcting “wrong” LVM mirroring

Mirrored logical volume from practice

In Part III of our article series “AIX LVM: Mechanics of migratepv” we show how incorrect mirroring of logical volumes can be corrected online. All that is required is the migratelp and migratepv commands and a good understanding of how these commands work.

Here are the links to the articles in the series:

AIX LVM: Mechanics of migratepv (Part I)

AIX LVM: Mechanics of migratepv (Part II)

AIX LVM: Mechanics of migratepv (Part III)

HSCF0180E Operation failed for

When trying to update the system firmware of a managed system via the HMC Command Line, we encountered the following error message:

hmc01:~> updlic -o a -t all -l latest -m ms26 -r sftp -h X.X.X.X -u XXXXXXXX --passwd XXXXXXXX -d /firmware/system/01VL940_071_027
HSCF0180E Operation failed for ms26 (9009-22A*XXXXXXX).
Could not unpack the firmware update package.
Check the health and available disk space of the file system.
hmc01:~>

The error message that was displayed suggested checking the available space in the HMC file systems:

hmc01:~> lshmcfs
filesystem=/var,filesystem_size=7935,filesystem_avail=4955,temp_files_start_time=11/22/2018 12:59:00,temp_files_size=2011
filesystem=/dump,filesystem_size=60347,filesystem_avail=55935,temp_files_start_time=02/15/2021 10:21:00,temp_files_size=0
filesystem=/extra,filesystem_size=20030,filesystem_avail=15939,temp_files_start_time=none,temp_files_size=0
filesystem=/,filesystem_size=15615,filesystem_avail=4369,temp_files_start_time=02/15/2021 06:05:00,temp_files_size=4
hmc01:~>

Actually, the available space should be sufficient, but to be on the safe side, we’ve cleaned up a bit with the temporary files:

hmc01:~> chhmcfs -o f -d 5
hmc01:~>

The updlic command was unimpressed and returned the same error message.

The removal of some old firmware versions from the HMC‘s local disk repository was also unsuccessful:

hmc01:~> updlic -o p --ecnumber 01AL740
hmc01:~> updlic -o p --ecnumber 01AL770
hmc01:~> updlic -o p --ecnumber 01AM740
hmc01:~>

The error message was still the same. Obviously, contrary to the information in the error message, the problem had nothing to do with the available space on the HMC!

We then took a closer look at the downloaded firmware. We downloaded the firmware as an ISO file H75557812_01VL940_071_027.iso from the IBM website and then mounted it on our NIM server using the loopmount command:

aixnim:/root> loopmount -i /tmp/H75557812_01VL940_071_027.iso -o "-o ro -V cdrfs" -m /mnt
aixnim:/root> ls -l /mnt
total 528296
-rw-r--r--    1 102010979 213            1860 Feb 04 09:08 01VL940071_special_instructs.xml.special.note.xml
-rw-r-----    1 102010979 210            7290 Feb 04 09:08 01VL940_071_027.dd.xml
-rw-r--r--    1 102010979 213           95687 Feb 04 09:07 01VL940_071_027.html
-rw-r-----    1 102010979 210            2971 Feb 04 09:08 01VL940_071_027.pd.sdd
-rw-r-----    1 102010979 210           67338 Feb 04 09:08 01VL940_071_027.readme.txt
-rw-r-----    1 102010979 210       134969022 Feb 04 09:08 01VL940_071_027.rpm
-rw-r-----    1 102010979 210       135328848 Feb 04 09:08 01VL940_071_027.tar.gz
-rw-r-----    1 102010979 210            9442 Feb 04 09:08 01VL940_071_027.xml
aixnim:/root>

What we didn’t notice when copying the files was the lack of read permissions on other for most files.

After we had assigned read permissions for all files, the next update attempt was successful:

hmc01:~> updlic -o a -t all -l latest -m ms26 -r sftp -h X.X.X.X -u XXXXXX --passwd XXXXXXXX -d /firmware/system/01VL940_071_027 
HSCF0179W Operation was partially successful for ms26 (9009-22A*XXXXXXX).
The following deferred fixes are present in the fix pack.  Deferred fixes will be activated after the next IPL of the system.
An immediate IPL is not required, unless you want to activate one of the fixes below now.
..
hmc01:~>

LPAR-Tool 1.6.0.0 is available now

Version 1.6.0.0 of our LPAR tool is now available in our download area!

New features are:

  • Online monitoring of SEA client statistics (vios help seastat)
  • Online monitoring of virtual FC client adapters (vios help fcstat)
  • Display of historical processor and memory data (lpar help lsmem, lpar help lsproc)

In the article Monitoring SEA Traffic the possibilities of calling up SEA client statistics are shown.

Error: Mirror pools must be defined

The following error message occurred while creating a mirrored logical volume:

# mklv -c 2 -t jfs2 datavg01 10
0516-1814 lcreatelv: Mirror pools must be defined for each copy when strict mirror
        pools are enabled.
0516-822 mklv: Unable to create logical volume.
#

The reason for this is that the mirror pool strictness is set for the volume group. The article Mirror Pools: Understanding Mirror Pool Strictness examines and explains this in more detail.

Error: Every mirror pool must contain a copy of the logical volume

The attempt to create an unmirrored LV in a VG with mirror pools results in the following error:

# mklv -t jfs2 -p copy1=DC1 datavg01 10
0516-1829 mklv: Every mirror pool must contain a copy of
        the logical volume.
0516-822 mklv: Unable to create logical volume.
#

The cause is the VG‘s Mirror Pool Strictness attribute, which is set to ‘super-strict‘. The article Mirror Pools: Understanding Mirror Pool Strictness examines the importance of Mirror Pool Strictness using examples.

New Article Introduction to Mirror Pools

Many IT environments operate their systems in more than one data center. In order not to have any data loss in the event of a failure of an entire data center, the data is mirrored between 2 or more data centers. The mirroring can be realized by the storage (storage based mirroring) or by a volume manager (LVM in the case of AIX) on the server (host based mirroring). In the article Mirror Pools: An Introduction we look at mirroring using AIX Logical Volume Manager and Mirror Pools. The aim is to show how the correct mirroring of logical volumes can be implemented with the help of mirror pools. In larger environments with many physical volumes, maintaining correct mirroring without mirror pools is difficult and a challenge for the administrator.

The Impact of FC-Ports without a Link

FC ports that are not used and do not have a link should be deactivated, as these significantly extend the runtime of a series of commands and operations (e.g. LPM).

(Note: our LPAR tool is used in some examples, but the corresponding commands on the HMC or the virtual I / O server are always shown!)

Two 4-port FC adapters are in use on one of our virtual I / O servers (ms26-vio1):

$ lpar lsslot ms26-vio1
DRC_NAME                  DRC_INDEX  IOPOOL  DESCRIPTION
U78D3.001.XXXXXXX-P1-C49  21040015   none    PCIe3 x8 SAS RAID Internal Adapter 6Gb
U78D3.001.XXXXXXX-P1-C7   2103001C   none    PCIe3 4-Port 16Gb FC Adapter
U78D3.001.XXXXXXX-P1-C2   21010021   none    PCIe3 4-Port 16Gb FC Adapter
$
(HMC: lshwres -r io --rsubtype slot -m ms26 --filter lpar_names=ms26-vio1)

However, only 2 ports of the 8 ports are cabled:

$ vios lsnports ms26-vio1
NAME  PHYSLOC                     FABRIC  TPORTS  APORTS  SWWPNS  AWWPNS
fcs0  U78D3.001.XXXXXXX-P1-C2-T1  1       64      64      3072    3072
fcs4  U78D3.001.XXXXXXX-P1-C7-T1  1       64      64      3072    3072
$
(VIOS: lsnports)

When working with the virtual I / O server, it is noticeable, that some of the commands have an unexpectedly long runtime and sometimes hang for a long time. Some example commands are given below, along with the measured runtime:

(0)padmin@ms26-vio1:/home/padmin> time netstat –cdlistats
…
Error opening device: /dev/fscsi1
errno: 00000045

Error opening device: /dev/fscsi2
errno: 00000045

Error opening device: /dev/fscsi3
errno: 00000045

Error opening device: /dev/fscsi5
errno: 00000045

Error opening device: /dev/fscsi6
errno: 00000045

Error opening device: /dev/fscsi7
errno: 00000045

real    1m13.56s
user    0m0.03s
sys     0m0.10s
(0)padmin@ms26-vio1:/home/padmin>
(0)padmin@ms26-vio1:/home/padmin> time lsnports
name             physloc                        fabric tports aports swwpns  awwpns
fcs0             U78D3.001.XXXXXXX-P1-C2-T1          1     64     64   3072    3072
fcs4             U78D3.001.XXXXXXX-P1-C7-T1          1     64     64   3072    3072

real    0m11.61s
user    0m0.01s
sys     0m0.00s
(0)padmin@ms26-vio1:/home/padmin>
(0)padmin@ms26-vio1:/home/padmin> time fcstat fcs1

Error opening device: /dev/fscsi1
errno: 00000045

real    0m11.31s
user    0m0.01s
sys     0m0.01s
(4)padmin@ms26-vio1:/home/padmin>

LPM operations also take significantly longer, since all FC ports are examined when searching for suitable FC ports for the necessary NPIV mappings. This can lead to delays in the range of minutes before the migration is finally started.

In order to avoid these unnecessarily long runtimes, FC ports that are not wired should not be activated. The fscsi device has the attribute autoconfig, with the possible values defined and available. By default, the value available is used, which means that the kernel configures and activates the device, even if it has no link, which leads to the waiting times shown above. If the autoconfig attribute is set to defined, the fscsi device is not activated, it then remains in the defined state.

The following example shows how to reconfigure the fscsi1 device:

$ vios chdev ms26-vio1 fscsi1 autoconfig=defined
$
(VIOS: chdev -dev fscsi1 -attr autoconfig=defined)
$
$ vios rmdev ms26-vio1 fscsi1
$
(VIOS: rmdev -dev fscsi1 –ucfg)
$
$ vios lsdev ms26-vio1 fscsi1
NAME    STATUS   PHYSLOC                     PARENT  DESCRIPTION
fscsi1  Defined  U78D3.001.XXXXXXX-P1-C2-T2  fcs1    FC SCSI I/O Controller Protocol Device
$
(VIOS: lsdev -dev fscsi1)
$
$  vios lsattr ms26-vio1 fscsi1
ATTRIBUTE     VALUE      DESCRIPTION                            USER_SETTABLE
attach        none       How this adapter is CONNECTED          False
autoconfig    defined    Configuration State                    True
dyntrk        yes        Dynamic Tracking of FC Devices         True+
fc_err_recov  fast_fail  FC Fabric Event Error RECOVERY Policy  True+
scsi_id       Adapter    SCSI ID                                False
sw_fc_class   3          FC Class for Fabric                    True
$
(VIOS: lsdev -dev fscsi1 –attr)
$

With the autoconfig=defined attribute, the fscsi device remains defined even when the cfgmgr is run!

If one repeats the runtime measurement of the commands above, one can see that the runtime of the commands has already measurably improved:

(0)padmin@ms26-vio1:/home/padmin> time netstat –cdlistats
…
Error opening device: /dev/fscsi1
errno: 00000005

Error opening device: /dev/fscsi2
errno: 00000045

Error opening device: /dev/fscsi3
errno: 00000045

Error opening device: /dev/fscsi5
errno: 00000045

Error opening device: /dev/fscsi6
errno: 00000045

Error opening device: /dev/fscsi7
errno: 00000045

real    1m1.02s
user    0m0.04s
sys     0m0.10s
(0)padmin@ms26-vio1:/home/padmin>
(0)padmin@ms26-vio1:/home/padmin> time lsnports
name             physloc                        fabric tports aports swwpns  awwpns
fcs0             U78D3.001.XXXXXXX-P1-C2-T1          1     64     64   3072    3072
fcs4             U78D3.001.XXXXXXX-P1-C7-T1          1     64     64   3072    3072

real    0m9.70s
user    0m0.00s
sys     0m0.01s
(0)padmin@ms26-vio1:/home/padmin>
(0)padmin@ms26-vio1:/home/padmin> time fcstat fcs1

Error opening device: /dev/fscsi1
errno: 00000005

real    0m0.00s
user    0m0.02s
sys     0m0.00s
(4)padmin@ms26-vio1:/home/padmin>

The running time of the netstat command was shortened by 12 seconds, the lsnports command was about 2 seconds faster.

We now also set the autoconfig attribute to defined for all other unused FC ports:

$ for fscsi in fscsi2 fscsi3 fscsi5 fscsi6 fscsi7
> do
> vios chdev ms26-vio1 $fscsi autoconfig=defined
> vios rmdev ms26-vio1 $fscsi
> done
$

Now we repeat the runtime measurement of the commands again:

(0)padmin@ms26-vio1:/home/padmin> time netstat –cdlistats
…
Error opening device: /dev/fscsi1
errno: 00000005

Error opening device: /dev/fscsi2
errno: 00000005

Error opening device: /dev/fscsi3
errno: 00000005

Error opening device: /dev/fscsi5
errno: 00000005

Error opening device: /dev/fscsi6
errno: 00000005

Error opening device: /dev/fscsi7
errno: 00000005

real    0m0.81s
user    0m0.03s
sys     0m0.10s
(0)padmin@ms26-vio1:/home/padmin>
(0)padmin@ms26-vio1:/home/padmin> time lsnports         
name             physloc                        fabric tports aports swwpns  awwpns
fcs0             U78D3.001.XXXXXXX-P1-C2-T1          1     64     64   3072    3072
fcs4             U78D3.001.XXXXXXX-P1-C7-T1          1     64     64   3072    3072

real    0m0.00s
user    0m0.01s
sys     0m0.01s
(0)padmin@ms26-vio1:/home/padmin> time fcstat fcs1       

Error opening device: /dev/fscsi1
errno: 00000005

real    0m0.04s
user    0m0.00s
sys     0m0.00s
(4)padmin@ms26-vio1:/home/padmin>

The netstat command now takes less than 1 second, the lsnports command only 0.1 seconds.

It is therefore worthwhile to set the autoconfig attribute for unused FC ports to defined!

 

Don’t Forget to Update AIX-rpm

Recently, when installing Python3 from the AIX toolbox using YUM, we encountered the following error situation:

# yum install python3
...
--> Finished Dependency Resolution
Error: Package: python3-3.7.1-1.ppc (AIX_Toolbox)
           Requires: libssl.a(libssl.so.1.0.2)
Error: Package: python3-3.7.1-1.ppc (AIX_Toolbox)
           Requires: libcrypto.a(libcrypto.so.1.0.2)
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
#

Python3 requires version 1.0.2 of libssl.so and libcrypto.so, but the two libraries cannot be found. These two libraries can either be provided via the RPM package openssl, in the appropriate version, or via AIX. When the AIX toolbox is used, the openssl package is typically not installed, and the AIX OpenSSL libraries are used. Therefore, in our case, the OpenSSL RPM package is not installed. The libraries are then made available by BFF packages of the AIX operating system.

The information about which libraries are made available by AIX is provided via the virtual RPM package AIX-rpm:

# rpm -q --provides AIX-rpm | egrep "lib(ssl|crypto)"
libcrypto.a(libcrypto.so)
libcrypto.a(libcrypto.so.0.9.8)
libcrypto.a(libcrypto.so.1.0.0)
libcrypto.a(libcrypto64.so)
libcrypto.a(libcrypto64.so.0.9.8)
libcrypto.a(libcrypto64.so.1.0.0)
libcrypto.so.0.9.7
libcrypto_compat.a(libcrypto.so)
libcrypto_compat.a(libcrypto.so.0.9.8)
libcrypto_compat.a(libcrypto64.so)
libcrypto_compat.a(libcrypto64.so.0.9.8)
libssl.a(libssl.so)
libssl.a(libssl.so.0.9.8)
libssl.a(libssl.so.1.0.0)
libssl.a(libssl64.so)
libssl.a(libssl64.so.0.9.8)
libssl.a(libssl64.so.1.0.0)
libssl3.a(libssl3.so)
libssl3.so
libssl_compat.a(libssl.so)
libssl_compat.a(libssl.so.0.9.8)
libssl_compat.a(libssl64.so)
libssl_compat.a(libssl64.so.0.9.8)
#

The output shows that versions 0.9.8 and 1.0.0 of the two required libraries are known to the AIX-rpm package, but the required version 1.0.2 is apparently not available. We take a quick look at the associated archive /usr/lib/libssl.a (or /usr/lib/libcrypto.a):

# ar -X any t /usr/lib/libssl.a
libssl.so
libssl.so.0.9.8
libssl.so.1.0.0
libssl.so.1.0.2
libssl.so
libssl.so.0.9.8
libssl.so.1.0.0
libssl.so.1.0.2
libssl64.so
libssl64.so.0.9.8
libssl64.so.1.0.0
#

The archive obviously also provides version 1.0.2. This information is not provided by the AIX-rpm RPM package. This is because a newer version of openssl.base has obviously been installed, but the AIX-rpm package has not been updated. The RPM package currently has the following version:

# rpm -q AIX-rpm
AIX-rpm-7.1.5.15-6.ppc
#

We update the package now with the command updtvpkg (update virtual package):

# updtvpkg
Please wait...
# 
# rpm -q AIX-rpm
AIX-rpm-7.1.5.30-7.ppc
#

The version number has now changed from 7.1.5.15-6 to 7.1.5.30-7! We can again display which libraries are provided by AIX via AIX rpm:

# rpm -q --provides AIX-rpm | egrep "lib(ssl|crypto)"
libcrypto.a(libcrypto.so)
libcrypto.a(libcrypto.so.0.9.8)
libcrypto.a(libcrypto.so.1.0.0)
libcrypto.a(libcrypto.so.1.0.2)
libcrypto.a(libcrypto64.so)
libcrypto.a(libcrypto64.so.0.9.8)
libcrypto.a(libcrypto64.so.1.0.0)
libcrypto.so
libcrypto.so.0.9.7
libcrypto.so.1.0.0
libcrypto_compat.a(libcrypto.so)
libcrypto_compat.a(libcrypto.so.0.9.8)
libcrypto_compat.a(libcrypto64.so)
libcrypto_compat.a(libcrypto64.so.0.9.8)
libssl.a(libssl.so)
libssl.a(libssl.so.0.9.8)
libssl.a(libssl.so.1.0.0)
libssl.a(libssl.so.1.0.2)
libssl.a(libssl64.so)
libssl.a(libssl64.so.0.9.8)
libssl.a(libssl64.so.1.0.0)
libssl.so
libssl.so.1.0.0
libssl3.a(libssl3.so)
libssl3.so
libssl_compat.a(libssl.so)
libssl_compat.a(libssl.so.0.9.8)
libssl_compat.a(libssl64.so)
libssl_compat.a(libssl64.so.0.9.8)
#

After the update of the virtual package AIX-rpm, the RPM database now also contains the version 1.0.2 of libssl.so and libcrypto.so.

The installation of Python3 is now successful:

# yum install python3
...
Is this ok [y/N]: y
...
Installed:
  python3.ppc 0:3.7.1-1                                                                                       

Dependency Installed:
  bzip2.ppc 0:1.0.6-3  expat.ppc 0:2.2.4-1  info.ppc 0:6.4-1  libffi.ppc 0:3.2.1-3  libstdc++.ppc 0:6.3.0-2
  ncurses.ppc 0:6.1-2

Dependency Updated:
  readline.ppc 0:7.0-5                                                                                        

Complete!
#

Therefore, do not forget to update AIX-rpm when updating BFF packages!