The LPAR tool is now available in version 1.3.0.1 in our download area. It has a test license included, with a validity until 01.06.2019. A trial license is therefore no longer necessary for testing.
So: Download, install and get started!

Home of the LPAR-Tool
The LPAR tool is now available in version 1.3.0.1 in our download area. It has a test license included, with a validity until 01.06.2019. A trial license is therefore no longer necessary for testing.
So: Download, install and get started!
Occasionally, a host key is changed on a host, either manually or possibly automatically through an update of OpenSSH. When you log in via ssh to the host in question you will get the following message:
$ ssh aix01
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that a host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
SHA256:xYglDF3cuHCCrxtbFUbpofpmhNs9MiO114vAT4qVX2M.
Please contact your system administrator.
Add correct host key in /home/as/.ssh/known_hosts to get rid of this message.
Offending RSA key in /home/as/.ssh/known_hosts:2
RSA host key for aix01 has changed and you have requested strict checking.
Host key verification failed.
$
Now many administrators use vi (or another editor) to remove the entry with the old host key from the known_hosts file. The line number of the corresponding entry is given in the output above, /home/as/.ssh/known_hosts:2 means the entry is in line 2 of the file.
It is much easier to remove the obsolete host key using the ssh-keygen command and the “-R” (remove) option:
$ ssh-keygen -R aix01
# Host aix01 found: line 2
/home/as/.ssh/known_hosts updated.
Original contents retained as /home/as/.ssh/known_hosts.old
$
The command creates a copy of the file, with the extension “.old” and removes the desired entry. This is much easier than using an editor!
If you want to know if a host key for a system already exists in the known_hosts, there is the option “-F” (find) for this purpose:
$ ssh-keygen -F aix02
# Host aix02 found: line 49
aix02,192.168.178.49 ssh-rsa AAAAB3NzaC1yc2E...
$
The public host key and the line for the system are shown.
Recently, no cron job was started anymore on one of our AIX systems. There was no entry in the error report and no indication of the problem could be found by syslog. In the log of the cron daemon, however, there were a lot of messages:
# cat /var/adm/cron/log
...
! c queue max run limit reached Sat Feb 23 08:49:00 2019
! rescheduling a cron job Sat Feb 23 08:49:00 2019
...
On AIX, the number of active cron jobs is set to 100 by default. Obviously this number had been achieved on our system. New entries are then executed by default 60 seconds later. Both can be configured via the file /var/adm/cron/queuedefs. The value 100 is already quite high and reaching the value indicates a problem.
The PID of the cron daemon was quick to find out:
$ ps -ef|grep cron
root 6684924 1 0 Sep 26 - 8:03 /usr/sbin/cron
$
The currently active cron jobs run as cron‘s child processes. With the option “-T” of the ps command, we can quickly list all children:
$ ps -T 6684924
PID TTY TIME CMD
6684924 - 8:03 cron
3276876 - 0:00 |\--perl
9961588 - 0:00 | \--mount
12714002 - 0:07 | \--nfsmnthelp
3604516 - 0:00 |\--perl
20185130 - 0:00 | \--mount
10158264 - 0:35 | \--nfsmnthelp
4587542 - 0:00 |\--perl
...
It is immediately noticeable that the lines are repeated again and again, i.e. a perl program was started over and over again by cron, which tried to mount a file system via NFS, which did not work (no answer from the NFS server) and the perl script hangs. Since the script was restarted over and over again, at some point in time there were 100 active cron jobs and from that moment on no more cron jobs were started. We briefly count the active perl processes:
$ ps -T 6684924 |grep perl |wc -l
100
$
There are exactly 100 perl processes started by cron. We terminate some of the hanging perl processes:
# kill 3276876 3604516 4587542
#
A look at the end of the cron log file shows, that jobs have been terminated, and after a short while the first newly started cron job appears:
# tail –f /var/adm/cron/log
…
Cron Job with pid: 3276876 Failed
Cron Job with pid: 3604516 Failed
Cron Job with pid: 4587542Failed
mqm : CMD ( /appdata/mqm/admin/bin/checks/checkXmitMonitoring.sh >>/appdata/mqm/tracks/logs/scheduler/checkXmitMonitoring.fatal 2>&1 ) : PID ( 28442840 ) : Mon Feb 25 10:34:00 2019
…
We also terminate the other hanging processes and restart the cron daemon for safety’s sake by simply terminating it:
# kill 6684924
#
The cron daemon is automatically restarted thanks to an /etc/inittab entry:
# lsitab cron
cron:23456789:respawn:/usr/sbin/cron
#
After cron works again, the perl script should be examined, which ultimately led to the hanging of cron. For scripts started per cron, it is advisable to check whether the job is still running or already running.
There are several possibilities under AIX to automatically create missing home directories when logging in. This is especially useful if the user accounts are managed through LDAP or another naming service and are not created locally. If a user is newly created in LDAP, he initially has no home directory on the AIX LDAP client:
$ ssh new_user@aix01
...
Could not chdir to home directory /home/new_user: No such file or directory
$ pwd
/
$ exit
$
Probably the easiest way to automatically create the home directory when logging in, is the attribute mkhomeatlogin in the file /etc/security/login.cfg. The default for this attribute is “false” if it is not set:
# lssec -f /etc/security/login.cfg -s usw -a mkhomeatlogin
usw mkhomeatlogin=
#
The attribute can be set to true with the chsec command:
# chsec -f /etc/security/login.cfg -s usw -a mkhomeatlogin=true
# lssec -f /etc/security/login.cfg -s usw -a mkhomeatlogin
usw mkhomeatlogin=true
#
We try the login again:
$ ssh new_user@aix01
...
$ pwd
/home/new_user
$
A new home directory has been created for the user.
Managing and administrating service events is often forgotten on HMCs. In this article we want to use a concrete example, error with reference code #25B810, to show how to handle such events. Of course, our LPAR tool is used here.
First, let’s find all open service events:
$ hmc lssvcevents
TIME PROBLEM PMH HMC REFCODE STATE STATUS CALLHOME FAILING_MTMS TEXT
02/13/2019 23:02:31 7 - hmc01 #25B810 approved Open false 8231-E2B/06A084P File System alert event occurred...
02/16/2019 16:14:28 8 - hmc01 B3030001 approved Open false 8231-E2B/06A084P ACT04284I A Management Console connect failed
02/11/2019 16:12:43 37 - hmc02 B3030001 approved Open false 8231-E2B/06A084P ACT04284I A Management Console connect failed
02/11/2019 17:43:19 38 - hmc02 B3030001 approved Open false 8231-E2B/06A084P ACT04283I A connection to a FSP,BPA...
$
This article is about the problem with the number 7. The problem was noted on 13.02.2019 at 23:02:31, and examined by the HMC with the name hmc01. The error code is #25B810. The problem is in the “open” state, a call home has not been triggered. For further information, please refer to the problem on the managed system with serial number 06A084P, a Power 710 (8231-E2B). The beginning of the error message can be found in the last column.
First, let’s look at the whole record of the problem by specifying the problem number and HMC:
$ hmc lssvcevents -p 7 hmc01
analyzing_hmc: hmc01
analyzing_mtms: 7042-CR8/21009CD
approval_state: approved
callhome_intended: false
created_time: 02/14/2019 04:11:31
duplicate_count: 0
eed_transmitted: false
enclosure_mtms: 8231-E2B/06A084P
event_severity: 0
event_time: 02/13/2019 23:02:31
failing_mtms: 8231-E2B/06A084P
files: iqyymrge.log/Consolidated system platform log,
iqyvpd.dat/Configuration information associated with the HMC,
actzuict.dat/Tasks performed,
iqyvpdc.dat/Configuration information associated with the HMC,
problems.xml/XML version of the problems opened on the HMC for the HMC and the server,
refcode.dat/list of reference codes associated with the hmc,
iqyylog.log/HMC firmware log information,
PMap.eed/Partition map, obtained from 'lshsc -w -c machine',
hmc.eed/HMC code level obtained from 'lshmc -V' and connection information obtained from 'lssysconn -r all',
sys.eed/Output of various system configuration commands,
8231-E2B_06A084P.VPD.xml/Configuration information associated with the managed system
first_time: 02/14/2019 04:11:31
last_time: 02/14/2019 04:11:31
problem_num: 7
refcode: #25B810
reporting_mtms: 8231-E2B/06A084P
reporting_name: p710
status: Open
sys_mtms: 8231-E2B/06A084P
sys_name: p710
sys_refcode: #25B810
text: File System alert event occurred on /home/ios/CM/DB. Free space is less than 10%, or there was an error querying the filesystem.
At the end of the issue we find the unabbreviated error message. It’s about a file system that has less than 10% free space. The path “/home/ios/CM/DB” indicates a virtual I/O server. The relevant virtual I/O servers are located on the managed system with the serial number 06A084P:
$ ms show 06A084P
NAME SERIAL_NUM TYPE_MODEL HMCS
p710 06A084P 8231-E2B hmc01,hmc02
$
It is the managed system named, p710. The managed system includes the following virtual I/O servers:
$ vios -m p710 show
LPAR ID SERIAL LPAR_ENV MS HMCs
aixvio1 1 06A084P1 vioserver p710 hmc01,hmc02
$
A check of the error report on the Virtual I/O Server aixvio1 shows the following entry:
LABEL: VIO_ALERT_EVENT
IDENTIFIER: 0FD4CF1A
Date/Time: Wed Feb 13 22:02:31 CST 2019
Sequence Number: 98
Machine Id: 00F6A0844C00
Node Id: aixvio1
Class: O
Type: INFO
WPAR: Global
Resource Name: /home/ios/CM/DB
Description
Informational Message
Probable Causes
Asynchronous Event Occurred
Failure Causes
PROCESSOR
Recommended Actions
Check Detail Data
Detail Data
Alert Event Message
25b810
A File System alert event occurred on /home/ios/CM/DB. Free space is less than 10%, or there was an error querying the filesystem.
Diagnostic Analysis
Diagnostic Log sequence number: 19
Resource tested: sysplanar0
Menu Number: 25B810
Description:
File System alert event occurred on /home/ios/CM/DB. Free space is less than 10%, or there was an error querying the filesystem.
A quick check of the file system shows that the problem has already been resolved, and there is enough space:
$ df -g
Filesystem GB blocks Free %Used Iused %Iused Mounted on
...
/dev/hd1 0.25 0.16 35% 111 1% /home
...
$
So the problem does not exist anymore. Therefore, the service event on the HMC should also be closed, which we do now:
$ hmc chsvcevent -o close -p 7 hmc01
$
For review we list the open service events:
$ hmc lssvcevents
TIME PROBLEM PMH HMC REFCODE STATE STATUS CALLHOME FAILING_MTMS TEXT
02/16/2019 16:14:28 8 - hmc01 B3030001 approved Open false 8231-E2B/06A084P ACT04284I A Management Console connect failed
02/11/2019 16:12:43 37 - machmc B3030001 approved Open false 8231-E2B/06A084P ACT04284I A Management Console connect failed
02/11/2019 17:43:19 38 - machmc B3030001 approved Open false 8231-E2B/06A084P ACT04283I A connection to a FSP,BPA...
$
The event with the number 7 was closed successfully.
Service events are easy to manage with the LPAR tool!
Many AIX systems still use SDDPCM as a multipathing solution. However, SDDPCM is no longer supported on POWER 9 hardware from IBM.
The following is the migration from SDDPCM to AIX PCM. On our example system we have the following physical volumes:
$ lspv
hdisk0 00abcdefabcde000 datavg active
hdisk1 00abcdefabcde001 datavg active
hdisk2 none None
hdisk3 00abcdefabcde003 altinst_rootvg
hdisk4 00abcdefabcde004 rootvg active
$
The Physical Volumes are disks that are made available through an SVC:
$ lsdev -l hdisk0 -F uniquetype
disk/fcp/2145
$
The Path Control Module (PCM) uses SDDPCM:
$ lsattr -El hdisk0 -a PCM -F value
PCM/friend/sddpcm
$
You can also see this when looking at the list of kernel extensions:
$ genkex | grep pcm
f1000000c012a000 af000 /usr/lib/drivers/sddpcmke
$
Which PCM driver is used for which disk types can be easily viewd with the command “manage_disk_drivers”:
$ manage_disk_drivers -l
Device Present Driver Driver Options
2810XIV AIX_AAPCM AIX_AAPCM,AIX_non_MPIO
DS4100 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS4200 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS4300 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS4500 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS4700 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS4800 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS3950 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS5020 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DCS3700 AIX_APPCM AIX_APPCM
DCS3860 AIX_APPCM AIX_APPCM
DS5100/DS5300 AIX_SDDAPPCM AIX_APPCM,AIX_SDDAPPCM
DS3500 AIX_APPCM AIX_APPCM
XIVCTRL MPIO_XIVCTRL MPIO_XIVCTRL,nonMPIO_XIVCTRL
2107DS8K NO_OVERRIDE AIX_AAPCM,AIX_non_MPIO,NO_OVERRIDE
IBMFlash NO_OVERRIDE AIX_AAPCM,AIX_non_MPIO,NO_OVERRIDE
IBMSVC NO_OVERRIDE AIX_AAPCM,AIX_non_MPIO,NO_OVERRIDE
$
In our case, SVC disks, the last line is relevant (IBMSVC). As current PCM driver NO_OVERRIDE is listed here, possible other drivers are AIX_AAPCM (AIX PCM for active / active and ALUASystems) and AIX_non_MPIO (drives without multi-pathing). The value NO_OVERRIDE means that if no multipathing driver is explicitly specified, a multipathing driver is used if possible (if available), otherwise no multipathing driver is used. If more than one multipathing driver is available (in our case AIX PCM and SDDPCM, then SDDPCM has priority).
In a subsequent blog entry, we will take a closer look at the possible values, as well as the point in AIX where the selection is made.
Before we change the driver for IBMSVC disks (a reboot is necessary), let’s take a look at the attributes of our disks, here an example for the hdisk0:
$ lsattr -El hdisk0
PCM PCM/friend/sddpcm PCM True
...
algorithm load_balance Algorithm True+
...
queue_depth 120 Queue DEPTH True+
...
reserve_policy no_reserve Reserve Policy True+
...
$
Changing the driver will cause the values of some set attributes to be lost or replaced by new default values of the new driver. This is especially true for the queue_depth (here: 120), the reserve_policy (here: no_reserve) and the load-balancing policy (algorithm). The current values should be noted, then after the conversion to the AIX PCM driver then adjust accordingly.
Switching to AIX PCM can be done with the command “manage_disk_drivers”. For this, the command is given the disk type (here IBMSVC) with the option “-d” and the desired driver (here AIX_AAPCM for the AIX PCM driver) with the option “-o”:
# manage_disk_drivers -d IBMSVC -o AIX_AAPCM
********************** ATTENTION *************************
For the change to take effect the system must be rebooted
#
The changed configuration can be listed directly with “manage_disk_drivers -l”:
$ manage_disk_drivers -l
Device Present Driver Driver Options
...
IBMSVC AIX_AAPCM AIX_AAPCM,AIX_non_MPIO,NO_OVERRIDE
$
To make the change, the system must now be rebooted:
# shutdown -r now
SHUTDOWN PROGRAM
Thu Feb 7 09:43:38 CET 2019
...
We execute the 3 commands from the beginning again (lspv, lsdevund lsattr):
$ lspv
hdisk0 00abcdefabcde000 datavg active
hdisk1 00abcdefabcde001 datavg active
hdisk2 none None
hdisk3 00abcdefabcde003 altinst_rootvg
hdisk4 00abcdefabcde004 rootvg active
$
The physical volumes are unchanged.
$ lsdev -l hdisk0 -F uniquetype
disk/fcp/mpioosdisk
$
The type of disks has changed from disk / fcp / 2145 to disk / fcp / mpioosdisk. This already indicates that the multipathing driver has changed.
$ lsattr -El hdisk0 -a PCM -F value
PCM/friend/fcpother
$
The Path Control Module (PCM) has also changed. The guy is no longer sddpcm but fcpother. That does not look directly after AIX PCM. However, a look at the corresponding driver shows immediately that AIX PCM is in use here:
$ lsdev -P -c PCM -s friend -t fcpother -F DvDr
aixdiskpcmke
$
The associated kernel extension aixdiskpcmke is also currently loaded and in use:
$ genkex | grep pcm
73e2000 57000 /usr/lib/drivers/aixdiskpcmke
$
Let’s take a look at the attributes of hdisk0 again. We expect changed values for some attributes here:
$ lsattr -El hdisk0
PCM PCM/friend/fcpother Path Control Module False
...
algorithm fail_over Algorithm True+
...
queue_depth 20 Queue DEPTH True+
...
reserve_policy single_path Reserve Policy True+
...
$
The value 120 for the queue_depth has been lost and has been replaced by the default value 20. The reserve_policy has changed to single_path and the load-balancing algorithm is now fail_over, i. only one path is used at a time.
We change the settings to a configuration that corresponds to the initial situation:
# chdev -P -l hdisk0 -a algorithm=shortest_queue -a queue_depth=120 -a reserve_policy=no_reserve
hdisk0 changed
#
Since the Physical Volume is in use, the setting can only be changed in the ODM and a further reboot is necessary.
After all disks have been reconfigured via the ODM, the system must be rebooted a second time:
# shutdown -r now
SHUTDOWN PROGRAM
Thu Feb 6 20:07:12 CET 2019
...
After the reboot SDDPCM can be uninstalled:
# installp -u devices.fcp.disk.ibm.mpio.rte devices.sddpcm.72.rte
+-----------------------------------------------------------------------------+
Pre-deinstall Verification...
+-----------------------------------------------------------------------------+
Verifying selections...done
Verifying requisites...done
Results...
SUCCESSES
...
0503-292 This update will not fully take effect until after a
system reboot.
* * * A T T E N T I O N * * *
System boot image has been updated. You should reboot the
system as soon as possible to properly integrate the changes
and to avoid disruption of current functionality.
installp: bosboot process completed.
+-----------------------------------------------------------------------------+
Summaries:
+-----------------------------------------------------------------------------+
Installation Summary
--------------------
Name Level Part Event Result
-------------------------------------------------------------------------------
devices.sddpcm.72.rte 2.7.1.1 ROOT DEINSTALL SUCCESS
devices.sddpcm.72.rte 2.7.1.1 USR DEINSTALL SUCCESS
devices.fcp.disk.ibm.mpio.r 1.0.0.25 USR DEINSTALL SUCCESS
#
The SDDPCM fileset, as well as the associated host-attachment fileset, were successfully uninstalled.
Since the SDDPCM driver was not loaded, and thus no changes have been made to the kernel, actually another reboot should not be necessary. However, since it is explicitly pointed out a quick reboot, and it is also a good idea to do a reboot test with the final configuration, we reboot the system a third and final time:
# shutdown -r now
SHUTDOWN PROGRAM
Thu Feb 6 20:17:21 CET 2019
...
After the reboot, we check the disk attributes again:
$ lsattr -El hdisk0
PCM PCM/friend/fcpother Path Control Module False
...
algorithm shortest_queue Algorithm True+
...
queue_depth 120 Queue DEPTH True+
...
reserve_policy no_reserve Reserve Policy True+
...
$
The system now uses the AIX PCM driver for multipathing:
$ manage_disk_drivers -l
Device Present Driver Driver Options
...
IBMSVC AIX_AAPCM AIX_AAPCM,AIX_non_MPIO,NO_OVERRIDE
$
Migrating from SDDPCM to AIX PCM is pretty easy to do.
On one of our systems, the command “who -r” did not return run level information. No error message was shown:
$ who -r
$ echo $?
0
$
As a consequence, an install script terminated with an error, since it was not able to determine the run level.
The information about the run level comes from the binary log file /etc/utmp. The run level is stored as the second record in this file. We assumed that /etc/utmp contained corrupt records.
The command /usr/sbin/acct/fwtmp (bos.acct) can be used to convert binary utmp-records to ASCII (and vice versa). The command expects the records to convert on standard input. In our case we got:
$ cat /etc/utmp | /usr/sbin/acct/fwtmp
system boot 2 0 0000 0000 1484666008 Tue Jan 17 16:13:28 CET 2017
root 0 804397248 0000 0000 0 \ufffd{\ufffd\ufffd Thu Jan 1 01:00:00 CET 1970
naudio 8 3473526 0000 0000 1484666008 Tue Jan 17 16:13:28 CET 2017
naudio2 8 3539068 0000 0000 1484666008 Tue Jan 17 16:13:28 CET 2017
...
The output above confirmed that the second record was corrupt, since it obviously did not contained the run level. Comparing with the entries from a working system showed how the correct records should look like:
system boot 2 0 0000 0000 1545044734 Mon Dec 17 12:05:34 2018
run-level 2 1 0 0062 0123 1545044734 Mon Dec 17 12:05:34 2018
First of all we made a copy of the corrupt /etc/utmp. Then we created an ASCII version using the above fwtmp command:
# cp /etc/utmp /etc/utmp.orig
# cat /etc/utmp | /usr/sbin/acct/fwtmp -X -L >/etc/utmp.ascii
#
The options -X and -L ensure that user and host names are not shortened.
Using an editor, we corrected the second entry by using the corresponding entry from the working system above. Then we corrected the timestamps by taking the values from the first entry. All in all the corrected version was:
system boot 2 0 0000 0000 1484666008 Tue Jan 17 16:13:28 CET 2017
run-level 2 1 0 0062 0123 1484666008 Tue Jan 17 16:13:28 CET 2017
naudio 8 3473526 0000 0000 1484666008 Tue Jan 17 16:13:28 CET 2017
...
Now we converted the corrected ASCII version back to the binary format and stored that version under /etc/utmp:
# cat /etc/utmp.ascii | /usr/sbin/acct/fwtmp -ic > /etc/utmp
#
Finally the command “who -r” worked again:
$ who -r
. run-level 2 Jan 17 16:13 2 0 S
$
The problem was resolved.
An english version of the user guide for the LPAR tool is available for download now.
The above error message showed up during the installation of an RPM package:
# rpm -U db4-4.7.25-2.aix5.1.ppc.rpm /usr/sbin/rpm_share[440]: 36044986 Illegal instruction rpm_share: 0645-007 ATTENTION: get_rpm_inst_root_list() returned an unexpected result. rpm_share: 0645-007 ATTENTION: update_inst_root() returned an unexpected result.
The rpm-command no longer works, a rebuild of the RPM database is therefore not possible anymore:
# rpm --rebuilddb /usr/sbin/rpm_share[470]: 22478966 Illegal instruction
Reinstalling the fileset rpm.rte fixes the problem:
# installp -acFXYd . rpm.rte +-----------------------------------------------------------------------------+ Pre-installation Verification... +-----------------------------------------------------------------------------+ ... Installation Summary -------------------- Name Level Part Event Result ------------------------------------------------------------------------------- rpm.rte 4.13.0.3 USR APPLY SUCCESS rpm.rte 4.13.0.3 ROOT APPLY SUCCESS
Afterwards the rpm-command works again:
# rpm -qa ... db4-4.7.25-2.ppc ... AIX-rpm-7.1.5.15-7.ppc
Starting from 5th november 2018 the LPAR-Tool is officially available.
A version for Linux can be downloaded from the download area (versions for AIX and MacOS will follow soon). A user guide is in preparation.
To test the LPAR tool free of charge:
Have fun testing the LPAR tool.
Ab 05.11.2018 ist das LPAR-Tool nun offiziell verfügbar!