Migration to AIX PCM combined with OS update using Alternate Disk Copy

On most AIX systems the SP or TL level is updated at regular intervals. It makes sense to perform the migration from SDDPCM to AIX PCM together with such an update. This saves time and some reboots, which otherwise have to be done because of the multipathing migration.

In our blog post “Migration from SDDPCM to AIX-PCM” we had already shown the migration for standalone systems.

Here, the migration from SDDPCM to AIX-PCM will be shown as part of an OS update, using the Alternate Disk Copy method. The procedure is roughly the following:

  1. Unmirroring the rootvg to get a free disk for Alternate Disk Copy.
  2. Change the Path Control Module (PCM) to AIX PCM.
  3. Creating the altinst_rootvg.
  4. Removal of fixes in the altinst_rootvg.
  5. Performing the OS update on the altinst_rootvg.
  6. Installing fixes in the altinst_rootvg.
  7. Adding a firstboot script to set disk attributes.
  8. Change the Path Control Module (PCM) back to SDDPCM.
  9. Booting from the altinst_rootvg.

On our example system AIX 7.1 TL5 SP2 is installed, the disks are SVC disks connected via virtual FC adapters. SDDPCM is the currently active multipathing driver:

# oslevel -s
7100-05-02-1810
# lsdev -l hdisk0 -F uniquetype
disk/fcp/2145
aix01:/root> lsattr -El hdisk0 -a PCM -F value
PCM/friend/sddpcm
#

As stated in the blog post above, some disk attributes change when migrating to AIX PCM. Therefore, you should take a close look at the current attributes in order to take them over later (at least partially). By way of example, we only look at the attribute queue_depth, which currently has the value 120:

# lsattr -El hdisk0 -a queue_depth -F value
120
#

Our system has a mirrored rootvg:

# lsvg -p rootvg
rootvg:
PV_NAME           PV STATE          TOTAL PPs   FREE PPs    FREE DISTRIBUTION
hdisk3            active            399         232         00..01..71..80..80
hdisk0            active            399         240         00..01..79..80..80
#

The system was booted from the hdisk0:

# bootinfo -b
hdisk0
#

So we leave hdisk0 in the rootvg and remove  hdisk3 from the rootvg to get a free disk for Alternate Disk Copy.

# unmirrorvg rootvg hdisk3
0516-1246 rmlvcopy: If hd5 is the boot logical volume, please run 'chpv -c <diskname>'
        as root user to clear the boot record and avoid a potential boot
        off an old boot image that may reside on the disk from which this
        logical volume is moved/removed.
0516-1804 chvg: The quorum change takes effect immediately.
0516-1144 unmirrorvg: rootvg successfully unmirrored, user should perform
        bosboot of system to reinitialize boot records.  Then, user must modify
        bootlist to just include:  hdisk0.
# reducevg rootvg hdisk3
# chpv -c hdisk3
# bootlist -m normal hdisk0
#

Before we create a copy of the rootvg using Alternate Disk Copy, we temporarily change the system to AIX PCM without, however, rebooting. If then the altinst_rootvg is generated, the conversion to AIX PCM is already done in altinst_rootvg!

# manage_disk_drivers -d IBMSVC -o AIX_AAPCM
********************** ATTENTION *************************
  For the change to take effect the system must be rebooted
#

At the end of the OS update, we then undo this change on the rootvg to have the original state with SDDPCM.

After these preparations we start now the alt_disk_copy command:

# alt_disk_copy -d hdisk3 -B
Calling mkszfile to create new /image.data file.
Checking disk sizes.
Creating cloned rootvg volume group and associated logical volumes.
Creating logical volume alt_hd5.
Creating logical volume alt_hd6.
Creating logical volume alt_hd8.
…
#

Some fixes are installed on the system, which we remove from the altinst_rootvg before the update:

# emgr -l
ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT
=== ===== ========== ================= ========== ======================================
1    S    102m_ifix  10/14/18 10:48:18            IFIX for Openssl CVE on 1.0.2m       
2    S    IJ03121s0a 10/14/18 10:49:04            IJ03121 for AIX 7.1 TL5 SP00         
3    S    IJ05822s2a 10/14/18 10:49:18            a potential security issue exists    
…
#

Activation of the altinst_rootvg:

# alt_rootvg_op -W -d hdisk3
Waking up altinst_rootvg volume group ...
#

And removal of the fixes:

# INUCLIENTS=1 /usr/sbin/chroot /alt_inst /usr/sbin/emgr –r -n 3
+-----------------------------------------------------------------------------+
Efix Manager Initialization
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
Accessing efix metadata ...
Processing efix label "IJ05822s2a" ...
…
Operation Summary
+-----------------------------------------------------------------------------+
Log file is /var/adm/ras/emgr.log

EFIX NUMBER       LABEL               OPERATION              RESULT           
===========       ==============      =================      ==============   
1                 IJ05822s2a          REMOVE                 SUCCESS          

Return Status = SUCCESS
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -r -n 2
…
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -r -n 1
…

(Note: please do not forget the variable INUCLIENTS, this signals that the operation is taking place in an alternate boot environment!)

Now we mount the LPP source for the OS update via NFS from our NIM server:

# mount aixnim:/export/nim/lpps/aix710503lpp /mnt
#

The OS update can now be done in the altinst_rootvg:

# alt_rootvg_op -C -b update_all -l /mnt
Installing optional filesets or updates into altinst_rootvg...
install_all_updates: Initializing system parameters.
install_all_updates: Log file is /var/adm/ras/install_all_updates.log
install_all_updates: Checking for updated install utilities on media.
…
installp:  * * * A T T E N T I O N ! ! !
        Software changes processed during this session require
        any diskless/dataless clients to which this SPOT is
        currently allocated to be rebooted.
install_all_updates: Log file is /var/adm/ras/install_all_updates.log
install_all_updates: Result = SUCCESS
#

Finally, we install some fixes. We first mount the directory /mnt with the fixes in the altinst_rootvg:

# mount -v namefs /mnt /alt_inst/mnt
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -e /mnt/emgr/ppc/102p_fix.181127.epkg.Z
+-----------------------------------------------------------------------------+
Efix Manager Initialization
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
Efix package file is: /mnt/emgr/ppc/102p_fix.181127.epkg.Z
…
EPKG NUMBER       LABEL               OPERATION              RESULT           
===========       ==============      =================      ==============   
1                 102p_fix            INSTALL                SUCCESS          

Return Status = SUCCESS
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -e /mnt/emgr/ppc/IJ09621s3a.181001.epkg.Z
…
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -e /mnt/emgr/ppc/IJ11545s0a.181127.epkg.Z
…
# umount /alt_inst/mnt
#

To set the desired disk attributes and uninstall SDDPCM we use a firstboot script:

# cat /alt_inst/etc/firstboot
#! /bin/ksh

print "INFO: adjust hdisk attributes"
chdev -Pl hdisk0 -a queue_depth=120

print "INFO: uninstalling SDDPCM"
installp -u devices.sddpcm.$(uname -v)$(uname -r).rte devices.fcp.disk.ibm.mpio.rte

print "INFO: perform reboot"
reboot

# chmod a+x /alt_inst/etc/firstboot
#

The script should, if used, be adapted to your own needs. There, you should customize all the desired disk attributes (queue_depth, reserve_policy, …). The sample script here is just to indicate what you could do!

The altinst_rootvg is now updated and converted to AIX PCM. We disable the altinst_rootvg so that it can be booted.

# alt_rootvg_op –S -t
Putting volume group altinst_rootvg to sleep ...
forced unmount of /alt_inst/var/adm/ras/livedump
…
forced unmount of /alt_inst
Fixing LV control blocks...
Fixing file system superblocks...
#

(Note: please do not forget the option “-t“, this creates a new boot image!)

But before we boot from the altinst_rootvg, we change the multipathing driver back to SDDPCM on the rootvg!

# manage_disk_drivers -d IBMSVC -o NO_OVERRIDE
********************** ATTENTION *************************
  For the change to take effect the system must be rebooted
#

Finally we change the bootlist to altinst_rootvg (hdisk3):

# bootlist -m normal hdisk3
#

And last but not least we reboot:

# shutdown –r now

SHUTDOWN PROGRAM
Tue Apr 16 19:49:08 CEST 2019

Broadcast message from root@aix01 (tty) at 19:49:08 ...

PLEASE LOG OFF NOW ! ! !
System maintenance in progress.
All processes will be killed now.
…

-------------------------------------------------------------------------------
                                Welcome to AIX.
                   boot image timestamp: 19:45:08 04/16/2019
                 The current time and date: 19:51:11 04/16/2019
        processor count: 2;  memory size: 4096MB;  kernel size: 36847630
       boot device: /vdevice/vfc-client@3000000a/disk@5005076XXXXXXXXX:2
-------------------------------------------------------------------------------
…
Multi-user initialization completed
INFO: adjust hdisk attributes
hdisk0 changed
INFO: uninstalling SDDPCM
…
Installation Summary
--------------------
Name                        Level           Part        Event       Result
-------------------------------------------------------------------------------
devices.sddpcm.71.rte       2.7.1.1         ROOT        DEINSTALL   SUCCESS   
devices.sddpcm.71.rte       2.7.1.1         USR         DEINSTALL   SUCCESS   
devices.fcp.disk.ibm.mpio.r 1.0.0.25        USR         DEINSTALL   SUCCESS   
INFO: perform reboot
Rebooting . . .
…

AIX Version 7
Copyright IBM Corporation, 1982, 2018.
Console login:

(In the output you can see the actions of the firstboot script: changing disk attributes, uninstalling SDDPCM and rebooting.)

After logging in we check the OS version, the used multipathing driver and some disk attributes:

# oslevel -s
7100-05-03-1846
# lsdev -l hdisk0 -F uniquetype
disk/fcp/mpioosdisk
# lsattr -El hdisk0 -a PCM -F value
PCM/friend/fcpother
# lsattr -El hdisk0 -a queue_depth -F value
120
# genkex|grep pcm
         5ae0000    60000 /usr/lib/drivers/aixdiskpcmke
# lslpp -l|grep sddpcm
#

We have successfully completed the migration from SDDPCM to AIX PCM together with an OS update. Using scripts this can be automated further.

We have tested this procedure for AIX 7.1 and AIX 7.2. So far, we have not been able to carry out a test for PowerHA for reasons of time.

Did you know that state and configuration change information is available on the HMC for about 2 months?

Status and configuration changes of LPARs and managed systems are stored on the HMCs for about 2 months. This can be used to find out, when a managed system was shut down, when a service processor failover took place, or when the memory of an LPAR was expanded, at least if the event is no more than 2 months ago.

The status changes of a managed system can be listed with the command “lslparutil -r sys -m <managed-system> -sh –startyear 1970 –filter event_types = state_change“, or alternatively with the LPAR-Tool command “ms history <managed -system> “.

linux $ ms history ms04
TIME                  PRIMARY_STATE         DETAILED_STATE
03/14/2019 08:45:13   Started               None
03/14/2019 08:36:52   Not Available         Unknown
02/17/2019 01:51:55   Started               None
02/17/2019 01:44:00   Not Available         Unknown
02/12/2019 09:32:57   Started               None
02/12/2019 09:28:02   Started               Service Processor Failover
02/12/2019 09:27:07   Started               None
02/12/2019 09:24:42   Standby               None
02/12/2019 09:21:25   Starting              None
02/12/2019 09:22:59   Stopped               None
02/12/2019 09:21:58   Not Available         Unknown
02/12/2019 09:09:45   Stopped               None
02/12/2019 09:07:53   Stopping              None
linux $

Configuration changes (processor, memory) of a managed system can be displayed with “lslparutil -r sys -m <managed-system> -s h –startyear 1970 –filter event_types = config_change“, or alternatively again with the LPAR tool:

linux $ ms history -c ms02
                                PROCUNIS              MEMORY
TIME                  CONFIGURABLE  AVAILABLE  CONFIGURABLE  AVAILABLE  FIRMWARE
04/16/2019 12:15:51      20.0          5.05       1048576       249344     25856
04/11/2019 11:17:39      20.0          5.25       1048576       253696     25600
04/02/2019 13:24:35      20.0          4.85       1048576       249344     25856
03/29/2019 14:29:14      20.0          5.25       1048576       253696     25600
03/15/2019 15:37:08      20.0          4.85       1048576       249344     25856
03/15/2019 11:36:57      20.0          4.95       1048576       249344     25856
...
linux $

The same information can also be displayed for LPARs.

The last status changes of an LPAR can be listed with “lpar history <lpar>“:

linux $ lpar history lpar02
TIME                  PRIMARY_STATE         DETAILED_STATE
04/17/2019 05:42:43   Started               None
04/17/2019 05:41:24   Waiting For Input     Open Firmware
04/16/2019 12:01:54   Started               None
04/16/2019 12:01:29   Stopped               None
02/15/2019 11:30:48   Stopped               None
02/01/2019 12:23:34   Not Available         Unknown
02/01/2019 12:22:50   Relocating            None
...

This corresponds to the command “lslparutil -r lpar -m ms03 -s h –startyear 1970 –filter event_types = state_change, lpar_names = lpar02” on the HMC command line.

From the output it can be seen that the LPAR has been relocated using LPM, was stopped and restartet and has been in Open Firmware mode.

And finally you can look at the last configuration changes of an LPAR using the command on the HMC CLI “lslparutil -r lpar -m ms03 -s h –startyear 1970 –filter event_types = config_change, lpar_names = lpar02“. The output of the LPAR tool is a bit clearer:

linux $ lpar history -c lpar02
TIME                  PROC_MODE  PROCS  PROCUNITS  SHARING  UNCAP_WEIGHT  PROCPOOL         MEM_MODE  MEM
04/23/2019 18:49:43   shared    1      0.7        uncap    10          DefaultPool      ded       4096
04/23/2019 18:49:17   shared    1      0.7        uncap    5           DefaultPool      ded       4096
04/23/2019 18:48:44   shared    1      0.3        uncap    5           DefaultPool      ded       4096
04/09/2019 08:04:25   shared    1      0.3        uncap    5           DefaultPool      ded       3072
03/14/2019 12:37:32   shared    1      0.1        uncap    5           DefaultPool      ded       3072
02/26/2019 09:34:28   shared    1      0.1        uncap    5           DefaultPool      ded       3072
02/20/2019 06:51:57   shared    1      0.3        uncap    5           DefaultPool      ded       3072
01/31/2019 08:12:58   shared    1      0.3        uncap    5           DefaultPool      ded       3072
..

From the output you can see that the number of processing units were changed several time, the uncapped weight was changed and the memory has been expanded.

Changes of the last two months are available at any time!

Fixes and Alternate Disk Copy

When using the Alternate Disk Copy method for AIX updates, it sometimes happens that the installed fixes prevent a successful update. In this case the installed fixes can be removed directly in the altinst_rootvg. For this the emgr command can be called directly in the altinst_rootvg using the chroot command.

After creating the altinst_rootvg, e.g. with the alt_disk_copy command, the altinst_rootvg must be activated first:

# alt_rootvg_op -W -d hdisk3
Waking up altinst_rootvg volume group ...
#

The installed fixes can be listed as follows:

# /usr/sbin/chroot /alt_inst /usr/sbin/emgr –l

ID  STATE LABEL      INSTALL TIME      UPDATED BY ABSTRACT
=== ===== ========== ================= ========== ======================================
1    S    102m_ifix  10/14/18 10:48:18            IFIX for Openssl CVE on 1.0.2m       
2    S    IJ03121s0a 10/14/18 10:49:04            IJ03121 for AIX 7.1 TL5 SP00         
3    S    IJ05822s2a 10/14/18 10:49:18            a potential security issue exists    
…

When removing fixes in altinst_rootvg, the environment variable INUCLIENTS is important. It signals the emgr command not to restart services and not to change devices dynamically. Without setting these variables, uninstalling some fixes will fail in the altinst_rootvg!

# INUCLIENTS=1 /usr/sbin/chroot /alt_inst /usr/sbin/emgr –r -n 3
+-----------------------------------------------------------------------------+
Efix Manager Initialization
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
Accessing efix metadata ...
Processing efix label "IJ05822s2a" ...
...
Operation Summary
+-----------------------------------------------------------------------------+
Log file is /var/adm/ras/emgr.log

EFIX NUMBER       LABEL               OPERATION              RESULT           
===========       ==============      =================      ==============   
1                 IJ05822s2a          REMOVE                 SUCCESS          

Return Status = SUCCESS
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -r -n 2
…
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -r -n 1
…

Now there are no fixes in the way of an OS update!

After the OS update, new fixes can be installed similarly to altinst_rootvg before rebooting. We first mount the directory with the fixes under /alt_inst/mnt:

# mount aixnim:/export/nim/lpps/aix710503lpp /alt_inst/mnt
#

And then we install the fixes directly in the altinst_rootvg, again with the help of chroot and INUCLIENTS:

# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -e /mnt/emgr/ppc/102p_fix.181127.epkg.Z
+-----------------------------------------------------------------------------+
Efix Manager Initialization
+-----------------------------------------------------------------------------+
Initializing log /var/adm/ras/emgr.log ...
Efix package file is: /mnt/emgr/ppc/102p_fix.181127.epkg.Z
…
EPKG NUMBER       LABEL               OPERATION              RESULT           
===========       ==============      =================      ==============   
1                 102p_fix            INSTALL                SUCCESS          

Return Status = SUCCESS
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -e /mnt/emgr/ppc/IJ09621s3a.181001.epkg.Z
…
# INUCLIENTS=1 chroot /alt_inst /usr/sbin/emgr -e /mnt/emgr/ppc/IJ11545s0a.181127.epkg.Z
…

Finally the altinst_rootvg has to be deactivated, and a new boot image should be created in the altinst_rootvg, otherwise the system will hang (depending on the fixes installed) when booting from altinst_rootvg!

# alt_rootvg_op –S -t
Putting volume group altinst_rootvg to sleep ...
forced unmount of /alt_inst/var/adm/ras/livedump
…
forced unmount of /alt_inst
Fixing LV control blocks...
Fixing file system superblocks...
#

(Note: The ‘-t‘ option forces the creation of a new boot image!)

Now, as usual, the boot list can be changed and the system can be rebooted after the update.

 

PowerVM: Do you know the Profile “last*valid*configuration”?

Maybe one or the other has ever wondered how and where the current configuration of an LPAR is stored. If the current configuration and profile are not synchronized with each other, differences will quickly arise. When an LPAR is shut down and deactivated, the last current configuration is retained. When activating the LPAR, this configuration is available in addition to the profiles of the LPAR as the “current configuration” in the GUI. If one selects the current configuration, then the LPAR has the same configuration after activation as before deactivation. For a newly created LPAR, however, this selection is not available on activation. The difference also manifests itself on the HMC command line: the already activated LPAR can be activated without specifying a profile, the newly created LPAR can only be activated by specifying a profile. Let’s take a closer look.

(Short note: The commands on the HMC command line were executed directly on the HMC hmc01. In the example outputs with the LPAR tool, the commands were started from a Linux jump server. All commands are always shown with both variants!)

We have activated and booted the LPAR aix01 with the profile “standard“. We have not made any dynamic changes yet. We briefly look at the status of the LPAR and check if there is an RMC connection to the HMC:

hscroot@hmc01:~> lssyscfg -m p710 -r lpar --filter lpar_names=aix01 --header -F name lpar_env state curr_profile rmc_state os_version
name lpar_env state curr_profile rmc_state os_version
aix01 aixlinux Running standard active "AIX 7.1 7100-04-00-0000"
hscroot@hmc01:~>
linux $ lpar status aix01
NAME  ID      TYPE   STATUS  PROFILE    RMC   PROCS  PROCUNITS MEMORY  OS
aix01  5  aixlinux  Running  standard  active   1       -      3072    AIX 7.1 7100-04-00-0000
linux $

To see the effect of a dynamic change, let’s take a look at the actual state and the profile “standard“:

hscroot@hmc01:~> lshwres -m p710 -r mem --level lpar --filter lpar_names=aix01 -F curr_mem
3072
hscroot@hmc01:~> lssyscfg -m p710 -r prof --filter profile_names=standard,lpar_names=aix01 -F desired_mem
3072
hscroot@hmc01:~>
linux $ lpar mem aix01
      MEMORY            MEMORY           HUGEPAGES
NAME   MODE  AME   MIN   CURR   MAX   MIN  CURR  MAX
aix01  ded    -   2048   3072  8192    0     0    0
linux $ lpar -p standard mem aix01
      MEMORY            MEMORY           HUGEPAGES
NAME   MODE  AME   MIN   CURR   MAX   MIN  CURR  MAX
aix01  ded    -   2048   3072  8192    0     0    0
linux $

The LPAR has currently 3072 MB of main memory, which are also stored in the “standard” profile.

Now we add 1024 MB of main memory dynamically (DLPAR):

hscroot@hmc01:~> chhwres -m p710 -r mem -o a -p aix01 -q 1024
hscroot@hmc01:~>
linux $ lpar -d addmem aix01 1024
linux $

Now let’s look at the resulting memory resources of the LPAR:

hscroot@hmc01:~> lshwres -m p710 -r mem --level lpar --filter lpar_names=aix01 -F curr_mem
4096
hscroot@hmc01:~>
linux $ lpar mem aix01
     MEMORY            MEMORY          HUGEPAGES
NAME  MODE  AME   MIN   CURR   MAX   MIN  CURR  MAX
aix01  ded   -   2048   4096  8192    0     0    0
linux $

As expected, the LPAR now has 4096 MB of RAM. But what does the profile “standard” looks like?

hscroot@hmc01:~> lssyscfg -m p710 -r prof --filter profile_names=standard,lpar_names=aix01 -F desired_mem
3072
hscroot@hmc01:~>
linux $ lpar -p standard mem aix01
     MEMORY            MEMORY          HUGEPAGES
NAME  MODE  AME   MIN   CURR   MAX   MIN  CURR  MAX
aix01  ded   -   2048   3072  8192    0     0    0
linux $

The profile has not changed, activating the LPAR with this profile would result in 3072 MB of main memory.

The current configuration is always saved in the special profile “last*valid*configuration“:

hscroot@hmc01:~> lssyscfg -m p710 -r prof --filter profile_names=last*valid*configuration,lpar_names=aix01 -F desired_mem
4096
hscroot@hmc01:~>
linux $ lpar -p last*valid*configuration mem aix01
     MEMORY            MEMORY          HUGEPAGES
NAME  MODE  AME   MIN   CURR   MAX   MIN  CURR  MAX
aix01  ded   -   2048   4096  8192    0     0    0
linux $

Here the value of 4096 MB is consistent with the currently available memory in the LPAR.

Every dynamic change to an LPAR is performed on the LPAR via a DLPAR operation and by updating the special profile! If a profile is synchronized manually or automatically, then this special profile is ultimately synchronized with the desired profile.

The existence and handling of the special profile “last*valid*configuration” also makes some LPM possibilities easier to understand. We will deal with this in a later blog post.

We want your feedback!

The new PowerCampus “LPAR tool” is available for download! Much revised and written in C ++. It supports output in various formats: JSON + YAML!

The first 100 feedbacks get two licenses (for 2 LPARS) for free! Forever!

So, download and give feedback, just send an e-mail to info@powercampus.de!

The integrated test license supports without further registration one HMC and two complete managed systems! For an extended trial version for 4 HMC’s and unlimited MS just send an email to info@powercampus.de.

Download “LPAR tool”: https://powercampus.de/en/download-2/