LPAR console using Virtual I/O Server

Typically, a console for an LPAR is launched via an HMC, via GUI or CLI (vtmenu or mkvterm). A console depends on the availability of an HMC. During an HMC update or problems with the HMC, you may not be able to connect to an LPAR console.

Relatively unknown is the ability to configure a console to an LPAR via a virtual I/O server. If the HMC is not available, then a console can be started via the virtual I/O server. No configuration is required on the client LPAR! By default, each client LPAR has 2 virtual serial server adapters (slots 0 and 1). If you configure an associated client adapter on a virtual I/O server, you can use it for a console connection.

On the virtual I/O server one needs only an unused virtual slot (here slot 45). The client LPAR has the LPAR ID 39. The virtual serial client adapter can be created with the following command:

hmc01 $ chhwres -m ms02 -r virtualio --rsubtype serial -o a -p ms02-vio1 -s 45 -a adapter_type=client,remote_lpar_name=aix02,remote_slot_num=0,supports_hmc=0
hmc01 $

Now you can always start a console for the LPAR via the virtual I/O server:

ms02-vio1 :/home/padmin> mkvt -id 39
AIX Version 7
Copyright IBM Corporation, 1982, 2018.
Console login: root
root's Password: XXXXXX


aix02  AIX 7.2         powerpc


Last unsuccessful login: Mon Mar 18 23:14:26 2019 on ssh from N.N.N.N
Last login: Wed Mar 27 20:19:22 2019 on /dev/pts/0 from M.M.M.M
[YOU HAVE NEW MAIL]
aix02:/root> hostname
aix02
aix02:/root>

The command mkvt on the virtual I/O server corresponds to the command mkvterm on the HMC. Here the desired partition must be specified by the LPAR-ID. Terminating the console works as usual with “~.“, Or if you are logged in via SSH on the virtual I/O server with “~~.“.

Alternatively, you can also end a console session with the command rmvt:

ms02-vio1:/home/padmin> rmvt -id 39
ms02-vio1:/home/padmin>

The following message appears in the console and the console is closed:

Virtual terminal has been disconnected.

$

With the LPAR tool, the console can of course be set up even easier. The virtual serial adapter on the virtual I/O server can be created with the command “lpar addserial“, a manual login to the HMC is not necessary for this to work:

$ lpar addscsi -c ms02-vio1 45 aix02 0
$

The “-c” option means “create client adapter”. The command also creates the adapter in the profile. The success of the action can be checked by “lpar vslots“, showing all virtual adapters of an LPAR:

$ lpar vslots ms02-vio1
SLOT  REQ  TYPE           DATA
0     1    serial/server  remote: -(any)/any status=unavailable hmc=1
1     1    serial/server  remote: -(any)/any status=unavailable hmc=1
2     0    eth            PVID=1 VLANS=- XXXXXXXXXXXX ETHERNET0
3     1    eth            TRUNK(1) IEEE PVID=1 VLANS=201 XXXXXXXXXXXXX ETHERNET0
...
45     0   serial/client  remote: aix02(39)/0 status=unavailable hmc=0
...
$

Starting the console then proceeds as usual by logging in as padmin on the virtual I/O server and the command mkvt.

Caution: The console session through the virtual I/O server should always be terminated when it is no longer needed. You can not terminate it from the HMC! Here is the attempt to start a console using the HMC, while the console is already active using the virtual I/O server:

$ lpar console aix02

Open in progress 

A terminal session is already open for this partition. 
Only one open session is allowed for a partition. 
Exiting.... 
Attempts to open the session failed. Please close the terminal and retry the open at a later time. 
If the problem persists, Please contact IBM support. 
Received end of file, Exiting.
Connection to X.X.X.X closed.
$

Even rmvterm does not help:

$ lpar rmvterm aix02
/bin/stty: standard input: Inappropriate ioctl for device
$

Conversely, no console can be started using the virtual I/O server if a console is active using the HMC:

ms02-vio1:/home/padmin> mkvt -id 39
Virtual terminal is already connected.

ms02-vio1:/home/padmin>

So always make sure that the console is terminated.

 

Which FC port is connected to which SAN fabric?

In larger environments with many managed systems and multiple SAN fabrics, it’s not always clear which SAN fabric an FC port belongs to despite good documentation. In many cases, the hardware is far from the screen, possibly even in a very different building or geographically farther away, so you can not just check the wiring on site.

This blog post will show you how to use Live Partition Mobility (LPM) to find all the FC ports that belong to a given SAN fabric.

We use the LPAR tool for the sake of simplicity, but you can also work with commands from the HMC CLI without the LPAR tool, so please continue reading even if the LPAR tool is not available!

In the following, we have named our SAN fabrics “Fabric1” and “Fabric2.” However, the procedure described below can be used with any number of SAN fabrics.

Since LPM is to be used, we first need an LPAR. We create the LPAR on one of our managed systems (ms09) with the LPAR tool:

$ lpar –m ms09 create fabric1
Creating LPAR fabric1:
done
Register LPAR
done
$

Of course you can also use the HMC GUI or the HMC CLI to create the LPAR. We named the new LPAR after our SAN Fabric “fabric1“. Every other name is just as good!

Next, our LPAR needs a virtual FC adapter mapped to an FC port of fabric “Fabric1“:

$ lpar –p standard addfc fabric1 10 ms09-vio1
fabric1 10 ms09-vio1 20
$

The LPAR tool has selected slot 20 for the VFC server adapter on VIOS ms09-vio1 and created the client adapter as well as the server adapter. Of course, client and server adapters can be created in exactly the same way via the HMC GUI or the HMC CLI. Since the LPAR is not active, the ‘-p standard‘ option specified that only the profile should be adjusted.

To map the VFC server adapter to a physical FC port, we need the vfchost adapter number on the VIOS ms09-vio1:

$ vios npiv ms09-vio1
VIOS       ADAPT NAME  SLOT  CLIENT OS      ADAPT   STATUS        PORTS
…
ms09-vio1  vfchost2    C20   (3)    unknown  -     NOT_LOGGED_IN  0
…
$

In slot 20 we have the vfchost2, so this must now be mapped to an FC port of fabric “Fabric1“. We map to the FC port fcs8, which we know to belong to fabric “Fabric1“. If we are wrong, we will see this shortly.

Let’s take a look at the WWPNs for the virtual FC Client Adapter:

$ lpar -p standard vslots fabric1
SLOT  REQ  TYPE           DATA
0     yes  serial/server  remote: (any)/any hmc=1
1     yes  serial/server  remote: (any)/any hmc=1
10    no   fc/client      remote: ms09-vio1(1)/20 c050760XXXXX00b0,c050760XXXXX00b1
$

Equipped with the WWPNs, we now ask our storage colleagues to create a small LUN for these WWPNs, which should only be visible in the fabric “Fabric1“. After the storage colleagues have created the LUN and adjusted the zoning accordingly, we activate our new LPAR in OpenFirmware mode and open a console:

$ lpar activate –p standard –b of fabric1

$ lpar console fabric1

Open in progress 

Open Completed.

IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM IBM
...

          1 = SMS Menu                          5 = Default Boot List
          8 = Open Firmware Prompt              6 = Stored Boot List

     Memory      Keyboard     Network     SCSI     Speaker  ok
0 >

Of course, this is also possible without problems with GUI or HMC CLI.

In OpenFirmware mode we start ioinfo and check if the small LUN is visible. If it is not visible, then the FC port fcs8 does not belong to the right fabric!

0 > ioinfo

!!! IOINFO: FOR IBM INTERNAL USE ONLY !!!
This tool gives you information about SCSI,IDE,SATA,SAS,and USB devices attached to the system

Select a tool from the following

1. SCSIINFO
2. IDEINFO
3. SATAINFO
4. SASINFO
5. USBINFO
6. FCINFO
7. VSCSIINFO

q - quit/exit

==> 6

FCINFO Main Menu
Select a FC Node from the following list:
 # Location Code           Pathname
-------------------------------------------------
 1. U9117.MMC.XXXXXXX7-V10-C10-T1  /vdevice/vfc-client@3000000a

q - Quit/Exit

==> 1

FC Node Menu
FC Node String: /vdevice/vfc-client@3000000a
FC Node WorldWidePortName: c050760XXXXXX0016
------------------------------------------
1. List Attached FC Devices
2. Select a FC Device
3. Enable/Disable FC Adapter Debug flags

q - Quit/Exit

==> 1

1. 500507680YYYYYYY,0 - 10240 MB Disk drive

Hit a key to continue...

FC Node Menu
FC Node String: /vdevice/vfc-client@3000000a
FC Node WorldWidePortName: c050760XXXXXX0016
------------------------------------------
1. List Attached FC Devices
2. Select a FC Device
3. Enable/Disable FC Adapter Debug flags

q - Quit/Exit

==> q

The LUN appears, the WWPN 500507680YYYYYYY is the WWPN of the corresponding storage port, which is unique worldwide and can only be seen in the fabric “Fabric1“!

Activating the LPAR in OpenFirmware mode has served two purposes, firstly to verify that the LUN is visible and our mapping to fcs8 was correct, secondly, the system now has the information which WWPNs need to be found during an LPM operation, so that the LPAR can be moved!

We deactivate the LPAR again.

$ lpar shutdown –f fabric1
$

If we now perform an LPM validation on the inactive LPAR, then a validation can only be successful on a managed system that has a virtual I/O server with a connection to the fabric “Fabric1“. Using a for loop, let’s try that for some managed systems:

$ for ms in ms10 ms11 ms12 ms13 ms14 ms15 ms16 ms17 ms18 ms19
do
echo $ms
lpar validate fabric1 $ms >/dev/null 2>&1
if [ $? -eq 0 ]
then
   echo connected
else
   echo not connected
fi
done

The command to validate on the HMC CLI is, migrlpar,.

Since we are not interested in validation messages, we redirect all validation messages to /dev/null.

Here’s the output of the for loop:

ms10
connected
ms11
connected
ms12
connected
ms13
connected
ms14
connected
ms15
connected
ms16
connected
ms17
connected
ms18
connected
ms19
connected

Obviously, all managed systems are connected to fabric “Fabric1“. That’s not very surprising, because they were cabled exactly like that.

It would be more interesting to know which FC port on the managed systems (Virtual I/O servers) are connected to the fabric “Fabric1“. To do this, we need a list of virtual I/O servers for each managed system and the list of NPIV-capable FC ports for each virtual I/O server.

The list of virtual I/O servers can be obtained easily with the following command:

$ vios -m ms11 list
ms11-vio1
ms11-vio2
$

On the HMC CLI you can use the command: lssyscfg -r lpar -m ms11 -F “name lpar_env”.

The NPIV-capable ports can be found out with the following command:

$ vios lsnports ms11-vio1
ms11-vio1       name             physloc                        fabric tports aports swwpns  awwpns
ms11-vio1       fcs0             U78AA.001.XXXXXXX-P1-C5-T1          1     64     60   2048    1926
ms11-vio1       fcs1             U78AA.001.XXXXXXX-P1-C5-T2          1     64     60   2048    2023
...
$

The command lsnports is used on the virtual I/O server. Of course you can do this without the LPAR tool.

With the LPM validation (and of course also with the migration) one can indicate which FC port on the target system is to be used, we show this here once with two examples:

$ lpar validate fabric1 ms10 virtual_fc_mappings=10/ms10-vio1///fcs0 >/dev/null 2>&1
$ echo $?
0
$ lpar validate fabric1 ms10 virtual_fc_mappings=10/ms10-vio1///fcs1 >/dev/null 2>&1
$ echo $?
1
$

The validation with target ms10-vio1 and fcs0 was successful, i.e. this FC port is attached to fabric “Fabric1“. The validation with targets ms10-vio1 and fcs1 was not successful, i.e. that port is not connected to the fabric “Fabric1“.

Here is the command that must be called on the HMC, if the LPAR tool is not used:

$ lpar -v validate fabric1 ms10 virtual_fc_mappings=10/ms10-vio1///fcs0
hmc02: migrlpar -m ms09 -o v -p fabric1 -t ms10 -v -d 5 -i 'virtual_fc_mappings=10/ms10-vio1///fcs0'
$

To find out all the FC ports that are connected to the fabric “Fabric1“, we need to loop through the managed systems to be checked, for each managed system we then need a loop across all VIOS of the managed system and finally a loop over each FC ports of the VIOS performing an LPM validation.

We have put things together in the following script. To make sure that it does not get too long, we have omitted some checks:

$ cat bin/fabric_ports
#! /bin/ksh
# Copyright © 2018, 2019 by PowerCampus 01 GmbH

LPAR=fabric1

STATE=$( lpar prop -F state $LPAR | tail -1 )

print "LPAR: $LPAR"
print "STATE: $STATE"

if [ "$STATE" != "Not Activated" ]
then
            print "ERROR: $LPAR must be in state 'Not Activated'"
            exit 1
fi

fcsCount=0
fcsSameFabricCount=0

for ms in $@
do
            print "MS: $ms"
            viosList=$( vios -m $ms list )

            for vios in $viosList
            do
                        rmc_state=$( lpar -m $ms prop -F rmc_state $vios | tail -1 )
                        if [ "$rmc_state" = "active" ]
                        then
                                    fcList=
                                    vios -m $ms lsnports $vios 2>/dev/null | \
                                    while read vio fcport rest
                                    do
                                               if [ "$fcport" != "name" ]
                                               then
                                                           fcList="${fcList} $fcport"
                                               fi
                                    done

                                    for fcport in $fcList
                                    do
                                               print -n "${vios}: ${fcport}: "
                                               lpar validate $LPAR $ms virtual_fc_mappings=10/${vios}///${fcport} </dev/null >/dev/null 2>&1
                                               case "$?" in
                                               0)
                                                           print "yes"
                                                           fcsSameFabricCount=$( expr $fcsSameFabricCount + 1 )
                                                           ;;
                                               *) print "no" ;;
                                               esac
                                               fcsCount=$( expr $fcsCount + 1 )
                                    done
                        else
                                    print "${vios}: RMC not active"
                        fi
            done
done

print "${fcsCount} FC-ports investigated"
print "${fcsSameFabricCount} FC-ports in same fabric"

$

As an illustration we briefly show a run of the script over some managed systems. We start the script with time to see how long it takes:

$ time bin/fabric_ports ms10 ms11 ms12 ms13 ms14 ms15 ms16 ms17 ms18 ms19
LPAR: fabric1
STATE: Not Activated
MS: ms10
ms10-vio3: RMC not active
ms10-vio1: fcs0: yes
ms10-vio1: fcs2: yes
ms10-vio1: fcs4: no
ms10-vio1: fcs6: no
ms10-vio2: fcs0: yes
ms10-vio2: fcs2: yes
ms10-vio2: fcs4: no
ms10-vio2: fcs6: no
MS: ms11
ms11-vio3: RMC not active
ms11-vio1: fcs0: no
ms11-vio1: fcs1: no
ms11-vio1: fcs2: no
ms11-vio1: fcs3: yes
ms11-vio1: fcs4: no
…
ms19-vio2: fcs2: no
ms19-vio2: fcs3: no
ms19-vio2: fcs0: no
ms19-vio2: fcs1: no
ms19-vio2: fcs4: no
ms19-vio2: fcs5: no
132 FC-ports investigated
17 FC-ports in same fabric

real       2m33.978s
user      0m4.597s
sys       0m8.137s
$

In about 150 seconds, 132 FC ports were examined (LPM validations performed). This means a validation took about 1 second on average.

We have found all the FC ports that are connected to the fabric “Fabric1“.

Of course, this can be done analogously for other fabrics.

A final note: not all ports above are cabled!

HMC Error #25B810

Managing and administrating service events is often forgotten on HMCs. In this article we want to use a concrete example, error with reference code #25B810, to show how to handle such events. Of course, our LPAR tool is used here.

First, let’s find all open service events:

$ hmc lssvcevents
TIME                 PROBLEM  PMH   HMC     REFCODE   STATE     STATUS  CALLHOME  FAILING_MTMS      TEXT                                         
02/13/2019 23:02:31  7        -     hmc01   #25B810   approved  Open    false     8231-E2B/06A084P  File System alert event occurred...          
02/16/2019 16:14:28  8        -     hmc01   B3030001  approved  Open    false     8231-E2B/06A084P  ACT04284I A Management Console connect failed
02/11/2019 16:12:43  37       -     hmc02   B3030001  approved  Open    false     8231-E2B/06A084P  ACT04284I A Management Console connect failed
02/11/2019 17:43:19  38       -     hmc02   B3030001  approved  Open    false     8231-E2B/06A084P  ACT04283I A connection to a FSP,BPA...  
$

This article is about the problem with the number 7. The problem was noted on 13.02.2019 at 23:02:31, and examined by the HMC with the name hmc01. The error code is #25B810. The problem is in the “open” state, a call home has not been triggered. For further information, please refer to the problem on the managed system with serial number 06A084P, a Power 710 (8231-E2B). The beginning of the error message can be found in the last column.

First, let’s look at the whole record of the problem by specifying the problem number and HMC:

$ hmc lssvcevents -p 7 hmc01
analyzing_hmc: hmc01
analyzing_mtms: 7042-CR8/21009CD
approval_state: approved
callhome_intended: false
created_time: 02/14/2019 04:11:31
duplicate_count: 0
eed_transmitted: false
enclosure_mtms: 8231-E2B/06A084P
event_severity: 0
event_time: 02/13/2019 23:02:31
failing_mtms: 8231-E2B/06A084P
files: iqyymrge.log/Consolidated system platform log,
iqyvpd.dat/Configuration information associated with the HMC,
actzuict.dat/Tasks performed,
iqyvpdc.dat/Configuration information associated with the HMC,
problems.xml/XML version of the problems opened on the HMC for the HMC and the server,
refcode.dat/list of reference codes associated with the hmc,
iqyylog.log/HMC firmware log information,
PMap.eed/Partition map, obtained from 'lshsc -w -c machine',
hmc.eed/HMC code level obtained from 'lshmc -V' and connection information obtained from 'lssysconn -r all',
sys.eed/Output of various system configuration commands,
8231-E2B_06A084P.VPD.xml/Configuration information associated with the managed system
first_time: 02/14/2019 04:11:31
last_time: 02/14/2019 04:11:31
problem_num: 7
refcode: #25B810
reporting_mtms: 8231-E2B/06A084P
reporting_name: p710
status: Open
sys_mtms: 8231-E2B/06A084P
sys_name: p710
sys_refcode: #25B810
text: File System alert event occurred on /home/ios/CM/DB. Free space is less than 10%, or there was an error querying the filesystem.

At the end of the issue we find the unabbreviated error message. It’s about a file system that has less than 10% free space. The path “/home/ios/CM/DB” indicates a virtual I/O server. The relevant virtual I/O servers are located on the managed system with the serial number 06A084P:

$ ms show 06A084P
NAME  SERIAL_NUM  TYPE_MODEL  HMCS        
p710  06A084P     8231-E2B    hmc01,hmc02
$

It is the managed system named, p710. The managed system includes the following virtual I/O servers:

$ vios -m p710 show
LPAR     ID  SERIAL    LPAR_ENV   MS    HMCs
aixvio1  1   06A084P1  vioserver  p710  hmc01,hmc02
$

A check of the error report on the Virtual I/O Server aixvio1 shows the following entry:

LABEL:          VIO_ALERT_EVENT
IDENTIFIER:     0FD4CF1A

Date/Time:       Wed Feb 13 22:02:31 CST 2019
Sequence Number: 98
Machine Id:      00F6A0844C00
Node Id:         aixvio1
Class:           O
Type:            INFO
WPAR:            Global
Resource Name:   /home/ios/CM/DB 

Description
Informational Message

Probable Causes
Asynchronous Event Occurred

Failure Causes
PROCESSOR

        Recommended Actions
        Check Detail Data

Detail Data
Alert Event Message
25b810
A File System alert event occurred on /home/ios/CM/DB. Free space is less than 10%, or there was an error querying the filesystem.

Diagnostic Analysis
Diagnostic Log sequence number: 19
Resource tested:        sysplanar0
Menu Number:            25B810
Description:


 File System alert event occurred on /home/ios/CM/DB. Free space is less than 10%, or there was an error querying the filesystem.

A quick check of the file system shows that the problem has already been resolved, and there is enough space:

$ df -g
Filesystem    GB blocks      Free %Used    Iused %Iused Mounted on
...
/dev/hd1           0.25      0.16   35%      111     1% /home
...
$ 

So the problem does not exist anymore. Therefore, the service event on the HMC should also be closed, which we do now:

$ hmc chsvcevent -o close -p 7 hmc01
$

For review we list the open service events:

$ hmc lssvcevents 
TIME                 PROBLEM  PMH   HMC     REFCODE   STATE     STATUS  CALLHOME  FAILING_MTMS      TEXT                                         
02/16/2019 16:14:28  8        -     hmc01   B3030001  approved  Open    false     8231-E2B/06A084P  ACT04284I A Management Console connect failed
02/11/2019 16:12:43  37       -     machmc  B3030001  approved  Open    false     8231-E2B/06A084P  ACT04284I A Management Console connect failed
02/11/2019 17:43:19  38       -     machmc  B3030001  approved  Open    false     8231-E2B/06A084P  ACT04283I A connection to a FSP,BPA...   
$ 

The event with the number 7 was closed successfully.

Service events are easy to manage with the LPAR tool!

LPAR tool is available now

Starting from  5th november 2018 the LPAR-Tool is officially available.

A version for Linux can be downloaded from the download area (versions for AIX and MacOS will follow soon). A user guide is in preparation.

To test the LPAR tool free of charge:

  1. Download the current version of the LPAR tool from the download area.
  2. Request a trial license.
  3. Install the LPAR tool.

Have fun testing the LPAR tool.

Ab 05.11.2018 ist das LPAR-Tool nun offiziell verfügbar!