“who -r” does not return run level

On one of our systems, the command “who -r” did not return run level information. No error message was shown:

$ who -r
$ echo $?
0
$

As a consequence, an install script terminated with an error, since it was not able to determine the run level.

The information about the run level comes from the binary log file /etc/utmp. The run level is stored as the second record in this file. We assumed that /etc/utmp contained corrupt records.

The command /usr/sbin/acct/fwtmp (bos.acct) can be used to convert binary utmp-records to ASCII (and vice versa). The command expects the records to convert on standard input. In our case we got:

$ cat /etc/utmp | /usr/sbin/acct/fwtmp
                        system boot   2     0 0000 0000 1484666008                                  Tue Jan 17 16:13:28 CET 2017
root                                  0 804397248 0000 0000          0 \ufffd{\ufffd\ufffd                             Thu Jan  1 01:00:00 CET 1970
         naudio                       8 3473526 0000 0000 1484666008                                  Tue Jan 17 16:13:28 CET 2017
         naudio2                      8 3539068 0000 0000 1484666008                                  Tue Jan 17 16:13:28 CET 2017
...

The output above confirmed that the second record was corrupt, since it obviously did not contained the run level. Comparing with the entries from a working system showed how the correct records should look like:

                        system boot   2     0 0000 0000 1545044734                                  Mon Dec 17 12:05:34 2018
                        run-level 2   1     0 0062 0123 1545044734                                  Mon Dec 17 12:05:34 2018

First of all we made a copy of the corrupt /etc/utmp. Then we created an ASCII version using the above fwtmp command:

# cp /etc/utmp /etc/utmp.orig
# cat /etc/utmp | /usr/sbin/acct/fwtmp -X -L >/etc/utmp.ascii
#

The options -X and -L ensure that user and host names are not shortened.

Using an editor, we corrected the second entry by using the corresponding entry from the working system above. Then we corrected the timestamps by taking the values from the first entry. All in all the corrected version was:

                        system boot   2     0 0000 0000 1484666008                                  Tue Jan 17 16:13:28 CET 2017
                        run-level 2   1     0 0062 0123 1484666008                                  Tue Jan 17 16:13:28 CET 2017
         naudio                       8 3473526 0000 0000 1484666008                                  Tue Jan 17 16:13:28 CET 2017
...

Now we converted the corrected ASCII version back to the binary format and stored that version under /etc/utmp:

# cat /etc/utmp.ascii | /usr/sbin/acct/fwtmp -ic > /etc/utmp
#

Finally the command “who -r” worked again:

$ who -r
   .        run-level 2 Jan 17 16:13       2    0    S
$

The problem was resolved.