Novell Home

Console Monitoring Tools for SUSE Linux Enterprise Server

Novell Cool Solutions: Feature
By Andras Dosztal

Digg This - Slashdot This

Posted: 10 Jan 2007
 

In this article you may learn about tools for finding errors, spotting bottlenecks or just keeping an eye on your server. SUSE Linux Enterprise server has some built-in command line tools that are suitable for these purposes. We discuss the following commands: top, iostat, sar, free, pmap, uptime, smartctl, and strace.

Packages

The following packages needed:

  • sysstat (sysstat-6.0.2-16.4 in SLES 10)
  • smartmontools (smartmontools-5.33-20.2 in SLES 10)
  • coreutils (coreutils-5.93-22.2 in SLES 10)
  • strace (strace-4.5.14-15.2 in SLES 10)

top

Top is a commonly used tool for viewing the list of processes which consume the most resources. It also displays a summary of CPU and memory usage. Example 1 shows a sample top output (using the default fields). The most important commands during running Top are following:

  • h: Displays the help.
  • d: Set update interval, the default is 3 seconds.
  • k {PID}: kills a process identified by PID.
  • F: Select sort order. A 2nd screen appears where you can select the sort field.
  • f: Select fields to display. A 2nd screen appears where you can select the fields.
top - 12:47:23 up  3:21,  1 user,  load average: 0.00, 0.00, 0.00
Tasks:  51 total,   2 running,  49 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.0%us,  0.0%sy,  0.0%ni, 99.7%id,  0.3%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:    646256k total,   298608k used,   347648k free,    56388k buffers
Swap:   530104k total,        0k used,   530104k free,   186056k cached

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
    1 root      16   0   716  280  244 S  0.0  0.0   0:01.31 init
    2 root      34  19     0    0    0 S  0.0  0.0   0:00.00 ksoftirqd/0
    3 root      10  -5     0    0    0 S  0.0  0.0   0:00.13 events/0
    4 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 khelper
    5 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread
    7 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 kblockd/0
    8 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid
  112 root      20   0     0    0    0 S  0.0  0.0   0:00.00 pdflush
  113 root      15   0     0    0    0 S  0.0  0.0   0:00.12 pdflush
  115 root      19  -5     0    0    0 S  0.0  0.0   0:00.00 aio/0
  114 root      25   0     0    0    0 S  0.0  0.0   0:00.00 kswapd0
  321 root      16  -5     0    0    0 S  0.0  0.0   0:00.00 cqueue/0
  322 root      10  -5     0    0    0 S  0.0  0.0   0:00.02 kseriod
  362 root      15  -5     0    0    0 S  0.0  0.0   0:00.00 kpsmoused
  763 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 reiserfs/0
  827 root      12  -4  1832  648  348 S  0.0  0.1   0:00.63 udevd
 1346 root      20   0     0    0    0 S  0.0  0.0   0:00.00 shpchpd_event

Example 1: top

The fields in this sample are:

  • PID: The Process ID of the running software.
  • USER: The user who is running the command.
  • PR: Priority of the process
  • NI: Niceness level
  • VIRT: Memory usage of the process. Contains the memory used by the code, the data and the stacks, in kB.
  • RES: Usage of physical memory, in kB.
  • SHR: Memory shared with other processes, in kB.
  • S: State of the process. State can be D (interruptible sleeping), S (Sleeping), R (Running), T (stopped or Traced) or Z (Zombie).
  • %CPU: CPU usage, in percent.
  • %MEM: Memory usage, in percent.
  • TIME+: CPU Time.
  • COMMAND: The name of the process.

You can exit Top by pressing Ctrl-C.

free

Free is used for viewing memory usage. It displays the total amount, the used and the available memory and swap space. Using free with -b,-k,-m or -g options show output in bytes, kB, MB, or GB. Example 2 shows a sample output of free.

server01:~ # free -m
             total       used       free     shared    buffers     cached
Mem:           631        291        339          0         55        181
-/+ buffers/cache:         54        576
Swap:          517          0        517
server01:~ #

Example 2: free

uptime

Uptime has three often used functions (see Example 3):

  • Shows how long the computer has been running
  • Displays the number of logged in users
  • Shows system load. You can find more info about system load in this Wikipedia article.
server01:~ # uptime
  1:32pm  up   4:06,  3 users,  load average: 1.41, 0.52, 0.19
server01:~ #

Example 3: uptime

In this case, the system time is 1:32pm, the system has been running for 4 hours and 6 minutes, 3 users are logged in and the load numbers are 1.41, 0.52, 0.19 (1, 5 and 15 minutes average).

pmap

Pmap shows the memory usage of a process along with the underlying files. With pmap you can trace processes that eat up memory.

server01:~ # pmap 1972
1972: acpid
START       SIZE     RSS   DIRTY PERM MAPPING
08048000     16K     16K      0K r-xp /sbin/acpid
0804c000      4K      4K      4K rw-p /sbin/acpid
0804d000    136K     20K     20K rw-p [heap]
b7dea000      4K      4K      4K rw-p [anon]
b7deb000   1124K    380K      0K r-xp /lib/libc-2.4.so
b7f04000      8K      8K      8K r--p /lib/libc-2.4.so
b7f06000      8K      8K      8K rw-p /lib/libc-2.4.so
b7f08000     12K      8K      8K rw-p [anon]
b7f11000      8K      8K      8K rw-p [anon]
b7f13000    104K     32K      0K r-xp /lib/ld-2.4.so
b7f2d000      8K      8K      8K rw-p /lib/ld-2.4.so
bfdb8000     88K      8K      8K rw-p [stack]
ffffe000      4K      0K      0K ---p [vdso]
Total:     1524K    504K     76K

268K writable-private, 1256K readonly-private, and 0K shared
server01:~ #

Example 4: pmap

smartctl

Smartctl displays statistics for the hard disk subsystems; useable only when the hard drive is S.M.A.R.T. capable. The most important options for this command are:

  • -i: Displays general information about the hard drive (Example 5). Note: If the drive is SMART capable but the feature is turned off you can turn it on using the smartctl -s on {device} command.
server01:~ # smartctl -i /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Device Model:     Maxtor 2B020H1
Serial Number:    B1HZYECE
Firmware Version: WAK21R90
User Capacity:    20,490,559,488 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   7
ATA Standard is:  ATA/ATAPI-7 T13 1532D revision 0
Local Time is:    Tue Jan  2 15:05:06 2007 CET
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

server01:~ #

5: General SMART info

  • -c: Shows the hard drive's capabilites
  • -H: Do some health tests.
  • -A: Displays the drive's attributes. This is very useful for spotting a hard drive that is going to fail. Example 6 shows output for an old disk:
server01:~ # smartctl -AH /dev/hda
smartctl version 5.33 [i686-pc-linux-gnu] Copyright (C) 2002-4 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG   VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  3 Spin_Up_Time            0x0027   233   232   063    Pre-fail  Always       -       6399
  4 Start_Stop_Count        0x0032   253   253   000    Old_age   Always       -       179
  5 Reallocated_Sector_Ct   0x0033   253   253   063    Pre-fail  Always       -       0
  6 Read_Channel_Margin     0x0001   253   253   100    Pre-fail  Offline      -       0
  7 Seek_Error_Rate         0x000a   253   252   000    Old_age   Always       -       0
  8 Seek_Time_Performance   0x0027   249   238   187    Pre-fail  Always       -       56532
  9 Power_On_Minutes        0x0032   251   251   000    Old_age   Always       -     1030h+44m
 10 Spin_Retry_Count        0x002b   253   252   157    Pre-fail  Always       -       0
 11 Calibration_Retry_Count 0x002b   253   252   223    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   253   253   000    Old_age   Always       -       254
192 Power-Off_Retract_Count 0x0032   253   253   000    Old_age   Always       -       43
193 Load_Cycle_Count        0x0032   253   253   000    Old_age   Always       -       313
194 Unknown_Attribute       0x0032   253   253   000    Old_age   Always       -       0
195 Hardware_ECC_Recovered  0x000a   253   252   000    Old_age   Always       -       13
196 Reallocated_Event_Count 0x0008   253   253   000    Old_age   Offline      -       0
197 Current_Pending_Sector  0x0008   253   253   000    Old_age   Offline      -       0
198 Offline_Uncorrectable   0x0008   253   253   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x0008   199   199   000    Old_age   Offline      -       0
200 Multi_Zone_Error_Rate   0x000a   253   252   000    Old_age   Always       -       0
201 Soft_Read_Error_Rate    0x000a   253   252   000    Old_age   Always       -       0
202 TA_Increase_Count       0x000a   253   252   000    Old_age   Always       -       0
203 Run_Out_Cancel          0x000b   253   252   180    Pre-fail  Always       -       0
204 Shock_Count_Write_Opern 0x000a   253   252   000    Old_age   Always       -       0
205 Shock_Rate_Write_Opern  0x000a   253   252   000    Old_age   Always       -       0
207 Spin_High_Current       0x002a   253   252   000    Old_age   Always       -       0
208 Spin_Buzz               0x002a   253   252   000    Old_age   Always       -       0
209 Offline_Seek_Performnce 0x0024   253   253   000    Old_age   Offline      -       0
 99 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
100 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0
101 Unknown_Attribute       0x0004   253   253   000    Old_age   Offline      -       0

server01:~ #

6: SMART attributes

Note 1: smartmontools works only if you use independent disks, software RAID or 3ware RAID controllers. For other RAID controllers please use software supplied by the vendor. Note 2: smartmontools has a daemon called smartd which monitors hard disks continuously. Note from Yast installer: To prevent system hangs from buggy devices, smartd is turned off by default. Please test smartd manually first and then turn it on via the Runlevel Editor or by /sbin/chkconfig -add smartd.

iostat

The iostat tool reports statistics about CPU and input/output rates of disks or partitions. The main options for this command are:

  • -c: Reports only CPU statistics.
  • -d: Reports only device utilization. Note: Cannot be used together with the -c option.
  • -p {device | ALL}: Display statistics for the partitions of a drive. Statistics for all block devices will be displayed if used with ALL.
  • -x: Display extended disk reports. Note: Cannot be used together with the -p option.

Command usage: iostat [options] [delay] [repeats]

In Example 7 iostat displays usage for all partitions on /dev/hda 5 times with 1 second delay, in kB. It looks that /dev/hda2 is in use, the swap partition (/dev/hda1) is idle.

server01:~ # iostat -d -k -p hda 1 5
Linux 2.6.16.21-0.8-default (server01)  01/03/07

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
hda               2.30        31.00         7.75    3271938     817668
hda2              3.86        30.84         2.49    3254805     262476
hda1              1.33         0.16         5.14      16805     542232

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
hda             315.84      1263.37         0.00       1276          0
hda2            315.84      1263.37         0.00       1276          0
hda1              0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
hda             214.00       856.00         0.00        856          0
hda2            214.00       856.00         0.00        856          0
hda1              0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
hda             273.00      1056.00      3608.00       1056       3608
hda2           1163.00      1052.00      3600.00       1052       3600
hda1              0.00         0.00         0.00          0          0

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
hda             315.00       472.00      2780.00        472       2780
hda2           1093.00       476.00      3896.00        476       3896
hda1              0.00         0.00         0.00          0          0

server01:~ #

Example 7: iostat

strace

Strace as a diagnostic tool for debugging, hacking programs, scripts. You can find all the system calls by tracing programs you run.

The best way to understand strace is Example 8. In this example we make a text file called test.txt and trace the viewing of this file with cat. The file contains this text: ">>>> This is the test file's content <<<<" To do this we enter the following command: strace cat test.txt

server01:~ # strace cat test.txt
execve("/bin/cat", ["cat", "test.txt"], [/* 55 vars */]) = 0
brk(0)                                  = 0x804d000
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa3000
access("/etc/ld.so.preload", R_OK)      = -1 ENOENT (No such file or directory)
open("/etc/ld.so.cache", O_RDONLY)      = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=28008, ...}) = 0
mmap2(NULL, 28008, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7f9c000
close(3)                                = 0
open("/lib/libc.so.6", O_RDONLY)        = 3
read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\300Y\1"..., 512) = 512
fstat64(3, {st_mode=S_IFREG|0755, st_size=1404242, ...}) = 0
mmap2(NULL, 1176988, PROT_READ|PROT_EXEC, MAP_PRIVATE|MAP_DENYWRITE, 3, 0) = 0xb7e7c000
madvise(0xb7e7c000, 1176988, MADV_SEQUENTIAL|0x1) = 0
mmap2(0xb7f95000, 16384, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_DENYWRITE, 3, 0x118) = 0xb7f95000
mmap2(0xb7f99000, 9628, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = 0xb7f99000
close(3)                                = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7e7b000
set_thread_area({entry_number:-1 -> 6, base_addr:0xb7e7b6b0, 
limit:1048575, seg_32bit:1, contents:0, read_exec_only:0, limit_in_pages:1, seg_not_present:0, useable:1}) = 0
mprotect(0xb7f95000, 8192, PROT_READ)   = 0
munmap(0xb7f9c000, 28008)               = 0
brk(0)                                  = 0x804d000
brk(0x806e000)                          = 0x806e000
open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = -1 ENOENT (No such file or directory)
open("/usr/share/locale/locale.alias", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=2528, ...}) = 0
mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7fa2000
read(3, "# Locale name alias data base.\n#"..., 4096) = 2528
read(3, "", 4096)                       = 0
close(3)                                = 0
munmap(0xb7fa2000, 4096)                = 0
open("/usr/lib/locale/en_US.UTF-8/LC_CTYPE", O_RDONLY) = -1 ENOENT (No such file or directory)
open("/usr/lib/locale/en_US.utf8/LC_CTYPE", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=208464, ...}) = 0
mmap2(NULL, 208464, PROT_READ, MAP_PRIVATE, 3, 0) = 0xb7e48000
close(3)                                = 0
open("/usr/lib/gconv/gconv-modules.cache", O_RDONLY) = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=25404, ...}) = 0
mmap2(NULL, 25404, PROT_READ, MAP_SHARED, 3, 0) = 0xb7f9c000
close(3)                                = 0
fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 1), ...}) = 0
open("test.txt", O_RDONLY|O_LARGEFILE)  = 3
fstat64(3, {st_mode=S_IFREG|0644, st_size=42, ...}) = 0
read(3, ">>>> This is the test file\'s con"..., 4096) = 42
write(1, ">>>> This is the test file\'s con"..., 42>>>> This is the test file's content <<<<
) = 42
read(3, "", 4096)                       = 0
close(3)                                = 0
close(1)                                = 0
exit_group(0)                           = ?
Process 3382 detached
server01:~ #

Example 8: strace

The important parts are highlited with bold. You can see that the /bin/cat command has been executed (it uses shared objects like libc.so.6), then cat opens the file called test.txt, reads its content and writes it to standard output. In last line strace tells us the process finished its running.

You may find it hard to use in the beginning but in some cases strace is indispensable.

Try these tool on your sytem as they are not hard to use and you can rely on them when nothing else works (ie. Tools on graphical interfaces).


Novell Cool Solutions (corporate web communities) are produced by WebWise Solutions. www.webwiseone.com

© 2014 Novell