Performing Job Management Tasks – Part II

Using top to Show Current System Activity

·  The “top” program offers a convenient interface in which you can monitor current process activity and also perform some basic management tasks

·  If you want to know how much CPU time various processes are consuming relative to one another or if you want to quickly discover which processes are consuming the most CPU time, this is the tool to use.

·  By default, top sorts its entries by CPU use, and updates its display every few seconds.

·  This is a good tool for spotting runaway processes on an otherwise lightly loaded system. . .these processes almost always, appear in the first position or two, and they consume an inordinate amount of CPU time.

·  The legitimate needs of different programs vary so much that it’s impossible to give a simple rule for judging when a process is consuming too much CPU time

Example: type top at command prompt

à Explanation of what you see:

-- Upper 5-lines of the top interface, you can see information about the current system activity

--- like a status indicator of current system performance. . .the most important information you’ll find in the first line is the “load average”, which gives the load average of the last minute, the last 5 minutes, and the last 15 minutes.

To understand the load average parameter, you should know that it reflects the average number of processes in the run queue, which is the queue where processes wait before they can be handled by the scheduler.

The scheduler is the kernel component that makes sure that a process is handled by any of the CPU cores in your server.

n  Tip: One rough estimate of whether your system can handle the workload is that the number of processes waiting in the run queue should never be higher than the total number of CPU cores in your server

à In the second line of the top window, you’ll see how many tasks your server is currently handling and what each of these tasks is doing. In this line, you may find four status indications:

n  Running: the number of active processes in the last polling loop

n  Sleeping: The number of processes currently loaded in memory, which haven’t issued any activity in the last polling loop

n  Stopped: The number of processes that have been sent a stop signal but haven’t yet freed all of the resources they were using

n  Zombie: The number of processes that are in a zombie state. This is an unmanageable process state because the parent of the zombie process has disappeared and the child still exists but can no longer be managed because the parent is needed to manage that process

--- Note: A zombie process normally is the result of bad programming. If you’re lucky, zombie processes will go away by themselves. Sometimes they don’t and that can be an annoyance. In that case, the only way to clean up your current zombie processes is by rebooting the server.

à In the 3rd line of top, you get an overview of the current processor activity. .If you’re experiencing a problem (which is typically expressed by a high load average), the CPU(s) line tells you exactly what the CPUs in your server are doing. . .This line will help you understand current system activity because it summarizes all the CPUs in your system.

n  For a per-CPU overview of current activity, press the “1” key from the top interface

n  In the CPU(s) line, you’ll find the following information about CPU states:

è us : The percentage of time your system is spending in user space, which is the amount of time your system is handling user-related tasks

è sy : the percentage of time your system is working on kernel-related tasks in system space. On average, this should be(much) lower than the amount of time spent in user space

è ni : The amount of time your system has worked on handling tasks of which the nice value has been changed

è id : The amount of time the CPU has been idle

è wa : The amount of time the CPU has been waiting for I/O requests. This is a very common indicator of performance problems. If you see an elevated value here, you can make your system faster by optimizing disk performance

è hi : The amount of time the CPU has been handling hardware interrupts

è si : The amount of time the CPU has been handling software interrupts

è st : The amount of time that has been stolen from this CPU. You’ll see this only if your server is a virtualization hypervisor host, and this value will increase at the moment that a virtual machine running on this host requests more CPU cycles

à You’ll find current info about memory usage in the last 2 lines of the top status. . the 1st line contains information about memory usage, and the 2nd line has info about the usage of swap space.

The last item on the 2nd line provides info that is really about the usage of memory. The following parameters show how memory currently is used:

à Mem : the total amount of memory that is available to the Linux kernel

à used : the total amount of memory that currently is used

à free : the total amount of memory that is available for starting new processes

à buffers : the amount of memory that is used for buffers. In buffers, essential system tables are stored in memory, as well as data that sill has be committed to disk

à cached : the amount of memory that is currently used for cache

n  NOTE: explaining “caching” : The Linux kernel tries to use system memory as efficiently as possible. To accomplish this goal, the kernel caches a lot. When a user requests a file from disk, it is first read from disk and then copied to RAM.

à Once the file is copied in RAM, the kernel tries to keep it there as long as possible. This process is referred to as caching.

à When the kernel needs memory that currently is allocated to cache for something else, it can claim this memory back immediately. . .The memory in buffers is related to cache.

à Like cache, buffer memory can also be claimed back immediately by the kernel when needed.

-- The lower part of the top window shows a list of the most active processes at the moment (this window is refreshed every 5 seconds)

à If you notice that a process is very busy,

-- you can press the “k” key from within the top interface to terminate that process. . .

--- The top program will first ask for the PID of the process to which you want to send a signal (basically the PID to KILL)..

--- After you enter this, it will ask which signal you want to send to that PID, then it will immediately respond to your request.

More “top” options (add on / 02162015)

“top” options:

Like many Linux commands, top accepts several options. The most useful are listed here:

-d delay This option specifies the delay between updates, which is normally five seconds.

-p pid If you wa n t to monitor specific processes, you can list them u sing this option. You'll need the PIDs, which you can obtain with ps, as described earlier. You can specify up to 20 PIDs by using this option multiple times, once for each PI D.

-n iter You can tell top to displ ay a certain number of updates ( iter ) and then quit. (Normally, top continues updating until you terminate t h e program.)

-b This option specifies batch mode, in which top doesn't use the norm al screen-update commands. You might use this to log CP U use of targeted programs to a fi le, for instance.

You can do more with top than watch it update its display. When it's running, you can enter any of several single -letter commands, some of which prompt you for additional information. These commands include t h e following:

h and ? These keystrokes display help information.

k You can kill a process with this command. The top program will ask for a PID number, and if it's able to kill the process, it will do so. (The upcoming section "Killing Processes" describes other ways to kill processes.)

q This option quits from top.

r You can change a process's priority with this command. You'll have to enter the PI D number and a new priority value- a positive value will decrease its priority, and a negative value will increase its priority, assuming it has the default 0 priority to begin with. Only root may increase a process's priority. The renice command (described shortly, in "Man aging Process Priorities") is another way to accomplish this task.

s This command changes t h e display's update rate, w hic h you'll be asked to enter (in seconds).

P This command sets the display to sort by CPU usage, which is the default.

M You can change the display to sort by memory usage with this comm and.

More commands are available in top (both command-line options an d interactive commands) than can be summarized here; consult top's man page for more information.

U To list all processes owned by a user, specify the -U option: ps -U root

G Gives you the option to display processes based on the owning group.

example: ps -G qemu

Exercise:

·  Managing processes with ps and kill

·  Managing Processes from the Command Line