Wednesday 23 November 2011

Linux Load Average


In UNIX computing, the system load is a measure of the amount of work that a computer system performs. The load average represents the average system load over a period of time. It conventionally appears in the form of three numbers which represent the system load during the last one-, five, and fifteen-minute periods.

load average: 0.00, 0.00, 0.00
How do you get this output?
To get your system’s Load Average, run the command uptime. It will show you the current time, how long your system has been powered on for, the number of users logged in, and finally the system’s load average.
What does it mean?
Simply, it is the number of blocking processes in the run queue averaged over a certain time period.
Time periods:
Load average: 1min, 5min, 15min
What is a blocking process?
A blocking process is a process that is waiting for something to continue. Typically, a process is waiting for:
  • CPU
  • Disk I/O
  • Network I/O
What does a high load average mean?
A high load average typically means that your server is under-specified for what it is being used for, or that something has failed (like an externally mounted disk).
How do I diagnose a high load average?
Typically, a server with a high load average is unresponsive and slow — and you want to reduce the load and increase responsiveness. But how do you go about working out what is causing your high load?
Lets start with the simplest one, are we waiting for CPU? Run the Linux command top.

Check the numbers above in the red circle. They are basically representing what percentage of its’ total time the CPU is spending processing stuff. If these numbers are constantly around 99-100% then chances are the problem is related to your CPU, almost certainly that it is under powered. Consider upgrading your CPU.
The next thing to look for is if the cpu is waiting on I/O. Now check the number around where the red circle is now. If this number is high (above 80% or so) then you have problems. This means that the CPU is spending a LOT of time waiting in I/O. This could mean that you have a failing Hard Disk, Failing Network Card, or that your applications are trying to access data on either of them at a rate significantly higher than the throughput that they are designed for.

 

 

 

All Unix and Unix-like systems generate a metric of three "load average" numbers in the kernel. Users can easily query the current result from a UNIX shell by running the uptime command:
The w and top commands show the same three load average numbers, as do a range of graphical user interface utilities. In Linux, they can also be accessed by reading the /proc/loadavg file.

Load Average

  • 0.00 means there's no traffic on the bridge at all. In fact, between 0.00 and 1.00 means there's no backup, and an arriving car will just go right on.
  • 1.00 means the bridge is exactly at capacity. All is still good, but if traffic gets a little heavier, things are going to slow down.
  • Over 1.00 means there's backup. How much? Well, 2.00 means that there are two lanes worth of cars total -- one lane's worth on the bridge, and one lane's worth waiting. 3.00 means there are three lane's worth total -- one lane's worth on the bridge, and two lanes' worth waiting. Etc.
  • This indicates the average CPU load over a specific time period.
  • On Linux, load average is displayed for the last 1 minute, 5 minutes, and 15 minutes. This is helpful to see whether the overall load on the system is going up or down.
For example -1: a load average of “0.75 1.70 2.10″ indicates that the load on the system is coming down. 0.75 is the load average in the last 1 minute. 1.70 is the load average in the last 5 minutes. 2.10 is the load average in the last 15 minutes.
  • Please note that this load average is calculated by combining both the total number of process in the queue, and the total number of processes in the uninterruptable task status
For example -2: one can interpret a load average of "1.73 0.50 7.98" on a single-CPU system as:
  • during the last minute, the CPU was overloaded by 73% (1 CPU with 1.73 runnable processes, so that 0.73 processes had to wait for a turn)
  • during the last 5 minutes, the CPU was underloaded 50% (no processes had to wait for a turn)
  • during the last 15 minutes, the CPU was overloaded 698% (1 CPU with 7.98 runnable processes, so that 6.98 processes had to wait for a turn)

Load on a server [top, w, uptime]

This command is used to find the load on the server. " top " command can also be used to find the process and users that causes load on the server. It gives information about the total process, sleeping process, the zombie process etc.
Example:
root@server [~]$ top -cd3
11:32:03 up 15 days, 23:57, 2 users, load average: 4.95, 5.13, 5.94
220 processes: 219 sleeping, 1 running, 0 zombie, 0 stopped
CPU states: cpu user nice system irq softirq iowait idle
total 1.5% 1.6% 2.4% 0.0% 0.0% 0.0% 94.3%
cpu00 3.4% 2.8% 2.8% 0.0% 0.0% 0.0% 90.9%
cpu01 0.3% 3.1% 0.0% 0.0% 0.0% 0.0% 96.5%
cpu02 2.5% 0.3% 6.5% 0.0% 0.0% 0.0% 90.6%
cpu03 0.0% 0.3% 0.3% 0.0% 0.0% 0.0% 99.3%
Mem: 3104932k av, 2909432k used, 195500k free, 0k shrd, 284548k buff
1201588k active, 1558304k inactive
Swap: 3004112k av, 499936k used, 2504176k free 1015264k cached
PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND
3754 root 16 0 1252 1252 896 R 1.4 0.0 0:01 2 top -cd3
3620 nobody 9 0 61460 45M 28768 S 0.6 1.4 2:23 0 /usr/local/apache/bin/httpd -DSSL
3604 mailnull 9 0 4204 4116 2816 S 0.2 0.1 0:00 0 /usr/sbin/exim -bd -q60m
29956 root 9 0 4684 3384 2640 S 0.1 0.1 0:31 0 /etc/authlib/authProg
1 root 8 0 468 440 416 S 0.0 0.0 0:34 2 init [3]
From the above example you can see the load average, total processes, sleeping processes and the CPU usage. You can find the load average ( here the load average is " 4.95 " ), the memory usage, stats, swap and the list of process and its users.
This command is also find the load and users on the server. " w " command will provide a brief description about the load, time, number of users and the uptime of the server.
Example:
root@server [~]$ w
11:39:18 up 16 days, 4 min, 2 users, load average: 5.33, 5.37, 5.74
USER TTY FROM LOGIN@ IDLE JCPU PCPU WHAT
user1 pts/0 user - ip1 8:26am 3:13m 0.09s 0.00s sshd: user1 [priv]
user2 pts/3 user - ip2 11:09am 0.00s 0.13s 0.02s sshd: user2 [priv]
This command gives the basic information about the uptime and load of the server.
Example:
root@server [~]$ uptime
11:42:52 up 16 days, 8 min, 2 users, load average: 4.91, 5.35, 5.67
From the above example you can find the load and the number of day’s server running with out failure etc...
 

1 comment: