In my recent article 4 open source tools for Linux system monitoring, I discussed interactive tools that can be helpful in solving system problems. I also mentioned the
sar command and how it is different from interactive commands. The
sar command is one of my favorites when it comes to resolving problems. It is especially useful for those problems that seem to manifest themselves while one is not looking.
And, for when you really want to get down to the raw data, the /proc filesystem has everything you need.
SAR stands for System Activity Reporter. Its primary function is to collect system performance data for each day and store it in log files for later display. Data is collected as 10-minute averages, but more granular collection can also be configured. Data is retained for one month.
Installation and configuration
SAR is installed as part of the sysstat package in Red Hat-based distributions. (Check your distribution's man pages for installation details.)
dnf -y install sysstat
After installing sar as part of the sysstat package, usually nothing needs to be done to alter its configuration. Data is collected on every 10-minute mark of each hour. Note that the output of the sar command can be very wide—too wide for use in this article. Therefore, if you have just installed the sysstat package, you should wait 30 minutes or so for data to be collected that can be used during the rest of this article. However, if you want to change the timing of data collection, you will have to configure the cron or systemd configuration files.
Up through Fedora 20, sar data collection and daily summary processing was controlled by the
sysstat cron job in /etc/cron.d. Starting with Fedora 2X, sar no longer uses cron jobs to control its collection and daily summary activities and systemd has taken over those duties. Check in the /usr/lib/systemd/system directory for the sysstat service, summary, and collect files if you need to change anything. The files are small and self-explanatory.
The only time I have made any changes to the sar configuration is when I needed to collect data every minute instead of every 10 minutes to get a better handle on the exact time a particular problem was occurring. The sar data is stored in two files per day in the /var/log/sa directory. Collecting data more frequently than every 10 minutes can cause these files to grow very large.
In one place I worked, we had a problem that started and escalated so quickly that the default 10-minute interval was not helpful in determining which occurred first: CPU load, high disk activity, or something else. Using a 1-minute interval, we determined that not only was CPU activity high, but that it was preceded by a short interval of high network activity as well as high disk activity. Ultimately we determined that this was an unintentional denial of service (DOS) attack on the web server that was complicated by the fact that there was too little RAM installed in the computer to handle the temporary overload. Adding 2GB of RAM to the existing 2GB resolved the problem and further DOS attacks have not caused problems.
Examining collected data
The output from the sar command can be detailed, or you can choose to limit the data displayed. For example, enter the
sar command with no options, which displays only aggregate CPU performance data. The sar command uses the current day by default, starting at midnight, so you should only see the CPU data for today.
On the other hand, using the
sar -A command shows all of the data that has been collected for today. Enter the
sar -A | less command now and page through the output to view the many types of data collected by SAR, including disk and network usage, CPU context switches (how many times per second the CPU switched from one program to another), page swaps, memory and swap space usage, and much more. Use the man page for the sar command to interpret the results and to get an idea of the many options available. Many of those options allow you to view specific data, such as network and disk performance.
I typically use the
sar -A command because many of the types of data available are interrelated, and sometimes I find something that gives me a clue to a performance problem in a section of the output that I might not have looked at otherwise. The
-A option displays all of the collected data types.
Look at the entire output of the
sar -A | less command to get a feel for the type and amount of data displayed. Be sure to look at the CPU usage data as well as the processes started per second (proc/s) and context switches per second (cswch/s). If the number of context switches increases rapidly, that can indicate that running processes are being swapped off the CPU very frequently.
You can limit the total amount of data to the total CPU activity with the
sar -u command. Try that and notice that you only get the composite CPU data, not the data for the individual CPUs. Also try the
-r option for memory, and
-Sfor swap space. Combining these options so the following command will display CPU, memory, and swap space is also possible:
-p option displays block device names for hard drives instead of the much more cryptic device identifiers, and
-d displays only the block devices—the hard drives. Issue the following command to view all of the block device data in a readable format using the names as they are found in the /dev directory:
sar -dp | less
If you want only data between certain times, you can use
-e to define the start and end times, respectively. The following command displays all CPU data, both individual and aggregate for the time period between 7:50 AM and 8:11 AM today:
sar -P ALL -s 07:50:00 -e 08:11:00
Note that all times must be in 24-hour format. If you have multiple CPUs, each CPU is detailed individually, and the average for all CPUs is also given.
The next command uses the
-n option to display network statistics for all interfaces:
sar -n ALL | less
Data for previous days
Data collected for previous days can also be examined by specifying the desired log file. Assume that today's date is September 3 and you want to see the data for yesterday, the following command displays all collected data for September 2. The last two digits of each file are the day of the month on which the data was collected:
sar -A -f /var/log/sa/sa02 | less
You can use the command below, where
DD is the day of the month for yesterday:
sar -A -f /var/log/sa/saDD | less
You can also use SAR to display (nearly) realtime data. The following command displays memory usage in 5- second intervals for 10 iterations:
sar -r 5 10
This is an interesting option for sar as it can provide a series of data points for a defined period of time that can be examined in detail and compared. The /proc filesystem All of this data for SAR and the system monitoring tools covered in my previous article must come from somewhere. Fortunately, all of that kernel data is easily available in the /proc filesystem. In fact, because the kernel performance data stored there is all in ASCII text format, it can be displayed using simple commands like
cat so that the individual programs do not have to load their own kernel modules to collect it. This saves system resources and makes the data more accurate. SAR and the system monitoring tools I have discussed in my previous article all collect their data from the /proc filesystem.
Note that /proc is a virtual filesystem and only exists in RAM while Linux is running. It is not stored on the hard drive.
Even though I won't get into detail, the /proc filesystem also contains the live kernel tuning parameters and variables. Thus you can change the kernel tuning by simply changing the appropriate kernel tuning variable in /proc; no reboot is required.
Change to the /proc directory and list the files there.You will see, in addition to the data files, a large quantity of numbered directories. Each of these directories represents a process where the directory name is the Process ID (PID). You can delve into those directories to locate information about individual processes that might be of interest.
To view this data, simply cat some of the following files:
cmdline— displays the kernel command line, including all parameters passed to it.
cpuinfo— displays information about the CPU(s) including flags, model name stepping, and cache size.
meminfo— displays very detailed information about memory, including data such as active and inactive memory, and total virtual memory allocated and that used, that is not always displayed by other tools.
ioports— lists the memory ranges and ports defined for various I/O devices.
You will see that, although the data is available in these files, much of it is not annotated in any way. That means you will have work to do to identify and extract the desired data. However, the monitoring tools already discussed already do that for the data they are designed to display.
There is so much more data in the /proc filesystem that the best way to learn more about it is to refer to the proc(5) man page, which contains detailed information about the various files found there.
Next time I will pull all this together and discuss how I have used these tools to solve problems.