First steps should include basic system resources checks:
After logging in, a simple check can be done by using w command in a terminal, which shows a few basic pieces of information like: uptime, number of logged-in users, and what's most important: an average system load over the given period of time of 1, 5 and 15 minutes. If the load is very high e.g. over 100 or more, it shows that one or more processes demand way more CPU than it's available in the system, which indicates abnormal work of this process.
Sample output of w command:
We can check various system resources by using those commands:
free -mh: On Unix-like operating systems, the free command displays the total amount of free and used physical and swap memory, and the buffers used by the kernel. When the system runs out of memory, it starts to use 'OOM Killer', which stands for: "Out of memory Killer" which is nothing more than a kernel feature that kills processes based on oom_score. In other words, OOM Killer will destroy any process that is allocating too much memory and is the least important to the system, so mostly, the first ones, that get killed, are a user's applications with most memory allocated.
Sample output of free -mh command:
More complex view and more precise data shown can be achieved by using a different command:
dstat -vn: It shows way more data and not only memory-related processes, but also cpu, disk and network bandwidth usage, also allow to track in close to real-time what is happening with resources. Sample output of dstat -vn:
df -i: Before checking the disk space used, a good practice is to check the number of free inodes in the system, as all of them may be used even before we run out of free disk space. Sample of df -i command:
df -h: It shows the amount of total, free and used disk space on all mounted partitions/drives, which can be helpful when determining what is causing the slowdown. Maintaining enough free space on a disk is crucial to keep the system running smoothly, e.g. having a root partition "/" full can destabilize the whole operating system.
Sample output of df -h command:
iotop -aoP: a tool that shows current disk read/write/swap/%IO parameters, associated command, and the user under which the command is running. It shows processes with most disk operations, and when one is doing e.g. too many writes to disk, it may slow down the whole system, as other disk operations are put on hold / in a queue because of this demanding process.
Sample output of iotop -aoP command sorted by most IO% ( used Input/Output operations in percentage -- the least, the better – 100% is the max for the system) :
top/htop: real-time monitoring tools which focus on various aspects of the system: those commands shows many useful parameters with regards to processes: PID, nice level, exact command with child processes created, owner of the command/user it was started under, running time of the command, % of used cpu and memory, load on the system and a few more. It's a convenient way of finding which process demands most cpu/memory resources and when was it started and by whom. By using the -u USER switch, we can list processes owned by a specific user only.
Sample output of top command:
cat /proc/mdstat: /proc/mdstat is a file maintained by the kernel which contains the real time information about the RAID arrays and devices. For detailed view of individual devices use: mdadm --detail /dev/md0 (instead of md0 use your device name taken from cat /proc/mdstat ).
Sample output of cat /proc/mdstat:
A sample of detailed view using mdadm --detail /dev/md0:
smartctl: smartctl is a command line utility that perform SMART tasks such as printing the SMART self-test and error logs, enabling and disabling SMART automatic testing, and initiating device self-tests. It can be used to determine health of disks and overall status of physical devices.
To see a summary of SMART tests run: smartctl -H /dev/sda (where /dev/sda is the path to the device ).
Sample output of this command:
smartctl -a /dev/sda: It outputs a lot of detailed informations, but one of the most important is the table with SMART attributes which shows exact records for individual SMART attributes.
Sample data from the smartctl -a /dev/sda command:
drbdadm status: this command shows status of drbd ( in case drbd is configured and in use ):
Sample output showing proper functioning:
dmesg -xe: is a command that prints the message buffer of the kernel. It helps with targeting malfunctioning e.g. drivers of devices or devices themselfs, but is not limited to. Typical output shows a lot if informations, so it can be used with less or more command (which allow for scrolling text in terminal window).
Sample of dmesg | more command:
Next, we need to investigate further:
In case of problems with a slow running application, for example Workbench, you can make sure that redis is working properly and no performance errors are thrown in the logs. Usually the target path where we can check the redis logs is as follows: /var/log/redis/redis.log
Sample output of redis logs showing proper functioning ( cat /var/log/redis/redis.log ):
Garbage Collector (GC)
There is also a need of checking a garbage collector logs, because when we find out, that e.g. the running time of GC is 50s during every minute or there are many logs coming every second from GC, then this might suggest too low memory allocated for an application. Note, that if there is a major problem with Garbage Collector, the logs will show ' FullGC '.
It can be checked using: cat catalina.out | grep GC
Sample output of normal GC logs:
Log into your postgresql instance using: psql and then switch to postgres database using ' \c postgres ' – note that you may have different settings, so if needed, use your credentials and proper switches: ( psql -U username - W -d postgres ). Finally, run this query:
If there are no errors (or stuck queries) you should see output similar to this: