How to monitor NFS, Lustre File System Performance using CollectlBy Shaon, Gaea News Network
Tuesday, December 7, 2010
What makes Collectl, a performance monitor for benchmarking, monitoring a system’s general heath different from other monitoring tools is that unlike most tools that focus on a small set of statistics and runs either in an iterative manner or a daemon it tries to do it all. The user is free to choose any broad set of subsystems. The options includes buddyinfo, cpu, disk, inodes, infiniband, Lustre, memory, network, nfs, processes, quadrics, slabs, sockets and tcp. The tool runs in all of the major Linux distributions and is included as a part of the Fedora package. Collectl provides an rpm for rpm based distros like CentOS, RHEL, Fedora etc. Source rpm is also available as well as source code is available for non-rpm based distros like Ubuntu. It requires only perl for functioning correctly. The tool has received approval from the industry leaders. We will now try and figure out how to use Collectl effectively and efficiently to monitor the NFS, Lustre, Gluster FS File System Performance.
It is worthwhile to mention however that Looking at detail data for more than one type of data can be cumbersome. It is advisable to use –home which gives the feel of a real-time display in top format.
This is an example which has been taken while writing a large file and running the Collectl command sans arguments. The default view is that of a cpu, network and disk stats in brief format. The key advantage of this format is that it becomes easier to spot spikes or other mis matches present in the output.
#cpu sys inter ctxsw KBRead Reads KBWrit Writes netKBi pkt-in netKBo pkt-out
37 37 382 188 0 0 27144 254 45 68 3 21
25 25 366 180 20 4 31280 296 0 1 0 0
25 25 368 183 0 0 31720 275 2 20 0 1
Collectl will display interrupts, memory usage and nfs activity with timestamps
. Hence it is advisable to keep the windows wide enough.
# <——-Int——–><———–Memory———–><——NFS Totals——>
#Time Cpu0 Cpu1 Cpu2 Cpu3 Free Buff Cach Inac Slab Map Reads Writes Meta Comm
08:36:52 1001 66 0 0 2G 201M 609M 363M 219M 106M 0 0 5 0
08:36:53 999 1657 0 0 2G 201M 1G 918M 252M 106M 0 12622 0 2
08:36:54 1001 7488 0 0 1G 201M 1G 1G 286M 106M 0 20147 0 2
The same information may be availed in the verbose format as well. The user can even see the network getting stalled while waiting on the server to write data physically.
# NETWORK SUMMARY (/sec)
# KBIn PktIn SizeIn MultI CmpI ErrIn KBOut PktOut SizeO CmpO ErrOut
08:46:35.002 3255 41000 81 0 0 0 112015 78837 1454 0 0
08:46:36.003 0 9 70 0 0 0 29 25 1174 0 0
08:46:37.003 0 2 70 0 0 0 0 2 134 0 0
In the last of the examples we see the multiple output lines for a particular data.
# Int Cpu0 Cpu1 Cpu2 Cpu3 Type Device(s)
08:52:32.002 225 0 4 0 0 IO-APIC-level ioc0
08:52:32.002 000 1000 0 0 0 IO-APIC-edge timer
08:52:32.002 014 0 0 18 0 IO-APIC-edge ide0
08:52:32.002 090 0 0 0 15461 IO-APIC-level eth1
Since the version 3.2.1, nfs monitoring has undergone a change in many of its key features. The older algorithm that restricted reports on the client or server for a specific version is gone now. The reporting is now much more extensive and collects data for all types of nfs data not excluding the nfs version 4. Collectl takes the raw read/write fields for each set of nfs data. Thus enabling the user to get the total value since boot. when both the fields are set to 0 then it acn’t assumed that there is no current nfs activity. If the user mounts a filesystem then it is certain that the counters would retain a non zero value till re booted. While this approach is not perfect it vastly brings down the requirement for a large number of user filters. The focus is largely retained at detailed outputs.
The results for this will vary as per the system. If the system is running mixed nfs versions and/or acting as a client and server it might be handy to get a subset for the activities to be included in the summary reports. The parameters may be modified by using filters by using the –nfsfilt switch. During this if the user decides to select clients then the data of the given clients would be mentioned in the brief, summary and detail formats. However while using a raw file Collectl records the data for all NFS client versions even if the user has specified his selection. The usage of -nfsfilt, dictates that the values are displayed in the brief and verbose headers during collection and playback.
The collected data for NFS V2 and V3 is almost identical in that V3 reports of superset of what V2 reports. The exception being the reports of root and wrcache. V4 will report more counters than either,however many of the same key ones as V3. So the detail format is standardized on the V3 counters. For this mode, only one line is reported for each of the 6 types. The blank entries denoting the non-common fields.
Playing back data generated by older versions of Collectl
Collectl is efficient enough to figure out what to do with the data generated by the older versions of Collectl. If the user is playing back data generated by a pre-3.2.1 version of Collectl, it figures out the type of data the contained in the file. It also helpfully sets the –nfsfilt for the user (the user is not free to select it). The data is displayed in the file.
Playing back newer data with older versions of Collectl
Raw files created by Version 3.2.1 of the Collectl records nfs data in a format that is not recognizable by earlier Collectl versions, any attempts to read it will result in a table with all zero fields.
The good thing about Lustre is that it records a wealth of performance data. The list is so extensive that it can make little sense as to why all this reports will be relevant to the user. It is to be understood while dealing with Lustre reporting is that mostly in cases, if the the server(s) has been configured and the user is content with monitoring it them, all that is required is to specify -sl or -sL and Collectl will take care of the rest. It automatically detects the mode of service(s) currently active. It then either records or displays the required data. If you select -sl without having Lustre installed in the system, Collectl warns you and consequently disables that switch.
Controlling the display of Data
To accommodate the broadest flexibility possible one gets to control the procedure of data collection and display by using several complementary switches.
1. -s :- The user can specify ‘-sl’ for summary level data, ‘-sL’ for detail data or combine them. There is an option to show the OST level details as the client detail data may presented actually at the individual filesystem or OST level.
2. –lustopts:- This switch provides more details of data types to be collected/displayed. There are 5 such values that collectl considers:
B - rpc buffer level data.
D - disk block statistics, which applies to both MDS and OSS servers. One should also note this is specific to HP SFS and this data is not available in the open source version.
M - client metadata (note that this was the default prior to collectl V1.6.2).
O - for client details only, show results by OST
R - read_ahead statistics. Unlike the other options, which generate a lot of data, –lustopts R may be used with brief mode.
IN order to let the user display whatever he desires, Collectl allows selection of multiple values for –lustopts and it tries to show the results appropriately. Try experiment to come out with what ever The USer is looking for.
As is always the case with Lustre unless otherwise instructed to do try else, collectl playbacks its recorded data based on the parameters selected for collection during playback.
Figure out Service Configuration Changes Lustre services are liable to change after the start of Collectl. Moreover it is quite possible that the user gets a message telling him that the system type can not be determined. Collectl periodically checks for configuration changes and will automatically adjust the data collected and any current displays. This can also mean that the output format changes. If The User is expecting the issue and you simply want to force the type of output to be consistent, you may use the lustsvc.
Changing the Default Recording/Display Behavior
To get specific control over the data that is recorded or displayed deviating from default behavior or collectl starts before Lustre and it can’t figure the system types. This is the seen when a system plays many roles by providing more than one service.. There may another case where the user has developed some reports or graphs that expect data in a standard format and the collection is a subset (or superset) of data.
To work around this behavior of the Lustre portion of the data, the user may use –lustsvc to specify the type of service(s) he is interested in. In that case Collectl pays attention to monitoring records to a file and the display. While displaying data for services the user never collected data on, the services will prints zeros.