May 22, 2012

Swiss army knife for ONTAP performance measurement

I don’t hear much about this command from the NetApp user community – stats. It’s a nifty little command that allows us to monitor a lot of performance metrics at a fairly granular level. The command output can also be customized, so this might save a lot of parsing and grep-ing that goes on during our normal filer performance monitoring, charting exercises.

First looks:

san-a> stats
The following commands are available; for more information
type "stats help <command>"
explain             list                start               stop                
help                show                
 
     stats help - Display command help.
     stats list - List available objects, instances, counters or presets in system.
     stats explain - Explain available objects or counters in system.
     stats show - Show all or selected statistics with formatting.
     stats start - Start background statistic collection.
     stats stop - Stop all background operations and discard results.
 
san-a>
san-a> stats list
Usage:
stats list objects [-p <preset>] 
stats list instances [-p <preset>] [<object_name>]
stats list counters [-p <preset>] [<object_name>]
stats list presets
- List available objects, instances, counters or presets in system.
san-a>

stats uses objects, instances and counters to collect and report various performance metrics. Objects are fixed and can be considered as global counters. There are multiple instances of an object (for e.g. disk is an object and for each disk in the system, there is an instance). For each of these instances, ONTAP can collect and report a variety of performance metrics. These individual metrics are called counters.
You can see how we can get as global of a view as possible or as specific of a view as possible using the stats command.

What objects are there ?

san-a> stats list objects
Objects:
        system
        disk
        processor
        ifnet
        nfsv3
        target
        lun
        volume
        cifs
        fcp
        iscsi
        aggregate
        qtree
        quota
        vfiler
        ext_cache
        ext_cache_obj
        logical_replication_destination
        logical_replication_source
        dump
        ndmp
        hostadapter
san-a>

There may be multiple instances for some of these objects. To see the instances, we can do a

san-a> stats list instances lun      
Instances for object name: lun
        /vol/luns/demolun7-C4aYk4Lggh9R
        /vol/luns/san-a-lin03-fcp-lun1-C4aYk4L-g3pl
        /vol/luns/san-a-lin03-lun1-C4aYk4Dlv0bh
        /vol/luns/san-a-lin03-lun2-C4aYk4Dlv0bp
        /vol/luns/san-a-lin03-lun0-C4aYk4Dlv0bY
        /vol/luns/demolun4-C4aYk4IzJb3b
        /vol/luns/san-a-lin03-fcp-lun0-C4aYk4F7X3ff
 
san-a>

For each of these instances, we can get a variety of performance metrics:

san-a> stats list counters lun
Counters for object name: lun
        read_ops
        write_ops
        other_ops
        read_data
        write_data
        queue_full
        avg_latency
        total_ops
 
san-a>

Given this info, let’s go thru some “progressive metrics extraction”..

san-a> stats show volume
volume:vol0:avg_latency:37.92us
volume:vol0:total_ops:14/s
volume:vol0:read_data:0b/s
volume:vol0:read_latency:0us
volume:vol0:read_ops:0/s
volume:vol0:write_data:263b/s
volume:vol0:write_latency:231.50us
volume:vol0:write_ops:2/s
volume:vol0:other_latency:5.66us
volume:vol0:other_ops:12/s
volume:luns:avg_latency:0us
volume:luns:total_ops:0/s
volume:luns:read_data:0b/s
volume:luns:read_latency:0us
volume:luns:read_ops:0/s
volume:luns:write_data:0b/s
volume:luns:write_latency:0us
volume:luns:write_ops:0/s
volume:luns:other_latency:0us
volume:luns:other_ops:0/s
volume:maska:avg_latency:0us
volume:maska:total_ops:0/s
volume:maska:read_data:0b/s
volume:maska:read_latency:0us
volume:maska:read_ops:0/s
volume:maska:write_data:0b/s
volume:maska:write_latency:0us
volume:maska:write_ops:0/s
volume:maska:other_latency:0us
volume:maska:other_ops:0/s
volume:maskb:avg_latency:0us
volume:maskb:total_ops:0/s
volume:maskb:read_data:0b/s
volume:maskb:read_latency:0us
volume:maskb:read_ops:0/s
volume:maskb:write_data:0b/s
volume:maskb:write_latency:0us
volume:maskb:write_ops:0/s
volume:maskb:other_latency:0us
volume:maskb:other_ops:0/s
volume:maskc:avg_latency:0us
volume:maskc:total_ops:0/s
volume:maskc:read_data:0b/s
volume:maskc:read_latency:0us
volume:maskc:read_ops:0/s
volume:maskc:write_data:0b/s
volume:maskc:write_latency:0us
volume:maskc:write_ops:0/s
volume:maskc:other_latency:0us
volume:maskc:other_ops:0/s
volume:oracledata:avg_latency:0us
volume:oracledata:total_ops:0/s
volume:oracledata:read_data:0b/s
volume:oracledata:read_latency:0us
volume:oracledata:read_ops:0/s
volume:oracledata:write_data:0b/s
volume:oracledata:write_latency:0us
volume:oracledata:write_ops:0/s
volume:oracledata:other_latency:0us
volume:oracledata:other_ops:0/s
volume:oraclebin:avg_latency:0us
volume:oraclebin:total_ops:0/s
volume:oraclebin:read_data:0b/s
volume:oraclebin:read_latency:0us
volume:oraclebin:read_ops:0/s
volume:oraclebin:write_data:0b/s
volume:oraclebin:write_latency:0us
volume:oraclebin:write_ops:0/s
volume:oraclebin:other_latency:0us
volume:oraclebin:other_ops:0/s
volume:oraclelogs:avg_latency:0us
volume:oraclelogs:total_ops:0/s
volume:oraclelogs:read_data:0b/s
volume:oraclelogs:read_latency:0us
volume:oraclelogs:read_ops:0/s
volume:oraclelogs:write_data:0b/s
volume:oraclelogs:write_latency:0us
volume:oraclelogs:write_ops:0/s
volume:oraclelogs:other_latency:0us
volume:oraclelogs:other_ops:0/s
volume:esx_vol1:avg_latency:0us
volume:esx_vol1:total_ops:0/s
volume:esx_vol1:read_data:0b/s
volume:esx_vol1:read_latency:0us
volume:esx_vol1:read_ops:0/s
volume:esx_vol1:write_data:0b/s
volume:esx_vol1:write_latency:0us
volume:esx_vol1:write_ops:0/s
volume:esx_vol1:other_latency:0us
volume:esx_vol1:other_ops:0/s
volume:aggr1_vol1_esx:avg_latency:83.33us
volume:aggr1_vol1_esx:total_ops:3/s
volume:aggr1_vol1_esx:read_data:0b/s
volume:aggr1_vol1_esx:read_latency:0us
volume:aggr1_vol1_esx:read_ops:0/s
volume:aggr1_vol1_esx:write_data:20480b/s
volume:aggr1_vol1_esx:write_latency:83.33us
volume:aggr1_vol1_esx:write_ops:3/s
volume:aggr1_vol1_esx:other_latency:0us
volume:aggr1_vol1_esx:other_ops:0/s
san-a> 
 
 
san-a> stats show volume:aggr1_vol1_esx
volume:aggr1_vol1_esx:avg_latency:0us
volume:aggr1_vol1_esx:total_ops:0/s
volume:aggr1_vol1_esx:read_data:0b/s
volume:aggr1_vol1_esx:read_latency:0us
volume:aggr1_vol1_esx:read_ops:0/s
volume:aggr1_vol1_esx:write_data:0b/s
volume:aggr1_vol1_esx:write_latency:0us
volume:aggr1_vol1_esx:write_ops:0/s
volume:aggr1_vol1_esx:other_latency:0us
volume:aggr1_vol1_esx:other_ops:0/s
san-a> stats show volume:aggr1_vol1_esx:avg_latency
volume:aggr1_vol1_esx:avg_latency:0us

OR

I can extract one particular counter from across all of my volumes

san-a> stats show volume:*:avg_latency
volume:vol0:avg_latency:34.90us
volume:luns:avg_latency:0us
volume:maska:avg_latency:0us
volume:maskb:avg_latency:0us
volume:maskc:avg_latency:0us
volume:oracledata:avg_latency:0us
volume:oraclebin:avg_latency:0us
volume:oraclelogs:avg_latency:0us
volume:esx_vol1:avg_latency:0us
volume:aggr1_vol1_esx:avg_latency:0us

or I can select TWO counters across all volumes

san-a> stats show volume:*:avg_latency volume:*:write_ops
volume:vol0:avg_latency:38.09us
volume:vol0:write_ops:2/s
volume:luns:avg_latency:0us
volume:luns:write_ops:0/s
volume:maska:avg_latency:0us
volume:maska:write_ops:0/s
volume:maskb:avg_latency:0us
volume:maskb:write_ops:0/s
volume:maskc:avg_latency:0us
volume:maskc:write_ops:0/s
volume:oracledata:avg_latency:0us
volume:oracledata:write_ops:0/s
volume:oraclebin:avg_latency:0us
volume:oraclebin:write_ops:0/s
volume:oraclelogs:avg_latency:0us
volume:oraclelogs:write_ops:0/s
volume:esx_vol1:avg_latency:0us
volume:esx_vol1:write_ops:0/s
volume:aggr1_vol1_esx:avg_latency:0us
volume:aggr1_vol1_esx:write_ops:0/s
san-a>

This is ONLY a high level overview of the stats command. The following links will help understand the stats command fully.

Further Reading:

  1. A quick guide to using the stats command
    stats – man page
    stats – who is your Daddy and what does he do?

Make stats part of your daily scripting regimen and you’ll find this a time saver and a swiss army knife.

Speak Your Mind

*