Metric Categories

A metric is some value that a resource exposes. An observation is a sample of a metric’s value at a point in time from a given resource. For example, a network interface has a bytes_received metric, and if we sample its value at 12:00 PM we have one observation. A metric has a dot-separated name, which defines it within a namespace hierarchy or tree. Metric names can only include letters, numbers, underscores, and dots.

Metrics at VividCortex take one of two forms: what we refer to as a gauge or as a derivative.

  • A gauge represents a scalar value, recorded at a specific time, such as the current value of the motherboard temperature or the number of users logged in. A VividCortex example would be mysql.innodb.pending_log_writes.

  • Some values are accumulated values, such as the number of network bytes received on an interface. We transform these into derivatives by subtracting the previously seen value from each new sample and dividing by the time between samples. A VividCortex example would be os.cpu.idle_us. These numbers often represent throughput or time, and (as in the below table) many derivative metrics are suffixed with tput or _us.

We use some standard naming suffixes for common types of metrics:

Concept Suffix
Throughput or arrivals per second tput
Read or write operations reads / writes
Timing *_us (microseconds)
Timing *_s (seconds)
Utilization util (integer from 1 to 100)
Count count

These are the default highest level metric namespaces and a brief description of each:

Name Description
agents Internal diagnostic metrics of agent behavior.
aws Metrics retrieved from Amazon CloudWatch.
host Database-agnostic data, such as user/connection/client statistics and query metrics.
mongo Metrics from MongoDB sources, such as serverStatus.
mysql Metrics from MySQL sources such as PERFORMANCE_SCHEMA.
os Disk, memory, CPU, ps, and networking information.
pgsql Metrics from PostgreSQL sources, such as pg_stat_statements.
redis Metrics from Redis sources such as INFO ALL.

Below is more information about each of the above metric categories. The complete list of metrics is extensive and in many cases come directly from the output of built in database commands (such as SHOW PROCESSLIST) and we have linked to the relevant database-specific documentation where applicable. You can use the categories below to search for metrics on the Metrics page of VividCortex.


agents Metric Information

Metrics in the agents-family of metrics contain diagnostic information about the

VividCortex agent. Each plugin records different metrics about itself, and the metrics are organized by plugin name; for example, agents.vc_mysql_metrics.*.

aws Metric Information

Metrics in the aws.*-family of metrics come from CloudWatch, Amazon’s monitoring service, for MySQL RDS and PostgreSQL RDS. To enable CloudWatch for your Amazon instances, follow the instructions here.

We collect the count, sum, min, and max values as provided directly from Amazon.

Category Description
aws.aws_rds We collect the metrics on this page. The metric names are lowercased, with words separated by underscores, to match our standard. For example, the AWS metric BinLogDiskUsage becomes aws.bin_log_disk_usage.*


host Metric Information

Children under host, and their descriptions, are as follows:

Category Description
host.auth VividCortex database login attempt metrics: failure, blocked, and other
host.callers Metrics related to queries, grouped by connected client. IP address bytes are underscore separated as the dot is a reserved character in metric names. Comes from the protocol decoder.

This data is used by the Profiler to allow ranking of ‘Hosts.’

These metrics are only available with the On-Host configuration.
host.connections Number of connections throughput and connection time throughput.

These metrics are only available with the On-Host configuration.
host.dbs Metrics by database, such as affected_rows, data_length, index_total, row_count, total_length, etc.

This data is used by the Profiler to allow ranking of ‘Databases.’
host.queries Query data, grouped by q, p, e, or c (for query/prepare/execute/close) and query digest. Each metric is suffixed with .tput (except time_us and tput itself) as they are per-second derivatve values. This information comes from the protocol decoder, in addition to PERFORMANCE_SCHEMA and PG_STAT_STATEMENTS.

This data is used by the Profiler to allow ranking of ‘Queries.’

host.queries.tagged.*, specifically, is used by the profiler to allow ranking of ‘Query Tags.’ Query tags are only available with the On-Host configuration. More information about query tags is available in our Query Tags Documentation.
host.samples Contains the count of failed_rules per query digest. You can read more about the rules we apply to query samples here.
host.status Total bytes_sent and bytes_received.
host.tables Metrics by table: data_length, index_length, total_length, data_free, and row_count. Comes from INFORMATION_SCHEMA in MySQL and pg_statio_user_tables view in PostgreSQL.

This data is used by the profiler to allow ranking of ‘Tables.’

host.tables metrics are disabled by default, as the capture of these metrics can be expensive.
host.totals Total metrics for all queries combined, for an entire host, regardless of database type. host.totals.queries.* includes totals for all query metrics, including affected_rows, errors, latency, rows_examined, etc.
host.users Metrics by user accross databases and tables: affected_rows, errors.<code>, errors.no_good_index, no_index, slow, time_us, and tput. Each of these are suffixed with .tput (except time_us and tput itself) as they are per-second derivative values. Comes from the protocol decoder.

This data is used by the profiler to allow ranking of ‘Users.’

These metrics are only available with the On-Host configuration.
host.verbs Metrics by query verb (ALTER, SELECT, etc.), such as affected_rows, no_index, rows_examined, slow, etc.

This data is used by the Profiler to allow ranking of ‘Query Verbs.’


mongo Metric Information

Metrics in the mongo.* family of metrics are captured by the vc-mongo-metrics plugin.

We capture nearly all of the metrics provided by the MongoDB command serverStatus. Additionally, we capture a number of metrics from the connPoolStats command as well.

VividCortex metric names follow the MongoDB hierarchy, normalized to our metric style of lowercase with underscores.

Category Description
mongo.connpool Information about the open outgoing connections from the current database instance.
mongo.status.asserts Assertions raised since the MongoDB process started.
mongo.status.background_flushing mongod process’s periodic writes to disk.
mongo.status.connections Status of the connections.
mongo.status.dur mongod instance’s journaling-related operations and performance.
mongo.status.extra_info Additional information regarding the underlying system.
mongo.status.global_lock Reports on the database’s lock state.
mongo.status.locks For each lock <type>, data on lock <modes>.
mongo.status.mem System architecture of the mongod and current memory use.
mongo.status.metrics Various statistics that reflect the current use and state of a running mongod instance.
mongo.status.network MongoDB’s network use.
mongo.status.opcounters Operations by type since the mongod instance last started.
mongo.status.opcounters_repl Database replication operations by type since the mongod instance last started.
mongo.status.wired_tiger Metrics about the Wired Tiger storage engine.


mysql Metric Information

Metrics recorded in the mysql.* family of metrics are captured by the vc-mysql-metrics plugin. For information on what settings and permissions are required to capture the metrics listed, please see here.

Children under mysql, and their descriptions, are as follows:

Category Description
mysql.innodb InnoDB engine information, from selected portions of the return from SHOW ENGINE INNODB STATUS. For more information on this statement, see here.
mysql.mutex.innodb Information regarding InnoDB mutex wait times. This data comes from the MySQL Performance Schema, and is available for versions 5.7 and up. For more information about InnoDB mutex wait instruments, which must be enabled to capture this data in VividCortex, see here.
This data is used by the Profiler to allow ranking of InnoDB Mutexes.
mysql.processlist Process information from performance_schema.threads, INFORMATION_SCHEMA.processlist, or SHOW PROCESSLIST, in that order, depending on which we are able to query. More information about each child under mysql.processlist, such as callers, command, and query, can be found here
This data is used by the Profiler to allow ranking of ‘MySQL Processlist’ metrics.
mysql.status Metrics built from MySQL’s server status variables, retrieved from SHOW GLOBAL STATUS. More information about each variable is here.
We keep the same variable name; for example, MySQL’s aborted_connects system variable becomes mysql.status.aborted_connects. The specific metric mysql.status.replication_delay.us is provided by SHOW SLAVE STATUS.
mysql.tables Metrics about the non-temporary tables that are open in the table cache, provided by SHOW OPEN TABLES. For more information about this command, see here.


os Metric Information

To obtain the information contained within the os.* category of metrics, the vc-os-metrics plugin inspects the contents of the /proc virutal filesystem. The agent does not execute any commands, such as ps. For more information about /proc, see here.

Note that when installed using the Off-Host configuration, these metrics will be for the host where the agent is installed, not the database itself. For system metrics when monitoring RDS or Aurora, use the CloudWatch metrics.

Children under os, and their descriptions, are as follows:

Metric Description
os.cpu.* CPU statistics retrieved from /proc/stat. Metric names generally correspond to the meanings of the columns contained within /proc/stat suffixed with _us as they are microsecond time values (see below). More information about /proc/stat and the data it contains is available here. If you would like to trigger an alert on CPU utilization, see the table here.
os.cpu.user_us Time spent in user mode
os.cpu.nice_us Time spent in user mode with low priority
os.cpu.system_us Time spent in system mode
os.cpu.idle_us Time spent in the idle task
os.cpu.io_wait_us Time waiting for I/O to complete
os.cpu.irq_us Time servicing interrupts
os.cpu.softirq_us Time servicing softirqs
os.cpu.steal_us Stolen time, which is the time spent in other operating systems when running in a virtualized environment
os.cpu.guest_us Time spent running a virtual CPU for guest operating systems under the control of the Linux kernel
os.cpu.guest_nice_us Time spent running a niced guest (virtual CPU for guest operating systems under the control of the Linux kernel)
os.cpu.processes Number of forks since boot
os.cpu.intr Count of interrupts serviced since boot time
os.cpu.ctxt The number of context switches that the system underwent
os.cpu.util_pml VividCortex-computed processor utilization (out of 1000%)
os.cpu.cores.count Number of cores
os.cpu.freq_mhz Speed of the processor; retrieved from /proc/cpuinfo
os.cpu.procs_blocked Number of processes blocked waiting for I/O to complete
os.cpu.procs_running Number of processes in runnable state
os.cpu.loadavg Retrieved from /proc/loadavg
os.disk.* Disk statistics retrieved from /proc/diskstats. There are metrics for each device individually (<volume_name>, below) as well as aggregate statistics.
os.disk.<volume_name>.total_io_us The total number of microseconds spent doing IO.
os.disk.<volume_name>.weighted_io_us Weighted number of microseconds spent doing IO.
os.disk.<volume_name>.ios_in_progress Number of operations currently in progress.
os.disk.<volume_name>.avg_ios_in_progress Weighted IO time / total IO time
os.disk.<volume_name>.tput This is the total number of reads and writes, combined.
os.disk.<volume_name>.read_sectors This is the total number of sectors read successfully.
os.disk.<volume_name>.write_sectors This is the total number of sectors written successfully.
os.disk.<volume_name>.read_us This is the total number of microseconds spent by all reads.
os.disk.<volume_name>.write_us This is the total number of microseconds spent by all writes.
os.disk.<volume_name>.read_tput This is the total number of reads completed successfully.
os.disk.<volume_name>.write_tput This is the total number of writes completed successfully.
os.disk.<volume_name>.read_merges Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued) as only one I/O. This field lets you know how often this was done.
os.disk.<volume_name>.write_merges Reads and writes which are adjacent to each other may be merged for efficiency. Thus two 4K reads may become one 8K read before it is ultimately handed to the disk, and so it will be counted (and queued) as only one I/O. This field lets you know how often this was done.
os.mem.* Memory statistics retrieved from /proc/meminfo and /proc/vmstat. All metrics coming from /proc/meminfo begin with bytes_. Metrics corresponding to bytes paged in/out or swapped in/out begin with pages_.
os.mem.bytes_active The total amount of buffer or page cache memory, in bytes, that is in active use. This is memory that has been recently used and is usually not reclaimed for other purposes.
os.mem.bytes_active_anon Anonymous memory, in bytes, that has been used more recently and usually not swapped out.
os.mem.bytes_active_file Pagecache memory, in bytes, that has been used more recently and usually not reclaimed until needed.
os.mem.bytes_anonpages Size, in bytes, of non-file backed pages mapped into userspace page tables.
os.mem.bytes_buffers The amount of physical RAM, in bytes, used for file buffers.
os.mem.bytes_cached The amount of physical RAM, in bytes, used as cache memory.
os.mem.bytes_commited_as The total amount of memory, in bytes, estimated to complete the workload. This value represents the worst case scenario value, and also includes swap memory.
os.mem.bytes_commitlimit Based on the overcommit ratio (vm.overcommit_ratio), this is the total amount of memory, in bytes, currently available to be allocated on the system.
os.mem.bytes_dirty The total amount of memory, in bytes, waiting to be written back to the disk.
os.mem.bytes_free Total free memory, in bytes.
os.mem.bytes_inactive The total amount of buffer or page cache memory, in bytes, that are free and available. This is memory that has not been recently used and can be reclaimed for other purposes.
os.mem.bytes_inactive_anon Anonymous memory, in bytes, that has not been used recently and can be swapped out.
os.mem.bytes_inactive_file Pagecache memory, in bytes, that can be reclaimed without huge performance impact.
os.mem.bytes_kernelstack The memory, in bytes, the kernel stack uses. This is not reclaimable.
os.mem.bytes_mapped The total amount of memory, in bytes, which have been used to map devices, files, or libraries using the mmap command.
os.mem.bytes_mlocked Size, in bytes, of pages locked to memory using the mlock() system call. Mlocked pages are also Unevictable.
os.mem.bytes_pagetables Amount of memory, in bytes, dedicated to the lowest level of page tables. This can increase to a high value if a lot of processes are attached to the same shared memory segment.
os.mem.bytes_shmem Memory, in bytes, allocated as small pages shared memory.
os.mem.bytes_slab The total amount of memory, in bytes, used by the kernel to cache data structures for its own use.
os.mem.bytes_sreclaimable Part of Slab, that might be reclaimed, such as caches.
os.mem.bytes_sunreclaim The part of the Slab that can’t be reclaimed under memory pressure.
os.mem.bytes_total Total usable memory, in bytes.
os.mem.bytes_unevictable Size of unevictable pages, in bytes, that can’t be swapped out for a variety of reasons.
os.mem.bytes_used Total memory currently used, in bytes.
os.mem.bytes_writeback The total amount of memory, in bytes, actively being written back to the disk.
os.mem.compact_stalls Incremented every time a process stalls to run memory compaction so that a huge page is free for use.
os.mem.major_page_faults Number of major page faults.
os.mem.page_faults Number of page faults.
os.mem.pages_free Number of pages of free memory.
os.mem.pages_in Number of pages paged into memory, per second.
os.mem.pages_out Number of pages paged out of memory, per second.
os.mem.percent_available Calculated % of free memory.
os.mem.percent_used Calculated % of used memory.
os.net Networking statistics retrieved from /proc/net/dev. Metrics are the expected column names prefixed with rx_ and tx_ for receive and send, respectively. There are statistics for each kernel-identified network interface individually as well as aggregate statistics. This data is used by the Profiler to allow ranking of ‘Network Socket’ metrics.
os.netstat Networking statistics retrieved from /proc/net/snmp/ (for metric names ip, tcp, and udp) and /proc/net/netstat (for metric names ipext and tcpext).
os.ps Process-specific metrics retrieved by examining /proc/$PID/stat and /proc/$PID/io for each process.
This data is used by the Profiler to allow ranking by ‘Process.’


pgsql Metric Information

Metrics recorded in the pgsql.* category of metrics are captured by the vc-pgsql-metrics plugin.

Category Description
pgsql.locks Metrics for each PostgreSQL locktype, grouped by held or awaited. This data comes from the pg_locks table. For more information, see here.
pgsql.processlist.state count and time_us for each PostgreSQL state. This data comes from pg_stat_activity. For more information, see here.
pgsql.processlist.users count and time_us by user. This data comes from pg_stat_activity. For more information, see here.
pgsql.processlist.query count and time_us by query ID. This data comes from pg_stat_activity. For more information, see here.
pgsql.status Data from the pg_stat_database view. Metric names are the column names documented here; for example, pgsql.status.blks_read.

Also includes data from the pg_stat_bgwriter view, which is one row about the background writer process. Metric names are the column names documented here; for example, pgsql.status.buffers_clean.
pgsql.totals count and time_us totals for state, users, and query.


redis Metric Information

Metrics recorded in the redis.* family of metrics are captured by vc-redis-metrics plugin by using INFO ALL. We record most of the metrics from the clients, memory, persistence, stats, replication, cpu, and cluster output sections. For more information, see here.

Metric Description
redis.status.aof_delayed_fsync Delayed fsync counter
redis.status.aof_enabled Flag indicating AOF logging is activated
redis.status.aof_pending_bio_fsync Number of fsync pending jobs in background I/O queue
redis.status.aof_pending_rewrite Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete.
redis.status.aof_rewrite_in_progress Flag indicating a AOF rewrite operation is on-going
redis.status.aof_rewrite_scheduled Flag indicating an AOF rewrite operation will be scheduled once the on-going RDB save is complete.
redis.status.blocked_clients Number of clients pending on a blocking call (BLPOP, BRPOP, BRPOPLPUSH)
redis.status.client_biggest_input_buf biggest input buffer among current client connections
redis.status.client_longest_output_list longest output list among current client connections
redis.status.cluster_enabled Indicate Redis cluster is enabled
redis.status.connected_clients Number of client connections (excluding connections from replicas)
redis.status.connected_slaves Number of connected replicas
redis.status.evicted_keys Number of evicted keys due to maxmemory limit
redis.status.expired_keys Total number of key expiration events
redis.status.instantaneous_ops_per_sec Number of commands processed per second
redis.status.keyspace_hits Number of successful lookup of keys in the main dictionary
redis.status.keyspace_misses Number of failed lookup of keys in the main dictionary
redis.status.latest_fork_usec Duration of the latest fork operation in microseconds
redis.status.loading_loaded_perc Same value expressed as a percentage
redis.status.loading_start_time Epoch-based timestamp of the start of the load operation
redis.status.loading Flag indicating if the load of a dump file is on-going
redis.status.mem_fragmentation_ratio Ratio between used_memory_rss and used_memory
redis.status.pubsub_channels Global number of pub/sub channels with client subscriptions
redis.status.pubsub_patterns Global number of pub/sub pattern with client subscriptions
redis.status.rdb_bgsave_in_progress Flag indicating a RDB save is on-going
redis.status.rdb_changes_since_last_save Number of changes since the last dump
redis.status.rejected_connections Number of connections rejected because ofmaxclients limit
redis.status.repl_backlog_active Flag indicating replication backlog is active
redis.status.total_commands_processed Total number of commands processed by the server
redis.status.total_connections_received Total number of connections accepted by the server
redis.status.uptime_in_seconds Number of seconds since Redis server start
redis.status.used_cpu_sys_children System CPU consumed by the background processes
redis.status.used_cpu_sys System CPU consumed by the Redis server
redis.status.used_cpu_user_children User CPU consumed by the background processes
redis.status.used_cpu_user User CPU consumed by the Redis server
redis.status.used_memory_lua Number of bytes used by the Lua engine
redis.status.used_memory_peak Peak memory consumed by Redis (in bytes)
redis.status.used_memory_rss Number of bytes that Redis allocated as seen by the operating system.
redis.status.used_memory Total number of bytes allocated by Redis using its allocator.