Monitoring Replication

Currently, Database Performance Monitor supports monitoring replication for PostgreSQL and MySQL.

DPM can typically monitor replication with no additional configuration, using PostgreSQL’s built-in system administration functions and MySQL’s SHOW SLAVE STATUS. This allows us to reconstruct the replication topology and track replication issues. If you are using a MySQL environment without unique server IDs, see below for additional details.

Setting up an Alert

Using the Alerts page, you can be notified if the replication delay becomes too large.

To do this, set up a threshold alert triggered on one of these two metrics, depending on your database technology:

Database	Metric
PostgreSQL	`pgsql.status.replication_delay_us`
MySQL	`mysql.status.replication_delay_us`

These metrics are measured in microseconds, so a replication delay of “25 seconds” would be entered as “25000000.” Follow the instructions on the Alerts page for more details of setting up alerts, and do not hesitate to contact us with any questions.

Monitoring MySQL Without Unique Server IDs

If your infrastructure has multiple hosts with the same server id, then replication monitoring is still possible using the “heartbeat” approach. Here, DPM uses an auxiliary heartbeat table to reconstruct the replication topology by tracking sample data as it is replicated through the servers. If you wish to use this approach, you must create a vividcortex.heartbeat table on each server you wish to monitor, and you must give the agent user permission to access that table.

Here are the necessary statements to create ‘vividcortex.heartbeat’:

CREATE DATABASE vividcortex;
CREATE TABLE vividcortex.heartbeat (
  server_id int unsigned NOT NULL PRIMARY KEY,
  host_id int unsigned NOT NULL,
  ts double NOT NULL
);

The necessary privileges are as follows:

GRANT SELECT, INSERT, UPDATE ON vividcortex.heartbeat TO vividcortex@'%';

You can also use a database and table name different from vividcortex.heartbeat as long as its structure is the same. Complete configuration options for monitoring replication using a heartbeat table are described here. Add these options to the host’s /etc/vividcortex/global.conf (see our Configuration documentation for more details about configuration files).

Option	Value	Definition
repl-heartbeat	`true`	Enables monitoring replication delay with the heartbeat table.
repl-heartbeat-table	`database.table`	Use a different database and table than the default.
repl-heartbeat-ttl	`5m`	Maximum duration on what is considered a valid lag.

The last option is used as a safeguard against infinitely increasing lag if the heartbeat mechanism is interrupted. This can occur if replication topology has changed and the agent has not been restarted. Once the maximum is exceeded, we report the maximum. Restart the agent on the primary and secondary if this ocurrs. Again, keep in mind that we can typically monitor replication delay using the database’s built-in functionality and that a heartbeat table is not required by default.