Alerts

Mission critical events can happen in your environment while you are away. Our alerting feature allows you to route those alerts to various destinations.

To view the Alerts page, click on the “Settings” icon in the left-hand navigation, and then click “Alerts” under Environment Settings:

Alerts Navigation

We support two kinds of alerts: Event Alerts, which are generated when an event happens in your account, and Threshold Alerts, which are generated when a metric exceeds a designated threshold.

Event Alerts

Event Alerts are based on events that happen in your account. To help ensure that you only receive events of high importance, we provide filters. These filters are by:

  • Type, that tells what the event is all about; e.g., “High Swap Activity”, “Disk Device Full”, etc. Event type is available as a filter in the Events Dashboard; check here for more information and a partial list of types.
  • Host name, refer to the Inventory Page for a list of possible host names.
  • Host tags, used to define groups of hosts. (See also how to filter by tag in the UI.)

The default settings will match on any event in your environment. If you find that these settings generate too many alerts, try applying filters. Filters are combined with an AND clause. For example:

if the host is “any” AND the category is “Host stopped sending data”

would filter out all but “Host stopped sending data” events from any host in your environment.

Threshold Alerts

Threshold Alerts are based on metrics that exceed a designated threshold. The alert is generated when the metric crosses the threshold trigger value and has maintained that value for a defined duration of time. Once that duration of time has passed, the alert activates and a “Threshold Alert” event of a defined level generates. No further alerts will generate while the original alert remains in an active state. The active state will be cleared once the metric meets the reset value; from then on, new alerts can be generated. For a list of the metric categories, see here.

For clients using VictorOps or PagerDuty, when a threshold alert resets (i.e. the monitored metric returns to its reset value) we send a resolution event to close the open incident.

For example consider a threshold alert with the following configuration:

Example Threshold Alert

Let’s say os.cpu.loadavg spikes to a value of 7 and stays at that value for 5 seconds. This will generate a “Threshold Alert” event with level “critical”. Now consider that the value of os.cpu.loadavg drops to 3, and then rises back to 7 for another 5 seconds. No new alert will be generated since there is already an active alert.

Now consider that os.cpu.loadavg drops down to 1, then goes back up to 7 for another 5 seconds. A new “Threshold Alert” event will be generated at level “info” when the alert resolves, followed by a second “Threshold Alert” at level “critical” noting that the alert has triggered again. A new alert is generated since the metric went below the reset value the active state was cleared.

Notification Configurations

Once you’ve configured your alert trigger, you’ll need to specify how often it can be triggered, and where notifications should go. The “Notify me” setting will allow you to specify a rate-limit on how often an alert can be triggered, on a per-host basis. In the example below, the configured alert would fire at most once every 5 minutes, so unexpected behavior that would otherwise cause a flurry of alert activity is suppressed.

Example Notification Configuration

When specifying integrations to send notifications to, note that multiple integrations can be added as destinations for each alert, so multiple groups can be notified by a single alert triggering in VividCortex. There is no limit for how many integrations may be specified as a destination, but at least one must be specified for an alert to be enabled.

Common Alert Scenarios

You may wish to set up an alert to notify you in the event of the following situations. To do so, simply set up the corresponding alert:

Scenario VividCortex Event Type
The database is down or unreachable. Database Connection Error
The database is no longer sending metrics. Host Stop Sending Data
Replication to a secondary has stopped. Replication Stopped
Replication to a secondary has started. Replication Started
The MySQL database has reached its maximum connections. Max MySQL Connections Reached
Replication on a secondary is delayed. See Monitoring Replication here