DPM Performance Impact

Our agent is written by world-renowned performance experts and are designed to cause negligible performance impact on the systems they measure. The agent has adjustable resource limits, and will be killed and restarted by the supervisor if they exceed these.

What language is the agent written in?

Our agent is written in Google’s Go language. It is compiled to native machine code, and is high-performance, memory-efficient, and CPU-efficient. It is comparable to Java in this regard, and in many cases comparable to C and C++, but requires no external dependencies, so no specific libraries or runtime are required on your servers.

How many resources does the agent consume?

Our agents uses minimal resources.

  • Disk impact is essentially zero. Our agent doesn’t use the disk and doesn’t require any parsing of log files or other intrusive operations. It writes log files to /var/log/vividcortex, but otherwise does not store any data locally in memory or on disk, so it will not exhaust system resources.
  • The agent rotates and caps its log files. By default, each plugin will keep 5 logs of 50 MB in size, but this is configurable if needed.
  • The agent uses a trivial amount of CPU. Network protocol decoding can use more CPU on very busy servers. It typically uses 4% of a single CPU on our own (quite busy) servers, but on some servers can use more than that. Note that if you test our agents on a server where you’re running a benchmark like Sysbench, you are typically creating a worst-case – many very small queries – so you can consider this as an upper bound of CPU consumption.
  • The plugins typically use 20-40MB of memory. Please do not be concerned about the virtual memory size of the plugins (VSZ). The resident set size (RSS) is the true memory usage. Please see the Go FAQ entry for more details on this topic.

Bottom line, the agents should essentially be “free.” Unless you run your servers at 100% CPU utilization, they will not deprive anything else of needed resources. There are also measures we can take to reduce resource utilization further, for example by capturing only a fraction of query traffic; please contact us for help with this.

Does the agent impact query performance?

No. See above about resource impact, which is essentially zero. In addition to that, the agent does not intercept or delay queries or data in any way. The agent is not configured as a man-in-the-middle for network traffic or system calls. They are passive observers. You can think of them as similar to a person standing on the side of the road counting the cars without interacting with them.

How much load is added to MySQL?

We don’t do expensive things inside your MySQL server. For example, we don’t poll SHOW FULL PROCESSLIST, which locks the server momentarily while it runs. (Tools that do this can severely impact your MySQL server.) We also don’t use intrusive commands such as SHOW TABLE STATUS or SHOW MUTEX STATUS. Some commands have safeguards; for example, we won’t run SHOW OPEN TABLES if your server’s table cache is large. DPM was designed and built by experts in MySQL performance, so we won’t cause problems on your critical servers.

How much load is added to PostgreSQL?

Similar to MySQL, we merely retrieve data from commonly-available status views such as pg_stat_activity and similar. If running in off-host configuration we measure query activity from pg_stat_statements, provided the extension is available.

How much load is added to Redis?

Virtually none. We only call INFO and CONFIG GET * once a second.

How much load is added to MongoDB?

We don’t run expensive things in your MongoDB server. The agent executes a serverStatus() call and a currentOp() call once a second, with an occasional buildInfo() when a server restart is detected.

Like for other databases, we run explain() on sampled queries to gather additional useful data. To do this, we simply append .explain({"verbosity": "queryPlanner"}) to the sampled query and run it against the server. However, we only run explain() on find() statements, in order to avoid any unintended consequences. In addition, this is the least-verbose setting of explain(), and as we only run them on captured samples, this takes place only one or two times per second. The load from this is negligible.

Index stats are captured by running the listIndexes command on the collection queried.

Does fault detection add load?

No. Our fault detection algorithm is super-efficient and uses metrics that are already captured by the agent in the normal course of its operation. Each fault detection operation is just a few CPU instructions and operates in a few bytes of memory, which are reused. We could run fault detection tens of thousands of times per second and you’d never notice an increase in system load.

Can agents fall behind?

It’s possible for the agent to fall behind in decoding network traffic. If this happens, it harmlessly drops network packets and simply degrades to capture less data. We have spent a great deal of time and effort to ensure that our agent decodes traffic as cheaply as possible and handles partial or corrupted data as well as possible.