Query Samples

Our software sees (but does not transmit) every unencrypted query that enters your database server. We obfuscate all queries we observe by normalizing them into digests (the “query shape” for MongoDB), stripping out variable portions of the query including literal values that may contain Personally Identifiable Information (PII) and other sensitive data. This obfuscation takes place locally and not within the API, and we transmit the digested query to the application.

For the most part, we simply generate metrics about these digested queries. But to make this data much more useful, we probabilistically capture individual raw samples of queries, including literal values, so you can examine them. When a raw query sample is available for examination, we display a small circle on the query details sparkplot, one for each sample. Here’s a screenshot:

Top Queries

VividCortex also stores the checksum (first 64 bits of MD5) for each query, which is used throughout the application to assist with analysis (such as through metric names which include the query checksum).

This table illustrates how a query is digested and checksummed:

Item Example Usage
raw query SELECT * FROM db1.tbl1 WHERE SSN = '123-45-6789' Sampled on a periodic basis and displayed in the UI to facilitate inspection of queries.
digested query select * from db?.tbl? where ssn = ? Transmitted to the VividCortex API the first time the software see it. Used for aggregate metrics.
checksum 51c37c16602e541c Used as a key for metrics about queries that share the same digest (or query shape), e.g. host.queries.51c37c16602e541c.latency_us

Because we are capturing raw query samples and sending them to our application for your analysis, there is the potential for Personally Identifiable Information or other sensitive data to be transmitted to VividCortex. The options below allow you to limit or completely eliminate the transmission of query samples.

While we treat this data with the utmost care, you should be familiar with the ways we secure your private data from end to end.

Configuring Capture of Query Samples

In order to prevent the transmission of PII or other sensitive data to VividCortex, you can make use of a number of specifc settings (also called configuration flags) to customize your configuration according to your needs.

Disable all sampling

Sampling can be completely disabled.

Disabling sampling entirely prevents the agent from taking any query samples: no query text nor any information related to individual samples (such as connection ID, query latency, and EXPLAIN plans) will be captured.

Flag: disable-sampling : true

During installation, this option is “None” when asked “Configure capture of query samples.”

Disable query text

Samples can be captured, but the raw query text itself is discarded.

This option allows us to collect query metadata for individual queries, such as connection ID, query latency, and EXPLAIN plans, but does not transmit raw query samples (such as the “raw query” example in the table above). It is helpful to have samples even if the raw query is deleted because many of the sample details are stil useful. When using this option, we also take steps to remove PII which may be in the EXPLAIN plan. For MySQL, we do not capture the JSON representation of the EXPLAIN plan in order to avoid capturing the query text. For PostgreSQL, we filter out the Filter, Hash Cond, Index Cond, Merge Cond, and Recheck Cond fields.

Flag: disable-sampling-text : true

During installation, this option is “Meta” when asked “Configure capture of query samples.”

Disable explain plans

The execution of EXPLAIN plains can be fully disabled

If it is believed that the EXPLAIN plan might still contain PII, EXPLAIN functionality can be disabled entirely. This is most often used in conjunction with Disable query text(above), when it is valuable to capture query metadata but the EXPLAIN plan, like the raw query itself, may contain PII or other sensitive data.

Flag: enable-explains : false

Safelist/Blocklist queries

Queries can be safelisted or blocklisted by query text or regular expression.

This option allow us to: * collect all queries except for those matching a blocklist or * discard all queries except those matching a safelist.

For example, you can disable query samples for all queries using the above option, and then allow the capture of a limited set of queries that you know are safe.

Flags:

query-blacklist-pattern : This is a regular expression describing the queries that must not be sampled. These queries will be treated as if disable-sampling was enabled.

query-whitelist-pattern : This is a regular expression describing the queries that may be sampled. Use this in conjunction with disable-sampling to disable all samples except for those that match this regular expression. Alternatively, you may instead provide a comma-separated list of query checksums (found on the query details page) which are safe to query.

(Note: The syntax available for use in a regular expression can be found here).

Enabling settings

You can enable the above settings on a per-host basis using a VividCortex configuration file containing the option(s) you wish to enable. Alternatively, you can also contact our Customer Support team to have these options enabled for all hosts in an environment.

Add the option to the metrics plugin configuration file; they are named based on database technology. As an example, if you wished to disable the capture of query text for PostgreSQL, you would edit the file /etc/vividcortex/vc-pgsql-metrics.conf to include a line with "disable-sampling-text":"true":

{
(...)
"disable-sampling-text":"true"
}

You will need to create this file in /etc/vividcortex/ if it does not exist.

Please contact VividCortex Customer Support if you have any questions or concerns about query samples or the available options to configure this functionality.