We have been investigating how to mature our Time Series Database architecture and options. Towards this end, I have completed an assessment of a couple of the most popular TSDB options as well as exploring Wavefront. In our team we heavily depend on Open Source, but Wavefront is very interesting since it was recently acquired by VMware. Here’s a quick burndown of the assessment.
Every comparison has some assumptions, here are the major ones that I made during this comparison effort.
Assumptions
1. There are currently a couple options that warranted investigation:
a. Wavefront by VMware (Much of the below does not apply to Wavefront since it is a SaaS offering.)
b. Prometheus
c. InfluxDB
Although popular, OpenTSDB was not investigated since initial research appears to show a general dislike of it compared to InfluxDB and Prometheus.
2. We will be using Telegraf as our agent of choice on remote systems for the collection and transmission of events.
3. All comparisons are under identical load. All graphs are show with both servers receiving the same load via identical queries and ingestion. Load is based on mimicking 900 Telegraf agents that are sending/posting metrics every 7 seconds. This happens via a Telegraf imitator that I wrote in Go.
This whole bit of research has brought up a whole list of questions regarding not just simply the use of time series data, but a more general question of its role in monitoring. Please see https://www.usenix.org/conference/srecon17americas/program/presentation/wilkinson for a very informative talk on the subject.
Review Criteria
Every comparison needs to have a set of pre-defined criteria to base our decisions and testing on. Our list will be: