Cube

Time Series Data Collection & Analysis

Cube is a system for collecting timestamped events and deriving metrics. By collecting events rather than metrics, Cube lets you compute aggregate statistics post hoc. It also enables richer analysis, such as quantiles and histograms of arbitrary event sets. Cube is built on MongoDB and available under the Apache License on GitHub.

Collecting Data

An event in Cube is simply a JSON object with a type, time, and arbitrary data. For example, to record an HTTP request sent to a web server, you might emit:

{
  "type": "request",
  "time": "2012-04-23T00:05:19.488Z",
  "data": {
    "path": "/cubism/",
    "duration": 294,
    "status": 200,
    "browser": {
      "os": "Mac",
      "name": "Chrome",
      "version": 20
    }
  }
}

Cube’s collector receives events and saves them to MongoDB. You can send events via UDP, HTTP POST, or WebSockets. Cube has built-in support for receiving events from collectd.

Querying Events

Cube defines a simple language for querying events. For example, you can retrieve the times of the ten most recent request events with the path "/search" like so:

{
  "expression": "request.eq(path, '/search')",
  "limit": 10
}

You can intersect filters and customize which event fields are returned. To inspect the browsers of requests whose duration was more than 250ms but less than 500ms, change the expression:

request(browser).gt(duration, 250).lt(duration, 500)

Cube supports both HTTP GET and WebSockets for retrieving events. WebSockets are particularly useful for streaming events in realtime to many listeners; these listeners can implement realtime dashboards or alerts based on events.

Querying Metrics

You can also use Cube to group events by time, map to derived values, and reduce to aggregate metrics. The language for computing metrics is an extension of the event queries described above. For example, to count requests at five-minute intervals:

{
  "expression": "sum(request)",
  "start": "2012-04-23T00:00:00Z",
  "start": "2012-04-24T00:00:00Z",
  "step": 300000
}

The first few results of which appear as:

{"time": "2012-04-23T00:00:00.000Z", "value": 257}
{"time": "2012-04-23T00:05:00.000Z", "value": 143}
{"time": "2012-04-23T00:10:00.000Z", "value": 223}
{"time": "2012-04-23T00:15:00.000Z", "value": 285}
{"time": "2012-04-23T00:20:00.000Z", "value": 263}
{"time": "2012-04-23T00:25:00.000Z", "value": 203}

Or, to count requests to the path "/search", change the expression:

sum(request.eq(path, "/search"))

To derive a value for events—rather than the default count—specify an expression in parentheses after the event type. For example, to compute the aggregate request load:

sum(request(duration))

Cube supports a variety of reducers and filters, and is readily extensible if you want to add your own. You can even compose arithmetic expressions, combining multiple event types and fields!

Cube automatically caches metrics to capped collections and employs pyramidal aggregation for most metrics, greatly improving performance. For example, if you ask for the number of events in a particular day, Cube can use previously-computed hourly sums without a full event scan.

Want to learn more? See the source and documentation.