Influxdb notes
See also Varied_databases#InfluxDB for context
Data model notes
In comparison to a more relational view...
- database is a logical container for
- users
- retention policies
- time series data
- continuous queries
- retention policy (RP) contain:
- replication factor (copies kept in the cluster) (autogen's default is 1)
- retention - how long to keep the data (min 1h) (autogen's default is infinite)
- shard group duration - how much data is stored in shards (min 1h) (autogen's default is 7d)
- measurements - each measurement is implicitly part of the retention policy you put it in
- each database can have one or more RPs
- you get a default called autogen (defaults mentioned above)
- you'll quickly notice them in addressing (testdb.autogen.measurementname) though can ignore everything about them at first
- measurement are like a table, containing tags, fields and points
- series
- basically refers to a (measurement,tag) combination you'd likely use in querying -- see below
- tags are key-value pairs (column-like) you can add per datapoint that are
- part of a series' uniqueness, and is indexed
- basically, whatever you need for lookup
- limited to 64K (and you probably don't want to get close without good reason)
- field are key-value pairs (column-like) you can add per datapoint that are
- not part of its uniqueness, not indexed
- basically, any values you may be looking up
- types include float (32-bit), integer (64-bit), boolean, timestamp, or string
- a measurement takes the type of the first value it gets (and currently cannot be changed except with some creativity[1]), so e.g. forcing integer (add i) over float is sometimes necessary, e.g. to store large values without losing precision
- keep in mind that a field will not take different types over time, even if it might be fine, so being consistent per measurement is a good idea. You'll see errors in the log like field type conflict: input field "ping" on measurement "net" is type integer, already exists as type float
- strings possibly not limited to 64K? (I've seen conflicting information)
- but you probably don't want to use influxdb as a blob store if you want it to stay efficient
- you can check current type with show field keys
- (data) points
- always have a time(verify)
- there is always an index on time(verify)
- time precision is a specific detail you can control
Typical use of measurements, series, tags
Say you want to start keeping track of CPU use and are collecting it for various datacenters (various tutorials use an example like this).
You might have a
- specific database for this purpose (for admin reasons)
- retention policy mostly because you want monitoring stuff deleted after a year without thinking about it
- measurement called host_monitor
and want to enter a datapoint with
- tags like hostname=node4,datacenter=berlin,country=de
- fields like cpu0=88,cpu2=32
You'll notice this is a pile of everything CPU-related.
Tags are usually structured with common uses in mind, often the coarsest and finest things you anticipate querying on - you can e.g. pick out/filter and so summarize per country, or pick out a particular host if needed (and you still need a combination of tags - hostnames are likely only unique within datacenters).
Series are basically the idea that each unique combination of (measurement,all_tags) represents a series.
There is no on-disk data structure of a series per se -- data you send in from different places will often imply unique series, through having unique tags, though to some degree they are more of a querying concept, and a storage one only insofar that the indexing helps that.(verify)
On point uniqueness
A point is unique if it has a distinct (measurementname,tagset,timestamp) combination, so if you write when a record with that tuple already exists, field values are merged/overwritten.
Depending on the timestamp precision you hand into the ingest url, such overwriting...
- may never happen, if it's based on 'now'
- may happen but never be an issue, if the precision is much higher than the interval you send in
- may be something you do intentionally
On timestamp precision
Timestamps are currently nanosecond resolution by default.
This can be reduced to microsecond, millisecond or second.
Lower-precision timestamps lead to
- a little more on-disk compression[2]
- overwrites data with the same timestamp (see previous point)
FIGUREOUT:
- Is that per database, series, datapoint at insertion time?
- Does it mix precision if you alter precision over time?
API
- /query queries, management
- /write ingest, takes line protocol
- /ping health and version
- /debug/pprof Generate profiles for troubleshooting
- /debug/requests Track HTTP client requests to the /write and /query endpoints
- /debug/vars Collect internal InfluxDB statistics
The line protocol[3] is a one-liner text presentation that looks like
measurement,tag_set field_set timestamp
where
- tag_set and key_set are comma-separated key=val pairs
- timestamp is nanosecond-precision Unix time
- (also optional; defaults to local timestamp, UTC, but be aware of
- clock drift (so you likely want NTP)
- timezones (so have a serious think about using either client time or server time))
- (also optional; defaults to local timestamp, UTC, but be aware of
Clients may ease conversion of structured data to line protocol.
On shards and shard groups
Under the hood, data is stored in shards, shards are grouped in shard groups, shard groups are part of retention policies.
This is under-the-hood stuff you don't really need to know, though it may be useful to consider in that shard groups are related to
- the granularity with which old data is removed because of retention policy (it's dropped in units of shard groups - so never immediately)
- efficiency of the typical query, e.g. when most queries deal with just the last one and the rest is rarely touched, essentially archived and rarely or never touched by IO
Config notes
Getting influx to log is great for debugging, but is both very verbose and (unless all its clients POST instead of GET) puts possibly-sensitive information in system logs, in which case you probably want to set log-enabled = false in [http] in influxdb.conf
Note that if you put a HTTP server/proxy in front, logging same may apply there as well.
If apache, consider at something like
SetEnvIf Request_URI "/submit" dontlog CustomLog /var/log/apache2/access.log combined env=!dontlog
Security notes
Host security
Because it's designed to be clustered, it serves on all interfaces by default (and names should be resolvable).
On a single-node installation you could choose a localhost-only via bind-address in the [http] section, which you'd want as
bind-address = "localhost:8086" # default is ":8086"
(There used to be an admin web interface on port 8083, but this has been removed[[4]]. You now probably want to use Chronograf)
Auth
For similar reasons, by default there is no authentication[5]
- you may wish to firewall things at IP level
- if you want auth, you need to enable security and create users (see below)
- auth happens at HTTP request scope, e.g. for the API and CLI
- certain service endpoints are not authenticated
- can do HTTPS itself [6]
- you can get server certs and client certs - and use self-signed ones if you wish
- note that in a microservice style setup, you may wish to do this on the edge / ingest sides instead
User authentication
Enable: set auth-enabled=true in the [http] section and restart
You can
- use basic auth
- hand in username and password in the URL or body
- JSON Web Tokens
If nonlocal, it's recommended you use HTTPS, because all of these options are effectively plaintext.
User authorisation
New non-admin users have no rights. They can be given
- READ,
- WRITE, or
- ALL (meaning READ and WRITE)
per database
New admin users have a lot more granularity, like
- CREATE DATABASE, and DROP DATABASE
- DROP SERIES and DROP MEASUREMENT
- CREATE RETENTION POLICY, ALTER RETENTION POLICY, and DROP RETENTION POLICY
- CREATE CONTINUOUS QUERY and DROP CONTINUOUS QUERY
- user management
Querying notes
Query languages
InfluxQL - an SQL-like language [7]
Flux - a more featured language [8]
InfluxQL examples (you may want to run the cli, e.g. influx -precision rfc3339 where that argument is for human-readable time formatting):
USE testdb
A simple query would be
SELECT "eth0_rx", "eth0_tx" FROM "pc_monitor"
would be all data of that series that we have
Queries like
SELECT "cpu_used" FROM "pc_monitor" WHERE time > now() - 15m
but when you want a timeseries you often want to regularize it like:
SELECT mean("cpu_used") FROM "pc_monitor" WHERE time > now() - 15m GROUP BY time(1m) fill(null)
This adds a time interval (1m), what to do with multiple values (aggregate into the mean)
On fill(): GROUP BY time() creates regular intervals(verify), so it has to do something for intervals with no data. Options:
- null: return timestamp with null value (default)
- none: omit entry for time range
- previous: copy value from previous time interval
- linear: linear interpolation
Getting the most recent value
Consider:
SELECT last("cpu_used") FROM "pc_monitor" WHERE time > now() - 1h
Notes:
- last() aggregate is what it sounds like
- you want a time limit, to avoid selecting the entire time series for that to
- you probably want that anyway, when you care to view something current
For a gauge, you may want a recent average, like:
SELECT mean(cpu_used) FROM "pc_monitor" WHERE time > now() - 5s group by time(5s) ORDER BY time desc
Notes:
- because this compares against now, the most recent interval that GROUP creates may not have a value in it yet, meaning you'll get
SELECT LAST(eth0_tx) from pc_monitor
SELECT LAST(field_name), * from test_result GROUP BY *
GROUP BY * effectively separates by series
SELECT LAST(*) from pc_monitor group by *
Practically similar to
SELECT * FROM "pc_monitor" WHERE time > now() - 15s ORDER BY time desc limit 1
Though you may like some averaging, like
SELECT mean(cpu_used) FROM "pc_monitor" WHERE time > now() - 5s group by time(5s) fill(none) ORDER BY time desc limit 1
Note that without the ORDER BY time desc limit 1 you'ld probaly get two time periods (at least until the group time is at least twice the selection time)
Dealing with null
See also:
Management notes
Deleting data
If this is about removing too-old data, the never-think-about-it approach is to set up retention policies.
...but yes, you can do things like:
DROP MEASUREMENT "net"
Notes:
- all data and series from a measurement
DROP SERIES FROM "net" WHERE hostname='myhostname'
Notes:
- drops all series that apply
DELETE FROM "net_monitor" WHERE hostname='myhostname' and time < now() - 1h
Notes:
- the delete granularity is effectively measurements (not tags) (verify)
- this won't delete the series, even if it removes all points
DROP SHARD shardid
Notes:
- you'd probably get the shard id from show shards
https://github.com/influxdata/influxdb/issues/8088
https://community.openhab.org/t/influxdb-clear-old-records/88442/4
Browsing data
Use the CLI, something like something chronograf or grafana.
There used to be an interface [ https://docs.influxdata.com/influxdb/cloud/query-data/execute-queries/data-explorer/ this?]
CLI example:
> SHOW DATABASES _internal foo > use foo Using database monitor > show series
Backup and restore
influxd backup -database name -portable backup.data
Storage size
Because of the compression done to older data, and the often-quite-compressible nature of time series, most monitoring needs don't really need to worry about space use.
This of course does scale with the amount of counters, and the time resolution of insertion.
For example, in a 70 day test with dozens of counters inserting in intervals between 2 to 300 seconds, space sawtoothed (because the compression is staged) up 200MB, from 500ish to 700ish.
Not nothing, and on embedded you probably still want rrdtool, but also nothing to worry about on, say, raspberries or small VPSes.
Chronograf notes
Separate install and binary, so needs to be pointed at a InfluxDB instance
https://docs.influxdata.com/chronograf/v1.8/introduction/installation/