Non-relational database notes

From Helpful
Jump to: navigation, search

For other database related things, see:

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Most broadly, NoSQL usually means "a data store that chooses a different specialization from one of the top five RDBMSes".

Some common properties of NoSQL:

  • specializing to something that just doesn't fit a relational model very efficiently without a lot of extra work, like graphs, timeseries, arguably things like fulltext search

  • storing data which may be relational, but usually not queried as such, in particular...
  • distancing from schema'd data models
e.g. for the reason that in an RDMBS you would have to bake in such structure, and in schemaless you can be more flexible
note that schemaless often just means implied app-managed schema, and more than one
...gets you started faster
...only keeps working if your data requirements are simple, and if referential integrity is not that important, and usually you want to migrate data along with the schema
effectively puts any and all validation, and change validation, on the app, rather than the database.
Which is what makes it easier to change your effective schema (in large RDBMSes schema changes would often involve hour-long locks)
but also potentially harder to keep it valid over a longer term - your code needs to either migrate data, or know about many versions
So to some degree it's a "know the rules to know when and how to break them" thing

  • Code(rs) having more flexibility - and more responsibility.
You have to think hard about data modelling, you have to think hard about schema changes, and making sure all client behaviour cooperates.
This is true for any design, yes. Now it's just spread over more time and more parts of your app

  • no transactions (usually)
in part, the last two come from the fact they are damn hard to handle when you care about scaling so much (but not impossible!)
  • no referential integrity (for similar reasons)
  • doing more things without join (often by design)
assumes you do not typically want to resolve all references
often faster when you don't
often slower when you actually did (so if your data inherently fits the relational model better than all others, then an RDBMS is still the best choice)
  • often stores a denormalized form
This is often faster to fetch (nice)
also easily means mean conflicting data, and duplication

  • (often hidden) CAP-style decisions
a big argument in itself -- e.g. the point that various software steps away from guarantees much too easily
Often eventual consistency instead of immediate/strict transactional consistency, particularly when they do replication/sharding.

Some of this argues that a lot of NoSQL is often better at a distributed cache, bit sometimes worse as your primary store.

See also:

On database types


The old standby.

Highly structured, schema'd, with things like optional relational integrity.

Which are features that are important when you want things highly controlled and highly verified, but also fundamentally back the ability to scale.

Relational databases are still best at consistency management, better than most NoSQL. NoSQL typically scales better, though many still have hairy bits (even flaws) in their consistency management.

Key-value stores

You ask for a value for a given key. Typically no structure to the data other than your interpretation after fetching it they lie very close to object stores and blob stores (the last basically file stores without filesystem semantics).

When this fits your data and use, these are often low-bother ways to store a whole bunch of information, often with some corruption recovery.

If you have something in a RDBMS where you are actually mostly retrieving by primary key, and doing few or no joins (which may include things like simple data logging), you could use one of these instead.

Disk-backed or not Various file-based stores (see e.g. File database notes) are effectively disk-based key-value stores.

Since they are often indexed, and cached in some way, you may easily think of them as hashmaps that happen to be large and stored on disk.

Used more than you may think; Berkeley DB is used in many embedded setups.

There has been some revival in this area. For example, Tokyo Cabinet is basically a modern, faster reimplementation of the dbm. (with some extensions, e.g. Tokyo Tyrant to have it be networked, Tokyo Dystopia to add full-text search).

When not disk-backed, they are effectively in-memory caches (e.g. memcached), and sometimes also useful as message broker (e.g. redis).

Document store

key-value where the value is structured data, often not following a strict schema.

It is also frequently possible to index on these fields

Often presented as JSON (or XML, though XML databases can be considered a specific type of document store).

In contrast with e.g. relational data, documents are often altered individually, delivered as-is, and not heavily linked.

Column store

Wide column store



Search engine

While generally considered an index based on a primary data store elsewhere, that also makes that searchable index the thing you actually use.

Plus there are projects that do both.

Storagey stuff - kv, document, and bigtable style

riak notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Key-value store with a focus on concurrency and fault tolerance,

Pluggable backends, e.g. allowing use as just a memcache, or give it persistence.

Eventually consistent (with some strong-consistency experiments?(verify))


Ideally you fix the cluster size ahead of time. When you add nodes, contents are redistributed (verify)

Backends include

  • bitcask
all keys in hashtable in RAM (fast, but limiting the amount of items via available RAM)
file copy = hot backup (verify)
  • leveldb
keys stored on-disk
secondary indexes, so limited limited relational-style querying at decent performance
data compression
no hot backup
  • innostore
  • memory
objects in ram

It aims to distribute perfectly (and supports some other features by assumptions), which implies you have to fix your cluster size ahead of time

Pluggable backends mean you can have it persist (default) or effectively be a distributed memcache


Key-value store, distributed by using Raft consensus.


MongoDB notes


  • Weakly typed, document-oriented store
retrieved documents are maps
values can be lists
values cam be embedded documents (maps)
  • searchable
on its fields, dynamic
- which together specifies a single query operation (e.g. it always sorts before limiting [1])
supportable with indices [2]
field indexes - basic index
compound indexes - indexes a combination, e.g. first looking for a userid, then something per-userid)
multikey indexes - allows matching by one of the values for a field
2d geospatial - 'within radius', basically
text search
indexes can be:
hash index - equality only, rather than the default sorted index (note: doesn't work on multi-key)
partial index - only index documents matching a filter
sparse index - only index documents having that have the field

  • sharding, replication, and combination
replication is like master/slave w/failover, plus when the primary leaves a new primary gets elected. If it comes back it becomes a secondary to the new primary.
  • attach binary blobs
exact handling depends on your driver[3]
note: for storage of files that may be over 16MB, consider GridFS

  • Protocol/format is binary (BSON[4]) (as is the actual storage(verify))
sort of like JSON, but binary, and has some extra things (like a date type)
  • Not the fastest NoSQL variant in a bare-metal sense, but often a good functionality/scalability tradeoff
e.g. for various nontrivial queries
  • no transactions, but there are e.g. atomic update modifiers ("update this bunch of things at once")

CouchDB notes

(not to be confused with couchbase)

Document store with a REST-like interface.

Meant to be compatible with memcachedb, but with persistence.

  • structured documents (schemaless)
can attach binary blobs to documents
  • RESTful HTTP/JSON API (to write, query)
so you could do with little or no middle-end (you'll need some client-side rendering)
  • shards its data
  • eventually consistent
  • ACIDity per doment operation (not larger, so inherently relational data)
no foreign keys, no transactions
  • MapReduce
  • Views
best fit for mapreduce tasks
  • Replication
because it's distributed, it's an eventually consistent thing - you have no guarantee of delivery, update order, or timeliness
which is nice for merging updated made remotely/offline (e.g. useful for mobile things)
and don't use it as a message queue, or other things where you want these guarantees
  • revisions
for acidity and conflict resolution, not in a store-forever way.
An update will conflict if someone did an update based on the same version -- as it should.
  • Couchapps,

document ~= row


  • view group = process
nice way to scale
  • sharding is a bit harder


  • not in views
  • if large, consider CDNs, a simpler nosql key-val store, etc.

See also:


Javascript analogue to CouchDB.

Made in part to allow storage in the browser while offline, and push it to CouchDB later, with minimal translation.

Couchbase notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(previously known as Membase) (not to be confused with CouchDB)

CouchDB-like document store, plus a memcached-compatible interface

Differences to CouchDB include:

  • typing
  • optional immediate consistency for individual operations
  • allows LDAP auth
  • declarative query language
  • stronger consistency design


Column store



Document store with a push mechanism, to allow easier/better real-timeness than continuous polling/querying.


An implementation imitating Google's Bigtable, part of Hadoop family (and built on top of HDFS).

See also:


See also:


Storagey stuff - graph style

This one is mostly about the way you model your data, and the operations you can do, and do with fair efficienct. that you can use e.g. key-value stores in a graph-like ways, and when you don't use the fancier features, the two may functionally be hard to tell apart.


Apache Giraph



Cachey stuff

redis notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


  • A typed key-value store
  • scalable through sharding [5]
  • types/structures include: counter, list, set, sorted set (hash-based), hash, bitarray
that you get a basic set of queries/operations for
  • allows transactions, which lets you do your own atomic updates when necessary
  • pub/sub

It is aimed at structuring well enough that all operations can stay simple and fast, which is roughly why it scales well.

Complex data structures is something that takes some informed designing.

As it's it's not made for finding items by anything other than their id, indexing is also something you would design yourself. [6]

memcached notes

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

memcached is a networked, in-memory key-value cache LRU-style.

Its common use is probably caching data that was complex and/or high-latency to generate, and/or not very volatile.

Clients can talk to sets of servers (a client/protocol feature), which means that many clients can distribute values, and share a distributed cache (without platform-specific IPC stuff).

There is no access control; firewalls should be enough.


  • was made to have nothing that could respond slowly - all in memory, no locks, no complex queries (it's mainly a hashmap), no wildcard queries, no list-all, so basically nothing that blocks
  • was made to keep the most-used data:
it throws away data based on expiration timeouts and
when full, based on LRU (Least Recently Used) logic.
  • can be distributed
in the client, though, and with some practical footnotes

It is not:

  • storage. It's not backed by disk.
  • redundant. You are probably looking for a distributed filesystem if you are expecting that. (you can look at memcachedb and MogileFS, and there are many others)
  • a document store. Keys are limited to 250 characters and values to 1MB. (again: look at distributed filesystems, distributed data stores)
  • a transparent database proxy. You have to do the work of figuring what to cache, how to handle dependencies and invalidations

Originally developed for livejournal (by Danga Interactive) and released under a BSD-style license.

Daemon options

The main command line options:

-d            daemon
-m 2048       take up to 2048MB of memory
-l  bind to this IP 
-p 11211      ...and this port

default when unspecified is 64MB, which may be too conservative, so distros tend to set something larger already.

(On 32-bit machines, you cannot give a single process more than, usually, 3GB or 2GB of memory (see 4GB of memory on a 32-bit machine). You can run multiple daemons, though.)

Some client, server, and interaction details

A single server is mostly just a simple and fast hashmap.

When given multiple servers, clients can choose to distribute objects among them (based on the hash, you may also get control of the hash so that you can ensure related objects are stored on the same server. Note that you'd want to do that consistently on all clients).

The client effectively adds another layer of hashing, in that it chooses the server to store an object on based on its hash and the available servers.

For this reason, for optimum cache hits, all clients should use the same client list, and use the same hashing method. It helps to have each be the same client implementation (also because some may have transparent serialization, that may not be compatible between everything you have).

Client interfaces

There are APIs/clients for most major languages (see [7], [8], [9]), and you can implement your own reading the the protocol.

Exactly what the interface takes and returns varies. It may be dictionaries, persisted objects, bytestrings, etc.

Basic usage notes

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)


  • get: note that you can ask for multiple entries in a single request
  • gets fetches more information, including the the value needed for the cas operation (see below)

Storage commands are:

  • set: creates entry if necessary
  • replace: like set, but only stores if the key already existed
  • add: like set, but only stores if they key was not already present exist
  • append: append to current cached value (only if it already existed)
  • prepend: prepend to current cached value (only if it already existed)
  • incr and decr increment and decrement a 64-bit integer. Entry must already exist (so e.g. set(key,'0') first). Interprets a non-integer value as 0.

  • cas: 'check-and-set': store, but only if no one else has updated it

You should write your interaction to minimize the amount of round trips (i.e. amount of commands)


get_multi: fetch for many keys at once. Avoids latency overhead from doing multiple requests

flush_all: Clears the cache. Useful when developing, since you don't get to list items to clear.

The time is interpreted either as

  • if <2592000 (which is 30 days in seconds): a delta time from the current server time
  • if larger than that: Unix (seconds-since-epoch) time

Undocumented/unofficial/debug features

Designing access models; Tricks

Rules of thumb:

  • Tackle the most obvious cases first. Usage probably follows 90-10 patterns. You can leave this to the LRU-ness of the cache, but in some cases you can avoid a bulk of nonsense that has to be managed from entering the cache.
  • Aside from the obvious networking and management costs, also consider serialization (marshalling) costs.

Things to consider:

  • Your setup may count on touching cache elements, but badly designed setups may mean a lot of touches per page view (or other overall product), that bottleneck your access to memcached (there are a few different ways to reduce touches)
  • It can help to layer your cache a little more. For example, fragments of pages may be constant, and could be cached. Some of this can also be cached in whatever end process you have, to lighten the load on memcached for things that have fairly simple/obvious/static use cases.

  • You may want to use cacti/munin/some other logging/graphing on certain stats while you are developing, to see both long-term patterns, and you may see some some obvious mistakes in, say, relative amounts of gets/sets this way.
  • You can't really control treatment of subsets of elements. That is, you can't say that certain elements should always be removed first. When you are using memcached for small-scale app caching, and not for application scaling, it may be useful to set up multiple daemons, to set up separate treatment per cache. (this does waste memory, but also note that you can easily set limits on the amount of memory to be used for each namespace this way)

(Faking) bulk invalidation

A situation where you don't know the exact set of keys you want to invalidate, but do have a pattern to remove, e.g. a prefix.

This doesn't exist directly, because this means a potentially slow wildcard query, and memcached was designed to only have queries that are always fast.

One way of working around this is to put a version in your key (this gets called namespacing, sounds fancier)

In other words, instead of removing, you're shifting to a new set of keys
the old ones are left in there, and will be pushed out by the LRU logic soon enough
keep in mind that this doesn't alter the store, it's just a different view from the/each client. Other clients will follow only if you either
make the same decision in all clients (can be annoying)
put a "currently applicable version number" in the memcache too (and e.g. have clients fetch it every second or so), so that you can tell other clients to move on to new keys.

Technical notes

See also

Searchy stuff


Built on Lucene, and seems to work out as a somewhat more complete offering than e.g. Solr

Related to Logstash and Kibana (also Beats).

[10] is a log aggregator and processing pipeline, and due to plugins does so in a flexible/wide sense. It's popular around docker/kuberenetes as well, alongside things like fluentd


Message brokers / queues

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

See message broker notes

Time seriesy

Time series databases are often used to show near-realtime graphs of things that happen, while also being archives.

They are aimed at being efficient at range queries,

and often have functionality that helps


Is now part of a larger stack:

InfluxDB[12] - time series database
Telegraf[13] - agent used to ease collecting metrics. Some pluggable input / aggregation/ processing things
Kapacitor[14] - streams/batch processing on the server side
Chronograf[15] - dashboard.
also some interface fto Kapacitor, e.g. for alerts
Often compared to Grafana. Initially simpler than that, but more similar now

Flux[16] refers to a query language used in some places.

InfluxDB can be distributed, and uses distributed consensus to stay synced.

Open-source, though some features (like distribution) are enterprise-only.

Data model notes
This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

In comparison to a more relational view...

  • database is a logical container for
retention policies
time series data
continuous queries

  • retention policy (RP) groups
replication factor (copies kept in the cluster) (autogen's default is 1)
retention - how long to keep the data (min 1h) (autogen's default is infinite)
shard group duration - how much data is stored in shards (min 1h) (autogen's default is 7d)
more practically:
each measurement is implicitly part of the retention policy you put it in
each database can have one or more RPs
you get a default called autogen (defaults mentioned above)
you'll quickly notice them in addressing (testdb.autogen.measurementname) though could ignore everything about them at first

  • measurement are like a table, containing tags, fields and points
there is always an index on time
  • series
basically refers to a (measurement,tag) combination you'd likely use in querying -- see below
  • tags are key-value pairs (column-like) you can add per datapoint that are
part of a series' uniqueness, and is indexed
basically, whatever you need for lookup
limited to 64K (and you probably don't want to get close without good reason)
  • field are key-value pairs (column-like) you can add per datapoint that are
not part of its uniqueness, not indexed
basically, any values you may be looking up
types include float (32-bit), integer (64-bit), boolean, timestamp, or string
a measurement takes the type of the first value it gets (and currently cannot be changed except with some creativity[17]), so e.g. forcing integer (add
) over float is sometimes necessary, e.g. to store large values without losing precision
strings possibly not limited to 64K? (I've seen conflicting information)
but you probably don't want to use influxdb as a blob store if you want it to stay efficient
you can check current type with show field keys
  • (data) points

  • time precision is another detail

Typical use of measurements, series, tags

Say you want to start keeping track of CPU use and are collecting it for various datacenters (various tutorials use an example like this).

You might have a

  • database for admin reasons
  • retention policy mostly because you want monitoring stuff deleted after a year without thinking about it
  • measurement called host_monitor

and want to enter a datapoint with

  • tags like hostname=node4,datacenter=berlin,country=de
  • fields like cpu0=88,cpu2=32

You'll notice this is a pile of everything CPU-related. The tags are structured with common uses in mind, often the coarsest and finest things you anticipate querying on - e.g. hostnames would be unique within a datacenter and can be addressed individually (and efficiently because index because tag), but you can also e.g. average per country, or pick out a particular host if needed.

Series are basically the idea that each unique combination of (measurement,all_tags) represents a series.

Data you send in from different places will often imply unique series, through having unique tags, though to some degree they are more of a querying concept, and a storage one only insofar that the indexing helps that.(verify)

On point uniqueness

A point is unique (measurementname,tagset,timestamp), so if you write when a record with that tuple already exists, field values are merged/overwritten.

Depending on the timestamp precision you hand into the ingest url, this

  • may never happen, if it's based on 'now'
  • may never be an issue, if the precision is much higher than the interval you send in
  • may be something you do intentionally

On timestamp precision

Timestamps are currently nanosecond resolution by default. This can be reduced to microsecond, millisecond or second.

Lower-precision timestamps lead to

  • overwrites data with the same timestamp (see previous point)


  • Is that per database, series, datapoint at insertion time?
  • Does it mix precision if you alter precision over time?


/query queries, management /write ingest, takes line protocol

/ping health and version

/debug/pprof Generate profiles for troubleshooting /debug/requests Track HTTP client requests to the /write and /query endpoints /debug/vars Collect internal InfluxDB statistics

The line protocol[18] is a one-liner text presentation that looks like

measurement,tag_set field_set timestamp


tag_set and key_set are comma-separated key=val pairs
timestamp is nanosecond-precision Unix time
(also optional; defaults to local timestamp, UTC, but be aware of
clock drift (so you likely want NTP)
timezones (so have a serious think about using either client time or server time))

Clients may ease conversion of structured data to line protocol.

On shards and shard groups

Under the hood, data is stored in shards, shards are grouped in shard groups, shard groups are part of retention policies.

This is under-the-hood stuff you don't really need to know, though it may be useful to consider in that shard groups are related to

  • the granularity with which old data is removed because of retention policy (it's dropped in units of shard groups - so never immediately)
  • efficiency of the typical query, e.g. when most queries deal with just the last one and the rest is rarely touched, essentially archived and rarely or never touched by IO
Config notes
Security notes
Querying notes
Management notes

Chronograf notes




See Data_logging_and_graphing#Graphite_notes


Openstack projects related to storage:

  • SWIFT - Object Store. Distributed, eventually consistent
  • CINDER - Block Storage
  • MANILA - Shared Filesystems
  • KARBOR - Application Data Protection as a Service
  • FREEZER - Backup, Restore, and Disaster Recovery

See also