Database model types

Database related

More theoretical - thinking about databases:

Broad qualifiers, a.k.a. thinking about what you want from your database ·
Describing database systems
- Data consistency and versioning, and its concepts
- Database model types · Some databases, sorted by those types

Everyday-use notes

Block store

Block stores are access a large series of bytes, in moderately-sized blocks at a time.

It's the most 'do everything yourself' variant of storage.

There are almost no use cases where users or services want a block device directly. Yes, in theory some specific servers might be more efficient doing their own management of their very specific specific data structures on disk - but many of those choose to use large files the same way, presumably because it amounts to much the same, and takes over some of the management work.

May just mean it's exposing a large disk at low level - SAN tends to mean storage exposed this way.

In cloudy context, it tends to mean 'when you connect a disk, we figure out the backing details' (which may be SAN style under the hood).

Object store

Object stores often mean storing arbitrary-sized blobs of data, under arbitrary names.

(A blob store, today, usually refers to object stores (though also, 'Binary Large OBject' still has relation to doing this in databases where it was perhaps less of a fit)).

And yes, you could see this as a lightly specialized key-value, in that

very basic basic operations: mostly store, fetch, and delete, and maybe list

there may not be structure to the names - no meaningful interpretation by the store itself (less so than e.g. filenames)

but this varies

Differences to kv:

object stores may allow large objects (gigabytes or more), kv might limit values to a few kilobytes

roughly so that you use kv for short term, OLTP-ish stuff, and object stores for bulk storage and archiving stuff

object stores seem more likely to have list operation in some way, where kv tries to avoid it

object stores usually let you separately fetch metadata

kv often implicitly suggests you make that part of your data

object stores may have access control per item (and per moderate unit), kv often has none

object stores may avoid transactions (there's typically no need), and be more eventual-consistency, particularly in designs that want to scale well

as may kv.

A lot of these seem because object stores are often aimed at "dump a lot of files in here" (with less or no implied directory-style semantics of a filesystem)

For example, S3 is an object store much more than a kv store.

Filesystems

It's worth pointing out that a filesystem is a database.

Code (usually kernel code) manages all the data structures, it has strict semantics (e.g. POSIX's) that ensure things happen in an atomic, consistent way.

Yet it turns out a filesystem is a relatively poor database.

For a slew of reasons, perhaps summarized by that this bit of convenience specializes in nothing.

Reasons include:

many things are not transactional - meaning you need a lot of error handling in every program

e.g. a treewalk can find itself erroring out on something that it was just now told existed.

(This is where NoSQL wouldn't help much either - they rarely even try to be transactional. Even in relational where you often can get this, you still have to think about it)

some restrictions that are there in theory but hard to keep

say, filesystems tend to have a limitations like '255 characters in a filename' and '32667 characters in a path'.

so what happens when you're a few bytes under the latter limit, and you move a directory into a subdirectory so that directory doesn't break that limit, but files in it now do?

if that move is just "directory parent is now this other directory", this is often not checked - which means it will work until some API calls break

if we were to check this, we would need to do this for a lot of items.

Text encoding is usually... breakable.

In bytes-based unices, which are conventionally UTF-8, it's entirely possible to store invalid byte sequences.

In windows, which are mostly UTF-16, it's entirely possible to get invalid sequences.

I have one file on my windows drive that I haven't been able to delete in years. Explorer cannot do it. the CLI cannot do it. Probably something can, but I'll be damned if I can find even the reason it exists this way.

Languages that want to interface with both linux and windows have to do even more creative things

As such, APIs that want to be portable between filesystems have a task of wrapping different existing APIs creativity, and somehow deal with the fact that the underlying filesystem might well allow incorrect data.

Tree structure can be weird

for example, symlinks and links may mean that what you previously thought was a tree is actually an acyclic graph or even cyclic graph instead.

the last meaning that treewalking is technically infinite, except it will error out on length at some point

and before you think that's a *nix thing, windows has even more types of links

there are some practical cases of unnecessarily bad algorithmic complexity

consider e.g. an ls that goes through items and stat()s each. It is possible that is implemented in a way where each stat is linear time with the amount of directory entries (linked list or similar). That makes a simple directory listing quadratic time with the amount of entries

and trust me, windows is no stranger to being slow as molasses when viewing more than 20K or so things in a single directory.

also part of why distributed stores don't like to have a 'list' operation, or if they do, make it part of local organization they put on you (consider e.g. S3 buckets)

The POSIX interface (and the windows one too) is defined very serially.

Unnecessarily so, but thirty years ago it mostly worked fine, and now we can't change the semantics.

Consider that a treewalk on a POSIX style filesystem is tens to hundreds of thousands of syscalls, that also have to happen in absolutely strict sequence because the semantics require it. Not even the much smaller latency of SSDs will help much here.

which is why distributed filesystems will point out their own, non-standard API - to sidestep these issues

metadata isn't centralized / indexed, so most queries are inefficient

most writes go to two places on disk. On platter that was always bad for performance

Example:

I have one directory structure for a bunch of structured data - 500K files, 19K directories.

When the metadata is not cached in RAM, it takes eight minutes just to walk through and just list their names

That's just stats, directory-opens, getdents, because that is the only way to fetch that structure

Even when it is cached, it still takes three minutes

I have that same directory listing mirrored in a database.

It's milliseconds in total to do most things I care to do.

And a few hundred milliseconds for rare things.

Even a total dump only takes seconds.

Computer data storage - Partitioning and filesystems

Relational

The old standby.

Also great at structure, generally the best consistency managers we have, and allows things like relational integrity that can actually be enforced.

The consistency is not due to their relational nature, in fact that makes things a little harder. It's just that that was one of their design goals, because they used to be the only real choice in serious database.

That same consistency is what fundamentally puts a limit on their ability to scale. (even loosening the consistency management only makes them somewhat faster. It is their overall model that doesn't scale)

NoSQL scales better in part because its guarantees are simpler, and less strong.

NewSQL is an interesting new balance between the two, but comes with footnotes that you really should know.

The below won't go into established relational databases. It is assumed you know how to find information about Postgres, MySQL and such.

...though things like Apache Derby (previously IBM Cloudscape) can be interesting if you want to, basically, embed a small (~3.5MB) RDBMs within the same VM as your Java application. (though it can also be set up client-server style) Or, perhaps more likely, something does this to give you standard SQL OLTP without requiring you to install and set up something more complex.

Key-value stores

You ask for a value for a given key. Typically no structure to the data other than your interpretation after fetching it

...so they lie very close to object stores and blob stores (the last basically file stores without filesystem semantics).

When this fits your data and use, these are often low-bother ways to store a whole bunch of information, often with some corruption recovery.

If you have something in a RDBMS where you are actually mostly retrieving by primary key, and doing few or no joins (which may include things like simple data logging), you could use one of these instead.

Disk-backed or not Various file-based stores (see e.g. File database notes) are effectively disk-based key-value stores.

Since they are often indexed, and cached in some way, you may easily think of them as hashmaps that happen to be large and stored on disk.

Used more than you may think; Berkeley DB is used in many embedded setups.

There has been some revival in this area. For example, Tokyo Cabinet is basically a modern, faster reimplementation of the dbm. (with some extensions, e.g. Tokyo Tyrant to have it be networked, Tokyo Dystopia to add full-text search).

When not disk-backed, they are effectively in-memory caches (e.g. memcached), and sometimes also useful as message broker (e.g. redis).

Distributed

Whenever each value is completely independent from all others, you can spread these to roughly any scale you want, by putting then on any amount of machines and not even telling the others about it/

Once you want things like altering multiple things atomically, though, you get one of three answers:

haha no.
not with acceptable performance
'eventual consistency probably' is good enough, right?

Document store

key-value where the value is structured data, often not following a strict schema.

It is also frequently possible to index on these fields

Often presented as JSON (or XML, though XML databases can be considered a specific type of document store).

In contrast with e.g. relational data, documents are often altered individually, delivered as-is, and not heavily linked.

Distributed

You can often think of these as key-value stores with larger values and a nicer interface to change them.

In that documents are whole things and mostly independent - you get no relations the database knows/cares about, and any indexing mostly just implies a little overhead to your CRUD.

The difference is that it is easier to expect documents to act as rows, and maybe some transactions.

The answer to transactions is still often one of:

haha no
not with any speed
eventual consistency is good enough, right?

though you can often make all alterations to a single document happen at once

and some will make that 'eventual' sooner than others

https://en.wikipedia.org/wiki/Document-oriented_database

Column store

Regular RDBMses store data by rows, which makes sense when rows represent individual things you usually handle one by one.

A column-oriented DMBS basically store columns at a time, which makes sense when you're typically working on columns, e.g. primarily doing aggregations, summaries, statistics, or analycics on what works out as a column at a time (with less focus on data relation and structure).

It also tends not to need indices for these applications.

There aren't really NoSQL column stores I know of, but it's good context for wide column stores.

https://en.wikipedia.org/wiki/Column-oriented_DBMS

Wide column store

✎ This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

A wide column store, sometimes extensible record store, could be thought of a key-value store where you make the key more complex.

For example, if you happen to have tabular data, you might use keys like: (row,col) → data

And yes, for that example you could easily absolutely abuse a plain key-value store, by smushing row and column into a single key.

But wide column stores tend to add semantics, e.g. ways to work on related keys.

You can see this as a data model pasted on top of basic key-value storage, meaning the storage is easily distributed by nature - unless your data model breaks that.

This can be a flexible way of storing a few types of structured data. Consider:

modeling a single matrix-like table like (row,col) -> data

if it's very sparse it may save a lot of space

if it's very large you may find the inherent ability to distribute useful

but if the table is small and you usually need all of it this would just be a very inefficient way of storing it. If the values are very small and/or you usually need a lot of them, it's an particularly inefficient way of accessing it, you have to remember things about that table (like its size) separately, you can't necessarily fetch ranges very easily

modeling EAV-like storage and access (more scalably/naturally than putting it in relational)

(entity, attribute) -> value

...and more.

Real-world wide column stores may easily be made for a few specific cases, which makes it more organized or more efficient for a specific use.

For example, google BigTable works something like (row:string, column:string, time:int64) → string and adds some semantics ().

And consider e.g. Cassandra, which tries to be as table-like and SQL-ish as it can manage.

In particular, various have the idea of column families (making them more structured than e.g. EAV would suggest)

For example, bigtable chooses (data under a row key is ordered, read/write under a row key are atomic (helps related sites be on similar nodes, helping access speeds

columns are grouped into column families

which makes them more structured under the covers (and helps access efficiency)

Bigtable suggests you store and access things with keys like:

(row:str, column:str, time) -> data

For another example, cassandra has partition keys, clustering columns that helps direct how data is stored.

primary key is partition keys + clustering columns?

https://www.youtube.com/watch?v=4D39wJu5Too

From another view, each row still maps to many columns that could be selected. Part of the access efficiency comes from the fact we need only select the columns we are interested in, and this is part of the lookup, not the filtering.

Wide column stores are, by themselves, just extensible record stores, but various implementations add additional properties

Examples: Bigtable (proprietary) HBase Cassandra Cosmos DB (when using its Cassandra API)

Bigtable is a distributed data chunk storage.

While bigtable is built on things only available within google, implementations based on the whitepaper are, like Cassandra and HBase (on top of HDFS)

It aims to meet a moderately specific sets of requirements, which happened to be useful for google search, gmail, and such.

It aims to serving potentially large chunks of only loosely structured data, with low latency where possible/necessary.

It actually is a moderately specific a data/access model, with ordering that allows for a B+-tree-like distribution model across various nodes

The data model is described in the whitepaper as "a sparse multi-dimensional sorted map"

In other words, a map.

With some fancy implementation details.

In terms of uniqueness, the map is from a 3-tuple:

row key (string),

column key (string), and

timestamp (int64),

to a string value

row key (string)

(row,column,timestamp) -> value )

(Note that to Bigtable itself, these values are uninterpreted bytes. It is up to clients to give it a meaning, as necessary, e.g. serializing )

This is not the way you access Bigtable, and most real-world applications of Bigtable are neither quite like a map or like a row-column table. Both of those can be useful assisting mental models, but you are likely to use it quite differently. Note that columns are weak versions of RDBMS-style columns at best.

(Note that you can use the bigtable setup in more than one way, because there is no predefinement or enforcement type of schema, only one implied in use.)

Consider for example the white paper's example is to store, for each URL, the page contents and sites that linked to this one (data you might want want for some particular linkage experiment or so, and would generate from crawl data).

For example, if foo.example.com and bar.example.com have links to www.example.com, you would have stored items probably at least including the following:

('www.example.com', 'anchor:foo.example.com', 0 ) -> 'link name used from foo'
('www.example.com', 'anchor:bar.example.com', 0 ) -> 'link name used from bar'
('www.example.com', 'contents:', 0 ) -> 'whole_html_page'
('foo.example.com', 'contents:', 0 ) -> 'whole_html_page'
('bar.example.com', 'contents:', 0 ) -> 'whole_html_page'

In this case, anchor is the column family you are using to map from link URL to link name.

In comparison to RDMBS, you would have a few options, and only adding an anchors table to the schema is even vaguely sensible (a compounded list in a single column would not scale, and a column per site would be crazy because of the way that is stored)

In comparison to maps, well, you can indeed use a basic a map like the above, creating a key single key from the elements above.

Except, of course, Bigtable rows and columns have meaning, in terms of querying, management, access control, (and also in implementation details, such as that compression happens per column family), and such.

Mutations happen atomically at row level. You can open a row as if it were a record, and add/alter/delete columns from it, then apply (store).

Reading can be done in a few ways, but mostly done by column family/families in a row or in any row(verify), optionally with row/column/timestamp filters(verify).

Hybrid columnar

Timeseries

Graph

A graph in the mathematical sense, of a node-link structure.

Certain types of data are very handy to see this way - sometimes because it is a good fit, sometimes because it is structured but still free-form (so you don't end up with 100 relation tables and a headache of joins, and can be much faster when you typically look for very small parts of that data)

Search engine

While generally considered an index based on a primary data store elsewhere, that also makes that searchable index the thing you actually use.

Plus there are projects that do both.

Message brokers

Not a storage model, or NoSQL, they just turn up in the same places, because they are useful tools when sending messages with more than two fixed endpoints involved (be it processes, hosts, datacenters).