File database notes

From Helpful
(Redirected from SQLite notes)
Jump to: navigation, search

For other database related things, see:

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Simple file databases are useful to persist data with little required structure, and for goals that require storing (large amounts of) data on disk with pretty quick hash-based lookups.

Many such databases are based directly or less directly on the old, simple, yet effective dbm, a (family of) database engine(s) that stores key-value mappings (both strings) in a file (sometimes two, splitting out some metadata/index(verify)).

Implementations usually hashing for fast retrieval, allows rebucketing to have the hashing scale well, and fixed-size buckets to allow relatively efficient modification.

They are also libraries, so easy to embed and avoid network serving details.

dbm family

Within the dbm family, the ones that are interesting enough to use are probably roughly:

  • dbm, 'database manager' (1979, AT&T)
  • ndbm, 'new database manager' (1986, Berkeley), added support for the library to deal with multiple open databases
    • sdbm, a 1987 clone of ndbm, made for licensing reasons
  • gdbm, 'GNU database manager', also added arbitrary-length data
  • Berkeley DB (a.k.a. BDB and sometimes BSD DB), optional concurrent access, transactions and such (ACIDity), etc.

There are also continuations / successors of this idea, including

  • Tokyo Cabinet and related are basically a modern reimplementation of the dbm idea
  • tdb (trivial database library): made for samba suite, API like GDBM but allows multiple writers
  • tdbm: variant of ndbm with atomic transactions, in-memory databases
  • MDBM: memory-mapped key-value database store derived (like sdbm and ndbm)
  • QDBM

Berkeley DB notes

Berkeley DB (also known as BDB, and libdb) is basically a key-value map in a file. It is a library instead of a server, so can be embedded, and is used like that quite a bit.

For simpler (e.g. not-very-relational) ends it has lower and more predictable overhead than bulkier databases.

Technical notes

The low-level interface does not support concurrent write access from multiple processes.

It has some higher-level provisions for locking, transactions, logging, and such, but you have to choose to use them.

From different processes, you would use DBEnv(verify) to get BDB to use proper exclusion. Most features you have to explicitly ask for via options, which also control whether the env is safe e.g. for multithreading but not between processes, safe between for several processes, etc. See things like DBEnv::open() (e.g. DB_THREAD, (lack of) DB_PRIVATE), and also notes on shared memory regions.

Interesting aspects/features:

  • it being a library means it runs in the app's address space, minimizing cross-process copying and required context switches
  • caching in shared memory
  • option for mmapped read-only access (without the cache)
  • option to keep database in memory rather than on disk
  • concurrent access:
    • writeahead logging or MVCC
    • locking (fairly fine-grained)
    • transactions (ACID), and recovery
  • hot backups
  • Distribution:
    • replication
    • commits to multiple stores (XA interface), (since 2.5)

Both key and value are byte arrays; the application has to decide how it wishes to format and use data.

Both key and value can be 232 bytes (4GB, though for keys that's usually not a great idea).
A database file up to 248 bytes (256TB, which is more than various current filesystem limits).

It uses a cache to avoid lookup slowness, and a write-back cache to be more write-efficient.

Format/access types

There are multiple types of access / file format. They provide mostly the same functionality (keyed access as well as iteration over the set); the difference mostly in performance, and only when the data is large, since if all data fits in the cache, this is an near-non-issue.

For larger data sets you should consider how each type fits the way you access your data.

If your keys do not order the entries, you should consider hash or btree. When keys are ordered record numbers, you should probably go with recno, a.k.a. record, (fixed or variable-length records).

You can supply your own comparison and hash functions.

More details:

  • Hash (DB_HASH)
    • uses extended linear hashing; scales well and keeps minimal metadata
    • supports insert and delete by key equality
    • allows iteration, but in arbirtary order
  • B+tree (DB_BTREE)
    • ordered by keys (according to the comparison function defined at creation time. You can use this for access locality)
    • allows lookup by range
    • also keeps record numbers and allows access by them, but note that these change with changes in the tree, so are mostly useful for use by recno:
  • recno (DB_RECNO)
    • ordered records
    • fast sequential access
    • also with key-based random access - it is actually built on B+tree but generates keys internally
  • queue (DB_QUEUE)
    • fixed-size records
    • fast sequential access

You can also open a BDB using DB_UNKNOWN, in which case the open call determines the type.

There are provisions to join databases on keys.

Versions and licenses

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(Note: this is probably incorrect in parts -- it's hard to find good details)

Early versions came from a library for simple hash, and later b-tree records.

  • BDB 1.85 was a version for BSD 4.4, released in 1992 under a BSD-like license (that makes it behave like the GPL)

A Netscape request for further features led to the creation of Sleepycat Software in 1996

  • BDB 2.1 was released in 1997, adding concurrent access, transactions

Sleepycat Software was acquired by Oracle Corporation in 2006. This seems to have had little licensing consequences on versions since then (verify)

Versions 2 and later, including the Oracle additions, are dual-licensed, under either:

  • The Sleepycat Commercial License: A purchased license that does not force redistribution
  • The Sleepycat Public License: software that uses Berkeley DB must be distributed (note: excludes in-house development)
    • apparently, the case of language support/wrapping is a special one (not unlike LGPL) as in cases like python and perl interfaces, the software that uses BDB is the language, not any scripts that use that interface/BDB. This does not seem to apply in the case of C library use(verify).

In other words, you generally have three options:

  • use freely while the application is and stays entirely personal / internal to your business
  • you have to distribute the source of the application that uses BDB
  • you have to get a paid license

There are now three products:

  • Berkeley DB (the basic C library)
  • Berkeley DB Java, a Java version with options for object persistence
  • Berkeley DB XML, an XML database using XQuery/XPath (see e.g. [1])

Some additional licenses apply to the latter.

See also

SQLite notes

SQLite is a lightweight library-based database engine - no server, no configuration, and a database it works on is a single file.

Also meaning you can use it without requiring an admin to install and configure a database engine.

It's a little fancier than similar library-only databases. Its feature set includes:

  • support for most of SQL92
  • ACIDity (using file-based locking)
  • recovery via journaling
  • constraints
  • concurrent access
  • views (to a degree)
  • querying multiple databases (as in different files)

How well it scales depends on some choices.

general RDBMS scaling notes apply, e.g. well-chosen indices, using transactions and not the default autocommit
some SQLite-specific notes, e.g. that
writes that would go to multiple databases are more efficient in a transaction, some cache tweaks
it doesn't deal as well with multiple clients

Say the writers: "SQLite is not designed to replace Oracle. It is designed to replace fopen()"

Cases where SQLite is interesting:

  • embedded things - SQLite and BDB have both been fairly common here for a while
  • application memory between runs, other file-based persistence
The ACID thing is nice
the indexes are nice
  • structured file storage
so that you don't have to think about binary coding of complex data
  • interchange of nontrivial data between different programs / languages
(that doesn't need to be minimum-latency)
  • when accessed via a generic database interfaces, you can fairly easily switch between sqlite to real rdbms
can be useful during development
  • Creative uses, such as local caching of data from a loaded central database
you can also have memory-only tables
  • simple dynamic websites
particularly if mostly read-only

Limitations, real or perceived:

  • many-client cases: while sqlite will function with multiple users, it will perform better with fewer users
...though concurrency is much better since WAL, i.e. since version 3.7
  • Its typing is non-strict. In programming language terms, it is dynamically and weakly, basically duck-typed.
sometimes convenient, sometimes weird
  • no row-level locking (but then, that's not commonly the largest scaling issue)
  • no foreign keys (can be worked around with triggers)
  • no permissions (...with the reason that it's fairly pointless if you can read the file anyway)
  • the less-common RIGHT JOIN and FULL OUTER JOIN are unimplemented (verify)
  • triggers are somewhat limited (verify)


String or BLOB exceeded size limit seems to mean that one or both of:

  • a query size exceeded SQLITE_MAX_SQL_LENGTH (1MB by default)
you can avoid this by using parametric values (binding) instead of literal query strings
  • a row, text field, or blob field exceeded SQLITE_MAX_LENGTH (1GB by default)

(possibly other limits)

If course, you can set these these while compiling. SQLITE_MAX_LENGTH may be 2^31-1, and SQLITE_MAX_SQL_LENGTH may be no larger than SQLITE_MAX_LENGTH (and possibly also no larger than 1GB?(verify) -- not that wanting a 2GB query is very sane...

Tokyo (and Kyoto)

Tokyo Cabinet / Kyoto Cabinet

database library, so meant for single-process use.

Tokyo Cabinet (2007) (written in C) is a embedded key-value database, a successor to QDBM

  • on-disk B+ trees, hash tables, or fixed-length array
  • multithreaded
  • some transaction
  • no real concurrent use
process-safe via exclusion control (via file locking), but only one writer can be connected at a time
threadsafe (meaning what exactly?)

Kyoto Cabinet (2009) is intended to be a successor.

written in C++, the code is simpler than Tokyo, intends to work better around threads. (Single-thread seems a little slower than Tokyo)


  • Tokyo may be a little faster
  • Tokyo may be a little more stable (at leas in earlier days of Kyoto's development)
  • Kyoto may be simpler to use
  • Kyoto may be simpler to install

Tokyo Tyrant / Kyoto Tycoon

database server, so useful when you want some concurrent use.

Supports expiry, so can act much like memcached

Tokyo Dystopia

full-text search system

Tokyo Promenade

content management system built around cabinet.

Presentable in BBS, blog, and Wiki style


  • Ordered-map store
  • ACID via MVCC
  • concurrentcy


This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

CDF, Common Data Form, is similar to HDF5, though without grouping(/hierarchy).

This makes it easier to use - when it applies.

(Note that netCDF-4 actually uses HDF5 under the covers.(verify))

If you want parallel access, you may want HDF5 instead

Hierarchical Data Format (HDF)

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

Now typically meaning HDF5 (though HDF4 still sees some use).

Describes a data model, library, and file format.


  • hierarchical refers to the fact that addressing-wise, it basically implements filesystem-like names within it
  • Stores various array-like stuff, and halfway clever about readout of parts from huge datasets.
  • structured, potentially complex data
primarily cases where you have many items following the same structure, and often numerical data (but can store others)
think time series, large sets of 2D or 3D imagery (think also meteorology), raster images, tables, other n-dimensional arrays
  • fast lookup on predictable items
offset lookups in arrays, B-tree indexes where necessary; consideration for strides, random access, and more.
a little forethought helps, though
  • portable data (a settled binary format, and self-described files)
  • dealing with ongoing access to possibly-growing datasets
  • parallel parallel IO has been considered
multiple applications accessing a dataset, parallelizing IO accesses, allows is use on clustered filesystems

See also


Other/similar include:

  • tdb, 'trivial database' (from Samba, and relatively recent), safe concurrent access
  • SQLite, more modern and scalable.
  • cdb (Constant DataBase), basically a fast on-disk associative array [3] [4]