File database notes

From Helpful
Revision as of 21:14, 14 December 2010 by Helpful (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

For other database related things, see:

These are primarily notes
It won't be complete in any sense.
It exists to contain fragments of useful information.

Simple file databases are useful to persist data with little required structure, and for goals that require storing (large amounts of) data on disk with pretty quick hash-based lookups.

Many such databases are based directly or less directly on the old, simple, yet effective dbm, a (family of) database engine(s) that stores key-value mappings (both strings) in a file (sometimes two, splitting out some metadata/index(verify)).

Implementations usually hashing for fast retrieval, allows rebucketing to have the hashing scale well, and fixed-size buckets to allow relatively efficient modification.

They are also libraries, so easy to embed and avoid network serving details.

dbm family

Within the dbm family, the ones that are interesting enough to use are probably roughly:

  • dbm, 'database manager' (1979, AT&T)
  • ndbm, 'new database manager' (1986, Berkeley), added support for the library to deal with multiple open databases
  • gdbm, 'GNU database manager', also added arbitrary-length data
  • Berkeley DB (a.k.a. BDB and sometimes BSD DB), optional concurrent access, transactions and such (ACIDity), etc. For more history, see Berkeley DB notes below.

See also the various derivations and successors:


Berkeley DB notes

Berkeley DB (also known as BDB, and libdb) stores key-value maps in files. It is a library instead of a server, so can be embedded, and is used like that more often than many people are aware of.

For certain ends it is faster than other types of database, particularly if the client-server model and its implied overhead is not necessary, and/or the relational data model is not particularly fit for the data.

Technical notes

Both key and value are byte arrays; the application has to decide how it wishes to format data. Both key and value can be 232 bytes. A database file up to 248 bytes (256TB, which is more than most current filesystems support).

It uses a cache to avoid lookup slowness, and a write-back cache to be more write-efficient.

The low-level interface does not support concurrent write access from multiple processes, but it does itself have provisions for locking, transactions, logging, and such. You do have to choose to use them (if you don't, you get much simpler access to the database).

From different processes, you would use DBEnv(verify) to get BDB to use proper exclusion. Most features you have to explicitly ask for via options, which also control whether the env is safe e.g. for multithreading but not between processes, safe between for several processes, etc. See things like DBEnv::open() (e.g. DB_THREAD, (lack of) DB_PRIVATE), and also notes on shared memory regions.

Interesting aspects/features:

  • it being a library means it runs in the app's address space, minimizing cross-process copying and required context switches
  • caching in shared memory
  • option for mmapped read-only access (without the cache)
  • option to keep database in memory rather than on disk
  • concurrent access:
    • writeahead logging or MVCC
    • locking (fairly fine-grained)
    • transactions (ACID), and recovery
  • hot backups
  • Distribution:
    • replication
    • commits to multiple stores (XA interface), (since 2.5)

Format/access types

There are multiple types of access / file format. They provide mostly the same functionality (keyed access as well as iteration over the set); the difference mostly in performance, and only when the data is large, since if all data fits in the cache, this is an near-non-issue.

For larger data sets you should consider how each type fits the way you access your data.

If your keys do not order the entries, you should consider hash or btree. When keys are ordered record numbers, you should probably go with recno, a.k.a. record, (fixed or variable-length records).

You can supply your own comparison and hash functions.

More details:

  • Hash (DB_HASH)
    • uses extended linear hashing; scales well and keeps minimal metadata
    • supports insert and delete by key equality
    • allows iteration, but in arbirtary order
  • B+tree (DB_BTREE)
    • ordered by keys (according to the comparison function defined at creation time. You can use this for access locality)
    • allows lookup by range
    • also keeps record numbers and allows access by them, but note that these change with changes in the tree, so are mostly useful for use by recno:
  • recno (DB_RECNO)
    • ordered records
    • fast sequential access
    • also with key-based random access - it is actually built on B+tree but generates keys internally
  • queue (DB_QUEUE)
    • fixed-size records
    • fast sequential access

You can also open a BDB using DB_UNKNOWN, in which case the open call determines the type.

There are provisions to join databases on keys.

Versions and licenses

This article/section is a stub — probably a pile of half-sorted notes, is not well-checked so may have incorrect bits. (Feel free to ignore, fix, or tell me)

(Note: this is probably incorrect in parts -- it's hard to find good details)

Early versions came from a library for simple hash, and later b-tree records.

  • BDB 1.85 was a version for BSD 4.4, released in 1992 under a BSD-like license (that makes it behave like the GPL)

A Netscape request for further features led to the creation of Sleepycat Software in 1996

  • BDB 2.1 was released in 1997, adding concurrent access, transactions

Sleepycat Software was acquired by Oracle Corporation in 2006. This seems to have had no licensing consequences on versions since then (verify)

There are now three products:

  • Berkeley DB (the basic C library)
  • Berkeley DB Java, a Java version with options for object persistence
  • Berkeley DB XML, an XML database using XQuery/XPath (see e.g. [1])

Some additional licenses apply to the latter.

Versions 2 and later, including the Oracle additions, are dual-licensed, under either:

  • The Sleepycat Commercial License: A purchased license that does not force redistribution
  • The Sleepycat Public License: software that uses Berkeley DB must be distributed (note: excludes in-house development), but apparently, the case of language support/wrapping is a special one (similar to but not the same as LGPL) as in cases like python and perl interfaces, the software that uses BDB is the language, not any scripts that use BDB. This does not seem to apply in the case of C library use(verify).

In other words, you generally have three options:

  • use freely while the application is and stays entirely personal / internal to your business
  • you have to distribute the source of the application that uses BDB
  • you have to get a paid license

See also

SQLite notes

SQLite is a lightweight file-and-library-based (so serverless, and no-configuration) database engine. (more complex than other simple file-based databases; see also File database notes)

For its small size it has an interesting number of features, including support for most of SQL92, ACIDity (using file-based locking) recovery using simple journaling, constraints, views, querying multiple databases (as in differnt files) and other things that make it practically usable.

It scales fairly well, better than you may think when you hear 'file-based'; a few dozen concurrent connections will work, although it does not have the scalability guarantees or e.g. row-level locking that other things may have, but since larger database systems write everything just as much, that's not where the big difference is. (The difference lies more in things like index caching)

It can be useful to offload some logic that SQL can handle, to have potentially complex data storage instead of custom file formats, to have persistence in applications, to back a simple object-relational layer to the same effect, with just the dependency on the sqlite library. Since database files are independent it is interchangable between programs that use the same major version of the sqlite library, these databases can pretend to be file formats and act as a structured exchange format.

Some creative uses include local caching of data from a loaded central database (also because you can also have memory-only tables) and even inter-process data interchange.

Missing features include

  • foreign keys (can be worked around with triggers)
  • VIEW writing
  • permissions (...with the reason that it's fairly pointless if you can read the file anyway)

Partial features include:

  • ALTER TABLE means depending on the alteration you wish you may have to create a new table instead (this is true for various larger database systems too, they just do it more automatically)
  • joins: the less common RIGHT join and FULL OUTER join are unimplemented, the other joins are present
  • triggers


String or BLOB exceeded size limit seems to mean that one or both of:

  • a query size exceeded SQLITE_MAX_SQL_LENGTH (1MB by default) (note that you can avoid this by using parametric values (binding) instead of literal query strings)
  • a row, text field, or blob field exceeded SQLITE_MAX_LENGTH (1GB by default)

(possibly other limits)

If course, you can set these these while compiling. SQLITE_MAX_LENGTH may be 2^31-1, and SQLITE_MAX_SQL_LENGTH may be no larger than SQLITE_MAX_LENGTH (and possibly also no larger than 1GB?(verify) -- not that wanting a 2GB query is very sane...


Other/similar include:

  • tdb, 'trivial database' (from Samba, and relatively recent), safe concurrent access
  • SQLite, more modern and scalable.
  • cdb (Constant DataBase), basically a fast on-disk associative array [3] [4]