SQLite notes

From Helpful
Jump to navigation Jump to search
Database related

More theoretical - thinking about databases:

Everyday-use notes

For some introduction, see Some_databases,_sorted_by_those_types#SQLite

On text coding

On SQLite typing
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

This page mentions a lot of what you want to know.


For contrast: where most RDBMSes are statically typed (converts data to the column's type on storage, fails if it can't)...

...SQLite is more like dynamic typing in that SQLite decides the type from the value you give it, and only somewhat cares about the according column's type.


A little more precisely, the schema defines a type affinity, not a rigid type.

'type affinity' here meaning 'the value may still be of any type, but we prefer this one if that works'

The SQL types map to the affinity of INTEGER, REAL, NUMERIC, TEXT, BLOB, or NULL

e.g. INTEGER is split into seven specific-sized things -- on disk, anyway; in memory all are loaded into int64(verify)
TEXT is stored as UTF-8, UTF-16LE, or UTF-16BE (according to the database encoding) (verify)

For example(verify)

  • if you store into a numeric column, it will do something like
    • if not well formed as an integer or real, it's stored as TEXT
    • if it's a real number (with digit) and store as REAL if (seems like float64 so if there was more precision in the text that will be lost)
    • if it's an integer, and find the smallest integer type that will store it
    • if it's an integer larger than int64 can store, it will try REAL
    • hex is stored as text


Assume that the schema is not used for conversions going out

AFAICT, what you get out is whatever type that got stored, and it is the logic on insert that is interesting.

...at least, that's what synamically typed languages tend to do with it. Statically typed languages (including SQLite's onw API) may get you to convert it, and/or makes you test for it(verify).



On concurrency

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

Say the writers: "SQLite is not designed to replace Oracle. It is designed to replace fopen()"


That said, it does about as much as it can with what the filesystem provides, which is pretty great on a single device.

...but keep in mind that filesystems vary. In particular some network filesystems do not do enough.



Multiple threads

Threads are evil, avoid doing this.

But if you insist: SQLite can be compiled to be thread-safe, and usually is. You can query that compile option from the database at runtime if you want to be sure. (the reason you can compile with and without is that removing this safety is slightly faster for cases you know will always be single-threaded)

This page points out there is a further distinction between multi-threaded (where threads must not share a connection) and serialized (where that appears to be fine).


Some libraries may have their own decisions - e.g. python3's sqlite3 will complain SQLite objects created in a thread can only be used in that same thread, though that seems to be overzealous.


Multiple processes

Multiple processes can read the same database.

Only one can write at a time. It is somewhat up to clients to error out, wait and possibly time out(verify)

Limitations:

  • locking (and the journaling that relies on it) will not work over NFS,
and there are issues with SMB / windows file sharing you probably want to know about.
  • You should not carry an open connection though a fork


The safety from multiple processes relates to the safety that is atomic commits, namely in that it's largely details in the rollback journal, or WAL, that help give it.


The default is using a rollback journal. This is kept separate (good against corruption), and simpler.

WAL is more concurrent, but has a few more requirements.


The difference is not about speed, it is about concurrency (there are only some cases where more concurrency means more speed).


WAL may have less lock contention going on.

One important difference is that, without the "one writer at most" (so that the commit can happen when it closes), under WAL it becomes "the WAL file keep growing until the last connection closes", and that is less guaranteed to happen regularly.


Also keep in mind that

  • keeping transactions open (e.g. autocommit off),
  • counting on garbage collection rather than explicit close() for cursors
  • database middleware doing their own thing to make things worse
this can make a lot of sense in a "make everything act the same"


Say, in a nonsense I can have 10 concurrent at 30 requests/sec each no issue

...but in real code I can get it to break with two writers


Note that if there is a SQL API on to that keeps transactions open (e.g. autocommit off) this rather changes what the lower levels can even give you.



WAL needs to be re-incorporated into the main database, and it does this when the last connection to the database closes. With a very busy site you might find it take a few seconds for all connections to be close, but 100K hits per day should not be a problem.


journal_mode:

  • DELETE: rollback journal, delete file (default)
TRUNCATE: rollback journal, truncates file (cheaper on some situations - fewer filesystem interactions)
PERSIST: rollback journal, zeroes out header (cheaper on some situations)
  • MEMORY: rollback journal in RAM - no disk IO, but if we crash in a middle of a write, the database is likely to corrupt
  • WAL: WAL journaling - more concurrent, but requires SQLITE >= 3.7.0
  • OFF: no atomic commit, or ROLLBACK, and database is likelier to corrupt. Not recommended.



"Why do I get "database is locked" all the time?"

Depends.

It often comes down to amount of interaction * average time spent in transaction > timeout value but the the actual reason why is harder to discover


With services in general, once your load is large enough to queues up, you have issues.


Given any concurrency, anything database-bounded should be assumed to timeout, regardless of access technique, because 'database bounded' means "queueing because we can't keep up", and the window where that queue fills up with one technique and doesn't in another is narrow -- because.


And yeah, you can delay the point at which that happens, like tryting to keep a transaction as short as possible (try to deciding what to send before, not after you start a transaction).


Sure, in a good case you might get three times higher higher commit rate with WAL enabled before things go sitty, but if your load was dataset bound, the amount of erroring out will be the same.

...or no difference in commit rate at all, when the reason it was timing out had to do with inaction.


Because the timeout is defined in seconds, and any reason that someone is doing nothing (whether it is their own behaviour or caused by something external) is an issue.


As such, your "database is locked" error is more likely to be caused by a

transactions that won't be committed until later
cursors left open (e.g. counting on the gc to get it)


The main one I ran into is "I have a big store that something is steadily inserting things into", and having a separate process do a count sounds light--but if listing its keys takes a second or two, which it will once it's multi-GByte, that may take a second or two, and that's the timeout I set.


There can even be races are arguably more like a thundering herd. In one benchmark, two processes trying to access the same thing with zero waiting would have one block the other, while ten with a 0.00001 second sleep would (usually) not. ...usually, because a histogram of its time showed that each process would do most of its accesses in almost not time and a few in something dangerously close to the timeout. It also showed that different processes got dissimilar amounts of accesses in (though the total time each spent was similar, the amount of accesses was limited by the slower ones)



But more often, it's due to things like

  • "I turned autocommit off because a bulk commit is faster".
it absolutely is. But it also tends to hold a transaction open for longer than the timeout is.


-->

See also:

Errors