SQLite notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
Line 80: Line 80:


===On concurrency===
===On concurrency===
{{stub}}


Say the writers: "SQLite is not designed to replace Oracle. It is designed to replace fopen()"
Say the writers: "SQLite is not designed to replace Oracle. It is designed to replace fopen()"
Line 90: Line 89:




====Concurrency from distinct tasks====




Line 121: Line 121:




====Concurrent operations====
{{stub}}
<!--
How safety from multiple processes is ''relates'' to how proper atomic commits are done,
How safety from multiple processes is ''relates'' to how proper atomic commits are done,
namely in that it's largely details in the rollback journal, or WAL,  
namely in that it's largely details in the rollback journal, or WAL.
that help give it.
 
 
Note that, before we dig into underlying details,  
that as in general you can always unintentionally limit / restrict concurrency, e.g. by
 
* keeping transactions open (e.g. autocommit off),
 
* counting on garbage collection rather than explicit close() for cursors
 
* database middleware doing their own thing to make things worse
:: this can make a lot of sense in a "make everything act the same"
 
 
 
Comparison:
* in the absence of writing, both equally allow lots of concurrent reading
 
* WAL allows writes to collect while reading is still going on; rollback journal does not
 
* WAL is more about concurrency than speed
: there are only some cases where more concurrency means more speed




The default is using a '''rollback journal'''.
This keeps the original and replacement version of what we are touching,
in a separate file (good against corruption).
A commit, and deletion of this rollback journal (almost equivalent events)
can happen without much [[write hole]] trouble, and a rollback is also well defined.
This happens to be a fairly simple and fairly safe.




The alternative is the ''WAL'', also a separate file, that can collect multiple ongoing transactions.
The details of the two also dictate what operations can be concurrent or not.
It is more concurrent, but has a few more requirements.


The difference is less about speed, and more about concurrency
(there are only cases where more concurrency means more speed, sure).


WAL may have less lock contention going on.


One important difference is that, without the "one writer at most" (so that the commit can happen when it closes),
The default is using a '''rollback journal'''.
under WAL it becomes "the WAL file keep growing until a checkpoint (which seems to be when the last concurrent connection closes{{verify}})", and that is less guaranteed to happen quickly/regularly.
: This
:: copies the original version of what we are altering in a separate file
:: we alter the database file itself directly
:: and close=commit=remove that separate file.
: after a crash, the separate file is replayed to fix the database file
: rollback of an ongoing transaction is similar
: which means
:: you can have many concurrent readers
:: ...but reading is exclusive with writing






You can always unintentionally restrict concurrency, e.g. by
The alternative is ''WAL'' style journaling
: It
:: collects writes into a separate file (from multiple transactions)
:: the database is as of yet untouched (so separate from ongoing reads)
:: only writes into the database file at checkpoint time.
: which means
:: reads often don't block writes (as is easily true with rollback journal).
:: writes don't block reads
:: writes ''do'' still block other writes (there's only one WAL)
:: checkpointing (=writing WAL contents into the database) ''can'' be largely concurrent with reading ''but'' there are more hold-up edge cases.
 


* keeping transactions open (e.g. autocommit off),


* counting on garbage collection rather than explicit close() for cursors
:: writes are a little faster, because they happen once, not twice as with rollback journaling
:: reads can be slower because each reader must sometimes check whether the content they need is in the WAL
:: you can balance it towards somewhat faster reads, or writes, by controlling the WAL size


* database middleware doing their own thing to make things worse
:: this can make a lot of sense in a "make everything act the same"


As such, read-heavy workloads are probably more concurrent.


-->
<!--
<!--
Say, in a nonsense
I can have 10 concurrent at 30 requests/sec each no issue


Say, in a nonsense benchmark I can have 10 concurrent at 30 requests/sec each no issue
...but in real code I can get it to break with two processes and a low transaction rate
...but in real code I can get it to break with two processes and a low transaction rate



Revision as of 12:01, 24 January 2024

Database related

More theoretical - thinking about databases:

Everyday-use notes

For some introduction, see Some_databases,_sorted_by_those_types#SQLite

On text coding

On SQLite typing
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

This page mentions a lot of what you want to know.


For contrast: where most RDBMSes are statically typed (converts data to the column's type on storage, fails if it can't)...

...SQLite is more like dynamic typing in that SQLite decides the type from the value you give it, and only somewhat cares about the according column's type.


A little more precisely, the schema defines a type affinity, not a rigid type.

'type affinity' here meaning 'the value may still be of any type, but we prefer this one if that works'

The SQL types map to the affinity of INTEGER, REAL, NUMERIC, TEXT, BLOB, or NULL

e.g. INTEGER is split into seven specific-sized things -- on disk, anyway; in memory all are loaded into int64(verify)
TEXT is stored as UTF-8, UTF-16LE, or UTF-16BE (according to the database encoding) (verify)

For example(verify)

  • if you store into a numeric column, it will do something like
    • if not well formed as an integer or real, it's stored as TEXT
    • if it's a real number (with digit) and store as REAL if (seems like float64 so if there was more precision in the text that will be lost)
    • if it's an integer, and find the smallest integer type that will store it
    • if it's an integer larger than int64 can store, it will try REAL
    • hex is stored as text


Assume that the schema is not used for conversions going out

AFAICT, what you get out is whatever type that got stored, and it is the logic on insert that is interesting.

...at least, that's what synamically typed languages tend to do with it. Statically typed languages (including SQLite's onw API) may get you to convert it, and/or makes you test for it(verify).



On concurrency

Say the writers: "SQLite is not designed to replace Oracle. It is designed to replace fopen()"


That said, it does about as much as it can with what the filesystem provides, which is pretty great on a single device.

...but keep in mind there are limits - and that filesystems vary. In particular some network filesystems do not do enough.


Concurrency from distinct tasks

Multiple threads

Standard message of threads are evil, avoid doing this.

Yet if you insist: SQLite can be compiled to be thread-safe, and usually is. You can query that compile option from the database at runtime if you want to be sure. (the reason you can compile with and without is that removing this safety is slightly faster for cases you know will always be single-threaded)

This page points out there is a further distinction between multi-threaded (where threads must not share a connection) and serialized (where that appears to be fine).


Some libraries may have their own decisions - e.g. python3's sqlite3 will complain SQLite objects created in a thread can only be used in that same thread, though that seems to be overzealous.


Multiple processes

Multiple processes can read the same database.

Only one can write at a time. It is somewhat up to clients to error out, or wait and/or time out(verify)

Limitations:

  • locking (and the journaling that relies on it) relies on some filesystem semantics
assume this will not work over NFS
and there are issues with SMB (a.k.a. windows network mounts) you probably want to know about.
  • You should not carry an open connection though a fork


Concurrent operations

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


See also:

Errors