SQLite notes: Difference between revisions

From Helpful
Jump to navigation Jump to search
mNo edit summary
Line 331: Line 331:
  SELECT name ,SUM(pgsize)/1024 table_size  FROM "dbstat" GROUP BY name ORDER BY table_size desc;
  SELECT name ,SUM(pgsize)/1024 table_size  FROM "dbstat" GROUP BY name ORDER BY table_size desc;
(note: requires SQLITE_ENABLE_DBSTAT_VTAB)
(note: requires SQLITE_ENABLE_DBSTAT_VTAB)


-->
-->
===OPTIMIZE, ANALYZE, etc===
'''OPTIMIZE'''
Roughly speaking,
the query planner counts cases where (on a given connection) it would have been useful to have (any, or newer) ANALYZE results at hand;
optimize does an ANALYZE when
As such, it can be quite useful to do an OPTIMIZE before you close a connection -- if the table never changes this is usually a no-op,
and the times at which it ''does'' do work is probably good for subsequent access.
ANALYZE (by default) does a full scan of every index, which ''can'' be slow for larger data.
You limit the amount of work any one ANALYZE via {{inlinecode|PRAGMA analysis_limit}}.
You should conside this an ''approximate'' ANALYZE (may still be accurate if there was little work to be done, but don't count on this).
https://www.sqlite.org/lang_analyze.html
sqlite_stat1
: useful mainly to the query planner (contains information about tables and indices)
: allowed to be ''read'' (but not altered) by applications, which is sometimes useful
: https://www.sqlite.org/fileformat2.html#stat1tab
sqlite_stat4
: can be useful for the query planner (contains information about keys within an index, or primary key of a WITHOUT ROWID table)
sqlite_stat2
: Only a thing between 3.6.18 and 3.7.8; basically replaced by 3 and 4
sqlite_stat3
: can be useful for the query planner (contains information about the keys in an index)
: 4 reads 3, but does not write it;
: you can continue 3 deprecated if you use 4{{verify}}
https://www.sqlite.org/fileformat2.html#stat4tab
ANALYZE kv;
select * from sqlite_stat1;
kv|sqlite_autoindex_kv_1|433 1
sqlite> select count(*) from KV;
433
BUT note that
* that 433 is the size of the index, which we only know is useful when we e.g. know that comes from a PRIMARY KEY / UNIQUE, NOT NULL constraint
:
* before an ANALYZE, sqlite_stat1 will not exist
Apparently SQLite can only use one index for each table,
so EXPLAIN can be more important than it is for
to tell it's choosing the right one -- and in some cases you can push it towards a better index choice.
https://stackoverflow.com/questions/12947214/sqlite-analyze-breaks-indexes
List all indexes
* SELECT name  FROM sqlite_master  WHERE type='index';
* or, on the prompt, .indexes
SQLite automatically creates internal indexes for UNIQUE and PRIMARY KEY constraints, [https://sqlite.org/fileformat2.html#intschema]
: which will be called something like  sqlite_autoindex_table_''number''
There are also per-query temporary indexes created by the query optimizer [https://sqlite.org/optoverview.html#autoindex] when it estimates that creating that table in RAM is faster.
-->


===Errors===
===Errors===

Revision as of 17:06, 25 March 2024

Database related

More theoretical - thinking about databases:

Everyday-use notes

For some introduction, see Some_databases,_sorted_by_those_types#SQLite

On text coding

On SQLite typing
This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.

This page mentions a lot of what you want to know.


For contrast: where most RDBMSes are statically typed (converts data to the column's type on storage, fails if it can't)...

...SQLite is more like dynamic typing in that SQLite decides the type from the value you give it, and only somewhat cares about the according column's type.


A little more precisely, the schema defines a type affinity, not a rigid type.

'type affinity' here meaning 'the value may still be of any type, but we prefer this one if that works'

The SQL types map to the affinity of INTEGER, REAL, NUMERIC, TEXT, BLOB, or NULL

e.g. INTEGER is split into seven specific-sized things -- on disk, anyway; in memory all are loaded into int64(verify)
TEXT is stored as UTF-8, UTF-16LE, or UTF-16BE (according to the database encoding) (verify)

For example(verify)

  • if you store into a numeric column, it will do something like
    • if not well formed as an integer or real, it's stored as TEXT
    • if it's a real number (with digit) and store as REAL if (seems like float64 so if there was more precision in the text that will be lost)
    • if it's an integer, and find the smallest integer type that will store it
    • if it's an integer larger than int64 can store, it will try REAL
    • hex is stored as text


Assume that the schema is not used for conversions going out

AFAICT, what you get out is whatever type that got stored, and it is the logic on insert that is interesting.

...at least, that's what synamically typed languages tend to do with it. Statically typed languages (including SQLite's onw API) may get you to convert it, and/or makes you test for it(verify).



On concurrency

Say the writers: "SQLite is not designed to replace Oracle. It is designed to replace fopen()"


That said, it does about as much as it can with what the filesystem provides, which is pretty great on a single device.

...but keep in mind there are limits - and that filesystems vary. In particular some network filesystems do not do enough.


Concurrency from distinct tasks

Multiple threads

There is usually nothing to be gained from distinct threads accessing the same database, so maybe avoid doing this.

Yet if you insist: SQLite can be compiled to be thread-safe, and usually is. (the reason you can compile without is that removing this safety is slightly faster for cases you know will always be single-threaded)

You can query that compile option from the database at runtime if you want to be sure.

This page points out there is a further distinction between multi-threaded (where threads must not share a connection) and serialized (where that appears to be fine).


Some libraries may have their own decisions - e.g. python3's sqlite3 will complain SQLite objects created in a thread can only be used in that same thread, though that seems to be overzealous.


Multiple processes

Multiple processes can access the same database.

Assume that:

  • multiple can read concurrently
  • writes are not concurrent with other writes
  • writes are not concurrent with reads

It is somewhat up to clients to error out, or wait and/or time out(verify)


Things work a little better with a non-standard method, but it comes with more requirements.


Limitations:

  • locking (and the journaling that relies on it) relies on some filesystem semantics
assume this will not work properly over some network filesystems (e.g. NFS, some SMB, a.k.a. windows network mounts).
  • You should not carry an open connection though a fork


More on concurrent operations

This article/section is a stub — some half-sorted notes, not necessarily checked, not necessarily correct. Feel free to ignore, or tell me about it.


See also:


CLI admin

OPTIMIZE, ANALYZE, etc

OPTIMIZE

Roughly speaking, the query planner counts cases where (on a given connection) it would have been useful to have (any, or newer) ANALYZE results at hand; optimize does an ANALYZE when

As such, it can be quite useful to do an OPTIMIZE before you close a connection -- if the table never changes this is usually a no-op, and the times at which it does do work is probably good for subsequent access.


ANALYZE (by default) does a full scan of every index, which can be slow for larger data.

You limit the amount of work any one ANALYZE via PRAGMA analysis_limit. You should conside this an approximate ANALYZE (may still be accurate if there was little work to be done, but don't count on this).

https://www.sqlite.org/lang_analyze.html


sqlite_stat1

useful mainly to the query planner (contains information about tables and indices)
allowed to be read (but not altered) by applications, which is sometimes useful
https://www.sqlite.org/fileformat2.html#stat1tab

sqlite_stat4

can be useful for the query planner (contains information about keys within an index, or primary key of a WITHOUT ROWID table)

sqlite_stat2

Only a thing between 3.6.18 and 3.7.8; basically replaced by 3 and 4

sqlite_stat3

can be useful for the query planner (contains information about the keys in an index)
4 reads 3, but does not write it;
you can continue 3 deprecated if you use 4(verify)

https://www.sqlite.org/fileformat2.html#stat4tab


ANALYZE kv;

select * from sqlite_stat1;

kv|sqlite_autoindex_kv_1|433 1

sqlite> select count(*) from KV;

433

BUT note that

  • that 433 is the size of the index, which we only know is useful when we e.g. know that comes from a PRIMARY KEY / UNIQUE, NOT NULL constraint
  • before an ANALYZE, sqlite_stat1 will not exist


Apparently SQLite can only use one index for each table, so EXPLAIN can be more important than it is for to tell it's choosing the right one -- and in some cases you can push it towards a better index choice.


https://stackoverflow.com/questions/12947214/sqlite-analyze-breaks-indexes


List all indexes

  • SELECT name FROM sqlite_master WHERE type='index';
  • or, on the prompt, .indexes


SQLite automatically creates internal indexes for UNIQUE and PRIMARY KEY constraints, [1]

which will be called something like sqlite_autoindex_table_number

There are also per-query temporary indexes created by the query optimizer [2] when it estimates that creating that table in RAM is faster.

-->


Errors