home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 473709883

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions issue performed_via_github_app
https://github.com/simonw/datasette/issues/419#issuecomment-473709883 https://api.github.com/repos/simonw/datasette/issues/419 473709883 MDEyOklzc3VlQ29tbWVudDQ3MzcwOTg4Mw== 9599 2019-03-17T20:09:47Z 2019-03-17T20:37:45Z OWNER

Could I persist the last calculated count for a table and somehow detect if that table has been changed in any way by another process, hence invalidating the cached count (and potentially scheduling a new count)?

https://www.sqlite.org/c3ref/update_hook.html says that sqlite3_update_hook() can be used to register a handler invoked on almost all update/insert/delete operations to a specific table... except that it misses out on deletes triggered by ON CONFLICT REPLACE and only works for ROWID tables.

Also this hook is not exposed in the Python sqlite3 library - though it may be available using some terrifying ctypes hacks: https://stackoverflow.com/a/16920926

So on further research, I think the answer is no: I should assume that it won't be possible to cache counts and magically invalidate the cache when the underlying file is changed by another process.

Instead I need to assume that counts will be an expensive operation.

As such, I can introduce a time limit on counts and use that anywhere a count is displayed. If the time limit is exceeded by the count(*) query I can show "many" instead.

That said... running count(*) against a table with 200,000 rows in only takes about 3ms, so even a timeout of 20ms is likely to work fine for tables of around a million rows.

It would be really neat if I could generate a lower bound count in a limited amount of time. If I counted up to 4m rows before the timeout I could show "more than 4m rows". No idea if that would be possible though.

Relevant: https://stackoverflow.com/questions/8988915/sqlite-count-slow-on-big-tables - reports of very slow counts on 6GB database file. Consensus seems to be "yeah, that's just how SQLite is built" - though there was a suggestion that you can use select max(ROWID) from table provided you are certain there have been no deletions.

Also relevant: http://sqlite.1065341.n5.nabble.com/sqlite3-performance-on-select-count-very-slow-for-16-GB-file-td80176.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
421551434  
Powered by Datasette · Queries took 1.56ms · About: github-to-sqlite