html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/datasette/issues/2058#issuecomment-1504426792,https://api.github.com/repos/simonw/datasette/issues/2058,1504426792,IC_kwDOBm6k_c5Zq7so,9599,2023-04-12T02:02:42Z,2023-04-12T02:02:42Z,OWNER,"I tightened up the benchmark (it was measuring the time taken to create the tables too) and got this: ![image](https://user-images.githubusercontent.com/9599/231328328-85ca35ac-a11b-46f4-b132-dae367103570.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821, https://github.com/simonw/datasette/issues/2058#issuecomment-1504328395,https://api.github.com/repos/simonw/datasette/issues/2058,1504328395,IC_kwDOBm6k_c5ZqjrL,9599,2023-04-12T00:28:38Z,2023-04-12T00:28:38Z,OWNER,"Here's a much better chart, which shows that MD5 performance unsurprisingly gets worse as the number of tables increases while `schema_version` remains constant: ![image](https://user-images.githubusercontent.com/9599/231316778-513bd99f-5ea4-495c-b86d-c572a7106369.png) ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821, https://github.com/simonw/datasette/issues/2058#issuecomment-1504315697,https://api.github.com/repos/simonw/datasette/issues/2058,1504315697,IC_kwDOBm6k_c5Zqgkx,9599,2023-04-12T00:16:22Z,2023-04-12T00:27:12Z,OWNER,"I got ChatGPT (code execution alpha) to run a micro-benchmark for me. This was the conclusion: > The benchmark using `PRAGMA schema_version` is approximately 1.36 times faster than the benchmark using `hashlib.md5` for the case with 100 tables. For the case with 200 tables, the benchmark using `PRAGMA schema_version` is approximately 2.33 times faster than the benchmark using `hashlib.md5`. Here's the chart it drew me: ![image](https://user-images.githubusercontent.com/9599/231315366-3a12b6d3-08d7-419d-a1fd-36eb24da0d85.png) (It's a pretty rubbish chart though, it only took measurements at 100 and 200 and drew a line between the two, I should have told it to measure every 10 and plot that) And the full transcript: https://gist.github.com/simonw/2fc46effbfbe49e6de0bcfdc9e31b235 The benchmark looks good enough on first glance that I don't feel the need to be more thorough with it. `PRAGMA schema_version` is faster, but not so fast that I feel like the MD5 hack is worth worrying about too much. I'm tempted to add something to the `/-/versions` page that tries to identify if this is a problem or not though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821, https://github.com/simonw/datasette/issues/2058#issuecomment-1504298448,https://api.github.com/repos/simonw/datasette/issues/2058,1504298448,IC_kwDOBm6k_c5ZqcXQ,9599,2023-04-12T00:04:01Z,2023-04-12T00:04:01Z,OWNER,"Here's a potential workaround: when I store the schema versions, I could also score an MD5 hash of the full schema (`select group_concat(sql) from sqlite_master`). When I read the schema version with `PRAGMA schema_version` I could catch that exception and, if I see it, I could calculate that MD5 hash again as a fallback and use that to determine if the schema has changed instead. The performance overhead of this needs investigating - how much more expensive is `md5(... that SQL query result)` compared to just `PRAGMA schema_version`, especially on a database with a lot of tables?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821, https://github.com/simonw/datasette/issues/2058#issuecomment-1504295345,https://api.github.com/repos/simonw/datasette/issues/2058,1504295345,IC_kwDOBm6k_c5Zqbmx,9599,2023-04-12T00:01:42Z,2023-04-12T00:02:26Z,OWNER,"Here's the relevant code: https://github.com/simonw/datasette/blob/5890a20c374fb0812d88c9b0ef26a838bfa06c76/datasette/app.py#L421-L437 This function is called on almost every request (everything that subclasses `BaseView` at least - need to remember that for the refactor in #2053 etc). https://github.com/simonw/datasette/blob/5890a20c374fb0812d88c9b0ef26a838bfa06c76/datasette/views/base.py#L101-L103 It uses `PRAGMA schema_version` as a cheap way to determine if the schema has changed, in which case it needs to refresh the internal schema tables. This was already the cause of a subtle bug here: - #1231 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821, https://github.com/simonw/datasette/issues/2058#issuecomment-1504292145,https://api.github.com/repos/simonw/datasette/issues/2058,1504292145,IC_kwDOBm6k_c5Zqa0x,9599,2023-04-11T23:58:59Z,2023-04-11T23:58:59Z,OWNER,Asked on the SQLite Forum if anyone has seen this before: https://sqlite.org/forum/forumpost/793a2ed75b,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821, https://github.com/simonw/datasette/issues/2058#issuecomment-1504291892,https://api.github.com/repos/simonw/datasette/issues/2058,1504291892,IC_kwDOBm6k_c5Zqaw0,9599,2023-04-11T23:58:45Z,2023-04-11T23:58:45Z,OWNER,"I thought it might relate to the ""defensive mode"" issue described here: - https://github.com/simonw/sqlite-utils/issues/235 But I have since determined that the Datasette official Docker image does NOT run anything in defensive mode, so I don't think it's related to that.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",1663399821,