home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

16 rows where issue = 1096558279 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • simonw 9
  • fgregg 7

author_association 2

  • OWNER 9
  • CONTRIBUTOR 7

issue 1

  • create-index should run analyze after creating index · 16 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1009548580 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009548580 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48LH0k fgregg 536941 2022-01-11T02:43:34Z 2022-01-11T02:43:34Z CONTRIBUTOR

thanks so much! always a pleasure to see how you work through these things

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1009521921 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1009521921 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48LBUB simonw 9599 2022-01-11T01:37:53Z 2022-01-11T01:37:53Z OWNER

I decided to go with making this opt-in, mainly for consistency with the other places where I added this feature - see: - #379 - #366

You can now run the following:

sqlite-utils create-index mydb.db mytable mycolumn --analyze

And ANALYZE will be run on the index once it has been created.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008275546 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008275546 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48GRBa fgregg 536941 2022-01-09T11:01:15Z 2022-01-09T13:37:51Z CONTRIBUTOR

i don’t want to be such a partisan for analyze, but the query planner deciding not to use an index based on information collected by analyze is not necessarily a bug, but could be the correct choice.

<s>the original poster in that stack overflow doesn’t say there’s a performance regression </s>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008229839 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008229839 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48GF3P simonw 9599 2022-01-09T04:51:44Z 2022-01-09T04:51:44Z OWNER

Found one report on Stack Overflow from 9 years ago of someone seeing broken performance after running ANALYZE, hard to say that's a trend and not a single weird edge-case though! https://stackoverflow.com/q/12947214/6083

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008163585 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163585 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1sB simonw 9599 2022-01-08T22:14:39Z 2022-01-09T03:03:07Z OWNER

The reason I'm hesitating on this is that I've not actually used ANALYZE at all in nearly five years of messing around with SQLite! So I'm nervous that there are surprise downsides I haven't thought of.

My hunch is that ANALYZE is only worth worrying about on much larger databases, in which case I'm OK supporting it as a thoroughly documented power-user feature rather than a default.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008166084 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008166084 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F2TE fgregg 536941 2022-01-08T22:32:47Z 2022-01-08T22:32:47Z CONTRIBUTOR

or using “ pragma optimize”

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008164786 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164786 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1-y fgregg 536941 2022-01-08T22:24:19Z 2022-01-08T22:24:19Z CONTRIBUTOR

the out-of-date scenario you describe could be addressed by automatically adding an analyze to the insert or convert commands if they implicate an index

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008164116 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008164116 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F10U fgregg 536941 2022-01-08T22:18:57Z 2022-01-08T22:18:57Z CONTRIBUTOR

the table with the query ran so bad was about 50k.

i think the scenario should not be worse than no stats.

i also did not know that sqlite was so different from postgres and needed an explicit analyze call.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008163050 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008163050 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1jq simonw 9599 2022-01-08T22:10:51Z 2022-01-08T22:10:51Z OWNER

Is there a downside to having a sqlite_stat1 table if it has wildly incorrect statistics in it?

Imagine the following sequence of events:

  • User imports a few records, creating the table, using sqlite-utils insert
  • User runs sqlite-utils create-index ... which also creates and populates the sqlite_stat1 table
  • User runs insert again to populate several million new records

The user now has a database file with several million records and a statistics table that is wildly out of date, having been populated when they only had a few.

Will this result in surprisingly bad query performance compared to it that statistics table did not exist at all?

If so, I lean much harder towards ANALYZE as a strictly opt-in optimization, maybe with the --analyze option added to sqlite-utils insert top to help users opt in to updating their statistics after running big inserts.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008161965 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008161965 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F1St fgregg 536941 2022-01-08T22:02:56Z 2022-01-08T22:02:56Z CONTRIBUTOR

for options 2 and 3, i would worry about discoverablity.

in other db’s it is not necessary to explicitly call analyze for most indices. ie for postgres

The system regularly collects statistics on all of a table's columns. Newly-created non-expression indexes can immediately use these statistics to determine an index's usefulness.

i suppose i would propose raising a warning if the stats table is created that explains what is going on and informs users about a —no-analyze argument.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1008158357 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1008158357 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48F0aV simonw 9599 2022-01-08T21:33:07Z 2022-01-08T21:33:07Z OWNER

The one thing that worries me a little bit about doing this by default is that it adds a surprising new table to the database - it may be confusing to users if they run create-index and their database suddenly has a new sqlite_stat1 table, see https://github.com/simonw/sqlite-utils/issues/366#issuecomment-1008157132

Options here are:

  • Do it anyway. People can tolerate a surprise table appearing when they create an index.
  • Only run ANALYZE if the user says sqlite-utils create-index ... --analyze
  • Use the --analyze option, but also automatically run ANALYZE if they create an index and the database they are working with already has a sqlite_stat1 table

I'm currently leading towards that third option - @fgregg any thoughts?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1007643254 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007643254 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48D2p2 simonw 9599 2022-01-07T18:37:56Z 2022-01-07T18:37:56Z OWNER

Or I could leave off --no-analyze and tell people that if they want to add an index without running analyze they can execute the CREATE INDEX themselves.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1007642831 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007642831 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48D2jP simonw 9599 2022-01-07T18:37:18Z 2022-01-07T18:37:18Z OWNER

After implementing #366 I can make it so sqlite-utils create-index automatically runs db.analyze(index_name) afterwards, maybe with a --no-analyze option in case anyone wants to opt out of that for specific performance reasons.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1007636709 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007636709 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48D1Dl fgregg 536941 2022-01-07T18:28:33Z 2022-01-07T18:29:43Z CONTRIBUTOR

i added an index to one table with sqlite-utils, and then a query that used to take about 1 second started taking hundreds of seconds.

running analyze got me back to sub second speed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1007634999 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007634999 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48D0o3 simonw 9599 2022-01-07T18:26:22Z 2022-01-07T18:26:22Z OWNER

I've not used the ANALYZE feature in SQLite at all before. Should probably add Python library methods for it.

Annoyingly I use the word "analyze" to mean something else in the CLI - for these features: - #207 - #320

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  
1007633376 https://github.com/simonw/sqlite-utils/issues/365#issuecomment-1007633376 https://api.github.com/repos/simonw/sqlite-utils/issues/365 IC_kwDOCGYnMM48D0Pg simonw 9599 2022-01-07T18:24:07Z 2022-01-07T18:24:07Z OWNER

Relevant documentation: https://www.sqlite.org/lang_analyze.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
create-index should run analyze after creating index 1096558279  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 20.632ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows