github: issue_comments: 11 rows where issue = 1097091527 sorted by updated

11 rows where issue = 1097091527 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1029402837	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1029402837	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM49W3DV	simonw 9599	2022-02-03T21:07:35Z	2022-02-03T21:07:35Z	OWNER	Closing this - it was something I was curious about, but evidently not curious enough to actually do the work!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008232075	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008232075	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GGaL	simonw 9599	2022-01-09T05:13:15Z	2022-01-09T05:13:56Z	OWNER	I think the query that will help solve this is: `explain query plan select * from ny_times_us_counties where state = 1 and county = 2` In this case, the query planner needs to decide if it should use the index for the `state` column or the index for the `county` column. That's where the statistics come into play. In particular: \| tbl \| idx \| stat \| \|----------------------\|---------------------------------\|---------------\| \| ny_times_us_counties \| idx_ny_times_us_counties_date \| 2092871 2915 \| \| ny_times_us_counties \| idx_ny_times_us_counties_fips \| 2092871 651 \| \| ny_times_us_counties \| idx_ny_times_us_counties_county \| 2092871 1085 \| \| ny_times_us_counties \| idx_ny_times_us_counties_state \| 2092871 37373 \| Those numbers are explained by this comment in the SQLite C code: https://github.com/sqlite/sqlite/blob/5622c7f97106314719740098cf0854e7eaa81802/src/analyze.c#L41-L55 There is normally one row per index, with the index identified by the name in the idx column. The tbl column is the name of the table to which the index belongs. In each such row, the stat column will be a string consisting of a list of integers. The first integer in this list is the number of rows in the index. (This is the same as the number of rows in the table, except for partial indices.) The second integer is the average number of rows in the index that have the same value in the first column of the index. So that table is telling us that using a value in the `county` column will filter down to an average of 1,085 rows, whereas filtering on the `state` column will filter down to an average of 37,373 - so clearly the `county` index is the better index to use here! Just one catch: against both my`covid.db` and my `covid-analyzed.db` databases the `county` index is picked for both of them - so SQLite is somehow guessing that `county` is a better index even though it doesn't have statistics for that.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008229341	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008229341	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GFvd	simonw 9599	2022-01-09T04:45:38Z	2022-01-09T04:47:11Z	OWNER	This is probably too fancy. I think maybe the way to do this is with `select * from [global-power-plants] where "country_long" = 'United Kingdom'` - then mess around with stats to see if I can get it to use the index or not based on them. Here's the explain for that: https://global-power-plants.datasettes.com/global-power-plants?sql=EXPLAIN+QUERY+PLAN+select+*+from+[global-power-plants]+where+%22country_long%22+%3D+%27United+Kingdom%27	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008227625	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008227625	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GFUp	simonw 9599	2022-01-09T04:25:38Z	2022-01-09T04:25:38Z	OWNER	`sql EXPLAIN QUERY PLAN select country_long, count() from [global-power-plants] group by country_long` https://global-power-plants.datasettes.com/global-power-plants?sql=EXPLAIN+QUERY+PLAN+select+country_long%2C+count%28%29+from+%5Bglobal-power-plants%5D+group+by+country_long SCAN TABLE global-power-plants USING COVERING INDEX "global-power-plants_country_long"	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008226862	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008226862	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GFIu	simonw 9599	2022-01-09T04:17:55Z	2022-01-09T04:17:55Z	OWNER	There are some clues as to what effect ANALYZE has in https://www.sqlite.org/optoverview.html Some quotes: SQLite might use a skip-scan on an index if it knows that the first one or more columns contain many duplication values. If there are too few duplicates in the left-most columns of the index, then it would be faster to simply step ahead to the next value, and thus do a full table scan, than to do a binary search on an index to locate the next left-column value. The only way that SQLite can know that there are many duplicates in the left-most columns of an index is if the ANALYZE command has been run on the database. Without the results of ANALYZE, SQLite has to guess at the "shape" of the data in the table, and the default guess is that there are an average of 10 duplicates for every value in the left-most column of the index. Skip-scan only becomes profitable (it only gets to be faster than a full table scan) when the number of duplicates is about 18 or more. Hence, a skip-scan is never used on a database that has not been analyzed. And Join reordering is automatic and usually works well enough that programmers do not have to think about it, especially if ANALYZE has been used to gather statistics about the available indexes, though occasionally some hints from the programmer are needed. And The various sqlite_statN tables contain information on how selective the various indexes are. For example, the sqlite_stat1 table might indicate that an equality constraint on column x reduces the search space to 10 rows on average, whereas an equality constraint on column y reduces the search space to 3 rows on average. In that case, SQLite would prefer to use index ex2i2 since that index is more selective.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008226487	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008226487	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GFC3	simonw 9599	2022-01-09T04:14:05Z	2022-01-09T04:14:05Z	OWNER	Didn't manage to spot a meaningful difference with that database either: `analyze % python3 -m timeit '__import__("sqlite3").connect("covid.db").execute("select fips, count() from [ny_times_us_counties] group by fips").fetchall()' 2 loops, best of 5: 101 msec per loop analyze % python3 -m timeit '__import__("sqlite3").connect("covid-analyzed.db").execute("select fips, count() from [ny_times_us_counties] group by fips").fetchall()' 2 loops, best of 5: 103 msec per loop` Maybe `select fips, count(*) from [ny_times_us_counties] group by fips` isn't a good query for testing this?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008220270	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008220270	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GDhu	simonw 9599	2022-01-09T03:12:38Z	2022-01-09T03:13:15Z	OWNER	Basically no difference using this very basic benchmark: `analyze % python3 -m timeit '__import__("sqlite3").connect("global-power-plants.db").execute("select country_long, count() from [global-power-plants] group by country_long").fetchall()' 100 loops, best of 5: 2.39 msec per loop analyze % python3 -m timeit '__import__("sqlite3").connect("global-power-plants-analyzed.db").execute("select country_long, count() from [global-power-plants] group by country_long").fetchall()' 100 loops, best of 5: 2.38 msec per loop` I should try this against a much larger database. https://covid-19.datasettes.com/covid.db is 879MB.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008219844	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219844	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GDbE	simonw 9599	2022-01-09T03:08:09Z	2022-01-09T03:08:09Z	OWNER	``` analyze % sqlite-utils global-power-plants-analyzed.db 'analyze' [{"rows_affected": -1}] analyze % sqlite-utils tables global-power-plants-analyzed.db [{"table": "global-power-plants"}, {"table": "global-power-plants_fts"}, {"table": "global-power-plants_fts_data"}, {"table": "global-power-plants_fts_idx"}, {"table": "global-power-plants_fts_docsize"}, {"table": "global-power-plants_fts_config"}, {"table": "sqlite_stat1"}] analyze % sqlite-utils rows global-power-plants-analyzed.db sqlite_stat1 -t tbl idx stat global-power-plants_fts_config global-power-plants_fts_config 1 1 global-power-plants_fts_docsize 33643 global-power-plants_fts_idx global-power-plants_fts_idx 199 40 1 global-power-plants_fts_data 136 global-power-plants "global-power-plants_owner" 33643 4 global-power-plants "global-power-plants_country_long" 33643 202 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008219588	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219588	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GDXE	simonw 9599	2022-01-09T03:06:42Z	2022-01-09T03:06:42Z	OWNER	``` analyze % sqlite-utils indexes global-power-plants.db -t table index_name seqno cid name desc coll key global-power-plants "global-power-plants_owner" 0 12 owner 0 BINARY 1 global-power-plants "global-power-plants_country_long" 0 1 country_long 0 BINARY 1 global-power-plants_fts_idx sqlite_autoindex_global-power-plants_fts_idx_1 0 0 segid 0 BINARY 1 global-power-plants_fts_idx sqlite_autoindex_global-power-plants_fts_idx_1 1 1 term 0 BINARY 1 global-power-plants_fts_config sqlite_autoindex_global-power-plants_fts_config_1 0 0 k 0 BINARY 1 ```	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008219484	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219484	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GDVc	simonw 9599	2022-01-09T03:05:44Z	2022-01-09T03:05:44Z	OWNER	I'll start by running some experiments against the 11MB database file from https://global-power-plants.datasettes.com/global-power-plants.db	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527
1008219191	https://github.com/simonw/sqlite-utils/issues/369#issuecomment-1008219191	https://api.github.com/repos/simonw/sqlite-utils/issues/369	IC_kwDOCGYnMM48GDQ3	simonw 9599	2022-01-09T03:03:53Z	2022-01-09T03:03:53Z	OWNER	Refs: - #366 - #365	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Research how much of a difference analyze / sqlite_stat1 makes 1097091527

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);