github: issue_comments: 5 rows where author_association = "NONE" and issue = 512996469 sorted by updated

5 rows where author_association = "NONE" and issue = 512996469 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
550649607	https://github.com/simonw/datasette/issues/607#issuecomment-550649607	https://api.github.com/repos/simonw/datasette/issues/607	MDEyOklzc3VlQ29tbWVudDU1MDY0OTYwNw==	zeluspudding 8431341	2019-11-07T03:38:10Z	2019-11-07T03:38:10Z	NONE	I just got FTS5 working and it is incredible! The lookup time for returning all rows where company name contains "Musk" from my table of 16,428,090 rows has dropped from `13,340.019` ms to `15.6`ms. Well below the 100ms latency for the "real time autocomplete" feel (which doesn't currently include the http call). So cool! Thanks again for the pointers and awesome datasette!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Ways to improve fuzzy search speed on larger data sets? 512996469
548060038	https://github.com/simonw/datasette/issues/607#issuecomment-548060038	https://api.github.com/repos/simonw/datasette/issues/607	MDEyOklzc3VlQ29tbWVudDU0ODA2MDAzOA==	zeluspudding 8431341	2019-10-30T18:47:57Z	2019-10-30T18:47:57Z	NONE	Hi Simon, thanks for the pointer! Feeling good that I came to your conclusion a few days ago. I did hit a snag with figuring out how to compile a special version of sqlite for my windows machine (which I only realized I needed to do after running your command `sqlite-utils enable-fts mydatabase.db items name description`). I'll try to solve that problem next week and report back here with my findings (if you know of a good tutorial for compiling on windows, I'm all ears). Either way, I'll try to close this issue out in the next two weeks. Thanks again!	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Ways to improve fuzzy search speed on larger data sets? 512996469
546752311	https://github.com/simonw/datasette/issues/607#issuecomment-546752311	https://api.github.com/repos/simonw/datasette/issues/607	MDEyOklzc3VlQ29tbWVudDU0Njc1MjMxMQ==	zeluspudding 8431341	2019-10-28T00:37:10Z	2019-10-28T00:37:10Z	NONE	UPDATE: According to tips suggested in Squeezing Performance from SQLite: Indexes? Indexes! I have added an index to my large table and benchmarked query speeds in the case where I want to return `all rows`, `rows exactly equal to 'Musk Elon'` and, `rows like 'musk'`. Indexing reduced query time for each of those measures and dramatically reduced the time to return `rows exactly equal to 'Musk Elon'` as shown below: table: edgar_idx rows: 16,428,090 rows indexed: False Return all rows where company name exactly equal to Musk Elon query: select rowid, * from edgar_idx where "company" = :p0 order by rowid limit 101 query time: Query took 21821.031ms Return all rows where company name contains Musk query: select rowid, * from edgar_idx where "company" like :p0 order by rowid limit 101 query time: Query took 20505.029ms Return everything query: select rowid, * from edgar_idx order by rowid limit 101 query time: Query took 7985.011ms indexed: True Return all rows where company name exactly equal to Musk Elon query: select rowid, * from edgar_idx where "company" = :p0 order by rowid limit 101 query time: Query took 30.0ms Return all rows where company name contains Musk query: select rowid, * from edgar_idx where "company" like :p0 order by rowid limit 101 query time: Query took 13340.019ms Return everything query: select rowid, * from edgar_idx order by rowid limit 101 query time: Query took 2190.003ms So indexing reduced query time for an exact match to "Musk Elon" from almost `22 seconds` to `30.0ms`. That's amazing and truly promising! However, an autocomplete feature relies on fuzzy / incomplete matching, which is more similar to the `contains 'musk'` query... Unfortunately, that takes 13 seconds even after indexing. So the hunt for a fast fuzzy / autocomplete search capability persists.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Ways to improve fuzzy search speed on larger data sets? 512996469
546723302	https://github.com/simonw/datasette/issues/607#issuecomment-546723302	https://api.github.com/repos/simonw/datasette/issues/607	MDEyOklzc3VlQ29tbWVudDU0NjcyMzMwMg==	zeluspudding 8431341	2019-10-27T18:59:55Z	2019-10-27T19:00:48Z	NONE	Ultimately, I'm needing to serve searches like this to multiple users (at times concurrently). Given the size of the database I'm working with, can anyone comment as to whether I should be storing this in something like MySQL or Postgres rather than SQLite. I know there's been much defense of sqlite being performant but I wonder if those arguments break down as the database size increases. For example, if I scroll to the bottom of that linked page, where it says Checklist For Choosing The Right Database Engine, here's how I answer those questions: Is the data separated from the application by a network? → choose client/server Yes Many concurrent writers? → choose client/server Not exactly. I may have many concurrent readers but almost no concurrent writers. Big data? → choose client/server No, my database is less than 40 gb and wont approach a terabyte in the next decade. So is sqlite still a good idea here?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Ways to improve fuzzy search speed on larger data sets? 512996469
546722281	https://github.com/simonw/datasette/issues/607#issuecomment-546722281	https://api.github.com/repos/simonw/datasette/issues/607	MDEyOklzc3VlQ29tbWVudDU0NjcyMjI4MQ==	zeluspudding 8431341	2019-10-27T18:46:29Z	2019-10-27T19:00:40Z	NONE	Update: I've created a table of only unique names. This reduces the search space from over 16 million, to just about 640,000. Interestingly, it takes less than 2 seconds to create this table using Python. Performing the same search that we did earlier for `elon musk` takes nearly a second - much faster than before but still not speedy enough for an autocomplete feature (which usually needs to return results within 100ms to feel "real time"). Any ideas for slashing the search speed nearly 10 fold?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	Ways to improve fuzzy search speed on larger data sets? 512996469

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);