home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where author_association = "NONE" and user = 8431341 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 2

  • Ways to improve fuzzy search speed on larger data sets? 5
  • Authentication (and permissions) as a core concept 2

user 1

  • zeluspudding · 7 ✖

author_association 1

  • NONE · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
626991001 https://github.com/simonw/datasette/issues/699#issuecomment-626991001 https://api.github.com/repos/simonw/datasette/issues/699 MDEyOklzc3VlQ29tbWVudDYyNjk5MTAwMQ== zeluspudding 8431341 2020-05-11T22:06:34Z 2020-05-11T22:06:34Z NONE

Very nice! Thank you for sharing that :+1: :) Will try it out!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Authentication (and permissions) as a core concept 582526961  
626807487 https://github.com/simonw/datasette/issues/699#issuecomment-626807487 https://api.github.com/repos/simonw/datasette/issues/699 MDEyOklzc3VlQ29tbWVudDYyNjgwNzQ4Nw== zeluspudding 8431341 2020-05-11T16:23:57Z 2020-05-11T16:24:59Z NONE

Authorization: bearer xxx auth for API keys is a plus plus for me. Looked into just adding this into your Flask logic but learned this project doesn't use flask. Interesting 🤔

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Authentication (and permissions) as a core concept 582526961  
550649607 https://github.com/simonw/datasette/issues/607#issuecomment-550649607 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU1MDY0OTYwNw== zeluspudding 8431341 2019-11-07T03:38:10Z 2019-11-07T03:38:10Z NONE

I just got FTS5 working and it is incredible! The lookup time for returning all rows where company name contains "Musk" from my table of 16,428,090 rows has dropped from 13,340.019 ms to 15.6ms. Well below the 100ms latency for the "real time autocomplete" feel (which doesn't currently include the http call).

So cool! Thanks again for the pointers and awesome datasette!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  
548060038 https://github.com/simonw/datasette/issues/607#issuecomment-548060038 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0ODA2MDAzOA== zeluspudding 8431341 2019-10-30T18:47:57Z 2019-10-30T18:47:57Z NONE

Hi Simon, thanks for the pointer! Feeling good that I came to your conclusion a few days ago. I did hit a snag with figuring out how to compile a special version of sqlite for my windows machine (which I only realized I needed to do after running your command sqlite-utils enable-fts mydatabase.db items name description).

I'll try to solve that problem next week and report back here with my findings (if you know of a good tutorial for compiling on windows, I'm all ears). Either way, I'll try to close this issue out in the next two weeks. Thanks again!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  
546752311 https://github.com/simonw/datasette/issues/607#issuecomment-546752311 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0Njc1MjMxMQ== zeluspudding 8431341 2019-10-28T00:37:10Z 2019-10-28T00:37:10Z NONE

UPDATE: According to tips suggested in Squeezing Performance from SQLite: Indexes? Indexes! I have added an index to my large table and benchmarked query speeds in the case where I want to return all rows, rows exactly equal to 'Musk Elon' and, rows like 'musk'. Indexing reduced query time for each of those measures and dramatically reduced the time to return rows exactly equal to 'Musk Elon' as shown below:

table: edgar_idx rows: 16,428,090 rows indexed: False Return all rows where company name exactly equal to Musk Elon query: select rowid, * from edgar_idx where "company" = :p0 order by rowid limit 101 query time: Query took 21821.031ms

Return all rows where company name contains Musk query: select rowid, * from edgar_idx where "company" like :p0 order by rowid limit 101 query time: Query took 20505.029ms

Return everything query: select rowid, * from edgar_idx order by rowid limit 101 query time: Query took 7985.011ms

indexed: True Return all rows where company name exactly equal to Musk Elon query: select rowid, * from edgar_idx where "company" = :p0 order by rowid limit 101 query time: Query took 30.0ms

Return all rows where company name contains Musk query: select rowid, * from edgar_idx where "company" like :p0 order by rowid limit 101 query time: Query took 13340.019ms

Return everything query: select rowid, * from edgar_idx order by rowid limit 101 query time: Query took 2190.003ms

So indexing reduced query time for an exact match to "Musk Elon" from almost 22 seconds to 30.0ms. That's amazing and truly promising! However, an autocomplete feature relies on fuzzy / incomplete matching, which is more similar to the contains 'musk' query... Unfortunately, that takes 13 seconds even after indexing. So the hunt for a fast fuzzy / autocomplete search capability persists.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  
546723302 https://github.com/simonw/datasette/issues/607#issuecomment-546723302 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0NjcyMzMwMg== zeluspudding 8431341 2019-10-27T18:59:55Z 2019-10-27T19:00:48Z NONE

Ultimately, I'm needing to serve searches like this to multiple users (at times concurrently). Given the size of the database I'm working with, can anyone comment as to whether I should be storing this in something like MySQL or Postgres rather than SQLite. I know there's been much defense of sqlite being performant but I wonder if those arguments break down as the database size increases.

For example, if I scroll to the bottom of that linked page, where it says Checklist For Choosing The Right Database Engine, here's how I answer those questions:

  • Is the data separated from the application by a network? → choose client/server Yes
  • Many concurrent writers? → choose client/server Not exactly. I may have many concurrent readers but almost no concurrent writers.
  • Big data? → choose client/server No, my database is less than 40 gb and wont approach a terabyte in the next decade.

So is sqlite still a good idea here?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  
546722281 https://github.com/simonw/datasette/issues/607#issuecomment-546722281 https://api.github.com/repos/simonw/datasette/issues/607 MDEyOklzc3VlQ29tbWVudDU0NjcyMjI4MQ== zeluspudding 8431341 2019-10-27T18:46:29Z 2019-10-27T19:00:40Z NONE

Update: I've created a table of only unique names. This reduces the search space from over 16 million, to just about 640,000. Interestingly, it takes less than 2 seconds to create this table using Python. Performing the same search that we did earlier for elon musk takes nearly a second - much faster than before but still not speedy enough for an autocomplete feature (which usually needs to return results within 100ms to feel "real time").

Any ideas for slashing the search speed nearly 10 fold?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ways to improve fuzzy search speed on larger data sets? 512996469  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 31.768ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows