home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where user = 10843208 sorted by updated_at descending

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

type 2

  • issue 1
  • pull 1

repo 2

  • datasette 1
  • sqlite-utils 1

state 1

  • open 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
1733198948 I_kwDOCGYnMM5nToRk 555 Filter table by a large bunch of ids redraw 10843208 open 0     1 2023-05-31T00:29:51Z 2023-06-14T22:01:57Z   NONE  

Hi! this might be a question related to both SQLite & sqlite-utils, and you might be more experienced with them.

I have a large bunch of ids, and I'm wondering which is the best way to query them in terms of performance, and simplicity if possible.

The naive approach would be something like select * from table where rowid in (?, ?, ?...) but that wouldn't scale if ids are >1k.

Another approach might be creating a temp table, or in-memory db table, insert all ids in that table and then join with the target one.

I failed to attach an in-memory db both using sqlite-utils, and plain sql's execute(), so my closest approach is something like,

python def filter_existing_video_ids(video_ids): db = get_db() # contains a "videos" table db.execute("CREATE TEMPORARY TABLE IF NOT EXISTS tmp (video_id TEXT NOT NULL PRIMARY KEY)") db["tmp"].insert_all([{"video_id": video_id} for video_id in video_ids]) for row in db["tmp"].rows_where("video_id not in (select video_id from videos)"): yield row["video_id"] db["tmp"].drop()

That kinda worked, I couldn't find an option in sqlite-utils's create_table() to tell it's a temporary table. Also, tmp table is not dropped finally, neither using .drop() despite being created with the keyword TEMPORARY. I believe it should be automatically dropped after connection/session ends though I read.

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/555/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   
1734786661 PR_kwDOBm6k_c5R0fcK 2082 Catch query interrupted on facet suggest row count redraw 10843208 open 0     0 2023-05-31T18:42:46Z 2023-05-31T18:45:26Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/2082

Just like facet's suggest() is trapping QueryInterrupted for facet columns, we also need to trap get_row_count(), which can reach timeout if database tables are big enough.

I've included get_columns() inside the block as that's just another query, despite it's a really cheap one and might never raise the exception.


:books: Documentation preview :books:: https://datasette--2082.org.readthedocs.build/en/2082/

datasette 107914493 pull    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/2082/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Queries took 65.326ms · About: github-to-sqlite