home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

6 rows where issue = 707478649 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 2

  • simonw 5
  • Florents-Tselai 1

author_association 2

  • OWNER 5
  • NONE 1

issue 1

  • Progress bar for sqlite-utils insert · 6 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
956041692 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-956041692 https://api.github.com/repos/simonw/sqlite-utils/issues/173 IC_kwDOCGYnMM44_Anc Florents-Tselai 2118708 2021-11-01T08:42:24Z 2021-11-01T08:42:24Z NONE

I know how to build this for CSV and TSV - I can read them via a file wrapper that counts how many bytes it has seen.

Not sure how to do it for JSON though. Maybe I could provide it just for newline-delimited JSON? Again I can measure progress based on how many bytes have been read.

I was thinking about this, while inserting a stream of ~40M line-delimited json docs. Wouldn't a --total-expected flag work ?

That's how tqdm does it

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
714758139 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-714758139 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDcxNDc1ODEzOQ== simonw 9599 2020-10-22T20:57:56Z 2020-10-22T20:57:56Z OWNER

I could use ijson to provide a progress bar for JSON arrays too. I'd prefer to keep that as an optional dependency though, since sqlite-utils is a library dependency for many other projects and it would be using ijson purely for the CLI component.

Here's how to iterate through a list of objects being read from a file:

python import json parser = ijson.items(open( "/tmp/list.json" ), "item") for object in parser: # ...

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
698578959 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-698578959 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5ODU3ODk1OQ== simonw 9599 2020-09-24T20:44:35Z 2020-09-24T20:50:19Z OWNER

I'm using a click.File() at the moment: https://github.com/simonw/sqlite-utils/blob/5a63b9e88c5887432eb1d7df39f304ea55038437/sqlite_utils/cli.py#L496

I'll need to change that to be something that I can easily measure progress through. Also I should change its name - json_file is a bad name when it sometimes handles csv or tsv instead.

It looks like the argument provided by click.File doesn't provide a way to read the size of the file, so I need to switch that out for a file path instead. https://click.palletsprojects.com/en/7.x/api/#click.Path

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
698579389 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-698579389 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5ODU3OTM4OQ== simonw 9599 2020-09-24T20:45:29Z 2020-09-24T20:45:29Z OWNER

Relevant code: https://github.com/simonw/sqlite-utils/blob/5a63b9e88c5887432eb1d7df39f304ea55038437/sqlite_utils/cli.py#L550-L560

Changing that to track progress through NL-JSON, CSV and TSV shouldn't be too hard.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
698577508 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-698577508 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5ODU3NzUwOA== simonw 9599 2020-09-24T20:41:18Z 2020-09-24T20:41:18Z OWNER

I know how to build this for CSV and TSV - I can read them via a file wrapper that counts how many bytes it has seen.

Not sure how to do it for JSON though. Maybe I could provide it just for newline-delimited JSON? Again I can measure progress based on how many bytes have been read.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  
697577646 https://github.com/simonw/sqlite-utils/issues/173#issuecomment-697577646 https://api.github.com/repos/simonw/sqlite-utils/issues/173 MDEyOklzc3VlQ29tbWVudDY5NzU3NzY0Ng== simonw 9599 2020-09-23T15:48:51Z 2020-09-23T15:48:51Z OWNER

This can only work when it's reading from a file, not when it's reading from standard input.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Progress bar for sqlite-utils insert 707478649  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 19.635ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows