home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where issue = 808008305 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 8

issue 1

  • --sniff option for sniffing delimiters · 8 ✖

author_association 1

  • OWNER 8
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
778827570 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778827570 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgyNzU3MA== simonw 9599 2021-02-14T19:24:20Z 2021-02-14T19:24:20Z OWNER

Here's the implementation in Python: https://github.com/python/cpython/blob/63298930fb531ba2bb4f23bc3b915dbf1e17e9e1/Lib/csv.py#L204-L225

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778824361 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778824361 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgyNDM2MQ== simonw 9599 2021-02-14T18:59:22Z 2021-02-14T18:59:22Z OWNER

I think I've got it. I can use io.BufferedReader() to get an object I can run .peek(2048) on, then wrap THAT in io.TextIOWrapper:

python encoding = encoding or "utf-8" buffered = io.BufferedReader(json_file, buffer_size=4096) decoded = io.TextIOWrapper(buffered, encoding=encoding, line_buffering=True) if pk and len(pk) == 1: pk = pk[0] if csv or tsv: if sniff: # Read first 2048 bytes and use that to detect first_bytes = buffered.peek(2048) print('first_bytes', first_bytes)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778821403 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778821403 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgyMTQwMw== simonw 9599 2021-02-14T18:38:16Z 2021-02-14T18:38:16Z OWNER

There are two code paths here that matter:

  • For a regular file, can read the first 2048 bytes, then .seek(0) before continuing. That's easy.
  • stdin is harder. I need to read and buffer the first 2048 bytes, then pass an object to csv.reader() which will replay that chunk and then play the rest of stdin.

I'm a bit stuck on the second one. Ideally I could use something like itertools.chain() but I can't find an alternative for file-like objects.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778818639 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778818639 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgxODYzOQ== simonw 9599 2021-02-14T18:22:38Z 2021-02-14T18:22:38Z OWNER

Maybe I shouldn't be using StreamReader at all - https://www.python.org/dev/peps/pep-0400/ suggests that it should be deprecated in favour of io.TextIOWrapper. I'm using StreamReader due to this line: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L667-L668

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778817494 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778817494 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgxNzQ5NA== simonw 9599 2021-02-14T18:16:06Z 2021-02-14T18:16:06Z OWNER

Types involved: (Pdb) type(json_file.raw) <class '_io.FileIO'> (Pdb) type(json_file) <class 'encodings.utf_8.StreamReader'>

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778816333 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778816333 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgxNjMzMw== simonw 9599 2021-02-14T18:08:44Z 2021-02-14T18:08:44Z OWNER

No, you can't .seek(0) on stdin: File "/Users/simon/Dropbox/Development/sqlite-utils/sqlite_utils/cli.py", line 678, in insert_upsert_implementation json_file.raw.seek(0) OSError: [Errno 29] Illegal seek

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778815740 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778815740 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgxNTc0MA== simonw 9599 2021-02-14T18:05:03Z 2021-02-14T18:05:03Z OWNER

The challenge here is how to read the first 2048 bytes and then reset the incoming file.

The Python docs example looks like this:

python with open('example.csv', newline='') as csvfile: dialect = csv.Sniffer().sniff(csvfile.read(1024)) csvfile.seek(0) reader = csv.reader(csvfile, dialect) Here's the relevant code in sqlite-utils: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/cli.py#L671-L679

The challenge is going to be having the --sniff option work with the progress bar. Here's how file_progress() works: https://github.com/simonw/sqlite-utils/blob/726219c3503e77440975cd15b74d006639feb0f8/sqlite_utils/utils.py#L106-L113

If file.raw is stdin can I do the equivalent of csvfile.seek(0) on it?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  
778812684 https://github.com/simonw/sqlite-utils/issues/230#issuecomment-778812684 https://api.github.com/repos/simonw/sqlite-utils/issues/230 MDEyOklzc3VlQ29tbWVudDc3ODgxMjY4NA== simonw 9599 2021-02-14T17:45:16Z 2021-02-14T17:45:16Z OWNER

Running this could take any CSV (or TSV) file and automatically detect the delimiter. If no header row is detected it could add unknown1,unknown2 headers:

sqlite-utils insert db.db data file.csv --sniff

(Using --sniff would imply --csv)

This could be called --sniffer instead but I like --sniff better.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
--sniff option for sniffing delimiters 808008305  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 64.993ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows