home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 752257666

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions issue performed_via_github_app
https://github.com/simonw/datasette/issues/1160#issuecomment-752257666 https://api.github.com/repos/simonw/datasette/issues/1160 752257666 MDEyOklzc3VlQ29tbWVudDc1MjI1NzY2Ng== 9599 2020-12-29T22:09:18Z 2020-12-29T22:09:18Z OWNER

Figuring out the API design

I want to be able to support different formats, and be able to parse them into tables either streaming or in one go depending on if the format supports that.

Ideally I want to be able to pull the first 1,024 bytes for the purpose of detecting the format, then replay those bytes again later. I'm considering this a stretch goal though.

CSV is easy to parse as a stream - here’s how sqlite-utils does it:

    dialect = "excel-tab" if tsv else "excel"
    with file_progress(json_file, silent=silent) as json_file:
        reader = csv_std.reader(json_file, dialect=dialect)
        headers = next(reader)
        docs = (dict(zip(headers, row)) for row in reader)

Problem: using db.insert_all() could block for a long time on a big set of rows. Probably easiest to batch the records before calling insert_all() and then run a batch at a time using a db.execute_write_fn() call.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
775666296  
Powered by Datasette · Queries took 1.517ms · About: github-to-sqlite