home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 688508510

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions issue performed_via_github_app
https://github.com/simonw/sqlite-utils/pull/146#issuecomment-688508510 https://api.github.com/repos/simonw/sqlite-utils/issues/146 688508510 MDEyOklzc3VlQ29tbWVudDY4ODUwODUxMA== 9599 2020-09-07T20:56:03Z 2020-09-07T20:56:24Z OWNER

The problem with this approach is that it requires us to consume the entire iterator before we can start inserting rows into the table - here on line 1052:

https://github.com/simonw/sqlite-utils/blob/bb131793feac16bc7181ab997568f941b0220ef2/sqlite_utils/db.py#L1047-L1054

I designed the .insert_all() to avoid doing this, because I want to be able to pass it an iterator (or more likely a generator) that could produce potentially millions of records. Doing things one batch of 100 records at a time means that the Python process doesn't need to pull millions of records into memory at once.

db-to-sqlite is one example of a tool that uses that characteristic, in https://github.com/simonw/db-to-sqlite/blob/63e4ee972f292de13bb11767c0fb64b35339d954/db_to_sqlite/cli.py#L94-L106

So we need to solve this issue without consuming the entire iterator with a records = list(records) call.

I think one way to do this is to execute each chunk one at a time and watch out for an exception that indicates that we sent too many parameters - then adjust the chunk size down and try again.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
688668680  
Powered by Datasette · Queries took 0.903ms · About: github-to-sqlite