home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where state = "closed" and user = 96218 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: title, closed_at, created_at (date), updated_at (date), closed_at (date)

type 2

  • pull 3
  • issue 2

state 1

  • closed · 5 ✖

repo 1

  • sqlite-utils 5
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
697030843 MDExOlB1bGxSZXF1ZXN0NDgzMDI3NTg3 156 Typos in tests simonwiles 96218 closed 0     1 2020-09-09T18:00:58Z 2020-09-09T18:24:50Z 2020-09-09T18:21:23Z CONTRIBUTOR simonw/sqlite-utils/pulls/156

One of these is my fault, and the other is one I just happened to come across. They're harmless, but might as well be fixed.

sqlite-utils 140912432 pull    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/156/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  
688659182 MDU6SXNzdWU2ODg2NTkxODI= 145 Bug when first record contains fewer columns than subsequent records simonwiles 96218 closed 0     2 2020-08-30T05:44:44Z 2020-09-08T23:21:23Z 2020-09-08T23:21:23Z CONTRIBUTOR  

insert_all() selects the maximum batch size based on the number of fields in the first record. If the first record has fewer fields than subsequent records (and alter=True is passed), this can result in SQL statements with more than the maximum permitted number of host parameters. This situation is perhaps unlikely to occur, but could happen if the first record had, say, 10 columns, such that batch_size (based on SQLITE_MAX_VARIABLE_NUMBER = 999) would be 99. If the next 98 rows had 11 columns, the resulting SQL statement for the first batch would have 10 * 1 + 11 * 98 = 1088 host parameters (and subsequent batches, if the data were consistent from thereon out, would have 99 * 11 = 1089).

I suspect that this bug is masked somewhat by the fact that while:

SQLITE_MAX_VARIABLE_NUMBER ... defaults to 999 for SQLite versions prior to 3.32.0 (2020-05-22) or 32766 for SQLite versions after 3.32.0.

it is common that it is increased at compile time. Debian-based systems, for example, seem to ship with a version of sqlite compiled with SQLITE_MAX_VARIABLE_NUMBER set to 250,000, and I believe this is the case for homebrew installations too.

A test for this issue might look like this: python def test_columns_not_in_first_record_should_not_cause_batch_to_be_too_large(fresh_db): # sqlite on homebrew and Debian/Ubuntu etc. is typically compiled with # SQLITE_MAX_VARIABLE_NUMBER set to 250,000, so we need to exceed this value to # trigger the error on these systems. THRESHOLD = 250000 extra_columns = 1 + (THRESHOLD - 1) // 99 records = [ {"c0": "first record"}, # one column in first record -> batch_size = 100 # fill out the batch with 99 records with enough columns to exceed THRESHOLD *[ dict([("c{}".format(i), j) for i in range(extra_columns)]) for j in range(99) ] ] try: fresh_db["too_many_columns"].insert_all(records, alter=True) except sqlite3.OperationalError: raise

The best solution, I think, is simply to process all the records when determining columns, column types, and the batch size. In my tests this doesn't seem to be particularly costly at all, and cuts out a lot of complications (including obviating my implementation of #139 at #142). I'll raise a PR for your consideration.

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/145/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed
688668680 MDExOlB1bGxSZXF1ZXN0NDc1ODc0NDkz 146 Handle case where subsequent records (after first batch) include extra columns simonwiles 96218 closed 0     5 2020-08-30T07:13:58Z 2020-09-08T23:20:37Z 2020-09-08T23:20:37Z CONTRIBUTOR simonw/sqlite-utils/pulls/146

Addresses #145.

I think this should do the job. If it meets with your approval I'll update this PR to include an update to the documentation -- I came across this bug while preparing a PR to update the documentation around batch_size in any event.

sqlite-utils 140912432 pull    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/146/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  
688386219 MDExOlB1bGxSZXF1ZXN0NDc1NjY1OTg0 142 insert_all(..., alter=True) should work for new columns introduced after the first 100 records simonwiles 96218 closed 0     3 2020-08-28T22:22:57Z 2020-08-30T07:28:23Z 2020-08-28T22:30:14Z CONTRIBUTOR simonw/sqlite-utils/pulls/142

Closes #139.

sqlite-utils 140912432 pull    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/142/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  
686978131 MDU6SXNzdWU2ODY5NzgxMzE= 139 insert_all(..., alter=True) should work for new columns introduced after the first 100 records simonwiles 96218 closed 0     7 2020-08-27T06:25:25Z 2020-08-28T22:48:51Z 2020-08-28T22:30:14Z CONTRIBUTOR  

Is there a way to make .insert_all() work properly when new columns are introduced outside the first 100 records (with or without the alter=True argument)?

I'm using .insert_all() to bulk insert ~3-4k records at a time and it is common for records to need to introduce new columns. However, if new columns are introduced after the first 100 records, sqlite_utils doesn't even raise the OperationalError: table ... has no column named ... exception; it just silently drops the extra data and moves on.

It took me a while to find this little snippet in the documentation for .insert_all() (it's not mentioned under Adding columns automatically on insert/update):

The column types used in the CREATE TABLE statement are automatically derived from the types of data in that first batch of rows. Any additional or missing columns in subsequent batches will be ignored.

I tried changing the batch_size argument to the total number of records, but it seems only to effect the number of rows that are committed at a time, and has no influence on this problem.

Is there a way around this that you would suggest? It seems like it should raise an exception at least.

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/139/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Queries took 36.04ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows