home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

2 rows where type = "issue" and user = 167893 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date), closed_at (date)

type 1

  • issue · 2 ✖

state 1

  • closed 2

repo 1

  • sqlite-utils 2
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
1578790070 I_kwDOCGYnMM5eGmy2 527 `Table.convert()` skips falsey values mcarpenter 167893 closed 0     5 2023-02-10T00:00:52Z 2023-05-09T21:15:05Z 2023-05-08T21:03:24Z CONTRIBUTOR  

Summary

By design, Table.convert() does not attempt conversion of falsey values (None, "", 0, ...). This is surprising (directly contradicts the docstring) and convert() may quietly skip cells where the user assumed a conversion would take place.

Example

Increment a column of integers by one

``` python from sqlite_utils import Database

db = Database(memory=True) table = db['table'] col = 'x' table.insert_all([{col: 0}, {col:1}]) print(table.get(1)) # 0 print(table.get(2)) # 1 print()

table.convert(col, lambda x: x+1) print(table.get(1)) # got 0, expected 1 ⚠⚠⚠ print(table.get(2)) # got 2, expected 2 ```

Another example might be, say, transforming cells containing empty string to NULL.

Discussion

This was, I think, a pragmatic choice so that consumers can skip writing guard clauses for these falsey values (particularly from the CLI). But this surprising undocumented behavior can lead to incorrect data. I don't think this is a good trade-off between convenience and correctness.

In the absence of this convenience users will either have to write guard clauses into their conversion expressions (or adapt the called function to do the same), so: python fn(value) if value else value instead of: python fn(value) This is more typing and sometimes I will forget, and there will be errors. (But they will be noisy errors, which is a good thing).

Such a change will certainly inconvenience some existing consumers; there will be some breakage. But I think this is worth it to avoid quietly not converting some values by default, which can lead to quietly bad data.

I have a PR that I will attach, please take a look and see what you think.

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/527/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed
1575131737 I_kwDOCGYnMM5d4ppZ 525 Repeated calls to `Table.convert()` fail mcarpenter 167893 closed 0     4 2023-02-07T22:40:47Z 2023-05-08T21:59:41Z 2023-05-08T21:54:02Z CONTRIBUTOR  

Summary

When using the API, repeated calls to Table.convert() do not work correctly since all conversions quietly use the callable (function, lambda) from the first call to convert() only. Subsequent invocations with different callables use the callable from the first invocation only.

Example

```python from sqlite_utils import Database

db = Database(memory=True) table = db['table'] col = 'x' table.insert_all([{col: 1}]) print(table.get(1))

table.convert(col, lambda x: x*2) print(table.get(1))

def zeroize(x): return 0

zeroize = lambda x: 0

zeroize.name = 'zeroize'

table.convert(col, zeroize) print(table.get(1)) ```

Output: {'x': 1} {'x': 2} {'x': 4} Expected: {'x': 1} {'x': 2} {'x': 0}

Explanation

This is some relevant documentation.

  • Table.convert() takes a Callable to perform data conversion on a column
  • The Callable is passed to Database.register_function()
  • Database.register_function() uses the callable's __name__ attribute for registration
  • (Aside: all lambdas have a __name__ of <lambda>: I thought this was the problem, and it was close, but not quite)
  • However convert() first wraps the callable by local function convert_value()
  • Consequently register_function() sees name convert_value for all invocations from convert()
  • register_function() silently ignores registrations using the same name, retaining only the first such registration

There's a mismatch between the comments and the code: https://github.com/simonw/sqlite-utils/blob/fc221f9b62ed8624b1d2098e564f525c84497969/sqlite_utils/db.py#L404

but actually the existing function is returned/used instead (as the "registering custom sql functions" doc I linked above says too). Seems like this can be rectified to match the comment?

Suggested fix

I think there are four things: 1. The call to register_function() from convert()should have an explicit name= parameter (to continue using convert_value() and the progress bar). 2. For functions, this name can be the real function name. (I understand the sqlite api needs a name, and it's nice if those are recognizable names where possible). For lambdas would 'lambda-{uuid}' or similar be acceptable? 3. register_function() really should throw an error on repeated attempts to register a duplicate (function, arity)-pair. 4. A test? I haven't looked at the test framework here but seems this should be testable.

See also

  • 458

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/525/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Queries took 33.513ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows