home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 1199158210

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
1199158210 I_kwDOCGYnMM5HebPC 423 .extract() doesn't set foreign key when extracted columns contain NULL value 37447552 closed 0     1 2022-04-10T20:05:30Z 2022-08-27T14:45:04Z 2022-08-27T14:45:04Z NONE  

I've run into an issue with extract and I don't believe this is the intended behaviour.

I'm working with a database with music listening information. Currently it has one large table listens that contains all information. I'm trying to normalize the database by extracting relevant columns to separate tables (artists, tracks, albums). Not every track has an album.

A simplified demonstration with just track_title and album_title columns: ```ipython In [1]: import sqlite_utils

In [2]: db = sqlite_utils.Database(memory=True)

In [3]: db["listens"].insert_all([ ...: {"id": 1, "track_title": "foo", "album_title": "bar"}, ...: {"id": 2, "track_title": "baz", "album_title": None} ...: ], pk="id") Out[3]: <Table listens (id, track_title, album_title)> ```

The track in the first row has an album, the second track doesn't. Now I extract album information into a separate column: ```ipython In [4]: db["listens"].extract(columns=["album_title"], table="albums", fk_column="album_id") Out[4]: <Table listens (id, track_title, album_id)>

In [5]: list(db["albums"].rows) Out[5]: [{'id': 1, 'album_title': 'bar'}, {'id': 2, 'album_title': None}]

In [6]: list(db["listens"].rows) Out[6]: [{'id': 1, 'track_title': 'foo', 'album_id': 1}, {'id': 2, 'track_title': 'baz', 'album_id': None}] ```

This behaves as expected -- the album table contains entries for both the existing album and the NULL album. The listens table has a foreign key only for the first row (since the album in the second row was empty).

Now I want to extract the track information as well. Album information belongs to the track so I want to extract both columns to a new table. ```ipython In [7]: db["listens"].extract(columns=["track_title", "album_id"], table="tracks", fk_column="track_id") Out[7]: <Table listens (id, track_id)>

In [8]: list(db["tracks"].rows) Out[8]: [{'id': 1, 'track_title': 'foo', 'album_id': 1}, {'id': 2, 'track_title': 'baz', 'album_id': None}]

In [9]: list(db["listens"].rows) Out[9]: [{'id': 1, 'track_id': 1}, {'id': 2, 'track_id': None}] ```

Extracting to the tracks table worked fine (both tracks are present with correct columns). However, the listens table only has a foreign key to the newly created tracks for the first row, the foreign key in the second row is NULL.

Changing the order of extracts doesn't help.

I poked around in the source a bit and I believe this line (essentially comparing NULL = NULL) is the problem, but I don't know enough about SQL to create a reliable fix myself.

140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/423/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed

Links from other tables

  • 1 row from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 1.267ms · About: github-to-sqlite