home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

7 rows where issue = 1383646615 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • simonw 3
  • eyeseast 2
  • chapmanjacobd 2

author_association 2

  • CONTRIBUTOR 4
  • OWNER 3

issue 1

  • Ability to merge databases and tables · 7 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1264218914 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1264218914 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LWnMi chapmanjacobd 7908073 2022-10-01T03:18:36Z 2023-06-14T22:14:24Z CONTRIBUTOR

some good concrete use-cases in mind

I actually found myself wanting something like this the past couple days. The use-case was databases with slightly different schema but same table names.

here is a full script:

``` import argparse from pathlib import Path

from sqlite_utils import Database

def connect(args, conn=None, kwargs) -> Database: db = Database(conn or args.database, kwargs) with db.conn: db.conn.execute("PRAGMA main.cache_size = 8000") return db

def parse_args() -> argparse.Namespace: parser = argparse.ArgumentParser() parser.add_argument("database") parser.add_argument("dbs_folder") parser.add_argument("--db", "-db", help=argparse.SUPPRESS) parser.add_argument("--verbose", "-v", action="count", default=0) args = parser.parse_args()

if args.db:
    args.database = args.db
Path(args.database).touch()
args.db = connect(args)

return args

def merge_db(args, source_db): source_db = str(Path(source_db).resolve())

s_db = connect(argparse.Namespace(database=source_db, verbose = args.verbose))
for table in s_db.table_names():
    data = s_db[table].rows
    args.db[table].insert_all(data, alter=True, replace=True)

args.db.conn.commit()

def merge_directory(): args = parse_args() source_dbs = list(Path(args.dbs_folder).glob('*.db')) for s_db in source_dbs: merge_db(args, s_db)

if name == 'main': merge_directory() ```

edit: I've made some improvements to this and put it on PyPI:

``` $ pip install xklb $ lb merge-db -h usage: library merge-dbs DEST_DB SOURCE_DB ... [--only-target-columns] [--only-new-rows] [--upsert] [--pk PK ...] [--table TABLE ...]

Merge-DBs will insert new rows from source dbs to target db, table by table. If primary key(s) are provided,
and there is an existing row with the same PK, the default action is to delete the existing row and insert the new row
replacing all existing fields.

Upsert mode will update matching PK rows such that if a source row has a NULL field and
the destination row has a value then the value will be preserved instead of changed to the source row's NULL value.

Ignore mode (--only-new-rows) will insert only rows which don't already exist in the destination db

Test first by using temp databases as the destination db.
Try out different modes / flags until you are satisfied with the behavior of the program

    library merge-dbs --pk path (mktemp --suffix .db) tv.db movies.db

Merge database data and tables

    library merge-dbs --upsert --pk path video.db tv.db movies.db
    library merge-dbs --only-target-columns --only-new-rows --table media,playlists --pk path audio-fts.db audio.db

    library merge-dbs --pk id --only-tables subreddits reddit/81_New_Music.db audio.db
    library merge-dbs --only-new-rows --pk subreddit,path --only-tables reddit_posts reddit/81_New_Music.db audio.db -v

positional arguments: database source_dbs ```

Also if you want to dedupe a table based on a "business key" which isn't explicitly your primary key(s) you can run this:

``` $ lb dedupe-db -h usage: library dedupe-dbs DATABASE TABLE --bk BUSINESS_KEYS [--pk PRIMARY_KEYS] [--only-columns COLUMNS]

Dedupe your database (not to be confused with the dedupe subcommand)

It should not need to be said but *backup* your database before trying this tool!

Dedupe-DB will help remove duplicate rows based on non-primary-key business keys

    library dedupe-db ./video.db media --bk path

If --primary-keys is not provided table metadata primary keys will be used
If --only-columns is not provided all non-primary and non-business key columns will be upserted

positional arguments: database table

options: -h, --help show this help message and exit --skip-0 --only-columns ONLY_COLUMNS Comma separated column names to upsert --primary-keys PRIMARY_KEYS, --pk PRIMARY_KEYS Comma separated primary keys --business-keys BUSINESS_KEYS, --bk BUSINESS_KEYS Comma separated business keys ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1258712931 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258712931 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LBm9j eyeseast 25778 2022-09-26T22:31:58Z 2022-09-26T22:31:58Z CONTRIBUTOR

Right. The backup command will copy tables completely, but in the case of conflicting table names, the destination gets overwritten silently. That might not be what you want here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1258697384 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258697384 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LBjKo simonw 9599 2022-09-26T22:12:45Z 2022-09-26T22:12:45Z OWNER

That feels like a slightly different command to me - maybe sqlite-utils backup data.db data-backup.db? It doesn't have any of the mechanics for merging tables together. Could be a useful feature separately though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1258508215 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258508215 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LA0-3 eyeseast 25778 2022-09-26T19:22:14Z 2022-09-26T19:22:14Z CONTRIBUTOR

This might be fairly straightforward using SQLite's backup utility: https://docs.python.org/3/library/sqlite3.html#sqlite3.Connection.backup

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1258450447 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258450447 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LAm4P simonw 9599 2022-09-26T18:36:23Z 2022-09-26T18:36:23Z OWNER

This is also the kind of feature that would need to express itself in both the Python library and the CLI utility.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1258449887 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1258449887 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5LAmvf simonw 9599 2022-09-26T18:35:50Z 2022-09-26T18:35:50Z OWNER

This is a really interesting idea.

I'm nervous about needing to set the rules for how duplicate tables should be merged though. This feels like a complex topic - one where there isn't necessarily an obviously "correct" way of doing it, but where different problems that people are solving might need different merging approaches.

Likewise, merging isn't just a database-to-database thing at that point - I could see a need for merging two tables using similar rules to those used for merging two databases.

So I think I'd want to have some good concrete use-cases in mind before trying to design how something like this should work. Will leave this thread open for people to drop those in!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  
1256858763 https://github.com/simonw/sqlite-utils/issues/491#issuecomment-1256858763 https://api.github.com/repos/simonw/sqlite-utils/issues/491 IC_kwDOCGYnMM5K6iSL chapmanjacobd 7908073 2022-09-24T04:50:59Z 2022-09-24T04:52:08Z CONTRIBUTOR

Instead of outputting binary data to stdout the interface might be better like this

sqlite-utils merge animals.db cats.db dogs.db

similar to zip, ogr2ogr, etc

Actually I think this might already be possible within ogr2ogr. I don't believe spatial data is a requirement though it might add an ogc_id column or something

cp cats.db animals.db ogr2ogr -append animals.db dogs.db ogr2ogr -append animals.db another.db

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Ability to merge databases and tables 1383646615  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.039ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows