home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

8 rows where author_association = "OWNER", issue = 642572841 and user = 9599 sorted by updated_at descending

✖
✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw · 8 ✖

issue 1

  • Database page loads too slowly with many large tables (due to table counts) · 8 ✖

author_association 1

  • OWNER · 8 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
905900807 https://github.com/simonw/datasette/issues/859#issuecomment-905900807 https://api.github.com/repos/simonw/datasette/issues/859 IC_kwDOBm6k_c41_vMH simonw 9599 2021-08-25T21:51:10Z 2021-08-25T21:51:10Z OWNER

10-20 minutes to populate _internal! How many databases and tables is that for?

I may have to rethink the _internal mechanism entirely. One possible alternative would be for the Datasette homepage to just show a list of available databases (maybe only if there are more than X connected) and then load in their metadata only the first time they are accessed.

I need to get my own stress testing rig setup for this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
648234787 https://github.com/simonw/datasette/issues/859#issuecomment-648234787 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODIzNDc4Nw== simonw 9599 2020-06-23T15:22:51Z 2020-06-23T15:22:51Z OWNER

I wonder if this is a SQLite caching issue then?

Datasette has a configuration option for this but I haven't spent much time experimenting with it so I don't know how much of an impact it can have: https://datasette.readthedocs.io/en/stable/config.html#cache-size-kb

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
648163272 https://github.com/simonw/datasette/issues/859#issuecomment-648163272 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0ODE2MzI3Mg== simonw 9599 2020-06-23T13:52:23Z 2020-06-23T13:52:23Z OWNER

I'm chunking inserts at 100 at a time right now: https://github.com/simonw/sqlite-utils/blob/4d9a3204361d956440307a57bd18c829a15861db/sqlite_utils/db.py#L1030

I think the performance is more down to using Faker to create the test data - generating millions of entirely fake, randomized records takes a fair bit of time.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647894903 https://github.com/simonw/datasette/issues/859#issuecomment-647894903 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0Nzg5NDkwMw== simonw 9599 2020-06-23T04:07:59Z 2020-06-23T04:07:59Z OWNER

Just to check: are you seeing the problem on this page: https://latest.datasette.io/fixtures (the database page) - or this page (the table page): https://latest.datasette.io/fixtures/compound_three_primary_keys

If it's the table page then the problem may well be #862.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647890619 https://github.com/simonw/datasette/issues/859#issuecomment-647890619 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0Nzg5MDYxOQ== simonw 9599 2020-06-23T03:48:21Z 2020-06-23T03:48:21Z OWNER
sqlite-generate many-cols.db --tables 2 --rows 200000 --columns 50

Looks like that will take 35 minutes to run (it's not a particularly fast tool).

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647890378 https://github.com/simonw/datasette/issues/859#issuecomment-647890378 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0Nzg5MDM3OA== simonw 9599 2020-06-23T03:47:19Z 2020-06-23T03:47:19Z OWNER

I generated a 600MB database using sqlite-generate just now - with 100 tables at 100,00 rows and 3 tables at 1,000,000 rows - and performance of the database page was fine, 250ms.

Those tables only had 4 columns each though.

You said "200k+, 50+ rows in a couple of tables" - does that mean 50+ columns? I'll try with larger numbers of columns and see what difference that makes.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647189948 https://github.com/simonw/datasette/issues/859#issuecomment-647189948 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzE4OTk0OA== simonw 9599 2020-06-21T22:30:12Z 2020-06-21T22:30:43Z OWNER

I'll write a little script which generates a 300MB SQLite file with a bunch of tables with lots of randomly generated rows in to help test this.

Having a tool like that which can generate larger databases with different gnarly performance characteristics will be useful for other performance work too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  
647189666 https://github.com/simonw/datasette/issues/859#issuecomment-647189666 https://api.github.com/repos/simonw/datasette/issues/859 MDEyOklzc3VlQ29tbWVudDY0NzE4OTY2Ng== simonw 9599 2020-06-21T22:26:55Z 2020-06-21T22:26:55Z OWNER

This makes a lot of sense. I implemented the mechanism for the index page because I have my own instance of Datasette that was running slow, but it had a dozen database files attached to it. I've not run into this with a single giant database file but it absolutely makes sense that the same optimization would be necessary for the database page there too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Database page loads too slowly with many large tables (due to table counts) 642572841  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 463.777ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows