home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 642572841

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
642572841 MDU6SXNzdWU2NDI1NzI4NDE= 859 Database page loads too slowly with many large tables (due to table counts) 3243482 open 0     21 2020-06-21T14:23:17Z 2021-08-25T21:59:55Z   CONTRIBUTOR  

Hey, I have a database that I save in HTML from couple of web scrapers. There are around 200k+, 50+ rows in a couple of tables, with sqlite file weighing around 600MB.

The app runs on a VPS with 2 core CPU, 4GB RAM and refreshing database page regularly takes more than 10 seconds. I was suspecting that counting tables was the culprit, but manually running select count(*) from table_name for the largest table finishes under a second.

I've looked at the source code. There's a check for index page for mutable databases larger than 100MB https://github.com/simonw/datasette/blob/799c5d53570d773203527f19530cf772dc2eeb24/datasette/views/index.py#L15

but this check is not performed for database page. I've manually crippled Database::table_counts method py async def table_counts(self, limit=10): if not self.is_mutable and self.cached_table_counts is not None: return self.cached_table_counts # Try to get counts for each table, $limit timeout for each count counts = {} for table in await self.table_names(): try: # table_count = ( # await self.execute( # "select count(*) from [{}]".format(table), # custom_time_limit=limit, # ) # ).rows[0][0] counts[table] = 10 # table_count # In some cases I saw "SQL Logic Error" here in addition to # QueryInterrupted - so we catch that too: except (QueryInterrupted, sqlite3.OperationalError, sqlite3.DatabaseError): counts[table] = None if not self.is_mutable: self.cached_table_counts = counts return counts

now the page loads in <100ms.

Is it possible to apply size check on database page too?

/-/versions output
{
    "python": {
        "version": "3.8.0",
        "full": "3.8.0 (default, Oct 28 2019, 16:14:01) \n[GCC 8.3.0]"
    },
    "datasette": {
        "version": "0.44"
    },
    "asgi": "3.0",
    "uvicorn": "0.11.5",
    "sqlite": {
        "version": "3.22.0",
        "fts_versions": [
            "FTS5",
            "FTS4",
            "FTS3"
        ],
        "extensions": {
            "json1": null
        },
        "compile_options": [
            "COMPILER=gcc-7.4.0",
            "ENABLE_COLUMN_METADATA",
            "ENABLE_DBSTAT_VTAB",
            "ENABLE_FTS3",
            "ENABLE_FTS3_PARENTHESIS",
            "ENABLE_FTS3_TOKENIZER",
            "ENABLE_FTS4",
            "ENABLE_FTS5",
            "ENABLE_JSON1",
            "ENABLE_LOAD_EXTENSION",
            "ENABLE_PREUPDATE_HOOK",
            "ENABLE_RTREE",
            "ENABLE_SESSION",
            "ENABLE_STMTVTAB",
            "ENABLE_UNLOCK_NOTIFY",
            "ENABLE_UPDATE_DELETE_LIMIT",
            "HAVE_ISNAN",
            "LIKE_DOESNT_MATCH_BLOBS",
            "MAX_SCHEMA_RETRY=25",
            "MAX_VARIABLE_NUMBER=250000",
            "OMIT_LOOKASIDE",
            "SECURE_DELETE",
            "SOUNDEX",
            "TEMP_STORE=1",
            "THREADSAFE=1"
        ]
    }
}
107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/859/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   

Links from other tables

  • 2 rows from issues_id in issues_labels
  • 21 rows from issue in issue_comments
Powered by Datasette · Queries took 0.97ms · About: github-to-sqlite