home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

10 rows where author_association = "CONTRIBUTOR" and issue = 1400374908 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • fgregg 10

issue 1

  • docker image is duplicating db files somehow · 10 ✖

author_association 1

  • CONTRIBUTOR · 10 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1272357976 https://github.com/simonw/datasette/issues/1836#issuecomment-1272357976 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5L1qRY fgregg 536941 2022-10-08T16:56:51Z 2022-10-08T16:56:51Z CONTRIBUTOR

when you are running from docker, you always will want to run as mode=ro because the same thing that is causing duplication in the inspect layer will cause duplication in the final container read/write layer when datasette serve runs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271103097 https://github.com/simonw/datasette/issues/1836#issuecomment-1271103097 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lw355 fgregg 536941 2022-10-07T04:43:41Z 2022-10-07T04:43:41Z CONTRIBUTOR

@simonw, should i open up a new issue for investigating the differences between "immutable=1" and "mode=ro" and possibly switching to "mode=ro". Or would you like to keep that conversation in this issue?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271100651 https://github.com/simonw/datasette/issues/1836#issuecomment-1271100651 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lw3Tr fgregg 536941 2022-10-07T04:38:14Z 2022-10-07T04:38:14Z CONTRIBUTOR

yes, and i also think that this is causing the apparent memory problems in #1480. when the container starts up, it will make some operation on the database in immutable mode which apparently makes some small change to the db file. if that's so, then the db files will be copied to the read/write layer which counts against cloudrun's memory allocation!

running a test of that now.

this completely addressed #1480

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271020193 https://github.com/simonw/datasette/issues/1836#issuecomment-1271020193 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwjqh fgregg 536941 2022-10-07T02:15:05Z 2022-10-07T02:21:08Z CONTRIBUTOR

when i hack the connect method to open non mutable files with "mode=ro" and not "immutable=1" https://github.com/simonw/datasette/blob/eff112498ecc499323c26612d707908831446d25/datasette/database.py#L79

then:

bash 870 B RUN /bin/sh -c datasette inspect nlrb.db --inspect-file inspect-data.json

the datasette inspect layer is only the size of the json file!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271008997 https://github.com/simonw/datasette/issues/1836#issuecomment-1271008997 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwg7l fgregg 536941 2022-10-07T02:00:37Z 2022-10-07T02:00:49Z CONTRIBUTOR

yes, and i also think that this is causing the apparent memory problems in #1480. when the container starts up, it will make some operation on the database in immutable mode which apparently makes some small change to the db file. if that's so, then the db files will be copied to the read/write layer which counts against cloudrun's memory allocation!

running a test of that now.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1271003212 https://github.com/simonw/datasette/issues/1836#issuecomment-1271003212 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5LwfhM fgregg 536941 2022-10-07T01:52:04Z 2022-10-07T01:52:04Z CONTRIBUTOR

and if we try immutable mode, which is how things are opened by datasette inspect we duplicate the files!!!

```python

test_sql_immutable.py

import sqlite3 import sys

db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}?immutable=1', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270992795 https://github.com/simonw/datasette/issues/1836#issuecomment-1270992795 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwc-b fgregg 536941 2022-10-07T01:29:15Z 2022-10-07T01:50:14Z CONTRIBUTOR

fascinatingly, telling python to open sqlite in read only mode makes this layer have a size of 0

```python

test_sql_ro.py

import sqlite3 import sys

db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}?mode=ro', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```

that's quite weird because setting the file permissions to read only didn't do anything. (on reflection, that chmod isn't doing anything because the dockerfile commands are run as root)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270988081 https://github.com/simonw/datasette/issues/1836#issuecomment-1270988081 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5Lwb0x fgregg 536941 2022-10-07T01:19:01Z 2022-10-07T01:27:35Z CONTRIBUTOR

okay, some progress!! running some sql against a database file causes that file to get duplicated even if it doesn't apparently change the file.

make a little test script like this:

```python

test_sql.py

import sqlite3 import sys

db_name = sys.argv[1] conn = sqlite3.connect(f'file:/app/{db_name}', uri=True) cur = conn.cursor() cur.execute('select count(*) from filing') print(cur.fetchone()) ```

then

docker RUN python test_sql.py nlrb.db

produced a layer that's the same size as nlrb.db!!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270936982 https://github.com/simonw/datasette/issues/1836#issuecomment-1270936982 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5LwPWW fgregg 536941 2022-10-07T00:52:41Z 2022-10-07T00:52:41Z CONTRIBUTOR

it's not that the inspect command is somehow changing the db files. if i set them to only read-only, the "inspect" layer still has the same very large size.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  
1270923537 https://github.com/simonw/datasette/issues/1836#issuecomment-1270923537 https://api.github.com/repos/simonw/datasette/issues/1836 IC_kwDOBm6k_c5LwMER fgregg 536941 2022-10-07T00:46:08Z 2022-10-07T00:46:08Z CONTRIBUTOR

i thought it was maybe to do with reading through all the files, but that does not seem to be the case

if i make a little test file like:

```python

test_read.py

import hashlib import sys import pathlib

HASH_BLOCK_SIZE = 1024 * 1024

def inspect_hash(path): """Calculate the hash of a database, efficiently.""" m = hashlib.sha256() with path.open("rb") as fp: while True: data = fp.read(HASH_BLOCK_SIZE) if not data: break m.update(data)

return m.hexdigest()

inspect_hash(pathlib.Path(sys.argv[1])) ```

then a line in the Dockerfile like

docker RUN python test_read.py nlrb.db && echo "[]" > /etc/inspect.json

just produes a layer of 3B

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
docker image is duplicating db files somehow 1400374908  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 23.324ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows