home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

11 rows where issue = 1175854982 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 11

issue 1

  • Research: how much overhead does the n=1 time limit have? · 11 ✖

author_association 1

  • OWNER 11
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1074459746 https://github.com/simonw/datasette/issues/1679#issuecomment-1074459746 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACvRi simonw 9599 2022-03-21T21:55:45Z 2022-03-21T21:55:45Z OWNER

I'm going to change the original logic to set n=1 for times that are <= 20ms - and update the comments to make it more obvious what is happening.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074458506 https://github.com/simonw/datasette/issues/1679#issuecomment-1074458506 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACu-K simonw 9599 2022-03-21T21:53:47Z 2022-03-21T21:53:47Z OWNER

Oh interesting, it turns out there is ONE place in the code that sets the ms to less than 20 - this test fixture: https://github.com/simonw/datasette/blob/4e47a2d894b96854348343374c8e97c9d7055cf6/tests/fixtures.py#L224-L226

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074454687 https://github.com/simonw/datasette/issues/1679#issuecomment-1074454687 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACuCf simonw 9599 2022-03-21T21:48:02Z 2022-03-21T21:48:02Z OWNER

Here's another microbenchmark that measures how many nanoseconds it takes to run 1,000 vmops:

```python import sqlite3 import time

db = sqlite3.connect(":memory:")

i = 0 out = []

def count(): global i i += 1000 out.append(((i, time.perf_counter_ns())))

db.set_progress_handler(count, 1000)

print("Start:", time.perf_counter_ns()) all = db.execute(""" with recursive counter(x) as ( select 0 union select x + 1 from counter ) select * from counter limit 10000; """).fetchall() print("End:", time.perf_counter_ns())

print() print("So how long does it take to execute 1000 ops?")

prev_time_ns = None for i, time_ns in out: if prev_time_ns is not None: print(time_ns - prev_time_ns, "ns") prev_time_ns = time_ns Running it: % python nanobench.py Start: 330877620374821 End: 330877632515822

So how long does it take to execute 1000 ops? 47290 ns 49573 ns 48226 ns 45674 ns 53238 ns 47313 ns 52346 ns 48689 ns 47092 ns 87596 ns 69999 ns 52522 ns 52809 ns 53259 ns 52478 ns 53478 ns 65812 ns ``` 87596ns is 0.087596ms - so even a measure rate of every 1000 ops is easily finely grained enough to capture differences of less than 0.1ms.

If anything I could bump that default 1000 up - and I can definitely eliminate the if ms < 50 branch entirely.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074446576 https://github.com/simonw/datasette/issues/1679#issuecomment-1074446576 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACsDw simonw 9599 2022-03-21T21:38:27Z 2022-03-21T21:38:27Z OWNER

OK here's a microbenchmark script: ```python import sqlite3 import timeit

db = sqlite3.connect(":memory:") db_with_progress_handler_1 = sqlite3.connect(":memory:") db_with_progress_handler_1000 = sqlite3.connect(":memory:")

db_with_progress_handler_1.set_progress_handler(lambda: None, 1) db_with_progress_handler_1000.set_progress_handler(lambda: None, 1000)

def execute_query(db): cursor = db.execute(""" with recursive counter(x) as ( select 0 union select x + 1 from counter ) select * from counter limit 10000; """) list(cursor.fetchall())

print("Without progress_handler") print(timeit.timeit(lambda: execute_query(db), number=100))

print("progress_handler every 1000 ops") print(timeit.timeit(lambda: execute_query(db_with_progress_handler_1000), number=100))

print("progress_handler every 1 op") print(timeit.timeit(lambda: execute_query(db_with_progress_handler_1), number=100)) Results: % python3 bench.py Without progress_handler 0.8789225700311363 progress_handler every 1000 ops 0.8829826560104266 progress_handler every 1 op 2.8892734259716235 ```

So running every 1000 ops makes almost no difference at all, but running every single op is a 3.2x performance degradation.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074439309 https://github.com/simonw/datasette/issues/1679#issuecomment-1074439309 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACqSN simonw 9599 2022-03-21T21:28:58Z 2022-03-21T21:28:58Z OWNER

David Raymond solved it there: https://sqlite.org/forum/forumpost/330c8532d8a88bcd

Don't forget to step through the results. All .execute() has done is prepared it.

db.execute(query).fetchall()

Sure enough, adding that gets the VM steps number up to 190,007 which is close enough that I'm happy.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074347023 https://github.com/simonw/datasette/issues/1679#issuecomment-1074347023 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACTwP simonw 9599 2022-03-21T19:48:59Z 2022-03-21T19:48:59Z OWNER

Posed a question about that here: https://sqlite.org/forum/forumpost/de9ff10fa7

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074341924 https://github.com/simonw/datasette/issues/1679#issuecomment-1074341924 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACSgk simonw 9599 2022-03-21T19:42:08Z 2022-03-21T19:42:08Z OWNER

Here's the Python-C implementation of set_progress_handler: https://github.com/python/cpython/blob/4674fd4e938eb4a29ccd5b12c15455bd2a41c335/Modules/_sqlite/connection.c#L1177-L1201

It calls sqlite3_progress_handler(self->db, n, progress_callback, ctx);

https://www.sqlite.org/c3ref/progress_handler.html says:

The parameter N is the approximate number of virtual machine instructions that are evaluated between successive invocations of the callback X

So maybe VM-steps and virtual machine instructions are different things?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074337997 https://github.com/simonw/datasette/issues/1679#issuecomment-1074337997 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACRjN simonw 9599 2022-03-21T19:37:08Z 2022-03-21T19:37:08Z OWNER

This is weird: ```python import sqlite3

db = sqlite3.connect(":memory:")

i = 0

def count(): global i i += 1

db.set_progress_handler(count, 1)

db.execute(""" with recursive counter(x) as ( select 0 union select x + 1 from counter ) select * from counter limit 10000; """)

print(i) Outputs `24`. But if you try the same thing in the SQLite console: sqlite> .stats vmstep sqlite> with recursive counter(x) as ( ...> select 0 ...> union ...> select x + 1 from counter ...> ) ...> select * from counter limit 10000; ... VM-steps: 200007 ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074332718 https://github.com/simonw/datasette/issues/1679#issuecomment-1074332718 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACQQu simonw 9599 2022-03-21T19:31:10Z 2022-03-21T19:31:10Z OWNER

How long does it take for SQLite to execute 1000 opcodes anyway?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074332325 https://github.com/simonw/datasette/issues/1679#issuecomment-1074332325 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACQKl simonw 9599 2022-03-21T19:30:44Z 2022-03-21T19:30:44Z OWNER

So it looks like even for facet suggestion n=1000 always - it's never reduced to n=1.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  
1074331743 https://github.com/simonw/datasette/issues/1679#issuecomment-1074331743 https://api.github.com/repos/simonw/datasette/issues/1679 IC_kwDOBm6k_c5ACQBf simonw 9599 2022-03-21T19:30:05Z 2022-03-21T19:30:05Z OWNER

https://github.com/simonw/datasette/blob/1a7750eb29fd15dd2eea3b9f6e33028ce441b143/datasette/app.py#L118-L122 sets it to 50ms for facet suggestion but that's not going to pass ms < 50:

python Setting( "facet_suggest_time_limit_ms", 50, "Time limit for calculating a suggested facet", ),

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research: how much overhead does the n=1 time limit have? 1175854982  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 29.919ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows