home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

2 rows where issue = 1077628073 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 2

issue 1

  • Research option for returning all rows from arbitrary query · 2 ✖

author_association 1

  • OWNER 2
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
991805516 https://github.com/simonw/datasette/issues/1550#issuecomment-991805516 https://api.github.com/repos/simonw/datasette/issues/1550 IC_kwDOBm6k_c47HcBM simonw 9599 2021-12-11T23:43:24Z 2021-12-11T23:43:24Z OWNER

I built a tiny Starlette app to experiment with this a bit: ```python import asyncio import janus from starlette.applications import Starlette from starlette.responses import JSONResponse, HTMLResponse, StreamingResponse from starlette.routing import Route import sqlite3 from concurrent import futures

executor = futures.ThreadPoolExecutor(max_workers=10)

async def homepage(request): return HTMLResponse( """ <html> <head><title>SQL CSV Server</title> <style>body { width: 40rem; font-family: helvetica; margin: 2em auto; }</style> <body>

SQL CSV Server

<form action="/csv"> <label style="display: block">SQL query: <textarea style="width: 90%; height: 20em" name="sql"></textarea> </label> </form> </head> """ )

def run_query_in_thread(sql, sync_q): db = sqlite3.connect("../datasette/covid.db") cursor = db.cursor() cursor.arraysize = 100 # Default is 1 apparently? cursor.execute(sql) columns = [d[0] for d in cursor.description] sync_q.put([columns]) # Now start putting batches of rows while True: rows = cursor.fetchmany() if rows: sync_q.put(rows) else: break # Let queue know we are finished\ sync_q.put(None)

async def csv_query(request): sql = request.query_params["sql"]

queue = janus.Queue()
loop = asyncio.get_running_loop()

async def csv_generator():
    loop.run_in_executor(None, run_query_in_thread, sql, queue.sync_q)
    while True:
        rows = await queue.async_q.get()
        if rows is not None:
            for row in rows:
                yield ",".join(map(str, row)) + "\n "
            queue.async_q.task_done()
        else:
            # Cleanup
            queue.close()
            await queue.wait_closed()
            break

return StreamingResponse(csv_generator(), media_type='text/plain')

app = Starlette( debug=True, routes=[ Route("/", homepage), Route("/csv", csv_query), ], ) But.. if I run this in a terminal window: /tmp % wget 'http://127.0.0.1:8000/csv?sql=select+*+from+ny_times_us_counties' ``` it takes about 20 seconds to run and returns a 50MB file - but while it is running no other requests can be served by that server - not even the homepage! So something is blocking the event loop.

Maybe I should be using fut = loop.run_in_executor(None, run_query_in_thread, sql, queue.sync_q) and then awaiting fut somewhere, like in the Janus documentation? Don't think that's needed though. Needs more work to figure out why this is blocking.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research option for returning all rows from arbitrary query 1077628073  
991761635 https://github.com/simonw/datasette/issues/1550#issuecomment-991761635 https://api.github.com/repos/simonw/datasette/issues/1550 IC_kwDOBm6k_c47HRTj simonw 9599 2021-12-11T19:39:01Z 2021-12-11T19:39:01Z OWNER

I wonder if this could work for public instances too with some kind of queuing mechanism?

I really need to use benchmarking to figure out what the right number of maximum SQLite connections is. I'm just guessing at the moment.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Research option for returning all rows from arbitrary query 1077628073  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 36.705ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows