github: issue_comments: 13 rows where issue = 749283032 sorted by updated

13 rows where issue = 749283032 sorted by updated_at descending

Search:

descending

id	html_url	issue_url	node_id	user	created_at	updated_at ▲	author_association	body	reactions	issue
1399341761	https://github.com/simonw/datasette/issues/1101#issuecomment-1399341761	https://api.github.com/repos/simonw/datasette/issues/1101	IC_kwDOBm6k_c5TaELB	simonw 9599	2023-01-21T22:07:19Z	2023-01-21T22:07:19Z	OWNER	Idea for supporting streaming with the `register_output_renderer` hook: `python @hookimpl def register_output_renderer(datasette): return { "extension": "test", "render": render_demo, "can_render": can_render_demo, "render_stream": render_demo_stream, # This is new }` So there's a new `"render_stream"` key which can be returned, which if present means that the output renderer supports streaming. I'll play around with the design of that function signature in: 1999 1062	{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
1105642187	https://github.com/simonw/datasette/issues/1101#issuecomment-1105642187	https://api.github.com/repos/simonw/datasette/issues/1101	IC_kwDOBm6k_c5B5sLL	eyeseast 25778	2022-04-21T18:59:08Z	2022-04-21T18:59:08Z	CONTRIBUTOR	Ha! That was your idea (and a good one). But it's probably worth measuring to see what overhead it adds. It did require both passing in the database and making the whole thing `async`. Just timing the queries themselves: Using `AsGeoJSON(geometry) as geometry` takes 10.235 ms Leaving as binary takes 8.63 ms Looking at the network panel: Takes about 200 ms for the `fetch` request Takes about 300 ms I'm not sure how best to time the GeoJSON generation, but it would be interesting to check. Maybe I'll write a plugin to add query times to response headers. The other thing to consider with async streaming is that it might be well-suited for a slower response. When I have to get the whole result and send a response in a fixed amount of time, I need the most efficient query possible. If I can hang onto a connection and get things one chunk at a time, maybe it's ok if there's some overhead.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
1105615625	https://github.com/simonw/datasette/issues/1101#issuecomment-1105615625	https://api.github.com/repos/simonw/datasette/issues/1101	IC_kwDOBm6k_c5B5lsJ	simonw 9599	2022-04-21T18:31:41Z	2022-04-21T18:32:22Z	OWNER	The `datasette-geojson` plugin is actually an interesting case here, because of the way it converts SpatiaLite geometries into GeoJSON: https://github.com/eyeseast/datasette-geojson/blob/602c4477dc7ddadb1c0a156cbcd2ef6688a5921d/datasette_geojson/init.py#L61-L66 ```python `if isinstance(geometry, bytes): results = await db.execute( "SELECT AsGeoJSON(:geometry)", {"geometry": geometry} ) return geojson.loads(results.single_value())` `` That actually seems to work really well as-is, but it does worry me a bit that it ends up having to execute an extraSELECT` query for every single returned row - especially in streaming mode where it might be asked to return 1m rows at once. My PostgreSQL/MySQL engineering brain says that this would be better handled by doing a chunk of these (maybe 100) at once, to avoid the per-query-overhead - but with SQLite that might not be necessary. At any rate, this is one of the reasons I'm interested in "iterate over this sequence of chunks of 100 rows at a time" as a potential option here. Of course, a better solution would be for `datasette-geojson` to have a way to influence the SQL query before it is executed, adding a `AsGeoJSON(geometry)` clause to it - so that's something I'm open to as well.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
1105608964	https://github.com/simonw/datasette/issues/1101#issuecomment-1105608964	https://api.github.com/repos/simonw/datasette/issues/1101	IC_kwDOBm6k_c5B5kEE	simonw 9599	2022-04-21T18:26:29Z	2022-04-21T18:26:29Z	OWNER	I'm questioning if the mechanisms should be separate at all now - a single response rendering is really just a case of a streaming response that only pulls the first N records from the iterator. It probably needs to be an `async for` iterator, which I've not worked with much before. Good opportunity to learn. This actually gets a fair bit more complicated due to the work I'm doing right now to improve the default JSON API: 1709 I want to do things like make faceting results optionally available to custom renderers - which is a separate concern from streaming rows. I'm going to poke around with a bunch of prototypes and see what sticks.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
1105588651	https://github.com/simonw/datasette/issues/1101#issuecomment-1105588651	https://api.github.com/repos/simonw/datasette/issues/1101	IC_kwDOBm6k_c5B5fGr	eyeseast 25778	2022-04-21T18:15:39Z	2022-04-21T18:15:39Z	CONTRIBUTOR	What if you split rendering and streaming into two things: `render` is a function that returns a response `stream` is a function that sends chunks, or yields chunks passed to an ASGI `send` callback That way current plugins still work, and streaming is purely additive. A `stream` function could get a cursor or iterator of rows, instead of a list, so it could more efficiently handle large queries.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
1105571003	https://github.com/simonw/datasette/issues/1101#issuecomment-1105571003	https://api.github.com/repos/simonw/datasette/issues/1101	IC_kwDOBm6k_c5B5ay7	simonw 9599	2022-04-21T18:10:38Z	2022-04-21T18:10:46Z	OWNER	Maybe the simplest design for this is to add an optional `can_stream` to the contract: `python @hookimpl def register_output_renderer(datasette): return { "extension": "tsv", "render": render_tsv, "can_render": lambda: True, "can_stream": lambda: True }` When streaming, a new parameter could be passed to the render function - maybe `chunks` - which is an iterator/generator over a sequence of chunks of rows. Or it could use the existing `rows` parameter but treat that as an iterator?	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
869812567	https://github.com/simonw/datasette/issues/1101#issuecomment-869812567	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDg2OTgxMjU2Nw==	simonw 9599	2021-06-28T16:06:57Z	2021-06-28T16:07:24Z	OWNER	Relevant blog post: https://simonwillison.net/2021/Jun/25/streaming-large-api-responses/ - including notes on efficiently streaming formats with some kind of separator in between the records (regular JSON). Some export formats are friendlier for streaming than others. CSV and TSV are pretty easy to stream, as is newline-delimited JSON. Regular JSON requires a bit more thought: you can output a `[` character, then output each row in a stream with a comma suffix, then skip the comma for the last row and output a `]`. Doing that requires peeking ahead (looping two at a time) to verify that you haven't yet reached the end. Or... Martin De Wulf pointed out that you can output the first row, then output every other row with a preceeding comma---which avoids the whole "iterate two at a time" problem entirely.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
869191854	https://github.com/simonw/datasette/issues/1101#issuecomment-869191854	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDg2OTE5MTg1NA==	eyeseast 25778	2021-06-27T16:42:14Z	2021-06-27T16:42:14Z	CONTRIBUTOR	This would really help with this issue: https://github.com/eyeseast/datasette-geojson/issues/7	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
755134771	https://github.com/simonw/datasette/issues/1101#issuecomment-755134771	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDc1NTEzNDc3MQ==	simonw 9599	2021-01-06T07:28:01Z	2021-01-06T07:28:01Z	OWNER	With this structure it will become possible to stream non-newline-delimited JSON array-of-objects too - the `stream_rows()` method could output `[` first, then each row followed by a comma, then `]` after the very last row.	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
755133937	https://github.com/simonw/datasette/issues/1101#issuecomment-755133937	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDc1NTEzMzkzNw==	simonw 9599	2021-01-06T07:25:48Z	2021-01-06T07:26:43Z	OWNER	Idea: instead of returning a dictionary, `register_output_renderer` could return an object. The object could have the following properties: `.extension` - the extension to use `.can_render(...)` - says if it can render this `.can_stream(...)` - says if streaming is supported `async .stream_rows(rows_iterator, send)` - method that loops through all rows and uses `send` to send them to the response in the correct format I can then deprecate the existing `dict` return type for 1.0.	{ "total_count": 2, "+1": 2, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
755128038	https://github.com/simonw/datasette/issues/1101#issuecomment-755128038	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDc1NTEyODAzOA==	simonw 9599	2021-01-06T07:10:22Z	2021-01-06T07:10:22Z	OWNER	Yet another use-case for this: I want to be able to stream newline-delimited JSON in order to better import into Pandas: `pandas.read_json("https://latest.datasette.io/fixtures/compound_three_primary_keys.json?_shape=array&_nl=on", lines=True)`	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
732544590	https://github.com/simonw/datasette/issues/1101#issuecomment-732544590	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDczMjU0NDU5MA==	simonw 9599	2020-11-24T02:22:55Z	2020-11-24T02:22:55Z	OWNER	The trick I'm using here is to follow the `next_url` in order to paginate through all of the matching results. The loop calls the `data()` method multiple times, once for each page of results: https://github.com/simonw/datasette/blob/4bac9f18f9d04e5ed10f072502bcc508e365438e/datasette/views/base.py#L304-L307	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032
732543700	https://github.com/simonw/datasette/issues/1101#issuecomment-732543700	https://api.github.com/repos/simonw/datasette/issues/1101	MDEyOklzc3VlQ29tbWVudDczMjU0MzcwMA==	simonw 9599	2020-11-24T02:20:30Z	2020-11-24T02:20:30Z	OWNER	Current design: https://docs.datasette.io/en/stable/plugin_hooks.html#register-output-renderer-datasette `python @hookimpl def register_output_renderer(datasette): return { "extension": "test", "render": render_demo, "can_render": can_render_demo, # Optional }` Where `render_demo` looks something like this: `python async def render_demo(datasette, columns, rows): db = datasette.get_database() result = await db.execute("select sqlite_version()") first_row = " \| ".join(columns) lines = [first_row] lines.append("=" * len(first_row)) for row in rows: lines.append(" \| ".join(row)) return Response( "\n".join(lines), content_type="text/plain; charset=utf-8", headers={"x-sqlite-version": result.first()[0]} )` Meanwhile here's where the CSV streaming mode is implemented: https://github.com/simonw/datasette/blob/4bac9f18f9d04e5ed10f072502bcc508e365438e/datasette/views/base.py#L297-L380	{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 }	register_output_renderer() should support streaming data 749283032

Advanced export

JSON shape: default, array, newline-delimited, object

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);

issue_comments

13 rows where issue = 749283032 sorted by updated_at descending

1999

1062

1709

Advanced export