home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

15 rows where author_association = "OWNER", "created_at" is on date 2018-05-16 and user = 9599 sorted by updated_at descending

✖
✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: issue_url, created_at (date), updated_at (date)

issue 5

  • Export to CSV 7
  • Facets 4
  • Add new metadata key persistent_urls which removes the hash from all database urls 2
  • Facets should not execute for ?shape=array|object 1
  • Add links to example Datasette instances to appropiate places in docs 1

user 1

  • simonw · 15 ✖

author_association 1

  • OWNER · 15 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
389570841 https://github.com/simonw/datasette/issues/266#issuecomment-389570841 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTU3MDg0MQ== simonw 9599 2018-05-16T15:54:49Z 2018-06-15T07:41:09Z OWNER

At the most basic level, this will work based on an extension. Most places you currently put a .json extension should also allow a .csv extension.

By default this will return the exact results you see on the current page (default max will remain 1000).

Streaming all records

Where things get interested is streaming mode. This will be an option which returns ALL matching records as a streaming CSV file, even if that ends up being millions of records.

I think the best way to build this will be on top of the existing mechanism used to efficiently implement keyset pagination via _next= tokens.

Expanding foreign keys

For tables with foreign key references it would be useful if the CSV format could expand those references to include the labels from label_column - maybe via an additional ?_expand=1 option.

When expanding each foreign key column will be shown twice:

rowid,city_id,city_id_label,state
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389626715 https://github.com/simonw/datasette/issues/266#issuecomment-389626715 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTYyNjcxNQ== simonw 9599 2018-05-16T18:50:46Z 2018-05-16T18:50:46Z OWNER

I’d recommend using the Windows-1252 encoding for maximum compatibility, unless you have any characters not in that set, in which case use UTF8 with a byte order mark. Bit of a pain, but some progams (eg various versions of Excel) don’t read UTF8. frankieroberto https://twitter.com/frankieroberto/status/996823071947460616

There is software that consumes CSV and doesn't speak UTF8!? Huh. Well I can't just use Windows-1252 because I need to support the full UTF8 range of potential data - maybe I should support an optional ?_encoding=windows-1252 argument simonw https://twitter.com/simonw/status/996824677245857793

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389608473 https://github.com/simonw/datasette/issues/266#issuecomment-389608473 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTYwODQ3Mw== simonw 9599 2018-05-16T17:52:35Z 2018-05-16T17:54:11Z OWNER

There are some code examples in this issue which should help with the streaming part: https://github.com/channelcat/sanic/issues/1067

Also https://github.com/channelcat/sanic/blob/master/docs/sanic/streaming.md#response-streaming

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389592566 https://github.com/simonw/datasette/issues/266#issuecomment-389592566 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTU5MjU2Ng== simonw 9599 2018-05-16T17:01:29Z 2018-05-16T17:02:21Z OWNER

Let's provide a CSV Dialect definition too: https://frictionlessdata.io/specs/csv-dialect/ - via https://twitter.com/drewdaraabrams/status/996794915680997382

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389579762 https://github.com/simonw/datasette/issues/266#issuecomment-389579762 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTU3OTc2Mg== simonw 9599 2018-05-16T16:21:12Z 2018-05-16T16:21:12Z OWNER

I basically want someone to tell me which arguments I can pass to Python's csv.writer() function that will result in the least complaints from people who try to parse the results :) https://twitter.com/simonw/status/996786815938977792

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389579363 https://github.com/simonw/datasette/issues/266#issuecomment-389579363 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTU3OTM2Mw== simonw 9599 2018-05-16T16:20:06Z 2018-05-16T16:20:06Z OWNER

I started a thread on Twitter discussing various CSV output dialects: https://twitter.com/simonw/status/996783395504979968 - I want to pick defaults which will work as well as possible for whatever tools people might be using to consume the data.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389572201 https://github.com/simonw/datasette/issues/266#issuecomment-389572201 https://api.github.com/repos/simonw/datasette/issues/266 MDEyOklzc3VlQ29tbWVudDM4OTU3MjIwMQ== simonw 9599 2018-05-16T15:58:43Z 2018-05-16T16:00:47Z OWNER

This will likely be implemented in the BaseView class, which needs to know how to spot the .csv extension, call the underlying JSON generating function and then return the columns and rows as correctly formatted CSV.

https://github.com/simonw/datasette/blob/9959a9e4deec8e3e178f919e8b494214d5faa7fd/datasette/views/base.py#L201-L207

This means it will take ALL arguments that are available to the .json view. It may ignore some (e.g. _facet= makes no sense since CSV tables don't have space to show the facet results).

In streaming mode, things will behave a little bit differently - in particular, if _stream=1 then _next= will be forbidden.

It can't include a length header because we don't know how many bytes it will be

CSV output will throw an error if the endpoint doesn't have rows and columns keys eg /-/inspect.json

So the implementation...

  • looks for the .csv extension
  • internally fetches the .json data instead
  • If no _stream it just transposes that JSON to CSV with the correct content type header
  • If _stream=1 - checks for _next= and throws an error if it was provided
  • Otherwise... fetch first page and emit CSV header and first set of rows
  • Then start async looping, emitting more CSV rows and following the _next= internal reference until done

I like that this takes advantage of efficient pagination. It may not work so well for views which use offset/limit though.

It won't work at all for custom SQL because custom SQL doesn't support _next= pagination. That's fine.

For views... easiest fix is to cut off after first X000 records. That seems OK. View JSON would need to include a property that the mechanism can identify.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Export to CSV 323681589  
389566147 https://github.com/simonw/datasette/issues/265#issuecomment-389566147 https://api.github.com/repos/simonw/datasette/issues/265 MDEyOklzc3VlQ29tbWVudDM4OTU2NjE0Nw== simonw 9599 2018-05-16T15:41:42Z 2018-05-16T15:41:42Z OWNER

An official demo instance of Datasette dedicated to this use-case would be useful, especially if it was automatically deployed by Travis for every commit to master that passes the tests.

Maybe there should be a permanent version of it deployed for each released version too?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add links to example Datasette instances to appropiate places in docs 323677499  
389563719 https://github.com/simonw/datasette/issues/263#issuecomment-389563719 https://api.github.com/repos/simonw/datasette/issues/263 MDEyOklzc3VlQ29tbWVudDM4OTU2MzcxOQ== simonw 9599 2018-05-16T15:34:46Z 2018-05-16T15:34:46Z OWNER

The underlying mechanics for the _extras mechanism described in #262 may help with this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Facets should not execute for ?shape=array|object 323671577  
389562708 https://github.com/simonw/datasette/issues/255#issuecomment-389562708 https://api.github.com/repos/simonw/datasette/issues/255 MDEyOklzc3VlQ29tbWVudDM4OTU2MjcwOA== simonw 9599 2018-05-16T15:32:12Z 2018-05-16T15:32:12Z OWNER

This is now landed in master, ready for the next release.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Facets 322477187  
389546040 https://github.com/simonw/datasette/issues/255#issuecomment-389546040 https://api.github.com/repos/simonw/datasette/issues/255 MDEyOklzc3VlQ29tbWVudDM4OTU0NjA0MA== simonw 9599 2018-05-16T14:47:34Z 2018-05-16T14:47:34Z OWNER

Latest demo - now with multiple columns: https://datasette-suggested-facets-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List?_facet=qCaretaker&_facet=qCareAssistant&_facet=qLegalStatus

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Facets 322477187  
389536870 https://github.com/simonw/datasette/pull/258#issuecomment-389536870 https://api.github.com/repos/simonw/datasette/issues/258 MDEyOklzc3VlQ29tbWVudDM4OTUzNjg3MA== simonw 9599 2018-05-16T14:22:31Z 2018-05-16T14:22:31Z OWNER

The principle benefit provided by the hash URLs is that Datasette can set a far-future cache expiry header on every response. This is particularly useful for JavaScript API work as it makes fantastic use of the browser's cache. It also means that if you are serving your API from behind a caching proxy like Cloudflare you get a fantastic cache hit rate.

An option to serve without persistent hashes would also need to turn off the cache headers.

Maybe the option should support both? If you hit a page with the hash in the URL you still get the cache headers, but hits to the URL without the hash serve uncashed content directly.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add new metadata key persistent_urls which removes the hash from all database urls 322741659  
389397457 https://github.com/simonw/datasette/issues/255#issuecomment-389397457 https://api.github.com/repos/simonw/datasette/issues/255 MDEyOklzc3VlQ29tbWVudDM4OTM5NzQ1Nw== simonw 9599 2018-05-16T05:20:04Z 2018-05-16T05:20:04Z OWNER

Maybe suggested_facets should only be calculated for the HTML view.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Facets 322477187  
389386919 https://github.com/simonw/datasette/issues/255#issuecomment-389386919 https://api.github.com/repos/simonw/datasette/issues/255 MDEyOklzc3VlQ29tbWVudDM4OTM4NjkxOQ== simonw 9599 2018-05-16T03:57:47Z 2018-05-16T03:58:30Z OWNER

I updated that demo to demonstrate the new foreign key label expansions: https://datasette-suggested-facets-demo.now.sh/sf-trees-02c8ef1/Street_Tree_List?_facet=qLegalStatus

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Facets 322477187  
389386142 https://github.com/simonw/datasette/pull/258#issuecomment-389386142 https://api.github.com/repos/simonw/datasette/issues/258 MDEyOklzc3VlQ29tbWVudDM4OTM4NjE0Mg== simonw 9599 2018-05-16T03:51:13Z 2018-05-16T03:51:13Z OWNER

The URL does persist across deployments already, in that you can use the URL without the hash and it will redirect to the current location. Here's an example of that: https://san-francisco.datasettes.com/sf-trees/Street_Tree_List.json

This also works if you attempt to hit the incorrect hash, e.g. if you have deployed a new version of the database with an updated hash. The old hash will redirect, e.g. https://san-francisco.datasettes.com/sf-trees-c4b972c/Street_Tree_List.json

If you serve Datasette from a HTTP/2 proxy (I've been using Cloudflare for this) you won't even have to pay the cost of the redirect - Datasette sends a Link: <URL>; rel=preload header with those redirects, which causes Cloudflare to push out the redirected source as part of that HTTP/2 request. You can fire up the Chrome DevTools to watch this happen.

https://github.com/simonw/datasette/blob/2b79f2bdeb1efa86e0756e741292d625f91cb93d/datasette/views/base.py#L91

All of that said... I'm not at all opposed to this feature. For consistency with other Datasette options (e.g. --cors) I'd prefer to do this as an optional argument to the datasette serve command - something like this:

datasette serve mydb.db --no-url-hash
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Add new metadata key persistent_urls which removes the hash from all database urls 322741659  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 1195.881ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows