home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

19 rows where issue = 725184645 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 19

issue 1

  • Better way of representing binary data in .csv output · 19 ✖

author_association 1

  • OWNER 19
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
719094027 https://github.com/simonw/datasette/issues/1034#issuecomment-719094027 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxOTA5NDAyNw== simonw 9599 2020-10-30T00:11:17Z 2020-10-30T00:11:17Z OWNER

Demos:

https://latest.datasette.io/fixtures/binary_data.csv?_size=max

csv rowid,data 1,http://latest.datasette.io/fixtures/binary_data/1.blob?_blob_column=data 2,http://latest.datasette.io/fixtures/binary_data/2.blob?_blob_column=data 3,

https://latest.datasette.io/fixtures.csv?sql=select+rowid%2C+data+from+binary_data+order+by+rowid+limit+1001&_size=max

csv rowid,data 1,http://latest.datasette.io/fixtures.blob?sql=select+rowid%2C+data+from+binary_data+order+by+rowid+limit+1001&_size=max&_blob_column=data&_blob_hash=f3088978da8f9aea479ffc7f631370b968d2e855eeb172bea7f6c7a04262bb6d 2,http://latest.datasette.io/fixtures.blob?sql=select+rowid%2C+data+from+binary_data+order+by+rowid+limit+1001&_size=max&_blob_column=data&_blob_hash=b835b0483cedb86130b9a2c280880bf5fadc5318ddf8c18d0df5204d40df1724 3,

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
719050754 https://github.com/simonw/datasette/issues/1034#issuecomment-719050754 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxOTA1MDc1NA== simonw 9599 2020-10-29T22:04:52Z 2020-10-29T22:04:52Z OWNER

I'm going to link to. the new .blob representation using the new ?_blob_hash=xxx argument to ensure that the content served is the expected binary blob.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716078777 https://github.com/simonw/datasette/issues/1034#issuecomment-716078777 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3ODc3Nw== simonw 9599 2020-10-25T01:25:11Z 2020-10-25T01:25:11Z OWNER

SQLite actually has APIs that could help here: https://www.sqlite.org/c3ref/column_database_name.html - for any given SQL query they identify the origin/table/column that is the source of each resulting column.

Those aren't exposed in the Python sqlite3 module though, so using them could be extremely tricky.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716078605 https://github.com/simonw/datasette/issues/1034#issuecomment-716078605 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3ODYwNQ== simonw 9599 2020-10-25T01:22:22Z 2020-10-25T01:22:22Z OWNER

For arbitrary CSV the only solution I can think of is to embed the base64 value.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716078512 https://github.com/simonw/datasette/issues/1034#issuecomment-716078512 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3ODUxMg== simonw 9599 2020-10-25T01:21:11Z 2020-10-25T01:21:11Z OWNER

What should happen for CSV export of arbitrary SQL queries, where there's no obvious BLOB to link to?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716078420 https://github.com/simonw/datasette/issues/1034#issuecomment-716078420 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3ODQyMA== simonw 9599 2020-10-25T01:20:00Z 2020-10-25T01:20:00Z OWNER

That documentation: https://docs.datasette.io/en/latest/internals.html#absolute-url-request-path

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716077541 https://github.com/simonw/datasette/issues/1034#issuecomment-716077541 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3NzU0MQ== simonw 9599 2020-10-25T01:09:38Z 2020-10-25T01:10:04Z OWNER

I should turn datasette.absolute_url(...) into a documented internal API on https://docs.datasette.io/en/stable/internals.html#datasette-class

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716077508 https://github.com/simonw/datasette/issues/1034#issuecomment-716077508 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3NzUwOA== simonw 9599 2020-10-25T01:09:17Z 2020-10-25T01:09:17Z OWNER

Here's how those absolute next_url values are generated: https://github.com/simonw/datasette/blob/5db7ae3ce165ded57c7fb1cfbdb3258b1cf06c10/datasette/views/table.py#L774-L776

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
716077436 https://github.com/simonw/datasette/issues/1034#issuecomment-716077436 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxNjA3NzQzNg== simonw 9599 2020-10-25T01:08:35Z 2020-10-25T01:08:42Z OWNER

This is actually a bit tricky to implement, for a few reasons:

  • Need to generate a full URL, including the https://host/ bit. I've done this for next_url in the JSON output before, thankfully.
  • This only makes sense for CSV output for tables. If it's the CSV output of an arbitrary query there's no /db/table/-/blob/pk/column.blob page for me to link to.
  • Need to generate those /.../-/blob/... URLs for the data that is being output as CSV.
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713277810 https://github.com/simonw/datasette/issues/1034#issuecomment-713277810 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzI3NzgxMA== simonw 9599 2020-10-21T03:40:50Z 2020-10-25T01:01:23Z OWNER

Blocked awaiting #1036 (update: now unblocked)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713191819 https://github.com/simonw/datasette/issues/1034#issuecomment-713191819 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzE5MTgxOQ== simonw 9599 2020-10-20T23:12:58Z 2020-10-20T23:12:58Z OWNER

Enzo has a great solution here: https://twitter.com/enzo_mdd/status/1318685442976436226

Or maybe an option for a url. This keeps the CSV small but allows scripts to download binary data as needed.

In #1036 I'm planning on adding a way for users to access BLOB data. I can include that URL in the CSV output.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713176082 https://github.com/simonw/datasette/issues/1034#issuecomment-713176082 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzE3NjA4Mg== simonw 9599 2020-10-20T22:27:33Z 2020-10-20T22:27:33Z OWNER

This feels good to me - it's consistent with how other features in Datasette work, and it means users who need the binary data in CSV (for whatever reason) can get it if they want to.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713175741 https://github.com/simonw/datasette/issues/1034#issuecomment-713175741 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzE3NTc0MQ== simonw 9599 2020-10-20T22:26:45Z 2020-10-20T22:26:45Z OWNER

New idea: since binary in CSV doesn't make sense anyway, emulate Datasette's HTML UI default and output this:

id,title,data
1,Some title,<Binary data: 14 bytes>
2,Other title,<Binary data: 57 bytes>

Then allow users to add ?_base64=1 to the URL to get base64 instead https://twitter.com/simonw/status/1318679950635888641

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713174690 https://github.com/simonw/datasette/issues/1034#issuecomment-713174690 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzE3NDY5MA== simonw 9599 2020-10-20T22:23:50Z 2020-10-20T22:23:50Z OWNER

Or... default to <Binary data: 7 bytes> and support a ?_base64=1 option which outputs in base64 instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713174341 https://github.com/simonw/datasette/issues/1034#issuecomment-713174341 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzE3NDM0MQ== simonw 9599 2020-10-20T22:22:53Z 2020-10-20T22:23:14Z OWNER

An even easier option: do what the Datasette UI does and output <Binary data: 7 bytes> for that CSV cell, as seen on https://latest.datasette.io/fixtures/binary_data

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
713172901 https://github.com/simonw/datasette/issues/1034#issuecomment-713172901 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMzE3MjkwMQ== simonw 9599 2020-10-20T22:19:10Z 2020-10-20T22:20:28Z OWNER

I could go with the same format as datasette-render-binary but using 0x00 as the format for the hex bytes.

0x15 0x1C 0x02 0xC7 JFIF 0x00 0x01

Problem with this is that it's ambiguous: if the ASCII characters 0x15 occur in the text they will be indistinguishable from those hex bytes.

But since representing binary data in CSV fundamentally doesn't make sense I'm not sure if that really matters.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
712582699 https://github.com/simonw/datasette/issues/1034#issuecomment-712582699 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMjU4MjY5OQ== simonw 9599 2020-10-20T04:36:04Z 2020-10-20T04:36:14Z OWNER

Asked for ideas on Twitter: https://twitter.com/simonw/status/1318409558805467136

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
712581994 https://github.com/simonw/datasette/issues/1034#issuecomment-712581994 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMjU4MTk5NA== simonw 9599 2020-10-20T04:33:28Z 2020-10-20T04:33:28Z OWNER

The datasette-render-binary plugin does this, which I really like - but without the different coloured fonts I'm not sure how readable it would be as just plain text:

Really the goal here is to find the most human-friendly option, so that people looking at the output have a vague idea what's going on. That's why I'm not leaping at the chance to use base64.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  
712580976 https://github.com/simonw/datasette/issues/1034#issuecomment-712580976 https://api.github.com/repos/simonw/datasette/issues/1034 MDEyOklzc3VlQ29tbWVudDcxMjU4MDk3Ng== simonw 9599 2020-10-20T04:29:23Z 2020-10-20T04:29:23Z OWNER

Most obvious option is base64. Any other potential solutions I'm missing?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Better way of representing binary data in .csv output 725184645  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 25.092ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows