home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where state = "open" and user = 649467 sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, author_association, created_at (date), updated_at (date)

type 2

  • issue 3
  • pull 1

repo 2

  • datasette 3
  • sqlite-utils 1

state 1

  • open · 4 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
836829560 MDU6SXNzdWU4MzY4Mjk1NjA= 248 support for Apache Arrow / parquet files I/O mhalle 649467 open 0     1 2021-03-20T14:59:30Z 2021-10-28T23:46:48Z   NONE  

I just started looking at Apache Arrow using pyarrow for import and export of tabular datasets, and it looks quite compelling. It might be worth looking at for sqlite-utils and/or datasette.

As a test, I took a random jsonl data dump of a dataset I have with floats, strings, and ints and converted it to arrow's parquet format using the naive pyarrow.parquet.write_file() command, which has automatic type inferrence. It compressed down to 7% of the original size. Conversion of a 26MB JSON file and serializing it to parquet was eyeblink instantaneous. Parquet files are portable and can be directly imported into pandas and other analytics software.

The only hangup is the automatic type inference of the naive reader. It's great for general laziness and for parsing JSON columns (it correctly interpreted a table of mine with a JSON array). However, I did get an exception for a string column where most entries looked integer-like but had a couple values that weren't -- the reader tried to coerce all of them for some reason, even though the JSON type is string. Since the writer optionally takes a schema, it shouldn't be too hard to grab the sqlite header types. With some additional hinting, you might get datetime columns and JSON, which are native Arrow types.

Somewhat tangentially, someone even wrote an sqlite vfs extension for Parquet: https://cldellow.com/2018/06/22/sqlite-parquet-vtable.html

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/248/reactions",
    "total_count": 1,
    "+1": 1,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   
808771690 MDU6SXNzdWU4MDg3NzE2OTA= 1225 More flexible formatting of records with CSS grid mhalle 649467 open 0     0 2021-02-15T19:28:17Z 2021-02-15T19:28:35Z   NONE  

In several applications I've been experimenting with alternate formatting of datasette query results. Lately I've found that CSS grids work very well and seem quite general for formatting rows. In CSS I use grid templates to define the layout of each record and the regions for each field, hiding the fields I don't want. It's pretty flexible and looks good. It's also a great basis for highly responsive layout.

I initially thought I'd only use this feature for record detail views, but now I use it for index views as well.

However, there are some limitations: * With the existing table templates, it seems that you can change the display property on the enclosing table, tbody, and tr to make them be grid-like, but that seems hacky (convert table and tbody to be display: block and tr to be display: grid). * More significantly, it's very nice to have the column name available when rendering each record to display headers/field labels. The existing templates don't do that, so a custom _table template is necessary. * I don't know if any plugins are sensitive to whether data is rendered as a table or not since I'm not completely clear how plugins get their data. * Regardless, you need custom CSS to take full advantage of grids. I don't have a proposal on how to integrate them more deeply.

It would be helpful to at least have an official example or test that used a grid layout for records to make sure nothing in datasette breaks with it.

datasette 107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/1225/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   
718395987 MDExOlB1bGxSZXF1ZXN0NTAwNzk4MDkx 1008 Add json_loads and json_dumps jinja2 filters mhalle 649467 open 0     1 2020-10-09T20:11:34Z 2020-12-15T02:30:28Z   FIRST_TIME_CONTRIBUTOR simonw/datasette/pulls/1008
datasette 107914493 pull    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/1008/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  
718238967 MDU6SXNzdWU3MTgyMzg5Njc= 1003 from_json jinja2 filter mhalle 649467 open 0     4 2020-10-09T15:30:58Z 2020-10-09T17:17:07Z   NONE  

When JSON fields are rendered in a jinja2 template, it is handy to be able to manipulate them as data (e.g., iterate over an array of values).

Ansible has a "from_json" function, which just called json.loads. It's a trivial as a datasette plugin, but it seems generally useful. Does it makes sense to add it directly into the app?

datasette 107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/1003/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Queries took 52.669ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows