issue_comments
10 rows where author_association = "OWNER", issue = 447469253 and user = 9599 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: created_at (date)
issue 1
- Improvements to table label detection · 10 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
1264769569 | https://github.com/simonw/datasette/issues/485#issuecomment-1264769569 | https://api.github.com/repos/simonw/datasette/issues/485 | IC_kwDOBm6k_c5LYtoh | simonw 9599 | 2022-10-03T00:04:42Z | 2022-10-03T00:04:42Z | OWNER | I love these tips - tools that can compile a simple machine learning model to a SQL query! Would be pretty cool if I could bundle a model in Datasette itself as a big in-memory SQLite SQL query: |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
1264737290 | https://github.com/simonw/datasette/issues/485#issuecomment-1264737290 | https://api.github.com/repos/simonw/datasette/issues/485 | IC_kwDOBm6k_c5LYlwK | simonw 9599 | 2022-10-02T21:29:59Z | 2022-10-02T21:29:59Z | OWNER | To clarify: the feature this issue is talking about relates to the way Datasette automatically displays foreign key relationships, for example on this page: https://github-to-sqlite.dogsheep.net/github/commits Each of those columns is a foreign key to another table. The link text that is displayed there comes from the "label column" that has either been configured or automatically detected for that other table. I wonder if this could be handled with a tiny machine learning model that's trained to help pick the best label column? Inputs to that model could include:
Output would be the most likely label column, or some indicator that no likely candidates had been found. My hunch is that this would be better solved using a few extra heuristics rather than by training a model, but it does feel like an interesting opportunity to experiment with a tiny ML model. Asked for tips about this on Twitter: https://twitter.com/simonw/status/1576680930680262658 |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
497116074 | https://github.com/simonw/datasette/issues/485#issuecomment-497116074 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NzExNjA3NA== | simonw 9599 | 2019-05-29T21:29:16Z | 2019-05-29T21:29:16Z | OWNER | Another good rule of thumb: look for text fields with a unique constraint? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
496367866 | https://github.com/simonw/datasette/issues/485#issuecomment-496367866 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjM2Nzg2Ng== | simonw 9599 | 2019-05-28T05:14:06Z | 2019-05-28T05:14:06Z | OWNER | I'm going to generate statistics for every TEXT column. Any column with more than 90% distinct rows (compared to the total count of rows) will be a candidate for the label. I will then pick the candidate column with the shortest average length. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
496283728 | https://github.com/simonw/datasette/issues/485#issuecomment-496283728 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjI4MzcyOA== | simonw 9599 | 2019-05-27T18:44:07Z | 2019-05-27T18:44:07Z | OWNER | This code now lives in a method on the new |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
496039483 | https://github.com/simonw/datasette/issues/485#issuecomment-496039483 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjAzOTQ4Mw== | simonw 9599 | 2019-05-26T23:22:53Z | 2019-05-26T23:22:53Z | OWNER | Comparing these two SQL queries (the one with union and the one without) using explain: So I'm going to use the one without the union. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
496039267 | https://github.com/simonw/datasette/issues/485#issuecomment-496039267 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjAzOTI2Nw== | simonw 9599 | 2019-05-26T23:19:38Z | 2019-05-26T23:20:10Z | OWNER | Thinking about that union query: I imagine doing this with union could encourage multiple full table scans. Maybe this query would only do one? https://latest.datasette.io/fixtures?sql=select%0D%0A++count+%28distinct+name%29+as+count_distinct_column_1%2C%0D%0A++avg%28length%28name%29%29+as+avg_length_column_1%2C%0D%0A++count%28distinct+address%29+as+count_distinct_column_2%2C%0D%0A++avg%28length%28address%29%29+as+avg_length_column_2%0D%0Afrom+roadside_attractions
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
495085021 | https://github.com/simonw/datasette/issues/485#issuecomment-495085021 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NTA4NTAyMQ== | simonw 9599 | 2019-05-23T06:27:57Z | 2019-05-26T23:15:51Z | OWNER | I could attempt to calculate the statistics needed for this in a time limited SQL query something like this one: https://latest.datasette.io/fixtures?sql=select+%27name%27+as+column%2C+count+%28distinct+name%29+as+count_distinct%2C+avg%28length%28name%29%29+as+avg_length+from+roadside_attractions%0D%0A++union%0D%0Aselect+%27address%27+as+column%2C+count%28distinct+address%29+as+count_distinct%2C+avg%28length%28address%29%29+as+avg_length+from+roadside_attractions
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
496038601 | https://github.com/simonw/datasette/issues/485#issuecomment-496038601 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NjAzODYwMQ== | simonw 9599 | 2019-05-26T23:08:41Z | 2019-05-26T23:08:41Z | OWNER | The code currently assumes the primary key is called "id" or "pk" - improving it to detect the primary key using database introspection should work much better. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 | |
495083670 | https://github.com/simonw/datasette/issues/485#issuecomment-495083670 | https://api.github.com/repos/simonw/datasette/issues/485 | MDEyOklzc3VlQ29tbWVudDQ5NTA4MzY3MA== | simonw 9599 | 2019-05-23T06:21:52Z | 2019-05-23T06:22:36Z | OWNER | If a table has more than two columns we could do a betterl job at guessing the label column. A few potential tricks:
|
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
Improvements to table label detection 447469253 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 1