home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

5 rows where repo = 248903544 and state = "open" sorted by updated_at descending

✖
✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: user, comments, author_association, created_at (date), updated_at (date)

type 2

  • issue 3
  • pull 2

state 1

  • open · 5 ✖

repo 1

  • hacker-news-to-sqlite · 5 ✖
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
1641117021 PR_kwDODtX3eM5M66op 6 Add permalink virtual field to items table xavdid 1231935 open 0     1 2023-03-26T22:22:38Z 2023-03-29T18:38:52Z   FIRST_TIME_CONTRIBUTOR dogsheep/hacker-news-to-sqlite/pulls/6

I added a virtual column (no storage overhead) to the output that easily links back to the source. It works nicely out of the box with datasette:

I got bit a bit by https://github.com/simonw/sqlite-utils/issues/411, so I went with a manual table_xinfo and creating the table via execute. Happy to adjust if that issue moves, but this seems like it works.

I also added my best-guess instructions for local development on this package. I'm shooting in the dark, so feel free to replace with how you work on it locally.

hacker-news-to-sqlite 248903544 pull    
{
    "url": "https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/6/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  
1353418822 PR_kwDODtX3eM497MOV 5 The program fails when the user has no submissions fernand0 2467 open 0     0 2022-08-28T17:25:45Z 2022-08-28T17:25:45Z   FIRST_TIME_CONTRIBUTOR dogsheep/hacker-news-to-sqlite/pulls/5

Tested with:

 hacker-news-to-sqlite user hacker-news.db fernand0

Result: Traceback (most recent call last): File "/home/ftricas/.pyenv/versions/3.10.6/bin/hacker-news-to-sqlite", line 8, in <module> sys.exit(cli()) File "/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py", line 1130, in __call__ return self.main(*args, **kwargs) File "/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py", line 1055, in main rv = self.invoke(ctx) File "/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py", line 1657, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py", line 1404, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/click/core.py", line 760, in invoke return __callback(*args, **kwargs) File "/home/ftricas/.pyenv/versions/3.10.6/lib/python3.10/site-packages/hacker_news_to_sqlite/cli.py", line 27, in user submitted = user.pop("submitted", None) or [] AttributeError: 'NoneType' object has no attribute 'pop'

There is a problem of style with the patch (but not sure what to do) because with the new inicialization ( submitted = []) the part

 or []

is not needed. Maybe there is a more adequate way of doing this.

hacker-news-to-sqlite 248903544 pull    
{
    "url": "https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/5/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  
1205867842 I_kwDODtX3eM5H4BVC 4 Retrieve the top-level story for a comment telotortium 1755789 open 0     0 2022-04-15T20:25:39Z 2022-04-15T20:25:39Z   NONE  

I think that each comment inserted into the database should include a column onstory that contains the ID of the story on which the comment was made. This is exactly equivalent to the link after "on:" at the top of an HN comment page (example). We could do this either by directly retrieving the HTML page and using Beautiful Soup to find that link, or alternatively recurse up the tree in the Firebase API using the parent field (probably using functools.lru_cache in case a person has commented a bunch of times on the same story).

hacker-news-to-sqlite 248903544 issue    
{
    "url": "https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/4/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   
952189173 MDU6SXNzdWU5NTIxODkxNzM= 3 Use HN algolia endpoint to retrieve trees simonw 9599 open 0     3 2021-07-25T03:35:27Z 2021-07-25T18:41:17Z   MEMBER  

The trees command currently has to make a request for every single comment. Algolia have an endpoint that bundles the entire thread together into a single request.

https://hn.algolia.com/api/v1/items/ID

Here's an example that loads quickly, with about 50 comments: https://hn.algolia.com/api/v1/items/27941108

It doesn't appear to use pagination at all - if a thread is big then the response is big.

I ran this search to find some stories with more than 1000 comments: https://hn.algolia.com/api/v1/search?tags=story&numericFilters=num_comments%3E=1000

Here's one: https://news.ycombinator.com/item?id=25015967 with 4759 comments. Hitting the API takes 41s and returns 3.7 MB of JSON! wget 'https://hn.algolia.com/api/v1/items/25015967' 0.03s user 0.04s system 0% cpu 41.368 total /tmp % ls -lah 25015967 -rw-r--r-- 1 simon wheel 3.7M Jul 24 20:31 25015967

hacker-news-to-sqlite 248903544 issue    
{
    "url": "https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   
952179830 MDU6SXNzdWU5NTIxNzk4MzA= 2 Command for fetching Hacker News threads from the search API simonw 9599 open 0     4 2021-07-25T02:00:45Z 2021-07-25T03:12:57Z   MEMBER  

I want to be able to fetch every item for a domain, e.g. https://news.ycombinator.com/from?site=simonwillison.net

hacker-news-to-sqlite 248903544 issue    
{
    "url": "https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Queries took 36.877ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows