home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

9 rows where issue = 964322136 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 3

  • simonw 7
  • tannewt 1
  • knowledgecamp12 1

author_association 2

  • OWNER 7
  • NONE 2

issue 1

  • Manage /robots.txt in Datasette core, block robots by default · 9 ✖
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
985982668 https://github.com/simonw/datasette/issues/1426#issuecomment-985982668 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c46xObM knowledgecamp12 95520595 2021-12-04T07:11:29Z 2021-12-04T07:11:29Z NONE

You can generate xml site map from the online tools using https://tools4seo.site/xml-sitemap-generator.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
974711959 https://github.com/simonw/datasette/issues/1426#issuecomment-974711959 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c46GOyX tannewt 52649 2021-11-20T21:11:51Z 2021-11-20T21:11:51Z NONE

I think another thing would be to make /pages/robots.txt work. That way you can use jinja to generate a desired robots.txt. I'm using it to allow the main index and what it links to to be crawled (but not the database pages directly.)

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
902263367 https://github.com/simonw/datasette/issues/1426#issuecomment-902263367 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41x3JH simonw 9599 2021-08-19T21:33:51Z 2021-08-19T21:36:28Z OWNER

I was worried about if it's possible to allow access to /fixtures but deny access to /fixtures?sql=...

From various answers on Stack Overflow it looks like this should handle that:

User-agent: * Disallow: /fixtures? I could use this for tables too - it may well be OK to access table index pages while still avoiding pagination, facets etc. I think this should block both query strings and row pages while allowing the table page itself: User-agent: * Disallow: /fixtures/searchable? Disallow: /fixtures/searchable/* Could even accompany that with a sitemap.xml that explicitly lists all of the tables - which would mean adding sitemaps to Datasette core too.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
902260338 https://github.com/simonw/datasette/issues/1426#issuecomment-902260338 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41x2Zy simonw 9599 2021-08-19T21:28:25Z 2021-08-19T21:29:40Z OWNER

Actually it looks like you can send a sitemap.xml to Google using an unauthenticated GET request to:

https://www.google.com/ping?sitemap=FULL_URL_OF_SITEMAP

According to https://developers.google.com/search/docs/advanced/sitemaps/build-sitemap

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
902260799 https://github.com/simonw/datasette/issues/1426#issuecomment-902260799 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41x2g_ simonw 9599 2021-08-19T21:29:13Z 2021-08-19T21:29:13Z OWNER

Bing's equivalent is: https://www.bing.com/webmasters/help/Sitemaps-3b5cf6ed

http://www.bing.com/ping?sitemap=FULL_URL_OF_SITEMAP
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
895522818 https://github.com/simonw/datasette/issues/1426#issuecomment-895522818 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41YJgC simonw 9599 2021-08-09T20:34:10Z 2021-08-09T20:34:10Z OWNER

At the very least Datasette should serve a blank /robots.txt by default - I'm seeing a ton of 404s for it in the logs.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
895510773 https://github.com/simonw/datasette/issues/1426#issuecomment-895510773 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41YGj1 simonw 9599 2021-08-09T20:14:50Z 2021-08-09T20:19:22Z OWNER

https://twitter.com/mal/status/1424825895139876870

True pinging google should be part of the build process on a static site :)

That's another aspect of this: if you DO want your site crawled, teaching the datasette publish command how to ping Google when a deploy has gone out could be a nice improvement.

Annoyingly it looks like you need to configure an auth token of some sort in order to use their API though, which is likely too much hassle to be worth building into Datasette itself: https://developers.google.com/search/apis/indexing-api/v3/using-api

``` curl -X POST https://indexing.googleapis.com/v3/urlNotifications:publish -d '{ "url": "https://careers.google.com/jobs/google/technical-writer", "type": "URL_UPDATED" }' -H "Content-Type: application/json"

{ "error": { "code": 401, "message": "Request is missing required authentication credential. Expected OAuth 2 access token, login cookie or other valid authentication credential. See https://developers.google.com/identity/sign-in/web/devconsole-project.", "status": "UNAUTHENTICATED" } } ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
895509536 https://github.com/simonw/datasette/issues/1426#issuecomment-895509536 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41YGQg simonw 9599 2021-08-09T20:12:57Z 2021-08-09T20:12:57Z OWNER

I could try out the X-Robots HTTP header too: https://developers.google.com/search/docs/advanced/robots/robots_meta_tag#xrobotstag

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  
895500565 https://github.com/simonw/datasette/issues/1426#issuecomment-895500565 https://api.github.com/repos/simonw/datasette/issues/1426 IC_kwDOBm6k_c41YEEV simonw 9599 2021-08-09T20:00:04Z 2021-08-09T20:00:04Z OWNER

A few options for how this would work:

  • datasette ... --robots allow
  • datasette ... --setting robots allow

Options could be:

  • allow - allow all crawling
  • deny - deny all crawling
  • limited - allow access to the homepage and the index pages for each database and each table, but disallow crawling any further than that

The "limited" mode is particularly interesting. Could even make it the default, but I think that may be a bit too confusing. Idea would be to get the key pages indexed but use nofollow to discourage crawlers from indexing individual row pages or deep pages like https://datasette.io/content/repos?_facet=owner&_facet=language&_facet_array=topics&topics__arraycontains=sqlite#facet-owner.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Manage /robots.txt in Datasette core, block robots by default 964322136  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 20.73ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows