html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app
https://github.com/simonw/datasette/issues/1439#issuecomment-1068461449,https://api.github.com/repos/simonw/datasette/issues/1439,1068461449,IC_kwDOBm6k_c4_r22J,9599,2022-03-15T20:51:26Z,2022-03-15T20:51:26Z,OWNER,I'm happy with this now that I've landed Tilde encoding in #1657.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1065988403,https://api.github.com/repos/simonw/datasette/issues/1439,1065988403,IC_kwDOBm6k_c4_ibEz,9599,2022-03-13T00:06:38Z,2022-03-13T00:07:19Z,OWNER,"If I want to reserve `-` as a character that CAN be used in URLs, the only remaining character that might make sense for escape sequences is `~` - based on this last line of characters that are escape from percentage encoding:
```python
_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'abcdefghijklmnopqrstuvwxyz'
b'0123456789'
b'_.-~')
```
So I'd add both `-` and `_` back to the safe list, but use `~` to escape `.` and `/` and suchlike.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1065987808,https://api.github.com/repos/simonw/datasette/issues/1439,1065987808,IC_kwDOBm6k_c4_ia7g,9599,2022-03-13T00:02:32Z,2022-03-13T00:02:32Z,OWNER,"OK, this has broken a lot more than I expected it would.
Turns out `-` is a very common character in existing Datasette database names!
https://datasette.io/-/databases for example has two:
```json
[
{
""name"": ""docs-index"",
""path"": ""docs-index.db"",
""size"": 1007616,
""is_mutable"": false,
""is_memory"": false,
""hash"": ""0ac6c3de2762fcd174fd249fed8a8fa6046ea345173d22c2766186bf336462b2""
},
{
""name"": ""dogsheep-index"",
""path"": ""dogsheep-index.db"",
""size"": 5496832,
""is_mutable"": false,
""is_memory"": false,
""hash"": ""d1ea238d204e5b9ae783c86e4af5bcdf21267c1f391de3e468d9665494ee012a""
}
]
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1060870237,https://api.github.com/repos/simonw/datasette/issues/1439,1060870237,IC_kwDOBm6k_c4_O5hd,9599,2022-03-07T16:19:22Z,2022-03-07T16:19:22Z,OWNER,"I didn't need to do any of the fancy regular expression routing stuff after all, since the new dash encoding format avoids using `/` so a simple `[^/]+` can capture the correct segments from the URL.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1060044007,https://api.github.com/repos/simonw/datasette/issues/1439,1060044007,IC_kwDOBm6k_c4_Lvzn,9599,2022-03-06T21:38:15Z,2022-03-06T21:38:15Z,OWNER,"Test: https://github.com/simonw/datasette/blob/d2e3fe3facf0ed0abf8b00cd54463af90dd6904d/tests/test_utils.py#L651-L666
One big advantage to this scheme is that redirecting old links to `%2F` pages (e.g. https://fivethirtyeight.datasettes.com/fivethirtyeight/twitter-ratio%2Fsenators) is easy - if you see a `%` in the `raw_path`, redirect to that page with the `%` replaced by `-`.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059903309,https://api.github.com/repos/simonw/datasette/issues/1439,1059903309,IC_kwDOBm6k_c4_LNdN,9599,2022-03-06T06:17:51Z,2022-03-06T06:17:51Z,OWNER,"Suggestion from a conversation with Seth Michael Larson: it would be neat if plugins could easily integrate with whatever scheme this ends up using, maybe with the `/db/table/-/plugin-name` standardized pattern or similar.
Making it easy for plugins to do the right, consistent thing is a good idea.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059864154,https://api.github.com/repos/simonw/datasette/issues/1439,1059864154,IC_kwDOBm6k_c4_LD5a,9599,2022-03-06T00:59:04Z,2022-03-06T00:59:04Z,OWNER,"Needs more testing, but this seems to work for decoding the percent-escaped-with-dashes format: `urllib.parse.unquote(s.replace('-', '%'))`","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059855418,https://api.github.com/repos/simonw/datasette/issues/1439,1059855418,IC_kwDOBm6k_c4_LBw6,9599,2022-03-06T00:00:53Z,2022-03-06T00:04:18Z,OWNER,"```python
_ESCAPE_SAFE = frozenset(
b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'abcdefghijklmnopqrstuvwxyz'
b'0123456789_'
)
# I removed b'.-~')
class Quoter(dict):
# Keeps a cache internally, via __missing__
def __missing__(self, b):
# Handle a cache miss. Store quoted string in cache and return.
res = chr(b) if b in _ESCAPE_SAFE else '-{:02X}'.format(b)
self[b] = res
return res
quoter = Quoter().__getitem__
''.join([quoter(char) for char in b'foo/bar.csv'])
# 'foo-2Fbar-2Ecsv'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059854864,https://api.github.com/repos/simonw/datasette/issues/1439,1059854864,IC_kwDOBm6k_c4_LBoQ,9599,2022-03-05T23:59:05Z,2022-03-05T23:59:05Z,OWNER,"OK, for that percentage thing: the Python core implementation of URL percentage escaping deliberately ignores two of the characters we want to escape: `.` and `-`:
https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L780-L783
```python
_ALWAYS_SAFE = frozenset(b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
b'abcdefghijklmnopqrstuvwxyz'
b'0123456789'
b'_.-~')
```
It also defaults to skipping `/` (passed as a `safe=` parameter to various things).
I'm going to try borrowing and modifying the core of the Python implementation: https://github.com/python/cpython/blob/6927632492cbad86a250aa006c1847e03b03e70b/Lib/urllib/parse.py#L795-L814
```python
class _Quoter(dict):
""""""A mapping from bytes numbers (in range(0,256)) to strings.
String values are percent-encoded byte values, unless the key < 128, and
in either of the specified safe set, or the always safe set.
""""""
# Keeps a cache internally, via __missing__, for efficiency (lookups
# of cached keys don't call Python code at all).
def __init__(self, safe):
""""""safe: bytes object.""""""
self.safe = _ALWAYS_SAFE.union(safe)
def __repr__(self):
return f""""
def __missing__(self, b):
# Handle a cache miss. Store quoted string in cache and return.
res = chr(b) if b in self.safe else '%{:02X}'.format(b)
self[b] = res
return res
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059853526,https://api.github.com/repos/simonw/datasette/issues/1439,1059853526,IC_kwDOBm6k_c4_LBTW,9599,2022-03-05T23:49:59Z,2022-03-05T23:49:59Z,OWNER,"I want to try regular percentage encoding, except that it also encodes both the `-` and the `.` characters, AND it uses `-` instead of `%` as the encoding character.
Should check what it does with emoji too.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059851259,https://api.github.com/repos/simonw/datasette/issues/1439,1059851259,IC_kwDOBm6k_c4_LAv7,9599,2022-03-05T23:35:47Z,2022-03-05T23:35:59Z,OWNER,"This [comment from glyph](https://twitter.com/glyph/status/1500244937312329730) got me thinking:
> Have you considered replacing % with some other character and then using percent-encoding?
What happens if a table name includes a `%` character and that ends up getting mangled by a misbehaving proxy?
I should consider `%` in the escaping system too. And maybe go with that suggestion of using percent-encoding directly but with a different character.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059850369,https://api.github.com/repos/simonw/datasette/issues/1439,1059850369,IC_kwDOBm6k_c4_LAiB,9599,2022-03-05T23:28:56Z,2022-03-05T23:28:56Z,OWNER,"Lots of great conversations about the dash encoding implementation on Twitter: https://twitter.com/simonw/status/1500228316309061633
@dracos helped me figure out a simpler regex: https://twitter.com/dracos/status/1500236433809973248
`^/(?P[^/]+)/(?P[^\/\-\.]*|\-/|\-\.|\-\-)*(?P\.\w+)?$`
![image](https://user-images.githubusercontent.com/9599/156903088-c01933ae-4713-4e91-8d71-affebf70b945.png)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059836599,https://api.github.com/repos/simonw/datasette/issues/1439,1059836599,IC_kwDOBm6k_c4_K9K3,9599,2022-03-05T21:52:10Z,2022-03-05T21:52:10Z,OWNER,Blogged about this here: https://simonwillison.net/2022/Mar/5/dash-encoding/,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481,https://api.github.com/repos/simonw/datasette/issues/1439,1045069481,IC_kwDOBm6k_c4-Sn6p,9599,2022-02-18T19:34:41Z,2022-03-05T21:32:22Z,OWNER,"I think I got format extraction working! https://regex101.com/r/A0bW1D/1
^/(?P[^/]+)/(?P(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*?)(?:(?\w+))?$
I had to make that crazy inner one even more complicated to stop it from capturing `.` that was not part of `-.`.
(?:[^\/\-\.]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*
Visualized:
So now I have a regex which can extract out the dot-encoded table name AND spot if there is an optional `.format` at the end:
If I end up using this in Datasette it's going to need VERY comprehensive unit tests and inline documentation.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059822391,https://api.github.com/repos/simonw/datasette/issues/1439,1059822391,IC_kwDOBm6k_c4_K5s3,9599,2022-03-05T19:50:12Z,2022-03-05T19:50:12Z,OWNER,I'm going to move this work to a PR.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059822151,https://api.github.com/repos/simonw/datasette/issues/1439,1059822151,IC_kwDOBm6k_c4_K5pH,9599,2022-03-05T19:48:35Z,2022-03-05T19:48:35Z,OWNER,Those new docs: https://github.com/simonw/datasette/blob/d1cb73180b4b5a07538380db76298618a5fc46b6/docs/internals.rst#dash-encoding,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1059802318,https://api.github.com/repos/simonw/datasette/issues/1439,1059802318,IC_kwDOBm6k_c4_K0zO,9599,2022-03-05T17:34:33Z,2022-03-05T17:34:33Z,OWNER,"Wrote documentation:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1053973425,https://api.github.com/repos/simonw/datasette/issues/1439,1053973425,IC_kwDOBm6k_c4-0lux,9599,2022-02-28T07:40:12Z,2022-02-28T07:40:12Z,OWNER,"If I make this change it will break existing links to one of the oldest Datasette demos: http://fivethirtyeight.datasettes.com/fivethirtyeight/avengers%2Favengers
A plugin that fixes those by redirecting them on 404 would be neat.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1049126151,https://api.github.com/repos/simonw/datasette/issues/1439,1049126151,IC_kwDOBm6k_c4-iGUH,9599,2022-02-23T19:17:01Z,2022-02-23T19:17:01Z,OWNER,Actually the relevant code looks to be: https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/views/base.py#L481-L498,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1049124390,https://api.github.com/repos/simonw/datasette/issues/1439,1049124390,IC_kwDOBm6k_c4-iF4m,9599,2022-02-23T19:15:00Z,2022-02-23T19:15:00Z,OWNER,"I'll start by modifying this function: https://github.com/simonw/datasette/blob/458f03ad3a454d271f47a643f4530bd8b60ddb76/datasette/utils/__init__.py#L732-L749
Later I want to move this to the routing layer to split out `format` automatically, as seen in the regexes here: https://github.com/simonw/datasette/issues/1439#issuecomment-1045069481","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1049114724,https://api.github.com/repos/simonw/datasette/issues/1439,1049114724,IC_kwDOBm6k_c4-iDhk,9599,2022-02-23T19:04:40Z,2022-02-23T19:04:40Z,OWNER,I'm going to try dash encoding for table names (and row IDs) in a branch and see how I like it.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045269544,https://api.github.com/repos/simonw/datasette/issues/1439,1045269544,IC_kwDOBm6k_c4-TYwo,9599,2022-02-18T22:19:29Z,2022-02-18T22:19:29Z,OWNER,"Note that I've ruled out using `Accept: application/json` to return JSON because it turns out Cloudflare and potentially other CDNs ignore the `Vary: Accept` header entirely:
- https://github.com/simonw/datasette/issues/1534","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045134050,https://api.github.com/repos/simonw/datasette/issues/1439,1045134050,IC_kwDOBm6k_c4-S3ri,9599,2022-02-18T20:25:04Z,2022-02-18T20:25:04Z,OWNER,Here's a useful modern spec for how existing URL percentage encoding is supposed to work: https://url.spec.whatwg.org/#percent-encoded-bytes,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045131086,https://api.github.com/repos/simonw/datasette/issues/1439,1045131086,IC_kwDOBm6k_c4-S29O,9599,2022-02-18T20:22:13Z,2022-02-18T20:22:47Z,OWNER,"Should it encode `%` symbols too, since they have a special meaning in URLs and we can't guarantee that every single web server / proxy out there will round-trip them safely using percentage encoding? If so, would need to pick a different encoding character for them. Maybe `%` becomes `-p` - and in that case `/` could become `-s` too.
Is it worth expanding dash-encoding outside of just `/` and `-` and `.` though? Not sure.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045117304,https://api.github.com/repos/simonw/datasette/issues/1439,1045117304,IC_kwDOBm6k_c4-Szl4,9599,2022-02-18T20:09:22Z,2022-02-18T20:09:22Z,OWNER,Adopting this could result in supporting database files with surprising characters in their filename too.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045108611,https://api.github.com/repos/simonw/datasette/issues/1439,1045108611,IC_kwDOBm6k_c4-SxeD,9599,2022-02-18T20:02:19Z,2022-02-18T20:08:34Z,OWNER,"One other potential variant:
```python
def dash_encode(s):
return s.replace(""-"", ""-dash-"").replace(""."", ""-dot-"").replace(""/"", ""-slash-"")
def dash_decode(s):
return s.replace(""-slash-"", ""/"").replace(""-dot-"", ""."").replace(""-dash-"", ""-"")
```
Except this has bugs - it doesn't round-trip safely, because it can get confused about things like `-dash-slash-` in terms of is that a `-dash-` or a `-slash-`?
```pycon
>>> dash_encode(""/db/table-.csv.csv"")
'-slash-db-slash-table-dash--dot-csv-dot-csv'
>>> dash_decode('-slash-db-slash-table-dash--dot-csv-dot-csv')
'/db/table-.csv.csv'
>>> dash_encode('-slash-db-slash-table-dash--dot-csv-dot-csv')
'-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv'
>>> dash_decode('-dash-slash-dash-db-dash-slash-dash-table-dash-dash-dash--dash-dot-dash-csv-dash-dot-dash-csv')
'-dash/dash-db-dash/dash-table-dash--dash.dash-csv-dash.dash-csv'
```
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045111309,https://api.github.com/repos/simonw/datasette/issues/1439,1045111309,IC_kwDOBm6k_c4-SyIN,9599,2022-02-18T20:04:24Z,2022-02-18T20:05:40Z,OWNER,"This made me worry that my current `dash_decode()` implementation had unknown round-trip bugs, but thankfully this works OK:
```pycon
>>> dash_encode(""/db/table-.csv.csv"")
'-/db-/table---.csv-.csv'
>>> dash_encode('-/db-/table---.csv-.csv')
'---/db---/table-------.csv---.csv'
>>> dash_decode('---/db---/table-------.csv---.csv')
'-/db-/table---.csv-.csv'
>>> dash_decode('-/db-/table---.csv-.csv')
'/db/table-.csv.csv'
```
The regex still works against that double-encoded example too:
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045099290,https://api.github.com/repos/simonw/datasette/issues/1439,1045099290,IC_kwDOBm6k_c4-SvMa,9599,2022-02-18T19:56:18Z,2022-02-18T19:56:30Z,OWNER,"> ```python
> def dash_encode(s):
> return s.replace(""-"", ""--"").replace(""."", ""-."").replace(""/"", ""-/"")
>
> def dash_decode(s):
> return s.replace(""-/"", ""/"").replace(""-."", ""."").replace(""--"", ""-"")
> ```
I think **dash-encoding** (new name for this) is the right way forward here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045024276,https://api.github.com/repos/simonw/datasette/issues/1439,1045024276,IC_kwDOBm6k_c4-Sc4U,9599,2022-02-18T19:01:42Z,2022-02-18T19:55:24Z,OWNER,"> Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.
```python
def dash_encode(s):
return s.replace(""-"", ""--"").replace(""."", ""-."").replace(""/"", ""-/"")
def dash_decode(s):
return s.replace(""-/"", ""/"").replace(""-."", ""."").replace(""--"", ""-"")
```
```pycon
>>> dash_encode(""foo/bar/baz.csv"")
'foo-/bar-/baz-.csv'
>>> dash_decode('foo-/bar-/baz-.csv')
'foo/bar/baz.csv'
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045095348,https://api.github.com/repos/simonw/datasette/issues/1439,1045095348,IC_kwDOBm6k_c4-SuO0,9599,2022-02-18T19:53:48Z,2022-02-18T19:53:48Z,OWNER,"> Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where ""system"" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.
>
> And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.
I don't think this matters. The new regex does indeed capture that kind of page:
But Datasette goes through configured route regular expressions in order - so I can have the regex that captures `/db/-/special` routes listed before the one that captures tables and formats.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045081042,https://api.github.com/repos/simonw/datasette/issues/1439,1045081042,IC_kwDOBm6k_c4-SqvS,9599,2022-02-18T19:44:12Z,2022-02-18T19:51:34Z,OWNER,"```python
def dot_encode(s):
return s.replace(""."", "".."").replace(""/"", ""./"")
def dot_decode(s):
return s.replace(""./"", ""/"").replace("".."", ""."")
```
No need for hyphen encoding in this variant at all, which simplifies things a bit.
(Update: this is flawed, see https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045086033,https://api.github.com/repos/simonw/datasette/issues/1439,1045086033,IC_kwDOBm6k_c4-Sr9R,9599,2022-02-18T19:47:43Z,2022-02-18T19:51:11Z,OWNER,"- https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv
- https://til.simonwillison.net/-/asgi-scope/db/./db./table-..csv..csv
Do both of those survive the round-trip to populate `raw_path` correctly?
No! In both cases the `/./` bit goes missing.
It looks like this might even be a client issue - `curl` shows me this:
```
~ % curl -vv -i 'https://datasette.io/-/asgi-scope/db/./db./table-..csv..csv'
* Trying 216.239.32.21:443...
* Connected to datasette.io (216.239.32.21) port 443 (#0)
* ALPN, offering http/1.1
* TLS 1.2 connection using TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256
* Server certificate: datasette.io
* Server certificate: R3
* Server certificate: ISRG Root X1
> GET /-/asgi-scope/db/db./table-..csv..csv HTTP/1.1
```
So `curl` decided to turn `/-/asgi-scope/db/./db./table` into `/-/asgi-scope/db/db./table` before even sending the request.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045082891,https://api.github.com/repos/simonw/datasette/issues/1439,1045082891,IC_kwDOBm6k_c4-SrML,9599,2022-02-18T19:45:32Z,2022-02-18T19:45:32Z,OWNER,"```pycon
>>> dot_encode(""/db/table-.csv.csv"")
'./db./table-..csv..csv'
>>> dot_decode('./db./table-..csv..csv')
'/db/table-.csv.csv'
```
I worry that web servers might treat `./` in a special way though.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045077590,https://api.github.com/repos/simonw/datasette/issues/1439,1045077590,IC_kwDOBm6k_c4-Sp5W,9599,2022-02-18T19:41:37Z,2022-02-18T19:42:41Z,OWNER,"Ugh, one disadvantage I just spotted with this: Datasette already has a `/-/versions.json` convention where ""system"" URLs are namespaced under `/-/` - but that could be confused under this new scheme with the `-/` escaping sequence.
And I've thought about adding `/db/-/special` and `/db/table/-/special` URLs in the past too.
Maybe change this system to use `.` as the escaping character instead of `-`?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045075207,https://api.github.com/repos/simonw/datasette/issues/1439,1045075207,IC_kwDOBm6k_c4-SpUH,9599,2022-02-18T19:39:35Z,2022-02-18T19:40:13Z,OWNER,"> And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this:
>
> * `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version
> * `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version
> * `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version
Here's what those look like with the updated version of `dot_dash_encode()` that also encodes `/` as `-/`:
- `/db/-/db-/table---.csv-.csv` - HTML
- `/db/-/db-/table---.csv-.csv.csv` - CSV
- `/db/-/db-/table---.csv-.csv.json` - JSON
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045059427,https://api.github.com/repos/simonw/datasette/issues/1439,1045059427,IC_kwDOBm6k_c4-Sldj,9599,2022-02-18T19:26:25Z,2022-02-18T19:26:25Z,OWNER,"With this new pattern I could probably extract out the optional `.json` format string as part of the initial route capturing regex too, rather than the current `table_and_format` hack.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045055772,https://api.github.com/repos/simonw/datasette/issues/1439,1045055772,IC_kwDOBm6k_c4-Skkc,9599,2022-02-18T19:23:33Z,2022-02-18T19:25:42Z,OWNER,"I want a match for this URL:
/db/table-/with-/slashes-.csv
Maybe this:
^/(?P[^/]+)/(?P([^/]*|(\-/)*|(\-\.)*|(\.\.)*)*$)
Here we are matching a sequence of:
([^/]*|(\-/)*|(\-\.)*|(\-\-)*)*
So a combination of not-slashes OR -/ or -. Or -- sequences
^/(?P[^/]+)/(?P([^/]*|(\-/)*|(\-\.)*|(\-\-)*)*$)
Try that with non-capturing bits:
^/(?P[^/]+)/(?P(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*$)
`(?:[^/]*|(?:\-/)*|(?:\-\.)*|(?:\-\-)*)*` visualized is:
Here's the explanation on regex101.com https://regex101.com/r/CPnsIO/1
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045032377,https://api.github.com/repos/simonw/datasette/issues/1439,1045032377,IC_kwDOBm6k_c4-Se25,9599,2022-02-18T19:06:50Z,2022-02-18T19:06:50Z,OWNER,"How does URL routing for https://latest.datasette.io/fixtures/table%2Fwith%2Fslashes.csv work?
Right now it's https://github.com/simonw/datasette/blob/7d24fd405f3c60e4c852c5d746c91aa2ba23cf5b/datasette/app.py#L1098-L1101
That's not going to capture the dot-dash encoding version of that table name:
```pycon
>>> dot_dash_encode(""table/with/slashes.csv"")
'table-/with-/slashes-.csv'
```
Probably needs a fancy regex trick like a negative lookbehind assertion or similar.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1045027067,https://api.github.com/repos/simonw/datasette/issues/1439,1045027067,IC_kwDOBm6k_c4-Sdj7,9599,2022-02-18T19:03:26Z,2022-02-18T19:03:26Z,OWNER,"(If I make this change it may break some existing Datasette installations when they upgrade - I could try and build a plugin for them which triggers on 404s and checks to see if the old format would return a 200 response, then returns that.)","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-1031141849,https://api.github.com/repos/simonw/datasette/issues/1439,1031141849,IC_kwDOBm6k_c49dfnZ,9599,2022-02-07T07:11:11Z,2022-02-07T07:11:11Z,OWNER,"I added a Link header to solve this problem for the JSON version in:
- #1533 ","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900715375,https://api.github.com/repos/simonw/datasette/issues/1439,900715375,IC_kwDOBm6k_c41r9Nv,9599,2021-08-18T00:15:28Z,2021-08-18T00:15:28Z,OWNER,"Maybe I should use `-/` to encode forward slashes too, to defend against any ASGI servers that might not implement `raw_path` correctly.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900714630,https://api.github.com/repos/simonw/datasette/issues/1439,900714630,IC_kwDOBm6k_c41r9CG,9599,2021-08-18T00:13:33Z,2021-08-18T00:13:33Z,OWNER,"The documentation should definitely cover how table names become URLs, in case any third party code needs to be able to calculate this themselves.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900712981,https://api.github.com/repos/simonw/datasette/issues/1439,900712981,IC_kwDOBm6k_c41r8oV,9599,2021-08-18T00:09:59Z,2021-08-18T00:12:32Z,OWNER,"So given the original examples, a table called `table.csv` would have the following URLs:
- `/db/table-.csv` - the HTML version
- `/db/table-.csv.csv` - the CSV version
- `/db/table-.csv.json` - the JSON version
And if for some horific reason you had a table with the name `/db/table-.csv.csv` (so `/db/` was the first part of the actual table name in SQLite) the URLs would look like this:
- `/db/%2Fdb%2Ftable---.csv-.csv` - the HTML version
- `/db/%2Fdb%2Ftable---.csv-.csv.csv` - the CSV version
- `/db/%2Fdb%2Ftable---.csv-.csv.json` - the JSON version","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900711967,https://api.github.com/repos/simonw/datasette/issues/1439,900711967,IC_kwDOBm6k_c41r8Yf,9599,2021-08-18T00:08:09Z,2021-08-18T00:08:09Z,OWNER,"Here's an alternative I just made up which I'm calling ""dot dash"" encoding:
```python
def dot_dash_encode(s):
return s.replace(""-"", ""--"").replace(""."", ""-."")
def dot_dash_decode(s):
return s.replace(""-."", ""."").replace(""--"", ""-"")
```
And some examples:
```python
for example in (
""hello"",
""hello.csv"",
""hello-and-so-on.csv"",
""hello-.csv"",
""hello--and--so--on-.csv"",
""hello.csv."",
""hello.csv.-"",
""hello.csv.--"",
):
print(example)
print(dot_dash_encode(example))
print(example == dot_dash_decode(dot_dash_encode(example)))
print()
```
Outputs:
```
hello
hello
True
hello.csv
hello-.csv
True
hello-and-so-on.csv
hello--and--so--on-.csv
True
hello-.csv
hello---.csv
True
hello--and--so--on-.csv
hello----and----so----on---.csv
True
hello.csv.
hello-.csv-.
True
hello.csv.-
hello-.csv-.--
True
hello.csv.--
hello-.csv-.----
True
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900709703,https://api.github.com/repos/simonw/datasette/issues/1439,900709703,IC_kwDOBm6k_c41r71H,9599,2021-08-18T00:03:09Z,2021-08-18T00:03:09Z,OWNER,"But... what if I invent my own escaping scheme?
I actually did this once before, in https://github.com/simonw/datasette/commit/9fdb47ca952b93b7b60adddb965ea6642b1ff523 - while I was working on porting Datasette to ASGI in https://github.com/simonw/datasette/issues/272#issuecomment-494192779 because ASGI didn't yet have the `raw_path` mechanism.
I could bring that back - it looked like this:
```
""table/and/slashes"" => ""tableU+002FandU+002Fslashes""
""~table"" => ""U+007Etable""
""+bobcats!"" => ""U+002Bbobcats!""
""U+007Etable"" => ""UU+002B007Etable""
```
But I didn't particularly like it - it was quite verbose.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900705226,https://api.github.com/repos/simonw/datasette/issues/1439,900705226,IC_kwDOBm6k_c41r6vK,9599,2021-08-17T23:50:32Z,2021-08-17T23:50:47Z,OWNER,"An alternative solution would be to use some form of escaping for the characters that form the name of the table.
The obvious way to do this would be URL-encoding - but it doesn't hold for `.` characters. The hex for that is `%2E` but watch what happens with that in a URL:
```
# Against Cloud Run:
curl -s 'https://datasette.io/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path
'path': '/-/asgi-scope/foo/bar/baz.',
'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz.',
'root_path': '',
# Against Vercel:
curl -s 'https://til.simonwillison.net/-/asgi-scope/foo/bar%2Fbaz%2E' | rg path
'path': '/-/asgi-scope/foo/bar%2Fbaz%2E',
'raw_path': b'/-/asgi-scope/foo/bar%2Fbaz%2E',
'root_path': '',
```
Surprisingly in this case Vercel DOES keep it intact, but Cloud Run does not.
It's still no good though: I need a solution that works on Vercel, Cloud Run and every other potential hosting provider too.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,
https://github.com/simonw/datasette/issues/1439#issuecomment-900699670,https://api.github.com/repos/simonw/datasette/issues/1439,900699670,IC_kwDOBm6k_c41r5YW,9599,2021-08-17T23:34:23Z,2021-08-17T23:34:23Z,OWNER,"The challenge comes down to telling the difference between the following:
- `/db/table` - an HTML table page
- `/db/table.csv` - the CSV version of `/db/table`
- `/db/table.csv` - no this one is actually a database table called `table.csv`
- `/db/table.csv.csv` - the CSV version of `/db/table.csv`
- `/db/table.csv.csv.csv` and so on...","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",973139047,