I wish ..."",
""points"": null,
""parent_id"": 27941108,
""story_id"": 27941108
}
```","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",952189173,
https://github.com/dogsheep/hacker-news-to-sqlite/issues/3#issuecomment-886142671,https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/3,886142671,IC_kwDODtX3eM400XbP,9599,2021-07-25T03:51:05Z,2021-07-25T03:51:05Z,MEMBER,"Prototype:
curl 'https://hn.algolia.com/api/v1/items/27941108' \
| jq '[recurse(.children[]) | del(.children)]' \
| sqlite-utils insert hn.db items - --pk id
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",952189173,
https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886140431,https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2,886140431,IC_kwDODtX3eM400W4P,9599,2021-07-25T03:12:57Z,2021-07-25T03:12:57Z,MEMBER,"I'm going to build a general-purpose `hacker-new-to-sqlite search ...` command, where one of the options is to search within the URL.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",952179830,
https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886136224,https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2,886136224,IC_kwDODtX3eM400V2g,9599,2021-07-25T02:08:29Z,2021-07-25T02:08:29Z,MEMBER,"Prototype:
curl ""https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url&hitsPerPage=1000"" | \
jq .hits | sqlite-utils insert hn.db items - --pk objectID --alter","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",952179830,
https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886135922,https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2,886135922,IC_kwDODtX3eM400Vxy,9599,2021-07-25T02:06:20Z,2021-07-25T02:06:20Z,MEMBER,"https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url looks like it does what I want.
https://hn.algolia.com/api/v1/search_by_date?query=simonwillison.net&restrictSearchableAttributes=url&hitsPerPage=1000 - returns 1000 at once.
Otherwise you have to paginate using `&page=2` etc - up to `nbPages` pages.
https://www.algolia.com/doc/api-reference/api-parameters/hitsPerPage/ says 1000 is the maximum.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",952179830,
https://github.com/dogsheep/hacker-news-to-sqlite/issues/2#issuecomment-886135562,https://api.github.com/repos/dogsheep/hacker-news-to-sqlite/issues/2,886135562,IC_kwDODtX3eM400VsK,9599,2021-07-25T02:01:11Z,2021-07-25T02:01:11Z,MEMBER,"That page doesn't have an API but does look easy to scrape.
The other option here is the HN Search API powered by Algolia, documented at https://hn.algolia.com/api","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",952179830,
https://github.com/dogsheep/healthkit-to-sqlite/issues/12#issuecomment-879477586,https://api.github.com/repos/dogsheep/healthkit-to-sqlite/issues/12,879477586,MDEyOklzc3VlQ29tbWVudDg3OTQ3NzU4Ng==,9599,2021-07-13T23:50:06Z,2021-07-13T23:50:06Z,MEMBER,"Unfortunately I don't think updating the database is practical, because the export doesn't include unique identifiers which can be used to update existing records and create new ones. Recreating from scratch works around that limitation.
I've not explored workouts with SpatiaLite but that's a really good idea.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",727848625,
https://github.com/dogsheep/github-to-sqlite/issues/64#issuecomment-861042050,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/64,861042050,MDEyOklzc3VlQ29tbWVudDg2MTA0MjA1MA==,9599,2021-06-14T22:45:42Z,2021-06-14T22:45:42Z,MEMBER,I'm definitely interested in supporting events in this tool - see #14.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",920636216,
https://github.com/dogsheep/github-to-sqlite/issues/64#issuecomment-861041597,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/64,861041597,MDEyOklzc3VlQ29tbWVudDg2MTA0MTU5Nw==,9599,2021-06-14T22:44:54Z,2021-06-14T22:44:54Z,MEMBER,Have you found a way to access events in GraphQL? I can only see way to access a timeline of events for a single issue or a single pull request. See also https://github.community/t/get-event-equivalent-for-v4/13600/2,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",920636216,
https://github.com/dogsheep/github-to-sqlite/pull/59#issuecomment-844250232,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/59,844250232,MDEyOklzc3VlQ29tbWVudDg0NDI1MDIzMg==,9599,2021-05-19T16:08:10Z,2021-05-19T16:08:10Z,MEMBER,Thanks for catching this.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",771872303,
https://github.com/dogsheep/github-to-sqlite/pull/61#issuecomment-844249385,https://api.github.com/repos/dogsheep/github-to-sqlite/issues/61,844249385,MDEyOklzc3VlQ29tbWVudDg0NDI0OTM4NQ==,9599,2021-05-19T16:07:06Z,2021-05-19T16:07:06Z,MEMBER,Thanks!,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",797108702,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790695126,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790695126,MDEyOklzc3VlQ29tbWVudDc5MDY5NTEyNg==,9599,2021-03-04T15:20:42Z,2021-03-04T15:20:42Z,MEMBER,"I'm not sure why but my most recent import, when displayed in Datasette, looks like this:
Sorting by `id` in the opposite order gives me the data I would expect - so it looks like a bunch of null/blank messages are being imported at some point and showing up first due to ID ordering.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790693674,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790693674,MDEyOklzc3VlQ29tbWVudDc5MDY5MzY3NA==,9599,2021-03-04T15:18:36Z,2021-03-04T15:18:36Z,MEMBER,"I imported my 10GB mbox with 750,000 emails in it, ran this tool (with a hacked fix for the blob column problem) - and now a search that returns 92 results takes 25.37ms! This is fantastic.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790669767,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790669767,MDEyOklzc3VlQ29tbWVudDc5MDY2OTc2Nw==,9599,2021-03-04T14:46:06Z,2021-03-04T14:46:06Z,MEMBER,"Solution could be to pre-process that string by splitting on `(` and dropping everything afterwards, assuming that the `(...)` bit isn't necessary for correctly parsing the date.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790668263,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790668263,MDEyOklzc3VlQ29tbWVudDc5MDY2ODI2Mw==,9599,2021-03-04T14:43:58Z,2021-03-04T14:43:58Z,MEMBER,"I added this code to output a message ID on errors:
```diff
print(""Errors: {}"".format(num_errors))
print(traceback.format_exc())
+ print(""Message-Id: {}"".format(email.get(""Message-Id"", ""None"")))
continue
```
Having found a message ID that had an error, I ran this command to see the context:
rg --text --context 20 '44F289B0.000001.02100@SCHWARZE-DWFXMI' ~/gmail.mbox
This was for the following error:
```
File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 102, in get_mbox
message[""date""] = get_message_date(email.get(""Date""), email.get_from())
File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 178, in get_message_date
datetime_tuple = email.utils.parsedate_tz(mail_date)
File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 50, in parsedate_tz
res = _parsedate_tz(data)
File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 69, in _parsedate_tz
data = data.split()
AttributeError: 'Header' object has no attribute 'split'
```
Here's what I spotted in the `ripgrep` output:
```
177133570:Message-Id: <44F289B0.000001.02100@SCHWARZE-DWFXMI>
177133571-Date: Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)
177133572-X-Mailer: IncrediMail (5002253)
```
So it could it be that `_parsedate_tz` is having trouble with that `Mon, 28 Aug 2006 08:14:08 +0200 (Westeurop�ische Sommerzeit)` string.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/issues/6#issuecomment-790384087,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/6,790384087,MDEyOklzc3VlQ29tbWVudDc5MDM4NDA4Nw==,9599,2021-03-04T07:22:51Z,2021-03-04T07:22:51Z,MEMBER,#3 also mentions the conflicting version with other tools.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",821841046,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790380839,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790380839,MDEyOklzc3VlQ29tbWVudDc5MDM4MDgzOQ==,9599,2021-03-04T07:17:05Z,2021-03-04T07:17:05Z,MEMBER,"Looks like you're doing this:
```python
elif message.get_content_type() == ""text/plain"":
body = message.get_payload(decode=True)
```
So presumably that decodes to a unicode string?
I imagine the reason the column is a `BLOB` for me is that `sqlite-utils` determines the column type based on the first batch of items - https://github.com/simonw/sqlite-utils/blob/09c3386f55f766b135b6a1c00295646c4ae29bec/sqlite_utils/db.py#L1927-L1928 - and I got unlucky and had something in my first batch that wasn't a unicode string.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790379629,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790379629,MDEyOklzc3VlQ29tbWVudDc5MDM3OTYyOQ==,9599,2021-03-04T07:14:41Z,2021-03-04T07:14:41Z,MEMBER,"Confirmed: removing the `len()` call does not speed things up, so it's reading through the entire file for some other purpose too.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790378658,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790378658,MDEyOklzc3VlQ29tbWVudDc5MDM3ODY1OA==,9599,2021-03-04T07:12:48Z,2021-03-04T07:12:48Z,MEMBER,"It looks like the `body` is being loaded into a BLOB column - so in Datasette default it looks like this:
If I `datasette install datasette-render-binary` and then try again I get this:
It would be great if we could store the `body` as unicode text instead. May have to do something clever to decode it based on some kind of charset header?","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790373024,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790373024,MDEyOklzc3VlQ29tbWVudDc5MDM3MzAyNA==,9599,2021-03-04T07:01:58Z,2021-03-04T07:04:06Z,MEMBER,"I got 9 warnings that look like this:
```
Errors: 1
Traceback (most recent call last):
File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 103, in get_mbox
message[""date""] = get_message_date(email.get(""Date""), email.get_from())
File ""/Users/simon/Dropbox/Development/google-takeout-to-sqlite/google_takeout_to_sqlite/utils.py"", line 167, in get_message_date
datetime_tuple = email.utils.parsedate_tz(mail_date)
File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 50, in parsedate_tz
res = _parsedate_tz(data)
File ""/Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.7/lib/python3.7/email/_parseaddr.py"", line 69, in _parsedate_tz
data = data.split()
AttributeError: 'Header' object has no attribute 'split'
```
It would be useful if those warnings told me the message ID (or similar) of the affected message so I could grep for it in the `mbox` and see what was going on.
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790372621,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790372621,MDEyOklzc3VlQ29tbWVudDc5MDM3MjYyMQ==,9599,2021-03-04T07:01:18Z,2021-03-04T07:01:18Z,MEMBER,"I'm not sure if it would work, but there is an alternative pattern for showing a progress bar against a really large file that I've used in `healthkit-to-sqlite` - you set the progress bar size to the size of the file in bytes, then update a counter as you read the file.
https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/cli.py#L24-L57 and https://github.com/dogsheep/healthkit-to-sqlite/blob/3eb2b06bfe3b4faaf10e9cf9dfcb28e3d16c14ff/healthkit_to_sqlite/utils.py#L4-L19 (the `progress_callback()` bit) is where that happens.
It can be a bit of a convoluted pattern, and I'm not at all sure it would work for `mbox` files since it looks like that library has other reasons it needs to do a file scan rather than streaming it through one chunk of bytes at a time. So I imagine this would not work here.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790370485,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790370485,MDEyOklzc3VlQ29tbWVudDc5MDM3MDQ4NQ==,9599,2021-03-04T06:57:25Z,2021-03-04T06:57:48Z,MEMBER,"The command takes quite a while to start running, presumably because this line causes it to have to scan the WHOLE file in order to generate a count:
https://github.com/dogsheep/google-takeout-to-sqlite/blob/a3de045eba0fae4b309da21aa3119102b0efc576/google_takeout_to_sqlite/utils.py#L66-L67
I'm fine with waiting though. It's not like this is a command people run every day - and without that count we can't show a progress bar, which seems pretty important for a process that takes this long.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790369076,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790369076,MDEyOklzc3VlQ29tbWVudDc5MDM2OTA3Ng==,9599,2021-03-04T06:54:46Z,2021-03-04T06:54:46Z,MEMBER,"The Rich-powered progress bar is pretty:
![rich](https://user-images.githubusercontent.com/9599/109923307-71f69200-7c73-11eb-9ee2-8f0a240f3994.gif)
","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790312268,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,790312268,MDEyOklzc3VlQ29tbWVudDc5MDMxMjI2OA==,9599,2021-03-04T05:48:16Z,2021-03-04T05:48:16Z,MEMBER,"Wow, my mbox is a 10.35 GB download!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-786925280,https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5,786925280,MDEyOklzc3VlQ29tbWVudDc4NjkyNTI4MA==,9599,2021-02-26T22:23:10Z,2021-02-26T22:23:10Z,MEMBER,"Thanks!
I requested my Gmail export from takeout - once that arrives I'll test it against this and then merge the PR.","{""total_count"": 1, ""+1"": 1, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",813880401,
https://github.com/dogsheep/evernote-to-sqlite/pull/10#issuecomment-777839351,https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/10,777839351,MDEyOklzc3VlQ29tbWVudDc3NzgzOTM1MQ==,9599,2021-02-11T22:37:55Z,2021-02-11T22:37:55Z,MEMBER,"I've merged these changes by hand now, thanks!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",770712149,
https://github.com/dogsheep/evernote-to-sqlite/issues/7#issuecomment-777827396,https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/7,777827396,MDEyOklzc3VlQ29tbWVudDc3NzgyNzM5Ng==,9599,2021-02-11T22:13:14Z,2021-02-11T22:13:14Z,MEMBER,My best guess is that you have an older version of `sqlite-utils` installed here - the `replace=True` argument was added in version 2.0. I've bumped the dependency in `setup.py`.,"{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",743297582,
https://github.com/dogsheep/evernote-to-sqlite/issues/9#issuecomment-777821383,https://api.github.com/repos/dogsheep/evernote-to-sqlite/issues/9,777821383,MDEyOklzc3VlQ29tbWVudDc3NzgyMTM4Mw==,9599,2021-02-11T22:01:28Z,2021-02-11T22:01:28Z,MEMBER,"Aha! I think I've figured out what's going on here.
The CData blocks containing the notes look like this:
`