issue_comments
12 rows where author_association = "NONE" and issue = 813880401 sorted by updated_at descending
This data as json, CSV (advanced)
Suggested facets: reactions, created_at (date), updated_at (date)
issue 1
- WIP: Add Gmail takeout mbox import · 12 ✖
id | html_url | issue_url | node_id | user | created_at | updated_at ▲ | author_association | body | reactions | issue | performed_via_github_app |
---|---|---|---|---|---|---|---|---|---|---|---|
888075098 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-888075098 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs407vNa | maxhawkins 28565 | 2021-07-28T07:18:56Z | 2021-07-28T07:18:56Z | NONE |
I did some investigation into this issue and made a fix here. The problem was that some messages (like gchat logs) don't have a @simonw While looking into this I found something unexpected about how sqlite_utils handles upserts if the pkey column is |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
885098025 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885098025 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40wYYp | UtahDave 306240 | 2021-07-22T17:47:50Z | 2021-07-22T17:47:50Z | NONE | Hi @maxhawkins , I'm sorry, I haven't had any time to work on this. I'll have some time tomorrow to test your commits. I think they look great. I'm great with your commits superseding my initial attempt here. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
885094284 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885094284 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40wXeM | maxhawkins 28565 | 2021-07-22T17:41:32Z | 2021-07-22T17:41:32Z | NONE | I added a follow-up commit that deals with emails that don't have a |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
885022230 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-885022230 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40wF4W | maxhawkins 28565 | 2021-07-22T15:51:46Z | 2021-07-22T15:51:46Z | NONE | One thing I noticed is this importer doesn't save attachments along with the body of the emails. It would be nice if those got stored as blobs in a separate attachments table so attachments can be included while fetching search results. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
884672647 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-884672647 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | IC_kwDODFE5qs40uwiH | maxhawkins 28565 | 2021-07-22T05:56:31Z | 2021-07-22T14:03:08Z | NONE | How does this commit look? https://github.com/maxhawkins/google-takeout-to-sqlite/commit/72802a83fee282eb5d02d388567731ba4301050d It seems that Takeout's mbox format is pretty simple, so we can get away with just splitting the file on lines begining with I was able to load a 12GB takeout mbox without the program using more than a couple hundred MB of memory during the import process. It does make us lose the progress bar, but maybe I can add that back in a later commit. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
849708617 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-849708617 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDg0OTcwODYxNw== | maxhawkins 28565 | 2021-05-27T15:01:42Z | 2021-05-27T15:01:42Z | NONE | Any updates? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
791530093 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791530093 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MTUzMDA5Mw== | UtahDave 306240 | 2021-03-05T16:28:07Z | 2021-03-05T16:28:07Z | NONE |
@maxhawkins a limitation of the python mbox module is it loads the entire mbox into memory. I did find another approach to this problem that didn't use the builtin python mbox module and created a generator so that it didn't have to load the whole mbox into memory. I was hoping to use standard library modules, but this might be a good reason to investigate that approach a bit more. My worry is making sure a custom processor handles all the ins and outs of the mbox format correctly. Hm. As I'm writing this, I thought of something. I think I can parse each message one at a time, and then use an mbox function to load each message using the python mbox module. That way the mbox module can still deal with the specifics of the mbox format, but I can use a generator. I'll give that a try. Thanks for the feedback @maxhawkins and @simonw. I'll give that a try. @simonw can we hold off on merging this until I can test this new approach? |
{ "total_count": 3, "+1": 3, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
791089881 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-791089881 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MTA4OTg4MQ== | maxhawkins 28565 | 2021-03-05T02:03:19Z | 2021-03-05T02:03:19Z | NONE | I just tried to run this on a small VPS instance with 2GB of memory and it crashed out of memory while processing a 12GB mbox from Takeout. Is it possible to stream the emails to sqlite instead of loading it all into memory and upserting at once? |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
790391711 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790391711 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM5MTcxMQ== | UtahDave 306240 | 2021-03-04T07:36:24Z | 2021-03-04T07:36:24Z | NONE |
Ah, that's good to know. I think explicitly creating the tables will be a great improvement. I'll add that. Also, I noticed after I opened this PR that the Thanks for the feedback. I should have time tomorrow to put together some improvements. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
790389335 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-790389335 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc5MDM4OTMzNQ== | UtahDave 306240 | 2021-03-04T07:32:04Z | 2021-03-04T07:32:04Z | NONE |
The wait is from python loading the mbox file. This happens regardless if you're getting the length of the mbox. The mbox module is on the slow side. It is possible to do one's own parsing of the mbox, but I kind of wanted to avoid doing that. |
{ "total_count": 1, "+1": 1, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
784638394 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-784638394 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc4NDYzODM5NA== | UtahDave 306240 | 2021-02-24T00:36:18Z | 2021-02-24T00:36:18Z | NONE | I noticed that @simonw is using black for formatting. I ran black on my additions in this PR. |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 | |
783794520 | https://github.com/dogsheep/google-takeout-to-sqlite/pull/5#issuecomment-783794520 | https://api.github.com/repos/dogsheep/google-takeout-to-sqlite/issues/5 | MDEyOklzc3VlQ29tbWVudDc4Mzc5NDUyMA== | UtahDave 306240 | 2021-02-23T01:13:54Z | 2021-02-23T01:13:54Z | NONE | Also, @simonw I created a test based off the existing tests. I think it's working correctly |
{ "total_count": 0, "+1": 0, "-1": 0, "laugh": 0, "hooray": 0, "confused": 0, "heart": 0, "rocket": 0, "eyes": 0 } |
WIP: Add Gmail takeout mbox import 813880401 |
Advanced export
JSON shape: default, array, newline-delimited, object
CREATE TABLE [issue_comments] ( [html_url] TEXT, [issue_url] TEXT, [id] INTEGER PRIMARY KEY, [node_id] TEXT, [user] INTEGER REFERENCES [users]([id]), [created_at] TEXT, [updated_at] TEXT, [author_association] TEXT, [body] TEXT, [reactions] TEXT, [issue] INTEGER REFERENCES [issues]([id]) , [performed_via_github_app] TEXT); CREATE INDEX [idx_issue_comments_issue] ON [issue_comments] ([issue]); CREATE INDEX [idx_issue_comments_user] ON [issue_comments] ([user]);
user 2