html_url,issue_url,id,node_id,user,created_at,updated_at,author_association,body,reactions,issue,performed_via_github_app https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905024066,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905024066,IC_kwDOCGYnMM418ZJC,66709385,2021-08-24T22:41:39Z,2021-08-24T22:41:39Z,NONE,"I'm happy with this functionality left the way you describe. In my case the data is homogeneous but other cases would work just by being consistent on the encoding. Thanks a lot, Simon!","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638, https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021010,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905021010,IC_kwDOCGYnMM418YZS,66709385,2021-08-24T22:33:42Z,2021-08-24T22:33:42Z,NONE,"Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode. From https://stackoverflow.com/questions/24616678/unicodedecodeerror-in-python-when-reading-a-file-how-to-ignore-the-error-and-ju : - 'ignore' ignores errors. Note that ignoring encoding errors can lead to data loss. - 'replace' causes a replacement marker (such as '?') to be inserted where there is malformed data. - 'surrogateescape' will represent any incorrect bytes as code points in the Unicode Private Use Area ranging from U+DC80 to U+DCFF. These private code points will then be turned back into the same bytes when the surrogateescape error handler is used when writing data. This is useful for processing files in an unknown encoding. - 'xmlcharrefreplace' is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference &#nnn;. - 'backslashreplace' (also only supported when writing) replaces unsupported characters with Python’s backslashed escape sequences.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638, https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905003381,https://api.github.com/repos/simonw/sqlite-utils/issues/319,905003381,IC_kwDOCGYnMM418UF1,66709385,2021-08-24T21:56:49Z,2021-08-24T21:56:49Z,NONE,"I was thinking that an approach could be making FILE_COLUMNS a generator (_get_file_columns(mode)) or you can just have a different set of columns (is there something else that makes sense to be changed on the text scenario?). About UTF-8 I was referring to the encoding to use when reading files. This can be difficult to auto-detect but I believe that UTF-8 is pretty much the standard for text files.","{""total_count"": 0, ""+1"": 0, ""-1"": 0, ""laugh"": 0, ""hooray"": 0, ""confused"": 0, ""heart"": 0, ""rocket"": 0, ""eyes"": 0}",976399638,