home / github / issue_comments

Menu
  • Search all tables
  • GraphQL API

issue_comments: 905021933

This data as json

html_url issue_url id node_id user created_at updated_at author_association body reactions issue performed_via_github_app
https://github.com/simonw/sqlite-utils/issues/319#issuecomment-905021933 https://api.github.com/repos/simonw/sqlite-utils/issues/319 905021933 IC_kwDOCGYnMM418Ynt 9599 2021-08-24T22:36:04Z 2021-08-24T22:36:04Z OWNER

Oh, I misread. Yes some files will not be valid UTF-8, I'd throw a warning and continue (not adding that file) but if you want to get more elaborate you could allow to define a policy on what to do. Not adding the file, index binary content or use a conversion policy like the ones available on Python's decode.

I thought about supporting those different policies (with something like --errors ignore) but I feel like that's getting a little bit too deep into the weeds. Right now if you try to import an invalid file the behaviour is the same as for the sqlite-utils insert command (I added the same detailed error message):

``` Error: Could not read file '/Users/simon/Dropbox/Development/sqlite-utils/data.txt' as text

'utf-8' codec can't decode byte 0xe3 in position 83: invalid continuation byte

The input you provided uses a character encoding other than utf-8.

You can fix this by passing the --encoding= option with the encoding of the file.

If you do not know the encoding, running 'file filename.csv' may tell you.

It's often worth trying: --encoding=latin-1 `` If someone has data that can't be translated to valid text using a known encoding, I'm happy leaving them to have to insert it into aBLOB` column instead.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
976399638  
Powered by Datasette · Queries took 0.986ms · About: github-to-sqlite