home / github

Menu
  • Search all tables
  • GraphQL API

issues

Table actions
  • GraphQL API for issues

4 rows where user = 1448859 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: comments, created_at (date), updated_at (date), closed_at (date)

state 2

  • closed 3
  • open 1

repo 2

  • sqlite-utils 3
  • datasette 1

type 1

  • issue 4
id node_id number title user state locked assignee milestone comments created_at updated_at ▲ closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
1839344979 I_kwDOCGYnMM5toi1T 582 Handling CSV/file input that contains NUL bytes betatim 1448859 open 0     0 2023-08-07T12:24:14Z 2023-08-07T12:24:14Z   NONE  

I was using sqlite-utils to create a DB from a CSV and it turns out the CSV contains a NUL byte.

When the processing reaches the line that contains the NUL an exception is raised.

I'm wondering if there is something that can be done in sqlite-utils to say "skip lines with encoding errors" or some such. I think it isn't super straightforward though as the exception comes from inside the csv module that does all the parsing.

Concretely the file is the KernelVersions.csv from https://www.kaggle.com/datasets/kaggle/meta-kaggle

This is the command and output: $ sqlite-utils insert --csv kaggle.db kaggle KernelVersions.csv [------------------------------------] 0% [#####################---------------] 60% 00:04:24Traceback (most recent call last): File "/home/foobar/miniconda/envs/meta-kaggle/bin/sqlite-utils", line 10, in <module> sys.exit(cli()) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1128, in __call__ return self.main(*args, **kwargs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1053, in main rv = self.invoke(ctx) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1659, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 1395, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/click/core.py", line 754, in invoke return __callback(*args, **kwargs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1223, in insert insert_upsert_implementation( File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1085, in insert_upsert_implementation db[table].insert_all( File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3198, in insert_all chunk = list(chunk) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/db.py", line 3742, in fix_square_braces for record in records: File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1071, in <genexpr> docs = (decode_base64_values(doc) for doc in docs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1068, in <genexpr> docs = (verify_is_dict(doc) for doc in docs) File "/home/foobar/miniconda/envs/meta-kaggle/lib/python3.10/site-packages/sqlite_utils/cli.py", line 1003, in <genexpr> docs = (dict(zip(headers, row)) for row in reader) _csv.Error: line contains NUL

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/582/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
   
1257724585 I_kwDOCGYnMM5K91qp 441 Combining `rows_where()` and `search()` to limit which rows are searched betatim 1448859 closed 0     4 2022-06-02T06:01:55Z 2022-06-14T21:57:57Z 2022-06-14T21:54:38Z NONE  

What is the right way to limit a full text search query to some rows of a table?

For example, I have a table that contains the following columns: title, content, owner (each row represents a document). The owner column is a username. It feels right to store all documents in one table, instead of having one table per owner. In particular because I'd like to full text search all documents, only documents owned by one user and documents owned by a set of users.

I tried to combine .rows_where("owner = ?", "1234") and .search() from the Table class but I don't think that is meant to work. I discovered .search_sql() as a way to generate the FTS SQL statement. By hand I can edit it to add a AND [original].[owner] = :owner to the where clause. This seems to do what I want.

My two questions: 1. is adding a AND ... to the where clause actually the right thing to do or should I be doing something else (my SQL skills are low)? 2. is there a built-in to sqlite-utils way to achieve this?

Right now I am thinking I will make my own version of search_sql() that generates a query that contains an additional owner = :owner for my particular use-case.

Bonus question: is this generally useful/something to add to sqlite-utils or too niche?

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/441/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed
705057955 MDU6SXNzdWU3MDUwNTc5NTU= 969 Add --tar option to "datasette publish heroku" betatim 1448859 closed 0   Datasette 0.50 5971510 3 2020-09-20T06:54:53Z 2020-10-08T23:55:59Z 2020-10-08T23:30:59Z NONE  

This issue is about how best to pass additional options to tools used for publishing datasettes. A concrete example is wanting to pass the --tar flag to the heroku CLI tool. I think there are at least two options for doing this: documentation for each publishing tool to explain how to set flags via env variables (if possible) or building a mechanism that lets users pass additional flags through datasette.

When using datasette publish heroku binder-launches.db --extra-options="--config facet_time_limit_ms:35000 --config sql_time_limit_ms:35000" --name=binderlytics --install=datasette-vega to publish https://binderlytics.herokuapp.com/ the following error happens:

``` › Warning: heroku update available from 7.42.1 to 7.43.0. › Warning: heroku update available from 7.42.1 to 7.43.0. › Warning: heroku update available from 7.42.1 to 7.43.0. Setting WEB_CONCURRENCY and restarting ⬢ binderlytics... done, v13 WEB_CONCURRENCY: 1 › Warning: heroku update available from 7.42.1 to 7.43.0. ▸ Couldn't detect GNU tar. Builds could fail due to decompression errors ▸ See https://devcenter.heroku.com/articles/platform-api-deploying-slugs#create-slug-archive ▸ Please install it, or specify the '--tar' option ▸ Falling back to node's built-in compressor buffer.js:358 throw new ERR_INVALID_OPT_VALUE.RangeError('size', size); ^

RangeError [ERR_INVALID_OPT_VALUE]: The value "3303763968" is invalid for option "size" at Function.alloc (buffer.js:367:3) at new Buffer (buffer.js:281:19) at Readable.<anonymous> (/Users/thead/.local/share/heroku/node_modules/archiver-utils/index.js:39:15) at Readable.emit (events.js:322:22) at endReadableNT (/Users/thead/.local/share/heroku/node_modules/readable-stream/lib/_stream_readable.js:1010:12) at processTicksAndRejections (internal/process/task_queues.js:84:21) { code: 'ERR_INVALID_OPT_VALUE' } ```

After installing GNU tar with brew install gnu-tar and modifying datasette/publish/heroku.py to include the --tar=/path/to/gnu-tar publishing works.

I think the problem occurs once your heroku slug reaches a certain size. At least when I add only a few 100 entries to the datasette then the error does not occcur.

datasette version 0.49.1 OSX 10.14.6 (18G103)

datasette 107914493 issue    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/969/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed
593751293 MDU6SXNzdWU1OTM3NTEyOTM= 97 Adding a "recreate" flag to the `Database` constructor betatim 1448859 closed 0     4 2020-04-04T05:41:10Z 2020-04-15T14:29:31Z 2020-04-13T03:52:29Z NONE  

I have a script that imports data into a sqlite DB. When I re-run that script I'd like to remove the existing sqlite DB, instead of adding to it. The pragmatic answer is to add the check and file deletion to my script.

However I thought it would be easy and useful for others to add a recreate=True flag to db = sqlite_utils.Database("binder-launches.db"). After taking a look at the code for it I am not so sure any more. This is because the connection string could be a URL (or "connection string") like "file:///tmp/foo.db". I don't know what the equivalent of os.path.exists() is for a connection string or how to detect that something is a connection string and raise an error "can't use recreate=True and conn_string at the same time".

Does anyone have an idea/suggestion where to start investigating?

sqlite-utils 140912432 issue    
{
    "url": "https://api.github.com/repos/simonw/sqlite-utils/issues/97/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
  completed

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issues] (
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [number] INTEGER,
   [title] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [state] TEXT,
   [locked] INTEGER,
   [assignee] INTEGER REFERENCES [users]([id]),
   [milestone] INTEGER REFERENCES [milestones]([id]),
   [comments] INTEGER,
   [created_at] TEXT,
   [updated_at] TEXT,
   [closed_at] TEXT,
   [author_association] TEXT,
   [pull_request] TEXT,
   [body] TEXT,
   [repo] INTEGER REFERENCES [repos]([id]),
   [type] TEXT
, [active_lock_reason] TEXT, [performed_via_github_app] TEXT, [reactions] TEXT, [draft] INTEGER, [state_reason] TEXT);
CREATE INDEX [idx_issues_repo]
                ON [issues] ([repo]);
CREATE INDEX [idx_issues_milestone]
                ON [issues] ([milestone]);
CREATE INDEX [idx_issues_assignee]
                ON [issues] ([assignee]);
CREATE INDEX [idx_issues_user]
                ON [issues] ([user]);
Powered by Datasette · Queries took 32.08ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows