home / github / issues

Menu
  • Search all tables
  • GraphQL API

issues: 530513784

This data as json

id node_id number title user state locked assignee milestone comments created_at updated_at closed_at author_association pull_request body repo type active_lock_reason performed_via_github_app reactions draft state_reason
530513784 MDExOlB1bGxSZXF1ZXN0MzQ3MTc5MDgx 644 Validate metadata json on startup 6025893 closed 0     1 2019-11-30T00:32:15Z 2021-07-28T17:58:45Z 2021-07-28T17:58:45Z CONTRIBUTOR simonw/datasette/pulls/644

This PR adds a sanity check which builds up a marshmallow schema on-the-fly based on the structure of the database(s) on startup and then validates the metadata json against it.

In case of invalid data, this will raise with a descriptive error e.g:

marshmallow.exceptions.ValidationError: {'databases': {'fixtures': {'tables': {'not_a_table': ['Unknown field.']}}}}

Closes #260


This was intended to be fairly self-contained, but then while I was working on it, I hit some problems getting the tests to pass in the context of the test suite as a whole. My tests passed in isolation, but then failed while doing a full test suite run. That's when the worms started coming out of the can :bug: After some sleuthing, it turned out this was essentially the result of several issues intersecting:

  • There are certain events in the application lifecycle where the metadata schema can be modified after it is loaded e.g: https://github.com/simonw/datasette/blob/a562f2965552fb2dbbbd74df245c9965ee23d886/datasette/app.py#L299-L320 This means that sometimes what goes in isn't always exactly what comes out when you call /-/metadata.
  • Because the test fixtures use session scope for performance reasons if one unit test performs an action which mutates the metadata, that can impact on other unit tests which run after it using the same fixture.
  • Because the self._metadata property was being set with a simple assignment self._metadata = metadata, that created an object reference to the test fixture data, so operating on self._metadata was actually modifying the test fixture METADATA meaning that depending on when it was loaded in the test suite lifecycle, METADATA had different content, which was somewhat unexpected.

As such, I've added some band-aids in 3552024 and 6859fd8: * Switching the metadata object to a deepcopy of the input prevents us directly mutating the input fixture. * I've switched some of the tests to use a fixture with function scope instead of session scope so we're working on a clean copy that hasn't been mutated by other tests where necessary but keeping session scope in most cases for performance. * I haven't really addressed the fact that sometimes the metadata object gets mutated in place, so the object that is served from /-/metadata isn't necessarily always exactly the same as the file you fed into it on init. I'm not sure how much of a problem that is. The way the tests were written makes me think it was unexpected, but getting into it feels like too much scope creep for this PR so its probably best addressed as another issue.

107914493 pull    
{
    "url": "https://api.github.com/repos/simonw/datasette/issues/644/reactions",
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
0  

Links from other tables

  • 0 rows from issues_id in issues_labels
  • 1 row from issue in issue_comments
Powered by Datasette · Queries took 2.474ms · About: github-to-sqlite