home / github

Menu
  • Search all tables
  • GraphQL API

issue_comments

Table actions
  • GraphQL API for issue_comments

51 rows where issue = 849978964 sorted by updated_at descending

✖
✖

✎ View and edit SQL

This data as json, CSV (advanced)

Suggested facets: created_at (date), updated_at (date)

user 1

  • simonw 51

issue 1

  • Show column metadata plus links for foreign keys on arbitrary query results · 51 ✖

author_association 1

  • OWNER 51
id html_url issue_url node_id user created_at updated_at ▲ author_association body reactions issue performed_via_github_app
1235752140 https://github.com/simonw/datasette/issues/1293#issuecomment-1235752140 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c5JqBTM simonw 9599 2022-09-02T17:34:09Z 2022-09-02T17:34:09Z OWNER

Accidentally closed.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
901475812 https://github.com/simonw/datasette/issues/1293#issuecomment-901475812 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41u23k simonw 9599 2021-08-18T22:41:19Z 2021-08-18T22:41:19Z OWNER

Maybe I split this out into a separate Python library that gets tested against every SQLite release I can possibly try it against, and then bakes out the supported release versions into the library code itself?

I'm going to do this, and call the Python library sqlite-explain.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
899915829 https://github.com/simonw/datasette/issues/1293#issuecomment-899915829 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41o6A1 simonw 9599 2021-08-17T01:02:35Z 2021-08-17T01:02:35Z OWNER

New approach: this time I'm building a simplified executor for the bytecode operations themselves. python def execute_operations(operations, max_iterations = 100, trace=None): trace = trace or (lambda *args: None) registers: Dict[int, Any] = {} cursors: Dict[int, Tuple[str, Dict]] = {} instruction_pointer = 0 iterations = 0 result_row = None while True: iterations += 1 if iterations > max_iterations: break operation = operations[instruction_pointer] trace(instruction_pointer, dict(operation)) opcode = operation["opcode"] if opcode == "Init": if operation["p2"] != 0: instruction_pointer = operation["p2"] continue else: instruction_pointer += 1 continue elif opcode == "Goto": instruction_pointer = operation["p2"] continue elif opcode == "Halt": break elif opcode == "OpenRead": cursors[operation["p1"]] = ("database_table", { "rootpage": operation["p2"], "connection": operation["p3"], }) elif opcode == "OpenEphemeral": cursors[operation["p1"]] = ("ephemeral", { "num_columns": operation["p2"], "index_keys": [], }) elif opcode == "MakeRecord": registers[operation["p3"]] = ("MakeRecord", { "registers": list(range(operation["p1"] + operation["p2"])) }) elif opcode == "IdxInsert": record = registers[operation["p2"]] cursors[operation["p1"]][1]["index_keys"].append(record) elif opcode == "Rowid": registers[operation["p2"]] = ("rowid", { "table": operation["p1"] }) elif opcode == "Sequence": registers[operation["p2"]] = ("sequence", { "next_from_cursor": operation["p1"] }) elif opcode == "Column": registers[operation["p3"]] = ("column", { "cursor": operation["p1"], "column_offset": operation["p2"] }) elif opcode == "ResultRow": p1 = operation["p1"] p2 = operation["p2"] trace("ResultRow: ", list(range(p1, p1 + p2)), registers) result_row = [registers.get(i) for i in range(p1, p1 + p2)] elif opcode == "Integer": registers[operation["p2"]] = ("Integer", operation["p1"]) elif opcode == "String8": registers[operation["p2"]] = ("String", operation["p4"]) instruction_pointer += 1 return {"registers": registers, "cursors": cursors, "result_row": result_row} Results are promising! ``` execute_operations(db.execute("explain select 'hello', 55, rowid, * from searchable").fetchall())

{'registers': {1: ('String', 'hello'), 2: ('Integer', 55), 3: ('rowid', {'table': 0}), 4: ('rowid', {'table': 0}), 5: ('column', {'cursor': 0, 'column_offset': 1}), 6: ('column', {'cursor': 0, 'column_offset': 2}), 7: ('column', {'cursor': 0, 'column_offset': 3})}, 'cursors': {0: ('database_table', {'rootpage': 32, 'connection': 0})}, 'result_row': [('String', 'hello'), ('Integer', 55), ('rowid', {'table': 0}), ('rowid', {'table': 0}), ('column', {'cursor': 0, 'column_offset': 1}), ('column', {'cursor': 0, 'column_offset': 2}), ('column', {'cursor': 0, 'column_offset': 3})]} Here's what happens with a union across three tables: execute_operations(db.execute(f""" explain select data as content from binary_data union select pk as content from complex_foreign_keys union select name as content from facet_cities """}).fetchall())

{'registers': {1: ('column', {'cursor': 4, 'column_offset': 0}), 2: ('MakeRecord', {'registers': [0, 1, 2, 3]}), 3: ('column', {'cursor': 0, 'column_offset': 1}), 4: ('column', {'cursor': 3, 'column_offset': 0})}, 'cursors': {3: ('ephemeral', {'num_columns': 1, 'index_keys': [('MakeRecord', {'registers': [0, 1]}), ('MakeRecord', {'registers': [0, 1]}), ('MakeRecord', {'registers': [0, 1, 2, 3]})]}), 2: ('database_table', {'rootpage': 44, 'connection': 0}), 4: ('database_table', {'rootpage': 24, 'connection': 0}), 0: ('database_table', {'rootpage': 42, 'connection': 0})}, 'result_row': [('column', {'cursor': 3, 'column_offset': 0})]} `` Note how the result_row refers to cursor 3, which is an ephemeral table which had three different sets ofMakeRecord` index keys assigned to it - indicating that the output column is NOT from the same underlying table source.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898961535 https://github.com/simonw/datasette/issues/1293#issuecomment-898961535 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41lRB_ simonw 9599 2021-08-14T21:37:24Z 2021-08-14T21:37:24Z OWNER

Did some more research into building SQLite custom versions via pysqlite3 - here's what I figured out for macOS (which should hopefully work for Linux too): https://til.simonwillison.net/sqlite/build-specific-sqlite-pysqlite-macos

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898936068 https://github.com/simonw/datasette/issues/1293#issuecomment-898936068 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41lK0E simonw 9599 2021-08-14T17:44:54Z 2021-08-14T17:44:54Z OWNER

Another interesting query to consider: https://latest.datasette.io/fixtures?sql=explain+select+*+from++pragma_table_info%28+%27123_starts_with_digits%27%29

That one shows VColumn instead of Column.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898933865 https://github.com/simonw/datasette/issues/1293#issuecomment-898933865 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41lKRp simonw 9599 2021-08-14T17:27:16Z 2021-08-14T17:28:29Z OWNER

Maybe I split this out into a separate Python library that gets tested against every SQLite release I can possibly try it against, and then bakes out the supported release versions into the library code itself?

Datasette could depend on that library. The library could be released independently of Datasette any time a new SQLite version comes out.

I could even run a separate git scraper repo that checks for new SQLite releases and submits PRs against the library when a new release comes out.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898913629 https://github.com/simonw/datasette/issues/1293#issuecomment-898913629 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41lFVd simonw 9599 2021-08-14T16:14:12Z 2021-08-14T16:14:12Z OWNER

I would feel a lot more comfortable about all of this if I had a robust mechanism for running the Datasette test suite against multiple versions of SQLite itself.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898913554 https://github.com/simonw/datasette/issues/1293#issuecomment-898913554 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41lFUS simonw 9599 2021-08-14T16:13:40Z 2021-08-14T16:13:40Z OWNER

I think I need to care about the following:

  • ResultRow and Column for the final result
  • OpenRead for opening tables
  • OpenEphemeral then MakeRecord and IdxInsert for writing records into ephemeral tables

Column may reference either a table (from OpenRead) or an ephemeral table (from OpenEphemeral).

That might be enough.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898788262 https://github.com/simonw/datasette/issues/1293#issuecomment-898788262 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41kmum simonw 9599 2021-08-14T01:22:26Z 2021-08-14T01:51:08Z OWNER

Tried a more complicated query: sql explain select pk, text1, text2, [name with . and spaces] from searchable where rowid in (select rowid from searchable_fts where searchable_fts match escape_fts(:search)) order by text1 desc limit 101 Here's the explain: ``` sqlite> explain select pk, text1, text2, [name with . and spaces] from searchable where rowid in (select rowid from searchable_fts where searchable_fts match escape_fts(:search)) order by text1 desc limit 101 ...> ; addr opcode p1 p2 p3 p4 p5 comment


0 Init 0 41 0 00 Start at 41
1 OpenEphemeral 2 6 0 k(1,-B) 00 nColumn=6
2 Integer 101 1 0 00 r[1]=101; LIMIT counter 3 OpenRead 0 32 0 4 00 root=32 iDb=0; searchable 4 Integer 16 3 0 00 r[3]=16; return address 5 Once 0 16 0 00
6 OpenEphemeral 3 1 0 k(1,) 00 nColumn=1; Result of SELECT 1 7 VOpen 1 0 0 vtab:7FCBCA72BE80 00
8 Function0 1 7 6 unknown(-1) 01 r[6]=func(r[7]) 9 Integer 5 4 0 00 r[4]=5
10 Integer 1 5 0 00 r[5]=1
11 VFilter 1 16 4 00 iplan=r[4] zplan='' 12 Rowid 1 8 0 00 r[8]=rowid
13 MakeRecord 8 1 9 C 00 r[9]=mkrec(r[8]) 14 IdxInsert 3 9 8 1 00 key=r[9]
15 VNext 1 12 0 00
16 Return 3 0 0 00
17 Rewind 3 33 0 00
18 Column 3 0 2 00 r[2]=
19 IsNull 2 32 0 00 if r[2]==NULL goto 32 20 SeekRowid 0 32 2 00 intkey=r[2]
21 Column 0 1 10 00 r[10]=searchable.text1 22 Sequence 2 11 0 00 r[11]=cursor[2].ctr++ 23 IfNotZero 1 27 0 00 if r[1]!=0 then r[1]--, goto 27 24 Last 2 0 0 00
25 IdxLE 2 32 10 1 00 key=r[10]
26 Delete 2 0 0 00
27 Rowid 0 12 0 00 r[12]=rowid
28 Column 0 2 13 00 r[13]=searchable.text2 29 Column 0 3 14 00 r[14]=searchable.name with . and spaces 30 MakeRecord 10 5 16 00 r[16]=mkrec(r[10..14]) 31 IdxInsert 2 16 10 5 00 key=r[16]
32 Next 3 18 0 00
33 Sort 2 40 0 00
34 Column 2 4 15 00 r[15]=[name with . and spaces] 35 Column 2 3 14 00 r[14]=text2
36 Column 2 0 13 00 r[13]=text1
37 Column 2 2 12 00 r[12]=pk
38 ResultRow 12 4 0 00 output=r[12..15] 39 Next 2 34 0 00
40 Halt 0 0 0 00
41 Transaction 0 0 35 0 01 usesStmtJournal=0 42 Variable 1 7 0 :search 00 r[7]=parameter(1,:search) 43 Goto 0 1 0 00
`` Here theResultRowis for registers12..15- but those all refer toColumnrecords in2- where2is the firstOpenEphemeraldeclared right at the start. I'm having enormous trouble figuring out how that ephemeral table gets populated by the other operations in a way that would let me derive which columns end up in theResultRow`.

Frustratingly SQLite seems to be able to figure that out just fine, see the column of comments on the right hand side - but I only get those in the sqlite3 CLI shell, they're not available to me with SQLite when called as a library from Python.

Maybe the key to that is this section: 27 Rowid 0 12 0 00 r[12]=rowid 28 Column 0 2 13 00 r[13]=searchable.text2 29 Column 0 3 14 00 r[14]=searchable.name with . and spaces 30 MakeRecord 10 5 16 00 r[16]=mkrec(r[10..14]) 31 IdxInsert 2 16 10 5 00 key=r[16] MakeRecord:

Convert P2 registers beginning with P1 into the record format use as a data record in a database table or as a key in an index. The Column opcode can decode the record later.

P4 may be a string that is P2 characters long. The N-th character of the string indicates the column affinity that should be used for the N-th field of the index key.

The mapping from character to affinity is given by the SQLITE_AFF_ macros defined in sqliteInt.h.

If P4 is NULL then all index fields have the affinity BLOB.

The meaning of P5 depends on whether or not the SQLITE_ENABLE_NULL_TRIM compile-time option is enabled:

  • If SQLITE_ENABLE_NULL_TRIM is enabled, then the P5 is the index of the right-most table that can be null-trimmed.

  • If SQLITE_ENABLE_NULL_TRIM is omitted, then P5 has the value OPFLAG_NOCHNG_MAGIC if the MakeRecord opcode is allowed to accept no-change records with serial_type 10. This value is only used inside an assert() and does not affect the end result.

IdxInsert:

Register P2 holds an SQL index key made using the MakeRecord instructions. This opcode writes that key into the index P1. Data for the entry is nil.

If P4 is not zero, then it is the number of values in the unpacked key of reg(P2). In that case, P3 is the index of the first register for the unpacked key. The availability of the unpacked key can sometimes be an optimization.

If P5 has the OPFLAG_APPEND bit set, that is a hint to the b-tree layer that this insert is likely to be an append.

If P5 has the OPFLAG_NCHANGE bit set, then the change counter is incremented by this instruction. If the OPFLAG_NCHANGE bit is clear, then the change counter is unchanged.

If the OPFLAG_USESEEKRESULT flag of P5 is set, the implementation might run faster by avoiding an unnecessary seek on cursor P1. However, the OPFLAG_USESEEKRESULT flag must only be set if there have been no prior seeks on the cursor or if the most recent seek used a key equivalent to P2.

This instruction only works for indices. The equivalent instruction for tables is Insert.

IdxLE:

The P4 register values beginning with P3 form an unpacked index key that omits the PRIMARY KEY or ROWID. Compare this key value against the index that P1 is currently pointing to, ignoring the PRIMARY KEY or ROWID on the P1 index.

If the P1 index entry is less than or equal to the key value then jump to P2. Otherwise fall through to the next instruction.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898760808 https://github.com/simonw/datasette/issues/1293#issuecomment-898760808 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41kgBo simonw 9599 2021-08-13T23:03:01Z 2021-08-13T23:03:01Z OWNER

Another idea: strip out any order by clause to try and keep this simpler. I doubt that's going to cope with complex nested queries though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898760020 https://github.com/simonw/datasette/issues/1293#issuecomment-898760020 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41kf1U simonw 9599 2021-08-13T23:00:28Z 2021-08-13T23:01:27Z OWNER

New theory: this is all about SorterOpen and SorterInsert. Consider the following with extra annotations at the end of the lines after the --: ``` addr opcode p1 p2 p3 p4 p5 comment


0 Init 0 25 0 00 Start at 25
1 SorterOpen 2 5 0 k(1,B) 00 -- New SORTER in r2 with 5 slots 2 OpenRead 0 43 0 7 00 root=43 iDb=0; facetable 3 OpenRead 1 42 0 2 00 root=42 iDb=0; facet_cities 4 Rewind 0 16 0 00
5 Column 0 6 3 00 r[3]=facetable.neighborhood 6 Function0 1 2 1 like(2) 02 r[1]=func(r[2..3]) 7 IfNot 1 15 1 00
8 Column 0 5 4 00 r[4]=facetable.city_id 9 SeekRowid 1 15 4 00 intkey=r[4]
10 Column 1 1 6 00 r[6]=facet_cities.name 11 Column 0 4 7 00 r[7]=facetable.state 12 Column 0 6 5 00 r[5]=facetable.neighborhood 13 MakeRecord 5 3 9 00 r[9]=mkrec(r[5..7]) 14 SorterInsert 2 9 5 3 00 key=r[9]-- WRITES record from r9 (line above) into sorter in r2 15 Next 0 5 0 01
16 OpenPseudo 3 10 5 00 5 columns in r[10] 17 SorterSort 2 24 0 00 -- runs the sort, not relevant to my goal 18 SorterData 2 10 3 00 r[10]=data -- "Write into register P2 (r10) the current sorter data for sorter cursor P1 (sorter 2)" 19 Column 3 2 8 00 r[8]=state
20 Column 3 1 7 00 r[7]=facet_cities.name 21 Column 3 0 6 00 r[6]=neighborhood 22 ResultRow 6 3 0 00 output=r[6..8] 23 SorterNext 2 18 0 00
24 Halt 0 0 0 00
25 Transaction 0 0 35 0 01 usesStmtJournal=0 26 String8 0 2 0 %bob% 00 r[2]='%bob%' 27 Goto 0 1 0 00
```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898576097 https://github.com/simonw/datasette/issues/1293#issuecomment-898576097 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jy7h simonw 9599 2021-08-13T16:19:57Z 2021-08-13T16:19:57Z OWNER

I think I need to look out for OpenPseudo and, when that occurs, take a look at the most recent SorterInsert and use that to find the MakeRecord and then use the MakeRecord to figure out the columns that went into it.

After all of that I'll be able to resolve that "table 3" reference.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898572065 https://github.com/simonw/datasette/issues/1293#issuecomment-898572065 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jx8h simonw 9599 2021-08-13T16:13:16Z 2021-08-13T16:13:16Z OWNER

Aha! That MakeRecord line says r[5..7] - and r5 = neighborhood, r6 = facet_cities.name, r7 = facetable.state

So if the MakeRecord defines what goes into that pseudo-table column 2 of that pseudo-table would be state - which is what we want.

This is really convoluted. I'm no longer confident I can get this to work in a sensible way, especially since I've not started exploring what complex nested tables with CTEs and sub-selects do yet.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898569319 https://github.com/simonw/datasette/issues/1293#issuecomment-898569319 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jxRn simonw 9599 2021-08-13T16:09:01Z 2021-08-13T16:10:48Z OWNER

Need to figure out what column 2 of that pseudo-table is.

I think the answer is here:

4 Rewind 0 16 0 00 5 Column 0 6 3 00 r[3]=facetable.neighborhood 6 Function0 1 2 1 like(2) 02 r[1]=func(r[2..3]) 7 IfNot 1 15 1 00 8 Column 0 5 4 00 r[4]=facetable.city_id 9 SeekRowid 1 15 4 00 intkey=r[4] 10 Column 1 1 6 00 r[6]=facet_cities.name 11 Column 0 4 7 00 r[7]=facetable.state 12 Column 0 6 5 00 r[5]=facetable.neighborhood 13 MakeRecord 5 3 9 00 r[9]=mkrec(r[5..7]) 14 SorterInsert 2 9 5 3 00 key=r[9] 15 Next 0 5 0 01 16 OpenPseudo 3 10 5 00 5 columns in r[10] I think the OpenPseduo line puts five columns in r[10] - and those five columns are the five from the previous block - maybe the five leading up to the MakeRecord call on line 13.

In which case column 2 would be facet_cities.name - assuming we start counting from 0.

But the debug code said "r[8]=state".

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898567974 https://github.com/simonw/datasette/issues/1293#issuecomment-898567974 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jw8m simonw 9599 2021-08-13T16:07:00Z 2021-08-13T16:07:00Z OWNER

So this line: 19 Column 3 2 8 00 r[8]=state Means "Take column 2 of table 3 (the pseudo-table) and store it in register 8"

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898564705 https://github.com/simonw/datasette/issues/1293#issuecomment-898564705 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jwJh simonw 9599 2021-08-13T16:02:12Z 2021-08-13T16:04:06Z OWNER

More debug output: table_rootpage_by_register={0: 43, 1: 42} names_and_types_by_rootpage={42: ('facet_cities', 'table'), 43: ('facetable', 'table')} table_id=0 cid=6 column_register=3 table_id=0 cid=5 column_register=4 table_id=1 cid=1 column_register=6 table_id=0 cid=4 column_register=7 table_id=0 cid=6 column_register=5 table_id=3 cid=2 column_register=8 table_id=3 cid=2 column_register=8 KeyError 3 table = names_and_types_by_rootpage[table_rootpage_by_register[table_id]][0] names_and_types_by_rootpage={42: ('facet_cities', 'table'), 43: ('facetable', 'table')} table_rootpage_by_register={0: 43, 1: 42} table_id=3 columns_by_column_register[column_register] = (table, cid) column_register=8 = (table='facetable', cid=2) table_id=3 cid=1 column_register=7 KeyError 3 table = names_and_types_by_rootpage[table_rootpage_by_register[table_id]][0] names_and_types_by_rootpage={42: ('facet_cities', 'table'), 43: ('facetable', 'table')} table_rootpage_by_register={0: 43, 1: 42} table_id=3 columns_by_column_register[column_register] = (table, cid) column_register=7 = (table='facetable', cid=1) table_id=3 cid=0 column_register=6 KeyError 3 table = names_and_types_by_rootpage[table_rootpage_by_register[table_id]][0] names_and_types_by_rootpage={42: ('facet_cities', 'table'), 43: ('facetable', 'table')} table_rootpage_by_register={0: 43, 1: 42} table_id=3 columns_by_column_register[column_register] = (table, cid) column_register=6 = (table='facetable', cid=0) result_registers=[6, 7, 8] columns_by_column_register={3: ('facetable', 6), 4: ('facetable', 5), 6: ('facet_cities', 1), 7: ('facetable', 4), 5: ('facetable', 6)} all_column_names={('facet_cities', 0): 'id', ('facet_cities', 1): 'name', ('facetable', 0): 'pk', ('facetable', 1): 'created', ('facetable', 2): 'planet_int', ('facetable', 3): 'on_earth', ('facetable', 4): 'state', ('facetable', 5): 'city_id', ('facetable', 6): 'neighborhood', ('facetable', 7): 'tags', ('facetable', 8): 'complex_array', ('facetable', 9): 'distinct_some_null'} Those KeyError are happening here because of a lookup in table_rootpage_by_register for table_id=3 - but table_rootpage_by_register only has keys 0 and 1.

It looks like that 3 actually corresponds to the OpenPseudo table from here:

16 OpenPseudo 3 10 5 00 5 columns in r[10] 17 SorterSort 2 24 0 00 18 SorterData 2 10 3 00 r[10]=data 19 Column 3 2 8 00 r[8]=state 20 Column 3 1 7 00 r[7]=facet_cities.name 21 Column 3 0 6 00 r[6]=neighborhood 22 ResultRow 6 3 0 00 output=r[6..8]

Python code:

`python def columns_for_query(conn, sql, params=None): """ Given a SQLite connectionconnand a SQL querysql, returns a list of(table_name, column_name)`` pairs corresponding to the columns that would be returned by that SQL query.

Each pair indicates the source table and column for the returned column, or
``(None, None)`` if no table and column could be derived (e.g. for "select 1")
"""
if sql.lower().strip().startswith("explain"):
    return []
opcodes = conn.execute("explain " + sql, params).fetchall()
table_rootpage_by_register = {
    r["p1"]: r["p2"] for r in opcodes if r["opcode"] == "OpenRead"
}
print(f"{table_rootpage_by_register=}")
names_and_types_by_rootpage = dict(
    [(r[0], (r[1], r[2])) for r in conn.execute(
        "select rootpage, name, type from sqlite_master where rootpage in ({})".format(
            ", ".join(map(str, table_rootpage_by_register.values()))
        )
    )]
)
print(f"{names_and_types_by_rootpage=}")
columns_by_column_register = {}
for opcode_row in opcodes:
    if opcode_row["opcode"] in ("Rowid", "Column"):
        addr, opcode, table_id, cid, column_register, p4, p5, comment = opcode_row
        print(f"{table_id=} {cid=} {column_register=}")
        try:
            table = names_and_types_by_rootpage[table_rootpage_by_register[table_id]][0]
            columns_by_column_register[column_register] = (table, cid)
        except KeyError as e:
            print("  KeyError")
            print("   ", e)
            print("    table = names_and_types_by_rootpage[table_rootpage_by_register[table_id]][0]")
            print(f"    {names_and_types_by_rootpage=} {table_rootpage_by_register=} {table_id=}")
            print("    columns_by_column_register[column_register] = (table, cid)")
            print(f"    {column_register=} = ({table=}, {cid=})")
            pass
result_row = [dict(r) for r in opcodes if r["opcode"] == "ResultRow"][0]
result_registers = list(range(result_row["p1"], result_row["p1"] + result_row["p2"]))
print(f"{result_registers=}")
print(f"{columns_by_column_register=}")
all_column_names = {}
for (table, _) in names_and_types_by_rootpage.values():
    table_xinfo = conn.execute("pragma table_xinfo({})".format(table)).fetchall()
    for column_info in table_xinfo:
        all_column_names[(table, column_info["cid"])] = column_info["name"]
print(f"{all_column_names=}")
final_output = []
for register in result_registers:
    try:
        table, cid = columns_by_column_register[register]
        final_output.append((table, all_column_names[table, cid]))
    except KeyError:
        final_output.append((None, None))
return final_output

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898554859 https://github.com/simonw/datasette/issues/1293#issuecomment-898554859 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jtvr simonw 9599 2021-08-13T15:46:18Z 2021-08-13T15:46:18Z OWNER

So it looks like the bug is in the code that populates columns_by_column_register.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898554427 https://github.com/simonw/datasette/issues/1293#issuecomment-898554427 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jto7 simonw 9599 2021-08-13T15:45:32Z 2021-08-13T15:45:32Z OWNER

Some useful debug output: table_rootpage_by_register={0: 43, 1: 42} names_and_types_by_rootpage={42: ('facet_cities', 'table'), 43: ('facetable', 'table')} result_registers=[6, 7, 8] columns_by_column_register={3: ('facetable', 6), 4: ('facetable', 5), 6: ('facet_cities', 1), 7: ('facetable', 4), 5: ('facetable', 6)} all_column_names={('facet_cities', 0): 'id', ('facet_cities', 1): 'name', ('facetable', 0): 'pk', ('facetable', 1): 'created', ('facetable', 2): 'planet_int', ('facetable', 3): 'on_earth', ('facetable', 4): 'state', ('facetable', 5): 'city_id', ('facetable', 6): 'neighborhood', ('facetable', 7): 'tags', ('facetable', 8): 'complex_array', ('facetable', 9): 'distinct_some_null'} The result_registers should each correspond to the correct entry in columns_by_column_register but they do not.

Python code: `python def columns_for_query(conn, sql, params=None): """ Given a SQLite connectionconnand a SQL querysql, returns a list of(table_name, column_name)`` pairs corresponding to the columns that would be returned by that SQL query.

Each pair indicates the source table and column for the returned column, or
``(None, None)`` if no table and column could be derived (e.g. for "select 1")
"""
if sql.lower().strip().startswith("explain"):
    return []
opcodes = conn.execute("explain " + sql, params).fetchall()
table_rootpage_by_register = {
    r["p1"]: r["p2"] for r in opcodes if r["opcode"] == "OpenRead"
}
print(f"{table_rootpage_by_register=}")
names_and_types_by_rootpage = dict(
    [(r[0], (r[1], r[2])) for r in conn.execute(
        "select rootpage, name, type from sqlite_master where rootpage in ({})".format(
            ", ".join(map(str, table_rootpage_by_register.values()))
        )
    )]
)
print(f"{names_and_types_by_rootpage=}")
columns_by_column_register = {}
for opcode in opcodes:
    if opcode["opcode"] in ("Rowid", "Column"):
        addr, opcode, table_id, cid, column_register, p4, p5, comment = opcode
        try:
            table = names_and_types_by_rootpage[table_rootpage_by_register[table_id]][0]
            columns_by_column_register[column_register] = (table, cid)
        except KeyError:
            pass
result_row = [dict(r) for r in opcodes if r["opcode"] == "ResultRow"][0]
result_registers = list(range(result_row["p1"], result_row["p1"] + result_row["p2"]))
print(f"{result_registers=}")
print(f"{columns_by_column_register=}")
all_column_names = {}
for (table, _) in names_and_types_by_rootpage.values():
    table_xinfo = conn.execute("pragma table_xinfo({})".format(table)).fetchall()
    for column_info in table_xinfo:
        all_column_names[(table, column_info["cid"])] = column_info["name"]
print(f"{all_column_names=}")
final_output = []
for register in result_registers:
    try:
        table, cid = columns_by_column_register[register]
        final_output.append((table, all_column_names[table, cid]))
    except KeyError:
        final_output.append((None, None))
return final_output

```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898545815 https://github.com/simonw/datasette/issues/1293#issuecomment-898545815 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jriX simonw 9599 2021-08-13T15:31:53Z 2021-08-13T15:31:53Z OWNER

My hunch here is that registers or columns are being reused in a way that makes my code break - my code is pretty dumb, there are places in it where maybe the first mention of a register wins instead of the last one?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898541972 https://github.com/simonw/datasette/issues/1293#issuecomment-898541972 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jqmU simonw 9599 2021-08-13T15:26:06Z 2021-08-13T15:29:06Z OWNER

ResultRow:

The registers P1 through P1+P2-1 contain a single row of results. This opcode causes the sqlite3_step() call to terminate with an SQLITE_ROW return code and it sets up the sqlite3_stmt structure to provide access to the r(P1)..r(P1+P2-1) values as the result row.

Column:

Interpret the data that cursor P1 points to as a structure built using the MakeRecord instruction. (See the MakeRecord opcode for additional information about the format of the data.) Extract the P2-th column from this record. If there are less that (P2+1) values in the record, extract a NULL.

The value extracted is stored in register P3.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898541543 https://github.com/simonw/datasette/issues/1293#issuecomment-898541543 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jqfn simonw 9599 2021-08-13T15:25:26Z 2021-08-13T15:25:26Z OWNER

But the debug output here seems to be saying what we want it to say: 17 SorterSort 2 24 0 00 18 SorterData 2 10 3 00 r[10]=data 19 Column 3 2 8 00 r[8]=state 20 Column 3 1 7 00 r[7]=facet_cities.name 21 Column 3 0 6 00 r[6]=neighborhood 22 ResultRow 6 3 0 00 output=r[6..8] We want to get back neighborhood, facet_cities.name, state.

Why then are we seeing [('facet_cities', 'name'), ('facetable', 'state'), (None, None)]?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898540260 https://github.com/simonw/datasette/issues/1293#issuecomment-898540260 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jqLk simonw 9599 2021-08-13T15:23:28Z 2021-08-13T15:23:28Z OWNER

SorterInsert:

Register P2 holds an SQL index key made using the MakeRecord instructions. This opcode writes that key into the sorter P1. Data for the entry is nil.

SorterData:

Write into register P2 the current sorter data for sorter cursor P1. Then clear the column header cache on cursor P3.

This opcode is normally use to move a record out of the sorter and into a register that is the source for a pseudo-table cursor created using OpenPseudo. That pseudo-table cursor is the one that is identified by parameter P3. Clearing the P3 column cache as part of this opcode saves us from having to issue a separate NullRow instruction to clear that cache.

OpenPseudo:

Open a new cursor that points to a fake table that contains a single row of data. The content of that one row is the content of memory register P2. In other words, cursor P1 becomes an alias for the MEM_Blob content contained in register P2.

A pseudo-table created by this opcode is used to hold a single row output from the sorter so that the row can be decomposed into individual columns using the Column opcode. The Column opcode is the only cursor opcode that works with a pseudo-table.

P3 is the number of fields in the records that will be stored by the pseudo-table.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898536181 https://github.com/simonw/datasette/issues/1293#issuecomment-898536181 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jpL1 simonw 9599 2021-08-13T15:17:20Z 2021-08-13T15:20:33Z OWNER

Documentation for MakeRecord: https://www.sqlite.org/opcode.html#MakeRecord

Running explain inside sqlite3 provides extra comments and indentation which make it easier to understand: ``` sqlite> explain select neighborhood, facet_cities.name, state ...> from facetable ...> join facet_cities ...> on facetable.city_id = facet_cities.id ...> where neighborhood like '%bob%'; addr opcode p1 p2 p3 p4 p5 comment


0 Init 0 15 0 00 Start at 15
1 OpenRead 0 43 0 7 00 root=43 iDb=0; facetable 2 OpenRead 1 42 0 2 00 root=42 iDb=0; facet_cities 3 Rewind 0 14 0 00
4 Column 0 6 3 00 r[3]=facetable.neighborhood 5 Function0 1 2 1 like(2) 02 r[1]=func(r[2..3]) 6 IfNot 1 13 1 00
7 Column 0 5 4 00 r[4]=facetable.city_id 8 SeekRowid 1 13 4 00 intkey=r[4]
9 Column 0 6 5 00 r[5]=facetable.neighborhood 10 Column 1 1 6 00 r[6]=facet_cities.name 11 Column 0 4 7 00 r[7]=facetable.state 12 ResultRow 5 3 0 00 output=r[5..7] 13 Next 0 4 0 01
14 Halt 0 0 0 00
15 Transaction 0 0 35 0 01 usesStmtJournal=0 16 String8 0 2 0 %bob% 00 r[2]='%bob%' 17 Goto 0 1 0 00
Compared with: sqlite> explain select neighborhood, facet_cities.name, state ...> from facetable ...> join facet_cities ...> on facetable.city_id = facet_cities.id ...> where neighborhood like '%bob%' order by neighborhood ...> ; addr opcode p1 p2 p3 p4 p5 comment


0 Init 0 25 0 00 Start at 25
1 SorterOpen 2 5 0 k(1,B) 00
2 OpenRead 0 43 0 7 00 root=43 iDb=0; facetable 3 OpenRead 1 42 0 2 00 root=42 iDb=0; facet_cities 4 Rewind 0 16 0 00
5 Column 0 6 3 00 r[3]=facetable.neighborhood 6 Function0 1 2 1 like(2) 02 r[1]=func(r[2..3]) 7 IfNot 1 15 1 00
8 Column 0 5 4 00 r[4]=facetable.city_id 9 SeekRowid 1 15 4 00 intkey=r[4]
10 Column 1 1 6 00 r[6]=facet_cities.name 11 Column 0 4 7 00 r[7]=facetable.state 12 Column 0 6 5 00 r[5]=facetable.neighborhood 13 MakeRecord 5 3 9 00 r[9]=mkrec(r[5..7]) 14 SorterInsert 2 9 5 3 00 key=r[9]
15 Next 0 5 0 01
16 OpenPseudo 3 10 5 00 5 columns in r[10] 17 SorterSort 2 24 0 00
18 SorterData 2 10 3 00 r[10]=data
19 Column 3 2 8 00 r[8]=state
20 Column 3 1 7 00 r[7]=facet_cities.name 21 Column 3 0 6 00 r[6]=neighborhood 22 ResultRow 6 3 0 00 output=r[6..8] 23 SorterNext 2 18 0 00
24 Halt 0 0 0 00
25 Transaction 0 0 35 0 01 usesStmtJournal=0 26 String8 0 2 0 %bob% 00 r[2]='%bob%' 27 Goto 0 1 0 00
`` So actually it looks like theSorterSort` may be key to understanding this.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898527525 https://github.com/simonw/datasette/issues/1293#issuecomment-898527525 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jnEl simonw 9599 2021-08-13T15:08:03Z 2021-08-13T15:08:03Z OWNER

Am I going to need to look at the ResultRow and its columns but then wind back to that earlier MakeRecord and its columns?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898524057 https://github.com/simonw/datasette/issues/1293#issuecomment-898524057 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jmOZ simonw 9599 2021-08-13T15:06:37Z 2021-08-13T15:06:37Z OWNER

Comparing the explain for the two versions of that query - one with the order by and one without:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898519924 https://github.com/simonw/datasette/issues/1293#issuecomment-898519924 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jlN0 simonw 9599 2021-08-13T15:03:36Z 2021-08-13T15:03:36Z OWNER

Weird edge-case: adding an order by changes the order of the columns with respect to the information I am deriving about them.

Without order by this gets it right:

With order by:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898517872 https://github.com/simonw/datasette/issues/1293#issuecomment-898517872 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jktw simonw 9599 2021-08-13T15:00:50Z 2021-08-13T15:00:50Z OWNER

The primary key column (or rowid) often resolves to an index record in the sqlite_master table, e.g. the second row in this:

type | name | tbl_name | rootpage | sql -- | -- | -- | -- | -- table | simple_primary_key | simple_primary_key | 2 | CREATE TABLE simple_primary_key ( id varchar(30) primary key, content text ) index | sqlite_autoindex_simple_primary_key_1 | simple_primary_key | 3 |  

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898506647 https://github.com/simonw/datasette/issues/1293#issuecomment-898506647 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41jh-X simonw 9599 2021-08-13T14:43:19Z 2021-08-13T14:43:19Z OWNER

Work will continue in PR #1434.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813134386 https://github.com/simonw/datasette/issues/1293#issuecomment-813134386 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzEzNDM4Ng== simonw 9599 2021-04-05T01:20:28Z 2021-08-13T00:42:30Z OWNER

... that output might also provide a better way to extract variables than the current mechanism using a regular expression, by looking for the Variable opcodes.

[UPDATE: it did indeed do that, see #1421]

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898066466 https://github.com/simonw/datasette/issues/1293#issuecomment-898066466 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41h2gi simonw 9599 2021-08-13T00:40:24Z 2021-08-13T00:40:24Z OWNER

It figures out renamed columns too:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898065948 https://github.com/simonw/datasette/issues/1293#issuecomment-898065948 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41h2Yc simonw 9599 2021-08-13T00:38:58Z 2021-08-13T00:38:58Z OWNER

Trying to run explain select * from facetable fails with an error in my prototype, because it tries to execute explain explain select * from facetable - so I need to spot that error and ignore it.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898065011 https://github.com/simonw/datasette/issues/1293#issuecomment-898065011 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41h2Jz simonw 9599 2021-08-13T00:36:30Z 2021-08-13T00:36:30Z OWNER

https://latest.datasette.io/fixtures?sql=explain+select+*+from+paginated_view will be an interesting test query - because paginated_view is defined like this:

sql CREATE VIEW paginated_view AS SELECT content, '- ' || content || ' -' AS content_extra FROM no_primary_key;

So this will help test that the mechanism isn't confused by output columns that are created through a concatenation expression.

Here's what it does for that:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898063815 https://github.com/simonw/datasette/issues/1293#issuecomment-898063815 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41h13H simonw 9599 2021-08-13T00:33:17Z 2021-08-13T00:33:17Z OWNER

Improved version of that function: python def columns_for_query(conn, sql): """ Given a SQLite connection ``conn`` and a SQL query ``sql``, returns a list of ``(table_name, column_name)`` pairs, one per returned column. ``(None, None)`` if no table and column could be derived. """ rows = conn.execute('explain ' + sql).fetchall() table_rootpage_by_register = {r['p1']: r['p2'] for r in rows if r['opcode'] == 'OpenRead'} names_by_rootpage = dict( conn.execute( 'select rootpage, name from sqlite_master where rootpage in ({})'.format( ', '.join(map(str, table_rootpage_by_register.values())) ) ) ) columns_by_column_register = {} for row in rows: if row['opcode'] in ('Rowid', 'Column'): addr, opcode, table_id, cid, column_register, p4, p5, comment = row table = names_by_rootpage[table_rootpage_by_register[table_id]] columns_by_column_register[column_register] = (table, cid) result_row = [dict(r) for r in rows if r['opcode'] == 'ResultRow'][0] registers = list(range(result_row["p1"], result_row["p1"] + result_row["p2"])) all_column_names = {} for table in names_by_rootpage.values(): table_xinfo = conn.execute('pragma table_xinfo({})'.format(table)).fetchall() for row in table_xinfo: all_column_names[(table, row["cid"])] = row["name"] final_output = [] for r in registers: try: table, cid = columns_by_column_register[r] final_output.append((table, all_column_names[table, cid])) except KeyError: final_output.append((None, None)) return final_output It works!

```diff diff --git a/datasette/templates/query.html b/datasette/templates/query.html index 75f7f1b..9fe1d4f 100644 --- a/datasette/templates/query.html +++ b/datasette/templates/query.html @@ -67,6 +67,8 @@

</form>

+extra_column_info: {{ extra_column_info }} + {% if display_rows %}

This data as {% for name, url in renderers.items() %}{{ name }}{{ ", " if not loop.last }}{% endfor %}, CSV

<table class="rows-and-columns"> diff --git a/datasette/views/database.py b/datasette/views/database.py index 7c36034..02f8039 100644 --- a/datasette/views/database.py +++ b/datasette/views/database.py @@ -10,6 +10,7 @@ import markupsafe from datasette.utils import ( await_me_maybe, check_visibility, + columns_for_query, derive_named_parameters, to_css_class, validate_sql_select, @@ -248,6 +249,8 @@ class QueryView(DataView): query_error = None + extra_column_info = None + # Execute query - as write or as read if write: if request.method == "POST": @@ -334,6 +337,10 @@ class QueryView(DataView): database, sql, params_for_query, truncate=True, **extra_args ) columns = [r[0] for r in results.description] + + # Try to figure out extra column information + db = self.ds.get_database(database) + extra_column_info = await db.execute_fn(lambda conn: columns_for_query(conn, sql)) except sqlite3.DatabaseError as e: query_error = e results = None @@ -462,6 +469,7 @@ class QueryView(DataView): "show_hide_text": show_hide_text, "show_hide_hidden": markupsafe.Markup(show_hide_hidden), "hide_sql": hide_sql, + "extra_column_info": extra_column_info, } return ( ```
{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
898056013 https://github.com/simonw/datasette/issues/1293#issuecomment-898056013 https://api.github.com/repos/simonw/datasette/issues/1293 IC_kwDOBm6k_c41hz9N simonw 9599 2021-08-13T00:12:09Z 2021-08-13T00:12:09Z OWNER

Having added column metadata in #1430 (ref #942) I could also include a definition list at the top of the query results page exposing the column descriptions for any columns, using the same EXPLAIN mechanism.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813480043 https://github.com/simonw/datasette/issues/1293#issuecomment-813480043 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzQ4MDA0Mw== simonw 9599 2021-04-05T16:16:17Z 2021-04-05T16:16:17Z OWNER

https://latest.datasette.io/fixtures?sql=explain+select+*+from+paginated_view will be an interesting test query - because paginated_view is defined like this:

sql CREATE VIEW paginated_view AS SELECT content, '- ' || content || ' -' AS content_extra FROM no_primary_key; So this will help test that the mechanism isn't confused by output columns that are created through a concatenation expression.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813445512 https://github.com/simonw/datasette/issues/1293#issuecomment-813445512 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzQ0NTUxMg== simonw 9599 2021-04-05T15:11:40Z 2021-04-05T15:11:40Z OWNER

Here's some older example code that works with opcodes from Python, in this case to output indexes used by a query: https://github.com/plasticityai/supersqlite/blob/master/supersqlite/idxchk.py

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813438771 https://github.com/simonw/datasette/issues/1293#issuecomment-813438771 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzQzODc3MQ== simonw 9599 2021-04-05T14:58:48Z 2021-04-05T14:58:48Z OWNER

I may need to do something special for rowid columns - there is a RowId opcode that might come into play here.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813164282 https://github.com/simonw/datasette/issues/1293#issuecomment-813164282 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzE2NDI4Mg== simonw 9599 2021-04-05T03:42:26Z 2021-04-05T03:42:36Z OWNER

Extracting variables with this trick appears to work OK, but you have to pass the correct variables to the explain select... query. Using defaultdict seems to work there:

```pycon

rows = conn.execute('explain select * from repos where id = :id', defaultdict(int)) [dict(r) for r in rows if r['opcode'] == 'Variable'] [{'addr': 2, 'opcode': 'Variable', 'p1': 1, 'p2': 1, 'p3': 0, 'p4': ':id', 'p5': 0, 'comment': None}] ```

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813162622 https://github.com/simonw/datasette/issues/1293#issuecomment-813162622 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzE2MjYyMg== simonw 9599 2021-04-05T03:34:24Z 2021-04-05T03:40:35Z OWNER

This almost works, but throws errors with some queries (anything with a rowid column for example) - it needs a bunch of test coverage. python def columns_for_query(conn, sql): rows = conn.execute('explain ' + sql).fetchall() table_rootpage_by_register = {r['p1']: r['p2'] for r in rows if r['opcode'] == 'OpenRead'} names_by_rootpage = dict( conn.execute( 'select rootpage, name from sqlite_master where rootpage in ({})'.format( ', '.join(map(str, table_rootpage_by_register.values())) ) ) ) columns_by_column_register = {} for row in rows: if row['opcode'] == 'Column': addr, opcode, table_id, cid, column_register, p4, p5, comment = row table = names_by_rootpage[table_rootpage_by_register[table_id]] columns_by_column_register[column_register] = (table, cid) result_row = [dict(r) for r in rows if r['opcode'] == 'ResultRow'][0] registers = list(range(result_row["p1"], result_row["p1"] + result_row["p2"] - 1)) all_column_names = {} for table in names_by_rootpage.values(): table_xinfo = conn.execute('pragma table_xinfo({})'.format(table)).fetchall() for row in table_xinfo: all_column_names[(table, row["cid"])] = row["name"] final_output = [] for r in registers: try: table, cid = columns_by_column_register[r] final_output.append((table, all_column_names[table, cid])) except KeyError: final_output.append((None, None)) return final_output

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813134637 https://github.com/simonw/datasette/issues/1293#issuecomment-813134637 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzEzNDYzNw== simonw 9599 2021-04-05T01:21:59Z 2021-04-05T01:21:59Z OWNER

http://www.sqlite.org/draft/lang_explain.html says:

Applications should not use EXPLAIN or EXPLAIN QUERY PLAN since their exact behavior is variable and only partially documented.

I'm going to keep exploring this though.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813134227 https://github.com/simonw/datasette/issues/1293#issuecomment-813134227 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzEzNDIyNw== simonw 9599 2021-04-05T01:19:31Z 2021-04-05T01:19:31Z OWNER

| addr | opcode | p1 | p2 | p3 | p4 | p5 | comment | |--------|---------------|------|------|------|-----------------------|------|-----------| | 0 | Init | 0 | 47 | 0 | | 00 | | | 1 | OpenRead | 0 | 51 | 0 | 15 | 00 | | | 2 | Integer | 15 | 2 | 0 | | 00 | | | 3 | Once | 0 | 15 | 0 | | 00 | | | 4 | OpenEphemeral | 2 | 1 | 0 | k(1,) | 00 | | | 5 | VOpen | 1 | 0 | 0 | vtab:3E692C362158 | 00 | | | 6 | String8 | 0 | 5 | 0 | CPAD_2020a_SuperUnits | 00 | | | 7 | SCopy | 7 | 6 | 0 | | 00 | | | 8 | Integer | 2 | 3 | 0 | | 00 | | | 9 | Integer | 2 | 4 | 0 | | 00 | | | 10 | VFilter | 1 | 15 | 3 | | 00 | | | 11 | Rowid | 1 | 8 | 0 | | 00 | | | 12 | MakeRecord | 8 | 1 | 9 | C | 00 | | | 13 | IdxInsert | 2 | 9 | 8 | 1 | 00 | | | 14 | VNext | 1 | 11 | 0 | | 00 | | | 15 | Return | 2 | 0 | 0 | | 00 | | | 16 | Rewind | 2 | 46 | 0 | | 00 | | | 17 | Column | 2 | 0 | 1 | | 00 | | | 18 | IsNull | 1 | 45 | 0 | | 00 | | | 19 | SeekRowid | 0 | 45 | 1 | | 00 | | | 20 | Column | 0 | 2 | 11 | | 00 | | | 21 | Function0 | 1 | 10 | 9 | like(2) | 02 | | | 22 | IfNot | 9 | 45 | 1 | | 00 | | | 23 | Column | 0 | 14 | 13 | | 00 | | | 24 | Function0 | 1 | 12 | 9 | intersects(2) | 02 | | | 25 | Ne | 14 | 45 | 9 | | 51 | | | 26 | Column | 0 | 14 | 9 | | 00 | | | 27 | Function0 | 0 | 9 | 15 | asgeojson(1) | 01 | | | 28 | Rowid | 0 | 16 | 0 | | 00 | | | 29 | Column | 0 | 1 | 17 | | 00 | | | 30 | Column | 0 | 2 | 18 | | 00 | | | 31 | Column | 0 | 3 | 19 | | 00 | | | 32 | Column | 0 | 4 | 20 | | 00 | | | 33 | Column | 0 | 5 | 21 | | 00 | | | 34 | Column | 0 | 6 | 22 | | 00 | | | 35 | Column | 0 | 7 | 23 | | 00 | | | 36 | Column | 0 | 8 | 24 | | 00 | | | 37 | Column | 0 | 9 | 25 | | 00 | | | 38 | Column | 0 | 10 | 26 | | 00 | | | 39 | Column | 0 | 11 | 27 | | 00 | | | 40 | RealAffinity | 27 | 0 | 0 | | 00 | | | 41 | Column | 0 | 12 | 28 | | 00 | | | 42 | Column | 0 | 13 | 29 | | 00 | | | 43 | Column | 0 | 14 | 30 | | 00 | | | 44 | ResultRow | 15 | 16 | 0 | | 00 | | | 45 | Next | 2 | 17 | 0 | | 00 | | | 46 | Halt | 0 | 0 | 0 | | 00 | | | 47 | Transaction | 0 | 0 | 265 | 0 | 01 | | | 48 | Variable | 1 | 31 | 0 | :freedraw | 00 | | | 49 | Function0 | 1 | 31 | 7 | geomfromgeojson(1) | 01 | | | 50 | String8 | 0 | 10 | 0 | %mini% | 00 | | | 51 | Variable | 1 | 32 | 0 | :freedraw | 00 | | | 52 | Function0 | 1 | 32 | 12 | geomfromgeojson(1) | 01 | | | 53 | Integer | 1 | 14 | 0 | | 00 | | | 54 | Goto | 0 | 1 | 0 | | 00 | |

Essential documentation for understanding that output: https://www.sqlite.org/opcode.html

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813134072 https://github.com/simonw/datasette/issues/1293#issuecomment-813134072 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzEzNDA3Mg== simonw 9599 2021-04-05T01:18:37Z 2021-04-05T01:18:37Z OWNER

Had a fantastic suggestion on the SQLite forum: it might be possible to get what I want by interpreting the opcodes output by explain select ....

Copying the reply I posted to this thread:

That's really useful, thanks! It looks like it might be possible for me to reconstruct where each column came from using the explain select output.

Here's a complex example: https://calands.datasettes.com/calands?sql=explain+select%0D%0A++AsGeoJSON%28geometry%29%2C+*%0D%0Afrom%0D%0A++CPAD_2020a_SuperUnits%0D%0Awhere%0D%0A++PARK_NAME+like+%27%25mini%25%27+and%0D%0A++Intersects%28GeomFromGeoJSON%28%3Afreedraw%29%2C+geometry%29+%3D+1%0D%0A++and+CPAD_2020a_SuperUnits.rowid+in+%28%0D%0A++++select%0D%0A++++++rowid%0D%0A++++from%0D%0A++++++SpatialIndex%0D%0A++++where%0D%0A++++++f_table_name+%3D+%27CPAD_2020a_SuperUnits%27%0D%0A++++++and+search_frame+%3D+GeomFromGeoJSON%28%3Afreedraw%29%0D%0A++%29&freedraw=%7B%22type%22%3A%22MultiPolygon%22%2C%22coordinates%22%3A%5B%5B%5B%5B-122.42202758789064%2C37.82280243352759%5D%2C%5B-122.39868164062501%2C37.823887203271454%5D%2C%5B-122.38220214843751%2C37.81846319511331%5D%2C%5B-122.35061645507814%2C37.77071473849611%5D%2C%5B-122.34924316406251%2C37.74465712069939%5D%2C%5B-122.37258911132814%2C37.703380457832374%5D%2C%5B-122.39044189453125%2C37.690340943717715%5D%2C%5B-122.41241455078126%2C37.680559803205135%5D%2C%5B-122.44262695312501%2C37.67295135774715%5D%2C%5B-122.47283935546876%2C37.67295135774715%5D%2C%5B-122.52502441406251%2C37.68382032669382%5D%2C%5B-122.53463745117189%2C37.6892542140253%5D%2C%5B-122.54699707031251%2C37.690340943717715%5D%2C%5B-122.55798339843751%2C37.72945260537781%5D%2C%5B-122.54287719726564%2C37.77831314799672%5D%2C%5B-122.49893188476564%2C37.81303878836991%5D%2C%5B-122.46185302734376%2C37.82822612280363%5D%2C%5B-122.42889404296876%2C37.82822612280363%5D%2C%5B-122.42202758789064%2C37.82280243352759%5D%5D%5D%5D%7D

It looks like the opcodes I need to inspect are OpenRead, Column and ResultRow.

OpenRead tells me which tables are being opened - the p2 value (in this case 51) corresponds to the rootpage column in sqlite_master here: https://calands.datasettes.com/calands?sql=select+*+from+sqlite_master - it gets assigned to the register in p1.

The Column opcodes tell me which columns are being read - p1 is that table reference, and p2 is the cid of the column within that table.

The ResultRow opcode then tells me which columns are used in the results. 15 16 means start at the 15th and then read the next 16 columns.

I think this might work!

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813116177 https://github.com/simonw/datasette/issues/1293#issuecomment-813116177 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExNjE3Nw== simonw 9599 2021-04-04T23:31:00Z 2021-04-04T23:31:00Z OWNER

Sadly it doesn't do what I need. This query should only return one column, but instead I get back every column that was consulted by the query:

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813115607 https://github.com/simonw/datasette/issues/1293#issuecomment-813115607 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExNTYwNw== simonw 9599 2021-04-04T23:25:15Z 2021-04-04T23:25:15Z OWNER

Oh wow, I just spotted https://github.com/macbre/sql-metadata

Uses tokenized query returned by python-sqlparse and generates query metadata. Extracts column names and tables used by the query. Provides a helper for normalization of SQL queries and tables aliases resolving.

It's for MySQL, PostgreSQL and Hive right now but maybe getting it working with SQLite wouldn't be too hard?

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813115414 https://github.com/simonw/datasette/issues/1293#issuecomment-813115414 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExNTQxNA== simonw 9599 2021-04-04T23:23:34Z 2021-04-04T23:23:34Z OWNER

The other approach I considered for this was to have my own SQL query parser running in Python, which could pick apart a complex query and figure out which column was sourced from which table. I dropped this idea because it felt that the moment select * came into play a pure parsing approach wouldn't work - I'd need knowledge of the schema in order to resolve the *.

A Python parser approach might be good enough to handle a subset of queries - those that don't use select * for example - and maybe that would be worth shipping? The feature doesn't have to be perfect for it to be useful.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813114933 https://github.com/simonw/datasette/issues/1293#issuecomment-813114933 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExNDkzMw== simonw 9599 2021-04-04T23:19:22Z 2021-04-04T23:19:22Z OWNER

I asked about this on the SQLite forum: https://sqlite.org/forum/forumpost/0180277fb7

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813113653 https://github.com/simonw/datasette/issues/1293#issuecomment-813113653 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExMzY1Mw== simonw 9599 2021-04-04T23:10:49Z 2021-04-04T23:10:49Z OWNER

One option I've not fully explored yet: could I write my own custom SQLite C extension which exposes this functionality as a callable function?

Then I could load that extension and run a SQL query something like this:

select database, table, column from analyze_query(:sql_query) Where analyze_query(...) would be a fancy virtual table function of some sort that uses the underlying sqlite3_column_database_name() C functions to analyze the SQL query and return details of what it would return.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813113403 https://github.com/simonw/datasette/issues/1293#issuecomment-813113403 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExMzQwMw== simonw 9599 2021-04-04T23:08:48Z 2021-04-04T23:08:48Z OWNER

Worth noting that adding limit 0 to the query still causes it to conduct the permission checks, hopefully while avoiding doing any of the actual work of executing the query: pycon In [20]: db.execute('select * from compound_primary_key join facetable on facetable.rowid = compound_primary_key.rowid limit 0').fetchall() ...: args (21, None, None, None, None) kwargs {} args (20, 'compound_primary_key', 'pk1', 'main', None) kwargs {} args (20, 'compound_primary_key', 'pk2', 'main', None) kwargs {} args (20, 'compound_primary_key', 'content', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'facetable', 'created', 'main', None) kwargs {} args (20, 'facetable', 'planet_int', 'main', None) kwargs {} args (20, 'facetable', 'on_earth', 'main', None) kwargs {} args (20, 'facetable', 'state', 'main', None) kwargs {} args (20, 'facetable', 'city_id', 'main', None) kwargs {} args (20, 'facetable', 'neighborhood', 'main', None) kwargs {} args (20, 'facetable', 'tags', 'main', None) kwargs {} args (20, 'facetable', 'complex_array', 'main', None) kwargs {} args (20, 'facetable', 'distinct_some_null', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'compound_primary_key', 'ROWID', 'main', None) kwargs {} Out[20]: []

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813113218 https://github.com/simonw/datasette/issues/1293#issuecomment-813113218 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExMzIxOA== simonw 9599 2021-04-04T23:07:25Z 2021-04-04T23:07:25Z OWNER

Here are all of the available constants: pycon In [3]: for k in dir(sqlite3): ...: if k.startswith("SQLITE_"): ...: print(k, getattr(sqlite3, k)) ...: SQLITE_ALTER_TABLE 26 SQLITE_ANALYZE 28 SQLITE_ATTACH 24 SQLITE_CREATE_INDEX 1 SQLITE_CREATE_TABLE 2 SQLITE_CREATE_TEMP_INDEX 3 SQLITE_CREATE_TEMP_TABLE 4 SQLITE_CREATE_TEMP_TRIGGER 5 SQLITE_CREATE_TEMP_VIEW 6 SQLITE_CREATE_TRIGGER 7 SQLITE_CREATE_VIEW 8 SQLITE_CREATE_VTABLE 29 SQLITE_DELETE 9 SQLITE_DENY 1 SQLITE_DETACH 25 SQLITE_DONE 101 SQLITE_DROP_INDEX 10 SQLITE_DROP_TABLE 11 SQLITE_DROP_TEMP_INDEX 12 SQLITE_DROP_TEMP_TABLE 13 SQLITE_DROP_TEMP_TRIGGER 14 SQLITE_DROP_TEMP_VIEW 15 SQLITE_DROP_TRIGGER 16 SQLITE_DROP_VIEW 17 SQLITE_DROP_VTABLE 30 SQLITE_FUNCTION 31 SQLITE_IGNORE 2 SQLITE_INSERT 18 SQLITE_OK 0 SQLITE_PRAGMA 19 SQLITE_READ 20 SQLITE_RECURSIVE 33 SQLITE_REINDEX 27 SQLITE_SAVEPOINT 32 SQLITE_SELECT 21 SQLITE_TRANSACTION 22 SQLITE_UPDATE 23

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813113175 https://github.com/simonw/datasette/issues/1293#issuecomment-813113175 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExMzE3NQ== simonw 9599 2021-04-04T23:07:01Z 2021-04-04T23:07:01Z OWNER

A more promising route I found involved the db.set_authorizer method. This can be used to log the permission checks that SQLite uses, including checks for permission to access specific columns of specific tables. For a while I thought this could work!

```pycon

def print_args(args, *kwargs): ... print("args", args, "kwargs", kwargs) ... return sqlite3.SQLITE_OK

db = sqlite3.connect("fixtures.db") db.execute('select * from compound_primary_key join facetable on rowid').fetchall() args (21, None, None, None, None) kwargs {} args (20, 'compound_primary_key', 'pk1', 'main', None) kwargs {} args (20, 'compound_primary_key', 'pk2', 'main', None) kwargs {} args (20, 'compound_primary_key', 'content', 'main', None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (20, 'facetable', 'created', 'main', None) kwargs {} args (20, 'facetable', 'planet_int', 'main', None) kwargs {} args (20, 'facetable', 'on_earth', 'main', None) kwargs {} args (20, 'facetable', 'state', 'main', None) kwargs {} args (20, 'facetable', 'city_id', 'main', None) kwargs {} args (20, 'facetable', 'neighborhood', 'main', None) kwargs {} args (20, 'facetable', 'tags', 'main', None) kwargs {} args (20, 'facetable', 'complex_array', 'main', None) kwargs {} args (20, 'facetable', 'distinct_some_null', 'main', None) kwargs {} `` Those20values (where 20 isSQLITE_READ`) looked like they were checking permissions for the columns in the order they would be returned!

Then I found a snag:

pycon In [18]: db.execute('select 1 + 1 + (select max(rowid) from facetable)') args (21, None, None, None, None) kwargs {} args (31, None, 'max', None, None) kwargs {} args (20, 'facetable', 'pk', 'main', None) kwargs {} args (21, None, None, None, None) kwargs {} args (20, 'facetable', '', None, None) kwargs {} Once a subselect is involved the order of the 20 checks no longer matches the order in which the columns are returned from the query.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  
813112546 https://github.com/simonw/datasette/issues/1293#issuecomment-813112546 https://api.github.com/repos/simonw/datasette/issues/1293 MDEyOklzc3VlQ29tbWVudDgxMzExMjU0Ng== simonw 9599 2021-04-04T23:02:45Z 2021-04-04T23:02:45Z OWNER

I've done various pieces of research into this over the past few years. Capturing what I've discovered in this ticket.

The SQLite C API has functions that can help with this: https://www.sqlite.org/c3ref/column_database_name.html details those. But they're not exposed in the Python SQLite library.

Maybe it would be possible to use them via ctypes? My hunch is that I would have to re-implement the full sqlite3 module with ctypes, which sounds daunting.

{
    "total_count": 0,
    "+1": 0,
    "-1": 0,
    "laugh": 0,
    "hooray": 0,
    "confused": 0,
    "heart": 0,
    "rocket": 0,
    "eyes": 0
}
Show column metadata plus links for foreign keys on arbitrary query results 849978964  

Advanced export

JSON shape: default, array, newline-delimited, object

CSV options:

CREATE TABLE [issue_comments] (
   [html_url] TEXT,
   [issue_url] TEXT,
   [id] INTEGER PRIMARY KEY,
   [node_id] TEXT,
   [user] INTEGER REFERENCES [users]([id]),
   [created_at] TEXT,
   [updated_at] TEXT,
   [author_association] TEXT,
   [body] TEXT,
   [reactions] TEXT,
   [issue] INTEGER REFERENCES [issues]([id])
, [performed_via_github_app] TEXT);
CREATE INDEX [idx_issue_comments_issue]
                ON [issue_comments] ([issue]);
CREATE INDEX [idx_issue_comments_user]
                ON [issue_comments] ([user]);
Powered by Datasette · Queries took 64.501ms · About: github-to-sqlite
  • Sort ascending
  • Sort descending
  • Facet by this
  • Hide this column
  • Show all columns
  • Show not-blank rows