features-and-implementation2 (8407B)
1 On verification 2 =============== 3 4 * Some guarantees can be enforced statically using the type system. 5 They are in a way proven to be enforeced in the expanded code, 6 but in order to write verified compilers, it would be necessary to 7 prove that the type indeed enforces the desired property, for all 8 possible macro inputs. 9 10 Advantage: these proofs need to be written only once for the graph macro, 11 and can be relied on by all compilers written using these static guarantees. 12 * Some guarantees can be enforced statically "by construction", 13 and the design patterns used to enforce them are guaranteed to be 14 correctly applied because the code is macro-generated. 15 In order to write verified compilers, it would be necessary to 16 prove that the macro-generated code always enforces the desired property. 17 18 Advantage: these proofs need to be written only once for the graph macro, 19 and can be relied on by all compilers written using these static guarantees. 20 * Some guarantees can be enforced at run-time. 21 This allows catching bugs early, and ensures no incorrect output can occur 22 when these guarantees are broken. It does not however ensure that the compiler 23 will never break these guarantees. It is therefore still possible to ship a 24 broken compiler which fails on valid inputs. 25 26 Drawback: it is necessary for every compiler written using such guards to prove 27 that the guards are never raised. 28 29 30 Features 31 ======== 32 33 * GC: 34 * Group nodes by SCC 35 * Two kinds of "indices" pointers: 36 * index within the current strongly-connected component 37 * direct pointer to a descendant SCC + index within it 38 * To limit the waste when there is an SCC not containing nodes of a given type, 39 do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions". 40 Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these. 41 Either: 42 * use casts 43 * pay the compile-time price of a huge type (not feasible with large N) 44 * pay the run-time price of bloated components (feasible but costly with large N) 45 * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable. 46 * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at 47 run-time 48 * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc. 49 and also close over some markers indicating to which "real" arrays these correspond. 50 This means O(2n) space usage worst-case. 51 Problem again: the construction function for such a case→ while minimizing the number of values closed on 52 has a large O(2^n) type too. 53 * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps 54 * In theory it could be possible to give hints to the GC, 55 but Racket's GC does not offer that possibility. 56 57 GC Implementation: 58 * Must be optional 59 * Initially, all nodes within a single component, all indices are of the "slef + index" kind. 60 * Use tarjan's algorithm 61 * group nodes by SCC 62 Update references to nodes in the same SCC, using just an index 63 Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index. 64 * Due to the higher memory cost (+ 1 reference for references to other SCCs), 65 and the poor set of options for implementation (cast, exponential size of type, or large memory usage) 66 we will not implement the GC. 67 68 We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and 69 not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity 70 transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots, 71 thereby allowing the original graph to be freed. 72 73 * Invariants 74 * Scope of the invariants: 75 * Input contracts, types and structural properties 76 * Output contracts, types and structural properties 77 * Transformation invariants (relate the input and output graphs) 78 * Time of verification: 79 * Run-time 80 Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check 81 them at run-time. Implemented as some sorts of contracts. 82 * Compile-time 83 * Node types 84 * Statically-enforced structural invariants at the level of node types 85 (e.g. disallow cycles in the types, to ensure there are no cycles in the instance) 86 * Macros can be used to generate code which is known to be correct 87 e.g. paths 88 * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes, 89 e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link" 90 is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother 91 and about the potential interactions. 92 * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check 93 that the output contains that flag. 94 * PHOAS 95 * Specification 96 * Invariants specified in the graph type 97 * Transformation invariants specified on the graph creation code 98 * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code 99 The graph creation code must enforce all invariants within the type 100 That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by 101 construction, or as a guard at the end of the construction. 102 103 * Automatic generation of mappings 104 * when there is no mapping taking an old node as an input, a mapping is automatically generated. 105 The mapping simply translates the old node to the new node type, and recursively transforms its fields, 106 traversing lists etc. 107 * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type, 108 and instead calls to the mapping must be explicit (i.e. can't return the old node type). 109 * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings 110 to auto-generate. 111 * We also have a mechanism for auto-calling transformations on input node types 112 * Possibility to escape this, in order to actually insert a reference to the old graph? 113 * Some notation in the type, to indicate that it's "protected" in some way? 114 * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better). 115 116 * Structural node equality 117 * Would be nice to coalesce nodes which are equal? (i.e. hash consing) 118 so that any two nodes which are equal? within the same graph have the same index. 119 I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially 120 costly, in principle. If we cached results, we could achieve a better run-time in practice, and 121 perhaps a better theoretical complexity if we handled cycles ourselves. 122 * The general algorithm when there are no unordered sets is deterministic finite automaton minimization 123 * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization 124 * We could cache all calls to equal? on two nodes for a limited dynamic scope. 125 * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq? 126 * Alpha-equivalence comparisons 127 (i.e. ignore the actual values of some fields, except for constraining the shape of the graph) 128 Possibility: create values which are unique within the graph, but are ignored when comparing 129 nodes from different graphs. 130 I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence? 131 * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now. 132 It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to 133 discard names early on (keeping them just for printing / debuggin purposes). 134 135 136 137 138 139 140 * Coloring: the existing graph library for Racket has some coloring algorithms: 141 http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29 142 Maybe we can build a wrapper for those?