features-and-implementation2 - www - Unnamed repository; edit this file 'description' to name the repository.

features-and-implementation2 (8407B)

1 On verification
2 ===============
3
4 * Some guarantees can be enforced statically using the type system.
5 They are in a way proven to be enforeced in the expanded code,
6 but in order to write verified compilers, it would be necessary to
7 prove that the type indeed enforces the desired property, for all
8 possible macro inputs.
9
10 Advantage: these proofs need to be written only once for the graph macro,
11 and can be relied on by all compilers written using these static guarantees.
12 * Some guarantees can be enforced statically "by construction",
13 and the design patterns used to enforce them are guaranteed to be
14 correctly applied because the code is macro-generated.
15 In order to write verified compilers, it would be necessary to
16 prove that the macro-generated code always enforces the desired property.
17
18 Advantage: these proofs need to be written only once for the graph macro,
19 and can be relied on by all compilers written using these static guarantees.
20 * Some guarantees can be enforced at run-time.
21 This allows catching bugs early, and ensures no incorrect output can occur
22 when these guarantees are broken. It does not however ensure that the compiler
23 will never break these guarantees. It is therefore still possible to ship a
24 broken compiler which fails on valid inputs.
25
26 Drawback: it is necessary for every compiler written using such guards to prove
27 that the guards are never raised.
28
29
30 Features
31 ========
32
33 * GC:
34 * Group nodes by SCC
35 * Two kinds of "indices" pointers:
36 * index within the current strongly-connected component
37 * direct pointer to a descendant SCC + index within it
38 * To limit the waste when there is an SCC not containing nodes of a given type,
39 do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions".
40 Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these.
41 Either:
42 * use casts
43 * pay the compile-time price of a huge type (not feasible with large N)
44 * pay the run-time price of bloated components (feasible but costly with large N)
45 * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable.
46 * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at
47 run-time
48 * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc.
49 and also close over some markers indicating to which "real" arrays these correspond.
50 This means O(2n) space usage worst-case.
51 Problem again: the construction function for such a case→ while minimizing the number of values closed on
52 has a large O(2^n) type too.
53 * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps
54 * In theory it could be possible to give hints to the GC,
55 but Racket's GC does not offer that possibility.
56
57 GC Implementation:
58 * Must be optional
59 * Initially, all nodes within a single component, all indices are of the "slef + index" kind.
60 * Use tarjan's algorithm
61 * group nodes by SCC
62 Update references to nodes in the same SCC, using just an index
63 Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index.
64 * Due to the higher memory cost (+ 1 reference for references to other SCCs),
65 and the poor set of options for implementation (cast, exponential size of type, or large memory usage)
66 we will not implement the GC.
67
68 We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and
69 not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity
70 transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots,
71 thereby allowing the original graph to be freed.
72
73 * Invariants
74 * Scope of the invariants:
75 * Input contracts, types and structural properties
76 * Output contracts, types and structural properties
77 * Transformation invariants (relate the input and output graphs)
78 * Time of verification:
79 * Run-time
80 Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
81 them at run-time. Implemented as some sorts of contracts.
82 * Compile-time
83 * Node types
84 * Statically-enforced structural invariants at the level of node types
85 (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
86 * Macros can be used to generate code which is known to be correct
87 e.g. paths
88 * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
89 e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
90 is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
91 and about the potential interactions.
92 * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
93 that the output contains that flag.
94 * PHOAS
95 * Specification
96 * Invariants specified in the graph type
97 * Transformation invariants specified on the graph creation code
98 * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
99 The graph creation code must enforce all invariants within the type
100 That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
101 construction, or as a guard at the end of the construction.
102
103 * Automatic generation of mappings
104 * when there is no mapping taking an old node as an input, a mapping is automatically generated.
105 The mapping simply translates the old node to the new node type, and recursively transforms its fields,
106 traversing lists etc.
107 * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
108 and instead calls to the mapping must be explicit (i.e. can't return the old node type).
109 * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
110 to auto-generate.
111 * We also have a mechanism for auto-calling transformations on input node types
112 * Possibility to escape this, in order to actually insert a reference to the old graph?
113 * Some notation in the type, to indicate that it's "protected" in some way?
114 * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better).
115
116 * Structural node equality
117 * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
118 so that any two nodes which are equal? within the same graph have the same index.
119 I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
120 costly, in principle. If we cached results, we could achieve a better run-time in practice, and
121 perhaps a better theoretical complexity if we handled cycles ourselves.
122 * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
123 * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
124 * We could cache all calls to equal? on two nodes for a limited dynamic scope.
125 * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
126 * Alpha-equivalence comparisons
127 (i.e. ignore the actual values of some fields, except for constraining the shape of the graph)
128 Possibility: create values which are unique within the graph, but are ignored when comparing
129 nodes from different graphs.
130 I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence?
131 * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now.
132 It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to
133 discard names early on (keeping them just for printing / debuggin purposes).
134
135
136
137
138
139
140 * Coloring: the existing graph library for Racket has some coloring algorithms:
141 http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
142 Maybe we can build a wrapper for those?

	www Unnamed repository; edit this file 'description' to name the repository.
	Log \| Files \| Refs \| README