www

Unnamed repository; edit this file 'description' to name the repository.
Log | Files | Refs | README

features-and-implementation2 (8407B)


      1 On verification
      2 ===============
      3 
      4 * Some guarantees can be enforced statically using the type system.
      5   They are in a way proven to be enforeced in the expanded code,
      6   but in order to write verified compilers, it would be necessary to
      7   prove that the type indeed enforces the desired property, for all
      8   possible macro inputs.
      9   
     10   Advantage: these proofs need to be written only once for the graph macro,
     11   and can be relied on by all compilers written using these static guarantees.
     12 * Some guarantees can be enforced statically "by construction",
     13   and the design patterns used to enforce them are guaranteed to be
     14   correctly applied because the code is macro-generated.
     15   In order to write verified compilers, it would be necessary to
     16   prove that the macro-generated code always enforces the desired property.
     17 
     18   Advantage: these proofs need to be written only once for the graph macro,
     19   and can be relied on by all compilers written using these static guarantees.
     20 * Some guarantees can be enforced at run-time.
     21   This allows catching bugs early, and ensures no incorrect output can occur
     22   when these guarantees are broken. It does not however ensure that the compiler
     23   will never break these guarantees. It is therefore still possible to ship a
     24   broken compiler which fails on valid inputs.
     25 
     26   Drawback: it is necessary for every compiler written using such guards to prove
     27   that the guards are never raised.
     28 
     29 
     30 Features
     31 ========
     32 
     33 * GC:
     34   * Group nodes by SCC
     35   * Two kinds of "indices" pointers:
     36     * index within the current strongly-connected component
     37     * direct pointer to a descendant SCC + index within it
     38   * To limit the waste when there is an SCC not containing nodes of a given type,
     39     do not use a fixed tuple of vectors of nodes. Instead, use a "list with possible omissions".
     40     Problem: the size of the fully-expanded type in TR is O(2ⁿ) for these.
     41     Either:
     42     * use casts
     43     * pay the compile-time price of a huge type (not feasible with large N)
     44     * pay the run-time price of bloated components (feasible but costly with large N)
     45     * use a case→ function, which returns the empty vector as a fallback. With pure functions, this could be doable.
     46       * Problem: generating all the possible functions means O(2^n) templates for closures that may be used at
     47         run-time
     48       * Otherwise, generate one closure over 1 array, one closure over 2 arrays, one over 3 arrays, etc.
     49         and also close over some markers indicating to which "real" arrays these correspond.
     50         This means O(2n) space usage worst-case.
     51         Problem again: the construction function for such a case→ while minimizing the number of values closed on
     52         has a large O(2^n) type too.
     53     * typed clojure has typed heterogeneous maps https://github.com/clojure/core.typed/wiki/Types#heterogeneous-maps
     54   * In theory it could be possible to give hints to the GC,
     55     but Racket's GC does not offer that possibility.
     56 
     57 GC Implementation:
     58 * Must be optional
     59 * Initially, all nodes within a single component, all indices are of the "slef + index" kind.
     60 * Use tarjan's algorithm
     61 * group nodes by SCC
     62   Update references to nodes in the same SCC, using just an index
     63   Update references to nodes in "lower" SCCs, using a pointer to the SCC + an index.
     64 * Due to the higher memory cost (+ 1 reference for references to other SCCs),
     65   and the poor set of options for implementation (cast, exponential size of type, or large memory usage)
     66   we will not implement the GC.
     67   
     68   We argue that in most cases it would not be useful, as compilers will normally work on the whole graph, and
     69   not on a subgraph. In the rare case where only part of a graph is needed, applying the graph-level identity
     70   transformation to the nodes of interest would effectively copy them and the nodes reachable from these new roots,
     71   thereby allowing the original graph to be freed.
     72 
     73 * Invariants
     74   * Scope of the invariants:
     75     * Input contracts, types and structural properties
     76     * Output contracts, types and structural properties
     77     * Transformation invariants (relate the input and output graphs)
     78   * Time of verification:
     79     * Run-time
     80       Should be easy enough to implement: Add a define-graph-invariant form, recognize these options, and check
     81       them at run-time. Implemented as some sorts of contracts.
     82     * Compile-time
     83       * Node types
     84       * Statically-enforced structural invariants at the level of node types
     85         (e.g. disallow cycles in the types, to ensure there are no cycles in the instance)
     86       * Macros can be used to generate code which is known to be correct
     87         e.g. paths 
     88         * Concern: static "by construction" guarantees may interfere with each other, if they fill in some nodes,
     89           e.g. a "no cycles starting from this node" constraint would not work as expected if a "backwards link"
     90           is filled in afterwards. We probably need to hardcode a set of constraints which know about eachother
     91           and about the potential interactions.
     92       * Conserve well-scopedness within a transition: pass in nodes flagged with a ∀ type, and check
     93         that the output contains that flag.
     94       * PHOAS
     95   * Specification
     96     * Invariants specified in the graph type
     97     * Transformation invariants specified on the graph creation code
     98     * Checks (run-time or the various compile-time mechanisms) are specified in the graph creation code
     99       The graph creation code must enforce all invariants within the type
    100       That way, it is guaranteed that any instance of the graph type satisfies its invariants, either by
    101       construction, or as a guard at the end of the construction.
    102 
    103 * Automatic generation of mappings
    104   * when there is no mapping taking an old node as an input, a mapping is automatically generated.
    105     The mapping simply translates the old node to the new node type, and recursively transforms its fields,
    106     traversing lists etc.
    107   * When there is a mapping with more than one argument, then no mapping is auto-generated for that input node type,
    108     and instead calls to the mapping must be explicit (i.e. can't return the old node type).
    109   * This means that we have a mechanism before the actual core graph macro, which checks and decides which mappings
    110     to auto-generate.
    111   * We also have a mechanism for auto-calling transformations on input node types
    112   * Possibility to escape this, in order to actually insert a reference to the old graph?
    113     * Some notation in the type, to indicate that it's "protected" in some way?
    114     * Some wrapper indicating that it should merely be unwrapped, not processed? (I think that's better).
    115 
    116 * Structural node equality
    117   * Would be nice to coalesce nodes which are equal? (i.e. hash consing)
    118     so that any two nodes which are equal? within the same graph have the same index.
    119     I suppose this would be rather costly: O(n log n) comparisons, with each comparison potentially
    120     costly, in principle. If we cached results, we could achieve a better run-time in practice, and
    121     perhaps a better theoretical complexity if we handled cycles ourselves.
    122     * The general algorithm when there are no unordered sets is deterministic finite automaton minimization
    123     * The general algorithm when there are unordered sets is nondeterministic finite automaton minimization
    124     * We could cache all calls to equal? on two nodes for a limited dynamic scope.
    125     * If we have this, then we can, for comparisons within the same graph, quickly return #t or #f based on eq?
    126   * Alpha-equivalence comparisons
    127     (i.e. ignore the actual values of some fields, except for constraining the shape of the graph)
    128     Possibility: create values which are unique within the graph, but are ignored when comparing
    129                  nodes from different graphs.
    130     I'm not sure whether plainly erasing the information won't be enough, for a weaker form of alpha-equivalence?
    131     * Alpha-equivalence can't easily be implemented atop equal?, dropping it for now.
    132       It's not extremely useful anyway, as thanks to the full-blown graph representation, we are likely to
    133       discard names early on (keeping them just for printing / debuggin purposes).
    134 
    135 
    136 
    137 
    138 
    139 
    140 * Coloring: the existing graph library for Racket has some coloring algorithms:
    141   http://docs.racket-lang.org/graph/index.html#%28def._%28%28lib._graph%2Fmain..rkt%29._coloring%2Fgreedy%29%29
    142   Maybe we can build a wrapper for those?