G: definition and design

Detail is two things. One is the definition of a detail, which we've just seen. The other is this thing called G: the global function.

G is a logically universal, global, computational, complete, stateless namespace. None of these words has a solid definition and a couple are downright meaningless.

So there are two ways to understand G. One is to skip ahead to the computational spec, which is not terribly complex, but which may not mean much by itself. The other is to read this explanation, which defines what I mean by "namespace," "complete," "stateless," "computational," "global," and "logically universal."

Namespace

A namespace is a mapping from names to values. For example, the Web is a namespace. Names in the Web are URLs; values are documents.

G has no special types for names or values. A G name (or "gname," but we often just say "name") can be any detail. So can a G value ("gvalue").

The common syntax for a gvalue is just *(gname), where gname is any detail in the common syntax.

(The Web analogy doesn't work here: [] is not a link syntax, like a Web href. It's an evaluation syntax. In other words, in general, in a language using this syntax for G, you can replace the text *(gname) with the syntax for the actual gvalue, and the program will work the same.)

Stateless

Another way of saying G is a namespace is to say that it's a "pure function" from detail to detail. In some ways this is an obvious duality, but in some ways it adds information: the proper dual of "pure function" is not "namespace," but "stateless namespace."

In other words, G does not and cannot change.

There is also an analogy to filesystem-like storage models with immutable data objects - for example, source code control systems with static version files - but the analogy is less exact, because creation and delete operations in these systems are stateful.

(The Web analogy remains broken here. In HTTP terms, G deletes the layer of indirection between resource and entity. It's hard to imagine the Web if the document behind a URL could not change. However, this doesn't mean G cannot model data on a network, just that it has to do it differently.)

Complete

G is complete: every detail is a valid gname. There is no such thing as an invalid, missing, or undefined gname.

In fact, G contains all information. More precisely, it is the set of all true statements. Or at least, all the statements that everyone who uses G is willing to agree on.

For this reason, a gname-gvalue pair is sometimes called a "gstatement", or just "statement". The task of normalizing information into statements is homologous to normalization in database theory.

Obviously, no computer can store infinite information, or even be able to access it on demand (for example, many gnames refer to information that will be available only in the future). Therefore any real interpreter of G must have a concept of known and unknown statements. In other words, it must have state.

However, a stable application must not depend on this state. In general, an application that evaluates an unknown statement will return control to its interpreter - in other words, crash.

For example, testing whether a gname is known or querying over all known statements, both common and natural operations in database models, are extremely unkosher (and in general impossible to express) in Detail.

(So how can Detail replace a conventional DB? In general, the code winds up pretty much the same; we just have to go through more lawyering to specify it. For instance, a database in Detail is generally defined as a functionally derived view of a time series of events, ie transactions, where every point in time has a list of events at that time. So one might query whether the set of all statements defined by events between two points in time includes some gname; if the two points are zero and now, the result is equivalent to a conventional key existence test.)

G lives in an odd place between storage systems and pure functional languages. An interpreter cannot modify or even create a statement in G; when it transitions a name from unknown to known status, we say it "learns" the name's value. But where the interpreter has the authority to declare the truth of an arbitrary statement, or one that is arbitrary as long as it satisfies some integrity constraint, learning is the practical equivalent of a storage operation.

And an interpreter can unlearn a statement, too. What it can't do is unlearn it and then relearn the name with a different value. This would be equivalent to a modification operation, and of course violate the statelessness of G.

If this issue still seems fuzzy, it may just be too heavy for a good introduction. It will be cleared up when we state the formal definitions of interpreters and hosts.

Computational

Anyone who's typed a URL with "cgi-bin" in the path knows the concept of a computational namespace; as does anyone who has ever typed an expression on a Lisp command line.

Since G is a pure function, one obvious thing it can do is to use the gname or some part of it as the input to a program - or just evaluate the gname (or some part of it) as a program itself. Of course, these amount to the same thing.

We reserve part of the G namespace (to be precise, all names which are pairs whose heads are blank) as a definition of computation. Compute space is recursive: it can use any information in G, and is as one might expect Turing-complete.

Compute space is defined precisely in the next spec. Briefly, the basic tool of computation in G is a function called mix, which builds a new detail from an expression and an argument.

Expression mixing is a rather mechanical template expansion. It's no more or less powerful than lambda-calculus (the most common theoretical abstraction of computing). In pure mathematical terms it may be slightly more complex. However, because it does not use higher-order functions, it should be easier for unsophisticated programmers to follow. There is some evidence for this from the popularity of a variety of template languages in Web server programming. Mixing also has no fundamental concept of variables or scope, which may make program transformation somewhat easier - but this remains to be seen.

Global

G, like the Web, is global: a name has the same value on every host. Or at least if it doesn't, one of the hosts is corrupt.

There are three ways to assure global consistency. We can standardize a central library of static statements that all hosts know, either at installation or via secure upgrade. We can standardize a computational namespace that all hosts compute equivalently (when the computation halts with success). And we can divide a namespace into subspaces by the addresses of hosts, where each host is responsible for its own internal consistency.

These models correspond to the shape of the head of a gname. Gnames whose head is blank are (as we've seen) computational. Library constants have an integer head. And if the name head is a pair, it is a host address. (If the gname is not a pair: integer gnames are also in library space; the value of a blank gname is itself.)

The first design goal of library space is to attack the complexity of the computational space by moving as much functionality as possible into static definitions. In other words, we want to minimize the amount of normative text in the definition of G.

Experience shows us that language, subroutine, and protocol specifications written in English do not scale well. In particular, the number of Turing-complete languages in which multiple implementations have coexisted without interop bugs is certainly in the single digits and may just be zero. By restricting its use of English to a page of text in a semiformal grammar, G hopes to give it a shot.

For example, integer multiply is not defined in compute space. Compute space has English definitions of integer add and negate, and semiformal definitions which enable recursion and branching. This is enough to write an expression for integer multiply. We state this expression as the value of a simple (ie, versionless) name in library space.

Of course this expression is a horrible thing, at least in terms of computational complexity. In practice it is never actually evaluated, except perhaps in some kind of extreme test mode. Rather, an interpreter recognizes it by name and runs its own internal multiply code. Of course, since G is immutable, it does not need to track the dependency.

In other words, the library expression for multiply is just a specification, but a perfectly precise and stable one. It is easy to test against this sort of specification: just disable the optimization.

Of course, not all library expressions work this way. Some may just be useful bits of code. And some may be sort of in between. The interpreter, in other words, has freedom of implementation.

Likewise, not all library names are simple. Versioning, both with names and with numbers, is a necessity for anything much more complicated than integer multiply. The difference between this and the common meaning of "library versioning" is that, of course, use paths are not automatically updated. If you want a layer of indirection, you have to ask for it.

At a higher level, it's worth pondering the difference between the G library model and other approaches to public software. The library has aspects of an open source repository (like CPAN), a specification authority (like IETF), an assignment authority (like IANA), and a system distribution (like Debian). It (or any other stateless namespace) could also be used as a PKI root, an eternity service, etc.

Of course the library is not strictly necessary. We could keep all shared data in host address spaces, and simply depend on trust relationships between hosts. For example, this is how library upgrades have to work. But it would be a bit cumbersome to have a host address as part of the name of integer multiply.

Logically universal

Universal space is a part of compute space that is only rarely computable. A name in universal space is a universally qualified equation.

To be precise, universal space is the space of names of the form (_ $name $value). The value of (_ $name $value) is . (yin - ie, true) if and only if for every detail $argument, **(. $name $argument) (ie, the value whose name is the mix of $name and $argument) equals *(. $value $argument) (ie, the mix of $name and $argument).

Intuitively, universal space means that G includes the truth or falsehood of just about any generalization over G.

How does a G interpreter resolve names in universal space? Since there is no such thing as a perfect theorem prover, it can't do so perfectly. On the other hand, since real computers are finite, there is no such thing as a perfect interpreter of mix space, either - let alone the rest of G. Therefore any correctness specification for an interpreter of G has to include the possibility of computational failure, and it is no problem to throw in another partly unavailable subspace.

In practice, there are a number of ways we use universal space:

First, we can standardize a universal statement on all host address spaces, which constrains the set of statements a host can learn during its lifetime, to some subset that we canonize as correct. This will indirect through local data to define each host's own permanent integrity constraints, the permanent constraints define time-varying constraints, etc.

Second, in universal space we often find simple expressions that are very difficult to compute. For example, it is easy to define whether an expression is a valid interpreter of G, but very hard to write a procedure that actually produces a proof or its converse. (And even harder, of course, to write another procedure that proves that one correct.)

Essentially, this gives G a definition of trusted code. G can express simple universal specifications of high-level system software concepts: an interpreter, a compiler for a specified instruction set, a type signature checker, an OS-style kernel, etc, etc.

With more standardized universals, we can link these specifications into standard space, so that we might trust (for example) that any detail named *(good interpreter $iname) is a correct interpreter. Among other things, this gives Detail a structured way to extend its own trusted code base.

Of course, universals do not help us actually implement these system software components. They just specify some high-level interfaces which are simple and very stable. They cannot (in general) be used to formally verify a component, but they are likely to be helpful in testing, especially interoperability testing.

[Ed.: it is hard to talk about universals without sounding fuzzy, and it will continue to be hard until I have a good set of examples. More to come here.]