|
|
|
||||
![]() |
|||||||||||||||||||||||||||||||||
Dance/multitude concurrent computation This invention computes by constructing a lattice of states. Every lattice of states corresponding to correct execution satisfies the temporal logic formula comprising a Definitive Axiomatic Notation for Concurrent Execution (DANCE) program. This invention integrates into a state-lattice computational model: a polymorphic, strong type system; visibility-limiting domains; first-order assertions; and logic for proving a program correctness. This invention includes special hardware means for elimination of cache coherency among other things, necessary to construct state-lattices concurrently. The model of computation for the DANCE language consisting of four, interrelated, logical systems describing state-lattices, types, domains, and assertions. A fifth logical system interrelates the other four systems allowing proofs of program correctness. The method of the present invention teaches programs as temporal formulas satisfied by lattices of states corresponding to correct execution. State-lattices that are short and bushy allow application of many processors simultaneously thus reducing execution time.
Primary Examiner: Harrell; Robert B. Attorney, Agent or Firm: Haugen and Nikolai, P.A. 1. A method of operating a plurality of computing means comprising the steps of: (a) transforming a temporal logic formula into machine executable form; (b) loading the machine executable form for the temporal logic formula and data to be manipulated into said plurality of computing means; and (c) constructing a lattice to satisfy said temporal logic formula, the lattice having a plurality of nodes, each node representing a state value that is a value of a program variable at a given moment, the plurality of nodes being interconnected by directed arcs, each directed arc representing a change in state affected by a computation associated with that arc. 2. The method of claim 1 wherein the state values are elements of strong types, including either constructed types or types defined using existential and bounded universal quantification. 3. The method of claim 1 wherein visibility of said program variables in a particular state is restricted to domains. 4. The method of claim 1 in which the state values may occupy computing means located in disparate geographical locations utilizing remote procedure call semantics. 5. The method of claim 1 and further including the step of proving correctness of the lattice construction automatically. 6. The method of claim 1 and further including the step of proving correctness of the lattice construction manually. 7. A computing system comprising: (a) a plurality of computing means operatively coupled by an interconnection network means for communicating with one another, the network means including means for performing global memory access routing, request combining and response decombining; (b) each of the computing means having an instruction set including combinable operations and write back capability; (c) the computing means further having a virtual memory address translation mechanism for enforcing access constraints for segments or pages of memory, said constraints being selected from the group consisting of: (i) memory shared, but no caching allowed, (ii) memory protected and temporary exclusive caching with processor indicated write back, (iii) discretionary memory access controls with exclusive demanded authorization user access rights, (iv) mandatory memory access controls in which a user may read values at an assigned security level or lower, write values at said assigned security level or higher, and only invoke combinable operations on values at the assigned security level. BACKGROUND OF THE INVENTION 1. Field of the Invention This invention relates to a method for the execution of plural computer programs being executed by a multiplicity of processors in a parallel configuration, and more specifically to performing multiple state transitions simultaneously. 2. Discussion of the Prior Art New Parallel Language Needed I surveyed ways to program parallel machines. Despite contentions to the contrary, I do not consider SIMD (single-instruction, multiple-datastream) machines like the Connection Machine (CM-2) to be parallel processors. SIMD machines are restricted to performing the same operation in many cells and thus can perform only a narrow range of applications. Most applications require much data-dependent branching that degrades performance on SIMD machines, such as the Cray YMP-8 and CM-2 alike. FORTRAN, with a vectorizing compiler, is appropriate for SIMD machines. Every attempt to wring efficient, massive parallelism from conventional FORTRAN programs has failed. Only programs expressed with a language embodying an inherently parallel model of computation can execute efficiently in parallel. The challenge to those wishing to exact optimum performance from highly parallel processors is to coordinate activities in processing nodes that are doing different things. Modifications to special-purpose languages like LISP and PROLOG to incorporate futures and guards, respectively, are problematic, addressing restricted application domains of largely academic interest. Then there are the many extensions of sequential languages. Such extensions fall into two classes: "processes" and "doall". Ada is perhaps the best example of a process-based language. Each Ada task (i.e. process) represents a single thread of control that cooperates with other tasks through message passing. Communicating Sequential Processes described in "Communicating Sequential Processes," C. A. R. Hoare, Communications of the ACM, Vol. 21, No. 8, August 1978 forms the underlying model of computation for message-passing, process-based languages. However, deciding whether an arbitrary collection of processes is deadlock-free is undecidable and proving that a particular collection of processes is deadlock-free is onerous. Furthermore, such pairwise communication and synchronization is unnecessarily restrictive thus limiting the degree of parallelism. Finally, a programmer writes (essentially) a plurality of sequential programs, rather than a single, parallel program. The "doall" is just a FORTRAN do-loop in which all the iterations are instead performed concurrently; again a collection of sequential programs. Current, Conventional, Concurrent Computing Traditional, sequential computing causes a sequence of states to be created so that the last state holds the data desired. FIG. 1 illustrates sequential computing starting at an initial state (101), proceeding through intermediate states (103), and terminating in a final state (102). The values of program variables at a particular instant of time comprise a state, depicted as circles. In state diagrams, time always flows down the page, that is, the state at the top of the figure occurs first followed by the state(s) connected by a line. Broken lines indicate repetition or "more of the same." In "parallel" execution several sequences of state transitions occur concurrently. FIG. 2 depicts parallel execution in doall style. The state diagram shows state transitions occurring as "parallel" sequences. The conception of parallel processing as a collection of sequential executions characterizes the prior art that this invention supersedes. Two mechanisms allow communication between sequential processes: message-passing or shared-memory. Load balancing and problem partitioning have produced many academic papers for either message-passing or shared-memory paradigms--nearly all yielding poor performance. Mutual exclusion (MutEx) via message passing (i.e. for distributed systems, no shared state) is horribly inefficient. If the application needs MutEx often, performance will suffer. "An Algorithm for Mutual Exclusion in Computer Networks," Glenn Ricart & Ashok K. Agrawala, Communications of the ACM, Vol. 24, No. 1, January 1981, for a discussion of prior art O(N) request/reply algorithm, which really needs O(N.sup.2) messages when everyone wants the lock. "A.sqroot.N Algorithm for Mutual Exclusion in Decentralized Systems," Mamoru Maekawa, ACM Transactions on Computer Systems, Vol. 3, No. 2, May 1985, reduces messages to O(.check mark.N) with an elegant multidimensional geometry. Sanders' generalized MutEx algorithm in more convoluted terms in her article, "The Information Structure of Distributed Mutual Exclusion Algorithms," Beverly A. Sanders, ACM Transactions on Computer Systems, Vol. 5, No. 1, August 1987. She considers deadlock freedom (absence of possibility that computation ceases because every process is waiting to acquire a lock held by a different process), but the basic model which she proposes needs further augmentation to detect and recover from deadlock. Of course, only a wordy explanation substitutes for correctness proof. Therefore the desired interference freedom provided by MutEx is not achieved. Raymond in "A Tree-Based Algorithm for Distributed Mutual Exclusion," ACM Transactions on Computer Systems, Vol. 7, No. 1, February 1989, further reduced the number of messages to O(logN) by increasing the latency, which could easily become a significant burden if processes need the lock briefly. Controlling access to shared variables is so difficult that designers have resorted to the message-passing model in which all state variables are local to some process and processes interact exclusively, via exchange of messages. Inevitably, two different kinds of system-wide operations barrier synchronization and remote state acquisition are required by the algorithm or application. They are notoriously inefficient. Barrier synchronization ensures that all the parallel computations in one phase are completed before the next phase is begun. Remote state acquisition occurs when the application's data cannot be partitioned disjointly so that all processes use values exclusively from its partition held in local memory. For example, advanced magnetic resonance imaging (MRI) research seeks to analyze brain scans and call to the attention of the radiologist any anomalies. Three-dimensional fast Fourier transform (FFT) is necessary for filtering and extraction of interesting features. The data consists of a 1 k by 1 k by 1 k array of floating point intensities. The first phase of the algorithm maps nicely by putting a 1 k by 1 k plane in each of 1 k processors. However integrating the third dimension finds the data in the worst possible distribution--every computation needs values that are possessed by each of the 1 k processors| Barriers are an artifact of the popular yet problematic process paradigm. A static collection of sequential processes alternately work independently and wait for laggards. FIG. 3 depicts four processes (201) synchronized with barriers (204). Each process is actively computing (202) and then wait idly (203) until all other processes reach the barrier. The best known algorithms for barrier-synchronization take O(logP) time (with a large constant) for both message-passing and shared-memory models. This corresponds with the depth of a tree using either messages or flags. A large body of academic work has been produced for both message-passing and parallel, random-access machine (PRAM) models, i.e. shared memory. Barrier synchronization in a message-passing model requires O(logP) time where P is number of processors, with a big constant proportional to message transmission latency. See "Algorithms for Scalable Synchronization on Shared-Memory Multiprocessors," John M. Mellor-Crummey and Michael L. Scott, ACM Transactions on Computer Systems, Vol. 9, No. 1, February 1991. Consider an ideally parallelizable application that requires synchronization. A single processor would require N steps (time, T.sub.8) to perform the application with L iterations of a single loop each requiring N/L steps. When parallelized, barrier synchronization is required at the end of each iteration taking logP steps. Real systems take many steps for each level of the synchronization tree. So the parallel time for each iteration, T.sub.i is N/(L*P)+logP and the whole application T.sub.p, is T.sub.i *L or N/P+L*logP steps. Speedup, S, is the ratio of sequential time to parallel time, T.sub.s /T.sub.p. Efficiency, E, is the speedup obtained divided by the number of processors used, S/P. In this case, E=T.sub.s /T.sub.p *P=N/(N+P*L*logP).
______________________________________
Total work to be done
T.sub.s = N steps
(sequential)
Number of loop iterations
L
Sequential time per iteration
N/L
Parallel time per iteration
T.sub.j = N/(L*P) + logP
Total parallel time
T.sub.p = T.sub.i *L = N/P + L*logP
Speedup S = T.sub.s /T.sub.p = N*P/(N + P*L*logP)
S = P/(1 + (L/N)*P*logP)
Efficiency E = S/P = N/(N + P*L*logP)
E = 1/(1 + (L/N)*P*logP) - O(1/logP)
______________________________________
When the number of processors is small, N>>P*L*logP, the efficiency can be quite good, close to 1. But as more processors are used (N.apprxeq.P) the barrier synchronization overhead begins to dominate even when only a single barrier is needed. Therefore, an application needing barrier synchronization on any conceivable message-passing architecture (N.apprxeq.P), the minimum time of execution will be O(logP), the best speedup possible is O(P/logP), the efficiency can be no better than O(1/logP). Little wonder that efficiencies of even 10% are hard to achieve on parallel processors like CM-2, CM-5, NCube, Paragon, and T3D. They can perform no better than their theoretical model.
______________________________________
Order parallel time T.sub.p = O(logP)
Order speedup S = O(P/logP)
Order efficiency E = O(1/logP)
______________________________________
Parallel Random-Access Machines (PRAMs) suffer similar fate. When most people think of "shared-memory" they imagine a PRAM--a bunch of sequential processors that can read and write the same memory locations. Many PRAM variations exist. Whether the memory locations can be accessed concurrently or exclusively and what happens for concurrent accesses constitute most of the PRAM model differences. Even unrealistic operations like storing the median or geometric mean of all concurrent writes still inflicts an O(logP) penalty for barrier synchronization same as message-passing. Sequential algorithms have been traditionally analyzed using Turing machines. Both the time (number of steps) and the space (number of tape squares) are used to measure the complexity of a computation. Additionally whether the Turing machine has a unique next action (deterministic) or a set of possible next actions (nondeterministic) substantially affects the complexity measure of a computation. The following paragraphs summarize three chapters of a recent book on parallel algorithms entitled "Synthesis of Parallel Algorithms", John H. Reif, editor, Morgan Kaufmann, 1993, that I believe is state-of-the-art computer science theory for shared-memory computers. The following sections quote heavily those authors (Faith E. Fich, Raymond Greenlaw, and Phillip B. Gibbons) and will be indented to indicate quotation. PRAMs attempt to model the computational behavior of a shared-memory parallel processor. The alternative is shared-nothing, message-passing parallel processors, which are popular these days. PRAMs have many variations and have a large body of published scholarship. Fich, Greenlaw, and Gibbons cover the main kinds of PRAMs, which I deem represent the conventional wisdom about how to think about parallel computation. This paradigm has, so far, not yielded efficient, general-purpose, parallel processing. Thus the motive for the present invention. Fich's chapter begins with a concise definition of a "synchronous" PRAM. "The PRAM is a synchronous model of parallel computation in which processors communicate via shared memory. It consists of m shared memory cells M.sub.1 . . . , M.sub.m and p processors P.sub.1, . . . , P.sub.p. Each processor is a random access machine (RAM) with a private local memory. During every step of a computation a processor may read from one shared memory cell perform a local operation, and write to one memory cell. Reads, local operations, and writes are viewed as occurring during three separate phases. This simplifies analysis and only changes the running time of an algorithm by a constant factor." (Underline mine.) Throughout Fich's paper she assumes a synchronous PRAM: single-instruction, multiple-datastream (SIMD). This allows her to ignore synchronization costs and hassles in her analyses. The synchronous PRAM cannot model MIMD machines with their independent, data-dependent branching is and uncertain timing. Having all processors read-execute-write each step sidesteps sticky issues like what happens when a memory cell is read and written simultaneously. Furthermore the execute part of a step can perform any calculation whatsoever| So the lower bounds derived for various algorithms concentrate solely on shared-memory accesses while totally ignoring the real computation--data transformation in the execute part of the step. Fich describes "forking" PRAM in which new processors are created with a fork operation (as in Unix). A forking PRAM can dynamically describe parallel activities at run-time. (Regular PRAMs have a fixed set of processes/processors.) But a forking PRAM still is a collection of sequential execution streams, not a single, parallel program. "A forking PRAM is a PRAM in which a new processor is created when an existing processor executes a fork operation. In addition to creating the new processor, this operation specifies the task that the new processor is to perform (starting at the next time step)." After defining a forking PRAM Fich never mentions it again. But she does make claims for PRAMs about "programmer's view," "ignoring lower level architectural details such as . . . synchronization," and "performance of algorithms on the PRAM can be a good predictor of their relative performance on real machines," that I completely disagree with. Since PRAMs exemplify the process paradigm (collection of sequential executions) they may adequately predict performance of algorithms and architectures similarly afflicted (all machines built so far), but certainly not what's possible. Most distinctions between PRAMs involve memory access restrictions. The most restrictive is called exclusive-read exclusive-write (EREW). Ensuring that two processors don't write to the same memory cell simultaneously is trivial with a synchronous PRAM since all operations are under central control. The other four memory access restrictions are: concurrent-read exclusive write (CREW), COMMON, ARBITRARY, and PRIORITY. The last three restrictions differ in how concurrent writes are handled. With COMMON, values written at the same time to the same cell must have the same value. With ARBITRARY, any value written in the same step may be stored in the memory cell. With PRIORITY, the value written by the highest priority process (lowest process number) is stored in the memory cell. Obviously an EREW program would run just fine on a CREW machine; the concurrent read hardware would not be used. The five memory restrictions Fich uses nest nicely in a power hierarchy: PRIORITY>ARBITRARY>COMMON>CREW>EREW Fich then spends much of her chapter showing how many steps it takes one PRAM to simulate a single step of another. Proving that EREW can simulate a step of PRIORITY using O(log p) steps and p*m memory cells takes Fich two pages of wordy prose. Essentially the simulation takes three phases: Phase 1: Each processor that wants to write marks its dedicated cell from the p EREW cells used for each PRIORITY cell. Phase 2: By evaluating (in parallel) a binary tree of all processors that want to write the highest priority processor is found. Phase 3: The winning processor writes the "real" memory cell. Since the height of the binary tree is O(log p) it takes O(log p) steps for an EREW PRAM to emulate a PRIORITY PRAM. Much of Fich's chapter is devoted to convoluted schemes by which one model simulates another without mentioning any applications that benefit from a more powerful and expensive model. Gibbons finally provides a reasonable PRAM model in the very last chapter of this 1000 page book. None of the other authors constrain themselves to reasonable models, thus limiting the value of their work. In Gibbons' PRAM model: "Existing MIMD machines are asynchronous, i.e., the processors are not constrained to operate in lock-step. Each processor can proceed through its program at its own speed, constrained by the progress of other processors only at explicit synchronization points. However in the (synchronous) PRAM model, all processors execute a PRAM program in lock-step with one another. Thus in order to safely execute a PRAM program on an asynchronous machine, there must be a synchronization point after each PRAM instruction. Synchronizing after each instruction is inherently inefficient since the ability of the machine to run asynchronously is not fully exploited and there is a (potentially large) overhead in performing the synchronization. Therefore our first modification to the PRAM model will be to permit the processors to run asynchronously and then charge for any needed synchronization." (underline mine) Additionally shared memory accesses are charged more than local accesses: "To keep the model simple we will use a single parameter, d, to quantify the communication delay to memory. This parameter is intended to capture the ratio of the median time for global memory accesses to the median time for a local operation." I invented the Layered class of multistage interconnection networks ›U.S. Pat. No. 4,833,468, LAYERED NETWORK, Larson et al., May 23, 1989! specifically to keep d small. For the Multitude architecture (interconnected by a Layered network) the ratio of time for global memory accesses to local memory accesses is about 3 to 4. This compares with a ratio of local memory access time to cache hit time of 20 to 30. Gibbons' asynchronous PRAM model uses four types of instructions: global read, global write, local operation, and synchronization step. The cost measures are:
______________________________________
Instruction Cost
______________________________________
local operation 1
global read or write d
k global reads or writes
d + k - 1
synchronization barrier
B
______________________________________
The parameter B=B(p), the time to synchronize all the processors, is a nondecreasing function of the number of processors, p, used by the program. In the Asynchronous PRAM model, the parameters are assumed to obey the following constraints: 2.ltoreq.d.ltoreq.B.ltoreq.p. However a reasonable assumption for modeling most machines is that B(p).di-elect cons.O(d/log p) or B(p).di-elect cons.O(d/log p/log d). Here again is evidence of a logP performance hit for barrier synchronization. A synchronous EREW PRAM can be directly adapted to run on an asynchronous PRAM by inserting two synchronization barriers, one after the read phase and write phase of the synchronous PRAM time step. Thus a single step of a synchronous PRAM can be simulated in 2B+2d+1 time on an asynchronous PRAM. Therefore a synchronous EREW PRAM algorithm running in t time using p processors will take t(2B+2d+1) time on an asynchronous PRAM--a significant penalty. Gibbons shows how to maintain the O(Bt) running time but use only p/B processors by bunching the operations of B synchronous processors into a single asynchronous processor thus getting more work done for each barrier synchronization. Gibbons transforms an EREW algorithm for the venerable "all prefix sums" operation for an asynchronous PRAM yielding an O(B log n/log B) time algorithm which isn't too bad. Gibbons presents asynchronous versions of descaling lemmas (performing the same application with fewer processors) which tack on a term for the barrier synchronizations. Still he uses many barriers in order to obtain behavior similar to synchronous PRAMs. Only when applications are synchronized just when they need it, and those needed synchronizations are really fast, B.apprxeq.d.apprxeq.1, will scaled-up parallel processing be possible, which is the contribution of this invention and Layered networks. Gibbons considers several algorithms that run in O(B log n/log B) time. In particular, he proves an interesting theorem about summing numbers: Given n numbers stored one per global memory location, and the following four types of instructions: L:=G, L:=L+L, G:=L and "barrier," where L is a local cell and G is a global cell, then the sum of n numbers on an Asynchronous PRAM with this instructions set requires .OMEGA.(B log n/log B) regardless of the number of processors. This is interesting because Multitude can sum n integers in O(1) time| Gibbons concludes with "subset synchronization" in which only a subset of processors in the machine need synchronize. Since the barriers involve fewer processors, and thus are shorter, summing numbers runs marginally faster. So the best PRAM model I could find charges PRAM algorithms for their "cheating," but still offers no insights about how to efficiently compute on an MIMD machine. Temporal Logic Moszkowski in his Executing Temporal Logic Programs ›Cambridge University Press, 1986! presented a temporal-logic programming language. His language, Tempura, allows description of parallel execution (unsatisfactorily) within a single program expression. The foundation for Tempura is interval temporal logic. A Tempura interval is a non-empty sequence of states. A state is determined by the values of program variables at a particular instant of time. The sequence of states that form an interval provide a discrete notion of time. The semantics of Tempura were much different than any I had previously encountered. Instead of programs being a step-by-step recipe for how to do the computation, they were logical formulas that are "satisfied" by constructing a "model" that makes the formula "true." Unfortunately the models constructed are simply sequences of states--the process model underlying PRAMs. The semantics of what is referred to herein as DANCE programs, although also temporal logic formula based, are satisfied by constructing models with a different and superior mathematical structure that allows efficient execution on a plurality of computational engines. It is this mathematical structure that forms the core of this invention. BACKGROUND--MATHEMATICAL FOUNDATIONS In this section of the specification, we explain the notation used throughout this document and some necessary elementary notions from mathematical logic. Although these notations are fairly standard, this presentation is adapted from Apt and Olderog's book ›Verification of Sequential and Concurrent Programs, Krzysztof R. Apt, and Ernst-Ruidiger Olderog, Springer-Verlag, 1991!. Sets It is assumed that the reader is familiar with the notion of a set, a collection of elements. Finite sets may be specified by enumerating their elements between curly brackets. For example {T, F} denotes the set consisting of the Boolean constants T (true) and F (false). When enumerating the elements of a set, I sometimes use ". . . " as a notation. For example, {1, . . . , n} denotes the set consisting of the natural numbers from 1 to n where the upper bound, n, is a natural number that is not further specified. More generally, sets are specified by referring to some property of their elements: {x.vertline.P} denotes the set consisting of all elements x that satisfy the property P. The bar, .vertline., can be read "such that." For example, {x.vertline.x is an integer and x is divisible by 2} denotes the infinite set of all even integers. We write a A to denote that a is an element of the set A, and b A to denote that b is not an element of A. Often it is convenient to refer to a given set when defining a new set. We write {x.ANG. A.vertline.P} as an abbreviation for {x.vertline.x.di-elect cons.A and P}. Some sets have standard names .o slashed. denotes the empty set, Z denotes the set of all integers, and N.sub.0 denotes the set of all natural numbers including zero. In a set one does not distinguish repetitions of elements. Thus {T, F} and {T, T, F} are the same set. Similarly, the order of elements is irrelevant. In general, two sets A and B are equal if-and-only-if the have the same elements; in symbols: A=B. Let A and B be sets. Then A.OR right.B (and B.OR left.A) denotes that A is a subset of B, A.andgate.B denotes the intersection of A and B, A.orgate.B denotes the union of A and B, and A-B denotes the set difference of A and B. Symbolically, A.OR right.B if a.di-elect cons.B for every a.di-elect cons.A, A.andgate.B={a.vertline.a.di-elect cons.A and a.di-elect cons.B}, A.orgate.B={a.vertline.a.di-elect cons.A or a.di-elect cons.B}, A-B={a.vertline.a.di-elect cons.A and a.epsilon slash.B}. Note that A=B if both A.OR right.B and B.OR right.A. A and B are disjoint if they have no element in common that is A.andgate.B=.o slashed.. The definitions of intersection and union can be generalized to more than two sets. Let A.sub.i be a set for every element i of some other set J. Then .andgate..sub.i.di-elect cons.J A.sub.i ={a.vertline.a.di-elect cons.A.sub.i for all i.di-elect cons.J} .andgate..sub.i.di-elect cons.J A.sub.i ={a.vertline.a.di-elect cons.A.sub.i for some i.di-elect cons.J} For a finite set A, card(A) denotes the cardinality, or the number of elements of A. For a non-empty, finite set A.OR right.Z, min(A) denotes the minimum of all integers in A. Tuples In sets the repetition of elements and their order is irrelevant. If ordering matters, we use ordered pairs and tuples. For elements a and b, not necessarily distinct, (a,b) is an ordered pair or simply pair. Then a and b are called components of (a,b). Two pairs (a,b) and (c,d) are identical (a,b)=(c,d), if-and-only-if a=c and b=d. More generally, let n be any natural number, n.di-elect cons.N.sub.0. Then if a1, . . . , an are any n elements then (a1, . . . , an) is an n-tuple. The element ai, where i {1, . . . , n} is called the i-th component of (a1, . . . , an). An n-tuple (a1, . . . , an) is equal to an m-tuple (b1, . . . , bm) if-and-only-if n=m and ai=bi for all i {1, . . ., n}. More generally, let n be any natural number, n.di-elect cons.N.sub.0. Then if a.sub.1, . . . , a.sub.n are any n elements, then (a.sub.1, . . . , a.sub.n) is an n-tuple. The element a.sub.i, where i.di-elect cons.{1, . . . , n} is called the i-th component of (a.sub.1, . . . , a.sub.n). An n-tuple (a.sub.1, . . . , a.sub.n) is equal to an m-tuple (b.sub.1, . . . , b.sub.m) if-and-only-if n=m and a.sub.i =b.sub.i for all i.di-elect cons.{1, . . . , n}. Note that 2-tuples are pairs. Additionally, a 0-tuple is written as (), and a 1-tuple as (a) for any element a. The Cartesian product A.times.B of sets A and B consists of all pairs (a,b) with a.di-elect cons.A and b.di-elect cons.B. The n-fold Cartesian product A.sub.1 x . . . xA.sub.n of sets A.sub.1, . . . , A.sub.n consists of all n-tuples (a.sub.1, . . . , a.sub.n) with a.sub.i .di-elect cons.A.sub.i for i.di-elect cons.{1, . . . , n}. If all A.sub.i are the same set A, then the n-fold Cartesian product Ax . . . xA is also written A.sup.n. Relations A binary relation R between sets A and B is a subset of the Cartesian product A.times.B, that is, R.OR right.A.times.B. If A=B then R is called a relation on A. For example, {(a,1),(b,2),(c,2)} is a binary relation between {a,b,c} and {1,2}. More generally, for any natural number n an n-ary relation R between A.sub.1, . . . , A.sub.n is a subset of the n-fold Cartesian product A.sub.1 x . . . xA.sub.n, that is, R.OR right.A.sub.1 x . . . xA.sub.n. Note that 2-ary relations are the same as binary relations. Instead of 1-ary and 3-ary relations one usually uses unary and ternary instead. Consider a relation R on a set A. R is called reflexive if (a,a).di-elect cons.R for all a.di-elect cons.A; it is called irreflexive if (a,a).epsilon slash.R for all a.di-elect cons.A. R is called symmetric if for all a,b.di-elect cons.A whenever (a,b).di-elect cons.R then also (b,a).di-elect cons.R; it is called antisymmetric if for all a,b.di-elect cons.A whenever (a,b).di-elect cons.R and (b,a).di-elect cons.R, then a=b. R is called transitive if for all a,b,c.ANG.A whenever (a,b).di-elect cons.R and (b,c).di-elect cons.R, then also (a,c).di-elect cons.R. The transitive, reflexive closure R* of a relation R on a set A is the smallest transitive and reflexive relation on A that contains R as a subset. The transitive, irreflexive closure R.sup.+ of a relation R on a set A is the smallest transitive and irreflexive relation on A that contains R as a subset. The transitive, irreflexive closure is necessary because the graphs considered later do not have self-loops. The "plus" superscript may not be standard. The relational composition R.sub.1 .smallcircle.R.sub.2 of relations R.sub.1 and R.sub.2 on a set A is defined as follows: R.sub.1 .smallcircle.R.sub.2 ={(a,c).vertline.there exists b.di-elect cons.A with (a,b).di-elect cons.R.sub.1 and (b,c).di-elect cons.R.sub.2 }. For any natural number n the n-fold relational composition R.sup.n of of a relation R on a set A is defined inductively as follows: R.sup.0 ={(a,a).vertline.a.di-elect cons.A}, R.sup.n+1 R=R.sub.n .smallcircle.R. Note that R*=.orgate..sub.n.di-elect cons.No R.sup.n and that R*=.orgate..sub.n.di-elect cons.N0-{0} R.sup.n =R*-R.sup.0 where R.sup.1 =R. Membership of pairs in a binary relation R is mostly written in infix notation, so instead of (a,b).di-elect cons.R one usually writes aRb. Any binary relation R.OR right.A.times.B has an inverse R.sup.-1 .OR right.B.times.A defined as: bR.sup.-1 a iff aRb. Functions Let A and B be sets. A function or mapping from A to B is a binary relation .function. between A and B with the following special property: for each element a.di-elect cons.A there is exactly one element b.di-elect cons.B with a.function.b. Mostly we use prefix notation for function application and write .function.(a)=b instead of a.function.b. For some functions, however, we use postfix notation and write a.function.=b. In both cases b is called the value of .function. applied to the argument a. To indicate that .function. is a function from A to B we write .function.:A.fwdarw.B. The set A is called the domain of .function. and the set B is the co-domain of .function.. Consider a function .function.:A.fwdarw.B and some set X.OR right.A. Then the restriction of .function. to X is denoted by .function.›X! and defined as the Intersection of .function. (which is a subset of A.times.B) with X.times.B: .function.›X!=.function..andgate.(X.times.B). We are sometimes interested in functions with special properties. A function .function.:A.fwdarw.B is called one-to-one or injective if .function.(a.sub.1)=.function.(a.sub.2) for any two distinct elements a.sub.1, a.sub.2 .di-elect cons.A; it is called onto or subjective if for every element b.di-elect cons.B there exits an element a.di-elect cons.A with .function.(a)=b; it is called bijective if it is both injective and surjective. Consider a function whose domain is a Cartesian product, say .function.:A.sub.1 x . . . xA.sub.n .fwdarw.B. Then it is customary to drop one pair of parentheses when applying .function. to an element (a.sub.1, . . . , a.sub.n).di-elect cons.A.sub.1 x . . . xA.sub.n. We also say .function. if an n-ary function. Consider a function whose domain and co-domain coincide, say .function.:A.fwdarw.A. An element a.di-elect cons.A is called a fixed point of .function. if .function.(a)=a. Sequences In the following let A be a set. A sequence of elements from A of length n.gtoreq.0 is a function .function.:{1, . . . , n}.fwdarw.A. We write a sequence .function. by listing the values of .function. without punctuation in the order of ascending arguments, that is, as a.sub.1 . . . a.sub.n where a.sub.1 =.function.(1), . . . , a.sub.n =.function.(n). Then a.sub.1, i.di-elect cons.{1, . . . , n} is referred to as the i-th element in the sequence a.sub.1 . . . a.sub.n. A finite sequence is a sequence of any length n.gtoreq.0. A sequence of length 0 is called the empty sequence and is denoted by .epsilon.. We also allow countably infinite sequences. An infinite sequence of elements from A is a function .xi.: N.sub.0 .fwdarw.A. To exhibit the general form of an infinite sequence .xi. we typically write .xi.:a.sub.0 a.sub.1 a.sub.2 . . . if a.sub.i =.xi.(i) for all i.di-elect cons.N.sub.0. Then i is called the index of the element a.sub.i. Consider now relations R.sub.1, R.sub.2, . . . on a set A. For any finite sequence a.sub.0 . . . a.sub.n of elements from A with a.sub.0 R.sub.1 a.sub.i, a.sub.1 R.sub.2 a.sub.2, . . . , a.sub.n-1 R.sub.n a.sub.n we write a finite chain a.sub.0 R.sub.1 a.sub.1 R.sub.2 a.sub.2 . . . a.sub.n-1 R.sub.n a.sub.n. For example, using the relations = and < on Z, we may write a.sub.0 =a.sub.1 We apply this notion also to infinite sequences. Thus for any infinite sequence a.sub.0 a.sub.1 a.sub.2 . . . of elements of A with a.sub.0 R.sub.1 a.sub.1, a.sub.1 R.sub.2 a.sub.2, a.sub.2 R.sub.3 a.sub.3, . . . we write an infinite chain a.sub.0 R.sub.1 a.sub.1 R.sub.2 a.sub.2 R.sub.3 a.sub.3 . . . Such a chain stabilizes if from some index n onwards all elements a.sub.i with i.gtoreq.n are identical a.sub.0 R.sub.1 a.sub.1 R.sub.2 a.sub.2 . . . a.sub.n-1 R.sub.n a.sub.n =a.sub.n+1 =a.sub.n+2 =. . . Apt and Olderog use chains to describe the computations of programs. This invention does not. Chains represent the sequential paradigm of computation that permeates computer science. It is the fundamental limiting factor for both the kind of programs Apt and Olderog can handle, and the efficiency with which those programs can be executed. Strings A set of symbols is often called an alphabet. A string over an alphabet A is a finite sequence of symbols from A. For example 1+2 is a string over the alphabet {1,2,+}. The syntactic objects considered in this treatise are strings. We shall introduce several classes of strings: expressions, assertions, temporal predicates, programs, and correctness formulas. We use .tbd. for the syntactic identity of strings. For example 1+2=1+2 but not 1+2=2+1. The symbol = will be used for the semantic equality of objects. Thus 1+2=2+1. The concatenation of strings s.sub.1 and s.sub.2 yields the string s.sub.1 s.sub.2 formed by first writing s.sub.1 and then s.sub.2 without intervening spaces. A string t is called a substring of a string s if there exist strings s.sub.1 and s.sub.2 such that s=s.sub.1 ts.sub.2. Since s.sub.1 and s.sub.2 may be empty, s is always a substring of itself. Partial Orders A partial order is a pair (A,) consisting of a set A and reflexive, antisymmetric, and transitive relation on A. If xy for some x,y.di-elect cons.A, we say that x is less than or equal to y or y is greater than or equal to x. Since is reflexive, xx. Sometimes we consider irreflexive partial orders. These are pairs (A, ) consisting of a set A and an irreflexive and transitive relation on A. Consider now a partial order (A, ). Let a.di-elect cons.A and X.OR right.A. Then a is called the least element of X if a.di-elect cons.X and ax for all x.di-elect cons.X. The element a is called an upper bound of X if xa for all x.di-elect cons.X. Note that upper bound of X need not be elements of X. Let U be the set of all upper bounds of X. Then a is called the least upper bound of X if a is the least element of U. Similarly for greatest lower bound. A partial order is called complete if A contains a least element and if for every ascending chain a.sub.0 a.sub.1 a.sub.2 . . . of elements from A the set {a.sub.0, a.sub.1, a.sub.2, . . . } has a least upper bound. Graphs A graph is a pair (V, E) where V is a finite set of vertices {v.sub.1, v.sub.2, . . . , v.sub.n } and E is a finite set of edges where each edge is a pair of vertices {(v.sub.1, v.sub.2), (v.sub.1, v.sub.3), . . . , (b.sub.i, v.sub.j)}. All graphs we consider are directed in that the order of vertices within the pair describing an edge is significant. The set of edges E forms a relation on the set of vertices: E.OR right.V.times.V. The transitive, irreflexive closure of E, namely E.sup.+, is especially important. In the temporal logic created for DANCE (i.e. this invention) E.sup.+ is a partial order that corresponds to the concept of precedes. A graph (V, E) is a lattice if E.sup.+ is an irreflexive partial order, it has a least element 1.di-elect cons.V, and an upper bound u.di-elect cons.V. All executions of DANCE programs will create a lattice of states. Lattices may be combined to form new lattices in two ways. Consider two lattices (V.sub.1, E.sub.1) and (V.sub.2, E.sub.2) that have no vertices in common, V.sub.1 .andgate.V.sub.2 =.o slashed., least elements l.sub.1 .di-elect cons.V.sub.1 and l.sub.2 .di-elect cons.V.sub.2, and upper bounds u.sub.1 .di-elect cons.V.sub.1 and u.sub.2 .di-elect cons.V.sub.2. Their sequential combination may be performed as follows: substitute u.sub.1 for l.sub.2 in V.sub.2 and E.sub.2 and form the union of the vertices and edges, (V.sub.1 .orgate.V.sub.2, E.sub.1 .orgate.E.sub.2). Their insertion combination may be performed as follows: choose an edge e.di-elect cons.E.sub.1, e=(v.sub.i, v.sub.j), remove it from E.sub.1, substitute v.sub.i for l.sub.2 and v.sub.j for u.sub.2 in V.sub.2 and E.sub.2 and form the union of the vertices and edges, (V.sub.1 .orgate.V.sub.2, E.sub.1 .orgate.E.sub.2). We assert, without proof, that sequential and insertion combination of lattices results in a lattice. Proofs A mathematical proof is a sequence of statements formulas or expressions, each of which is an axion of the logical system used, a given assumption, or a theorem derived by an inference rule from statements earlier in the sequence. The last theorem is usually the desired truth to be established such as the program meets its specifications and therefore is correct. We will exhibit proofs in the following form:
______________________________________
(1) theorem 1
›explanation why theorem 2 is derived from theorem 1!
(2) theorem 2
.
.
(n-1) theorem n-1
›explanation why theorem n is derived from theorems 1 to n-1!
(n) theorem n
______________________________________
Sometimes the parenthetical theorem numbers or obvious explanations may be omitted. The symbol iff will be used for if-and-only-if. The symbol .E-backward. will be used for there exists. The symbol .A-inverted. will be used for for all. Induction Proofs frequently rely on induction. Usually this involves proving some property P that has a natural number parameter, say n. To prove that P holds for all n.di-elect cons.N.sub.0, it suffices to proceed by induction on n, organizing the proof as follows: induction basis: Prove that P holds for n=0. Induction step: Prove that P holds for n+1 from the induction hypothesis that P holds for n. The induction principle for natural numbers is based on the fact that the natural numbers can be constructed by beginning with the number 0 and repeatedly adding 1. By allowing more general construction methods, one obtains the principle of structural induction, enabling the use of more than one case at the induction basis and the induction step. Grammars To facilitate parsing as well as defining attributes to be synthesized or inherited, the grammar for DANCE has been written in Larson normal form or LNF. LNF production rules: Non-terminals are enclosed in angle brackets: A production consists of a single non-terminal followed by ::= and then an expression in LNF describing what can be substituted for the non-terminal. Production expressions have have only three forms: 1) A single terminal. 2) A pair of non-terminals. 3) A single nonterminal. Literals are strings enclosed by quotation marks e.g. "MODULE"; the quote character itself is represented by """. For convenience, the language subset that is used for explaining this invention (simple DANCE) will use a more liberal notation. Productions within <> Literals within "" One or More .sup.+ Zero or more * Choice I Grouping ›! Zero or One () Data, States, and Meaning A model is a triple (T, S, ) containing a set of data types T, a set of domains containing states S, and an interpretation giving meaning to every symbol used in the language. For now, consider the data type domain, T, to be the integers. A state is a function mapping variables to values in T. For n state s in S and a variable X, let s ››X!! denote X's value in state s. Each k-place function symbol f has an interpretation s››f!! which is a function mapping k elements in T to a single value: s››f!! ›(T.sup.k .fwdarw.T)! What this means is that every function symbol (like "plus") has a "meaning" (like ››plus!!) that is a mapping from zero or more numbers (in this case two) to a single number. s››plus!! ›(T.sup.2 .fwdarw.T)! The line above doesn't say how "plus" works, just that it takes two numbers and returns one. Meanings aren't rigorously defined in this application for common arithmetic functions. Rather than develop axioms for number theory, the meanings of function symbols rely on intuition. Moszkowski uses temporal predicates to affect side effects (cause state change). This may cause confusion since for correctness proofs we need assertions made with first-order predicate calculus. Keep the distinction clear between the text that represents executable program (temporal predicates) and text used to assert some property of state variables (assertions). Interpretations of predicate symbols are similar but map to truth values: s››p!! ›(T.sup.k .fwdarw.{true, false}) Commands in the DANCE language are temporal predicates. Program execution tries to construct a model that makes all the temporal predicates "true." A model is a lattice or states that satisfies the temporal formula represented by program text. OBJECTS AND ADVANTAGES Several objects and advantages of my invention, the model of computation devised for the Definitive, Axiomatic Notation for Concurrent Execution (DANCE) computer programming language are: Execution of many parts of the same program simultaneously, such that, hundreds, thousands, or more processors may be efficiently used; program correctness may be proved; special data structures may be created that allow many elements within the data structure to be simultaneously accessed without locking, blocking, or semaphores; the meaning of every language construct is precisely and mathematically defined; and during program execution, processors may simultaneously self-schedule or post work for other processors to do. Expanding and explaining the objects and advantages: "Execution of many parts of the same program simultaneously" The fundamental concern of this invention is reducing the time of program execution by using may processors (i.e. speedup). Existing computer architectures and computational models underlying parallel programming languages encounter fundamental, theoretical limitations of speedup. I invented the "Layered" class of multistage interconnection networks to permit construction of parallel computers comprising thousands of processors connected with a Layered network to create a fundamentally different computer architecture I called "Multitude." Unfortunately, the models of computation embodied explicitly or implicitly in programming languages adapted for parallel computers (e.g. Ada) would not allow full exploitation of the Multitude architecture computer's power and efficiency. Therefore, I invented a new computational model that allows a programmer to describe, and the machine to utilize, maximal opportunities for simultaneous execution of many parts of the same program. I endeavored to create an embodiment, the DANCE language, whose form is familiar (patterned after Ada) but whose semantics (i.e. meaning, or computational model) are radically different. "hundreds, thousands, or more processors may be efficiently used" Techniques that allow a dozen or two processors to cooperate effectively generally do not work for hundreds or thousands of processors. That is, the techniques don't "scale." Efficiency is of utmost importance; keeping the processors busy (almost) always doing something useful economically employs expensive equipment. Machines that use 512 processors to deliver speedup around 25 have not been, and will not be, purchased by savvy, private-sector users. Speedup greater than one thousand for a broad range of applications requires all aspects of an application to scale from description in an inherently-parallel language down to the topology of the network physically interconnecting the processors. "program correctness may be proved" Poor quality software is endemic today. No other product's flaws are so tolerated by society. Buggy programs inevitably result from the "group hack" methodology: code-test-fix-test-fix-release-get bug reports from users-fix-test-release-get more bug reports-etc.-etc.-etc. Testing can only show the presence of errors, not their absence. Only when a proof that a program meets its specifications has been exhibited and confirmed can we be sure that a program does what we say it does and nothing more. Correctness proofs are even more important for parallel programs; debuggers that rely on a single locus of control to step through program executions cannot be extended for debugging of parallel programs. Furthermore, parallel programs must assure noininterference--one part of the execution doesn't adversely modify the values of variables needed by another part of the execution. This invention is precise, mathematical semantics for parallel programming languages which necessarily permits formal correctness proofs. "special data structures may be created that allow many elements within the data structure to be simultaneously accessed without locking, blocking, or semaphores" Achieving efficiency requires existing applications to be reprogrammed using data structures that can be accessed in parallel. Constructs that ensure exclusive use of some object by forcing other users to wait inevitably impede execution thus destroying efficiency. Many such constructs have been proposed or tried: critical sections, monitors, semaphores, locks, spin waiting, snoopy caches, message passing, etc. This invention incorporates semantics for combinable operations in which many or all of the processors can access the same variable at the same clock cycle yielding results as if the accesses occurred one-at-a-time. Provision by Layered networks of combinable operations for Multitude architecture computers is crucial for efficient program execution. Automatic conversion of existing, sequential programs for parallel execution is futile. Efficient parallel execution demands data structures, and their method of access, be radically different from their sequential counterparts. "the meaning of every language construct is precisely and mathematically defined" The semantics of most programming languages are defined in a "natural" language, usually English. Therefore a program compiled by different compilers for the same machine may not behave the same, though both compilers are "correct." In reality the compiler defines language semantics; compiler bugs are language features. Only when language and program semantics are precisely (formally, mathematically) defined can an implementation be deemed "correct." Furthermore, proof of program correctness demands formal language semantics. "during program execution, processors may simultaneously self-schedule or post work for other processors to do" When an idle processor can fetch its own work to do simultaneously with many other processors processors, can be kept busy doing useful work. Self-scheduling is essential for good efficiency. In addition anonymous processor scheduling greatly facilitates fault-tolerance; permanent failure of a processor, switch, or memory requires checkpoint/restart of the program and on-the-fly hardware reconfiguration to allow graceful degradation. Clever, but brittle, mapping schemes of applications onto architectures generally implode with the loss of a single processor or link. Additional objects and advantages such as elimination of cache coherency, concurrent memory allocation cache write-back and others will become apparent from the specification and drawings. DESCRIPTION OF DRAWINGS FIG. 1 is a diagram showing convention sequential execution and has one edge from every node except the final state and one edge to every node except the initial state; the final state is reachable from any state in the graph and any node is reachable from the initial state; FIG. 2 is a diagram showing convention parallel execution as several (or many) sequential executions; FIG. 3 is a diagram illustrating barrier synchronization which ensures that every process has completed a particular computation before any process may start its next computation; FIG. 4 represents a simple execution in which the values of b and a may be computed concurrently; FIG. 5 diagrammatically show sequential intervals. Sequential intervals may be represented with either text or a graph; FIG. 6 illustrates a simple state lattice in accordance with the invention. FIG. 7 diagrammatically illustrates an arbitrary interval. The interval, i, has multiple, concurrent computations; FIG. 8 represents a sequential composition in which two intervals are composed sequentially by fusing the "end" of one interval with the "start" of another; FIG. 9 illustrates a simple parallel composition in which two intervals may be composed in parallel by fusing their "start" states together and fusing their "end" states together; FIG. 10 illustrates universal quantification: Universal, temporal quantification creates many concurrent computations each with a different value for the quantified variable, in this case X; FIG. 11 illustrates diagrammatically an assignmentiIterval which functions to evaluate an expression using the values contained in its "start" node; the end node has the new value as the state of the assigned variable, in this case X; FIG. 12 represents a conventional concurrently accessible queue: Using fetchadd to determine the array index for either get or put allows many intervals to access the queue concurrently; FIG. 13 illustrates the inserting of records into a linked list: Two records can be inserted into the same place at the same time using a swap operation; FIG. 14 illustrates a conventional linked-list queue: Using swap to manipulate pointers allows concurrent access to a dynamically-sized queue; FIG. 15 is a graph of a single node showing an initial state condition. The "first" node contains the values for variables at the beginning of the program, before any computation occurs; FIG. 16 illustrates the initial state of procedure "sort". Because the actual parameter names (what the caller calls the variables) and the formal parameter names (used within a procedure) usually differ we show two nodes rather than just one. Since there is no state change the two nodes can be collapsed into one; FIG. 17 graphically shows an interval to satisfy "sort": Four subintervals are inserted into the lattice for "sort" to first partition the data (51), followed by sorts on the upper (54) and lower (57) halves, concurrent with storing the pivot (60); FIG. 18 illustrates an interval to satisfy "partition"; FIG. 19 shows concurrent intervals satisfying the alternative command within "partition"; FIGS. 20, 21 and 22 when arranged as shown in FIG. 22A illustrate as an examplee a lattice satisfying a temporal logic formula for sorting members of the Swedish Bikini Team; FIG. 23 shows schematically the Multitude architecture which connects processors with a layered network; FIG. 24 shows an exemplary processing node in a Multitude architecture computer; FIG. 25 represents diagrammatically a physical address comprised of node number and memory address; FIG. 26 illustrates the memory of each node partitioned into global, local, and spread; FIG. 27 shows a main memory word as comprising are 84 bits wide; FIG. 28 represents the translation of segmented address to an effective virtual address; FIG. 29 represents a translation of an effective virtual address to a physical address; FIG. 30 shows a free-space list for blocks of 2 n bytes; FIG. 31 shows the segment capability bit assignment; FIG. 32 illustrates page descriptor fields; FIG. 33 represents the static linking with link record; FIG. 34 shows a lattice of states satisfying a DANCE temporal formula; FIG. 35 illustrates hidden registers; FIG. 36 shows that MMU translates a segmented address to a physical address; FIG. 37 shows how a User Sergment Table defines Segmented-to-EVA translation; FIG. 38 illustrates TLB hit address translation; FIG. 39 illustrates TLB hit address translation with page descriptor; FIG. 40 illustrates that a cache line holding four words needs 335 bits; FIG. 41 shows that cache hits are determined by encoding virtual address to select cache lines, and then compare that user, segment, and offset match; FIG. 42 is a diagram showing nesting of domains in proof of noninterference; FIG. 43 shows count faces; and FIG. 44 is an instruction-set architecture table for Multitude architecture processors. It describes the functionality of every instruction and defines it in terms of temporal logic. SUMMARY Efficient, massively-parallel computing requires work scheduling that keeps every processor busy almost all the time. The dismal 3% to 5% efficiency found by the Army High-Performance Computing Research Center at the University of Minnesota (AHPCRC) across a broad range of applications results from inherent limitations in currently-available massively-parallel computers. This is not for lack of trying. A new class of multistage interconnection networks to provide rapid, voluminous communication between RISC processors is disclosed in my earlier U.S. Pat. No. 4,833,468 entitled "LAYERED NETWORK". The shared-memory, multiprocessor architecture based on my "Layered" class of networks has been dubbed "Multitude". Although Multitude nicely supports Ada ›ANSI/MIL-STD-1815A! tasking and rendezvous are both difficult to use and retain an inherent sequential paradigm. Therefore, I tried to define a new language, "Samantha", that would allow programmers to define applications without circumscribing opportunities for parallel execution. The objective of efficient computing with 1000s of processors has daunted computer architects for two decades. Achieving the objective requires major changes in the operation of virtually every aspect of hardware and software, with the important exception of fabrication technology. Sub-micron CMOS from any of a dozen silicon foundries will do fine. The trick to making a faster machine with slower parts is to buy more of those parts for less money, and then actually use a much bigger fraction of the parts doing something useful. Additionally, as fabrication technology improves, it can readily be incorporated without changing the architecture or the software. The sequential paradigm is so powerful that it is difficult to think otherwise. It is natural to think of parallel computing as a plurality of sequential execution streams. Sometimes they cooperate by exchanging messages; sometimes they cooperate by sharing state variables; sometimes they wait until all other processors have finished a phase before proceeding to the next phase. Sequential languages have been augmented with means to fork off nearly identical copies of themselves, with hints how to use the data in parallel, and with libraries of pre-compiled procedures for message passing or shared-memory access. In all of these cases the paradigm is still replicated sequential control flow. Virtually all of today's computer science theory deals with sequentiality or replicated sequentiality. Huge tomes of algorithms have been published and studied that are either sequential control flows or replicated sequential control flows. Regular expressions are recognized by finite-state machines; programs are parsed using a push-down automation coupled with a handle-recognizing finite-state machine. The compiler-construction tool, yacc, known and generally loathed by computer science students can only parse a subset of context-free grammars: LALR(1). The very definition of the LALR(1) class of languages has sequentiality built into it--those languages which can be parsed by push-down automata coupled with a handle-recognizing finite-state machines. The pull of sequentiality is so strong many people will not be able set it aside to consider a different way of computing. DESCRIPTION OF A PREFERRED EMBODIMENT OF THE INVENTION A mathematical model of computation is presented with the salient advantage that the computation may be performed by hundreds or thousands of microprocessors simultaneously and efficiently. Five mathematical systems interrelate in special ways. The five systems are: temporal logic, domains, type theory, assertions and proofs. Representing programs as special temporal logic formulae that are satisfied (executed) by lattices of state transitions form the core of this invention and provide the simultaneity and efficiency desired. Domains contain variable-value bindings that restrict visibility of variables allowing necessary proofs of non-interference, but without the locks, semaphores, or messages that hobble the prior art attempts. The type theory adapts an elegant, lambda-calculus-based, polymorphic, object-oriented type system for the temporal logic execution espoused here. Assertions in first-order predicate calculus define specifications and allowable intermediate states for programs. Since only proofs can assure program correctness, a proof system called BPL associates inference rules and/or weakest-precondition predicate transformers with each language production. A proof of soundness for BPL is presented; a logic is sound if it infers true fact from true facts. Computational Model A model of computation provides a mathematical foundation for the meaning of programs and their execution. Hopcraft and Ullman ›Introduction to Automata Theory, Languages, and Computation, Addison-Wesley, 1979! present a series of increasingly complex sequential machines and their models. Hopcraft and Ullman show correspondences between the machines and classes of languages recognized by those machines. State transitions are defined for an input character, the current state, and perhaps some internal data structure such as a stack. All the machines are sequential. Hoare ›"Communicating Sequential Processes," C. A. R. Hoare, Communications of the ACM, Vol. 21, No. 8, August 1978! expands the sequential model by allowing multiple sequential machines. Others use models such as dataflow (Petri nets), logic programming (Horn clauses), or functional (lambda calculus) without much greater utility. None of these models of computation adequately expresses parallel execution. Therefore I invented the state-lattice temporal logic model for DANCE. Temporal Logic Moszkowski in his Executing Temporal Logic Programs treats programs as temporal logic formulae. His language, "Tempura", allows description of traditional, process-based, parallel execution within a single program expression. The foundation for Tempura is interval temporal logic. A Tempura interval is a non-empty sequence of states. A state is determined by the values of program variables at a particular instant of time. The sequence of states that form an interval provide a discrete notion of time. The formal model for DANCE presented here differs significantly from the formal model for Tempura: States form a lattice not a sequence. The domains visible at a particular state are both restricted and explicit. Later a sophisticated, polymorphic type system will be substituted for the simple integer types presented here. Again later a first-order predicate calculus for describing sets of states very similar to those used by Apt/Dijkstra/Gries/Hoare for states in a guarded command language program is presented. A logical system for reasoning about the behavior of programs in terms of interval temporal logic lattices forms another significant, original contribution of this exposition. Each DANCE interval may be thought of as a non-empty lattice of states partially ordered by their occurrence in time. A state is the values of a set of variables at an instant in time. The set of variables visible by a particular state is its domain. An interval can have many intervening states between any pair of states. An interval may also be split but the subintervals must have a starting and ending state in common (i.e. every subinterval is itself a lattice, combinations of lattice must form lattices). Intervals are strung end-to-end by sequential composition invoked by the semicolons in the above program. Intervals are strung side-by-side by parallel composition invoked the ampersand. Consider the following DANCE program fragment: ##EQU1## First `a` is assigned the value ten. Then the values of `b` and `a` are computed in parallel. Finally `a` is assigned the sum of `a` and `b`. In this paper, interval lattices will be depicted as a set of circles (or stretched circles) connected by undirected edges. However those edges do have direction in that time flows down the page. FIG. 4 illustrates the lattice corresponding to the previous example. In state 301 both a and b have no value. State 302 has a assigned 10. States 303 and 304 are independently determined and may occur simultaneously. State 305 is the synchronization point for the independent computations of 303 and 304. Finally state 306 is computed. The heavy lines between states like 307 are state transitions. The actual computation occurs during state transitions like 307 and are shown as arcs between the nodes holding state. Please note that this model explicitly rejects the notion of some "universal" sequence of states comprised of states of individual processes in "some" sequential order. Such models although popular provide poor paradigms for programming parallel processors. Commands in the DANCE language are really temporal predicates. Program execution tries to construct a model that makes all the temporal predicates true. A model is a lattice of states that satisfies the temporal formula represented by program text. The use of the word "model" here refers to creation of a lattice of states that satisfies a particular temporal logic formula embodied in a DANCE program. The "model" of computation for DANCE refers to the general method of constructing lattices of states concurrently. Expressions and Predicates This section starts to use all the mathematical machinery in the "Background--Mathematical Foundations" section hereof to define the meaning of DANCE programs, which is the model of execution that forms the core of this invention. Obviously, one could merely change the grammar and symbols used in the DANCE language, but if the underlying meaning of such languages and programs is based on temporal logic formulae satisfied by lattices of states, it would not depart from the spirit and scope of the invention. Expressions in DANCE are built inductively as follows: Variable names or constants X, Y, Z, 5, 1509, 0 Functions: f(e.sub.1, . . . , e.sub.k), where f is a function symbol defined by M, k.gtoreq.0, and e.sub.1, . . . , e.sub.k are expressions. DANCE grammar for expressions: The following are expressions: X+Y (X+Y)/Z somefunction(X Y Z) Formulas (constructions of temporal predicates) are built inductively as follows: Predicates p(e.sub.1, . . . , e.sub.k), where p is a predicate symbol defined by M, k.gtoreq.0, and e.sub.1, . . . , e.sub.k are expressions. Like functions, we assume a basic set of relations complete with syntax. Logical connectives: w.sub.1 & w.sub.2 and w.sub.1, w.sub.2 where w.sub.1 and w.sub.2 are predicates. The ampersand corresponds to "and." The semicolon corresponds to "after." One more logical connective (bounded universal quantification) will be added in a following section. Parentheses may be used for grouping expressions. Predicates may be grouped with BEGIN-END. DANCE programs are temporal logic formulas. Execution of a program seeks to construct an interval (e.g. model) that makes the formula true. DANCE grammar for temporal logic formulas: The following are examples of formulas: someprocedure(X Y Z) dothis ; beforethat dothis & together BEGIN dothis ; beforethat END & someprocedure(X Y Z) Intervals From the states in S, intervals are constructed from S+, the set of all nonempty, finite lattices of states. A state is the values assigned to variables at a particular point in time. FIG. 4 depicts a lattice of states of which 305 is a state. For example, suppose the state variables are X and Y and states s, t, and u are: s=(X=1, Y=2) t=(X=0, Y=3) u=(X=0, Y=6) Sequential intervals can be represented as a string of state names enclosed in <>. Some examples are ::=<(X=1, Y=2)> FIG. 5 shows these sequential intervals in graph form. DANCE expands the definition of an interval to a finite lattice instead of a sequence of states. This provides the mathematical machinery upon which correct, efficient, parallel execution will be based. FIG. 6 depicts a simple lattice interval containing states s, t, u, v, w, and x. A textual representation could be: ##STR1## but notation good for sequential intervals becomes cumbersome for DANCE lattices so a graphical representation will be used instead. A lattice of states, L, is a graph whose nodes are states, S.sub.L, and arcs are state transitions, T.sub.L, that must have the following properties: An irreflexive partial order relation, .sub.L, such that a .sub.L b=.E-backward.n>0: .E-backward.t.sub.1, . . . , t.sub.n .di-elect cons.T.sub.L : ›t.sub.1 =(a,s.sub.1) and t.sub.n =(s.sub.n-1, b) and .A-inverted.1 A least upper bound (lub) and greatest lower bound (glb) functions are defined for any pair of states in S.sub.L as
______________________________________ lub(a,b)=c .congruent. a .sub.L c
and b .congruent..sub.L c and not .E-backward. d .di-elect cons. S.sub.L
: ›a .sub.L d and b .sub.L d and d .sub.L c!. glb(a,b)=c .congruent. c .sub.L
a and c .sub.L b and not .E-backward. d .di-elect cons. S.sub.L : ›d .sub.L
a and d .sub.L b and c .sub.L d! ______________________________________
| |||||||||||||||||||||||||||||||||