Eli System Administration Guide
Odin is designed according to a client/server architecture. The server is responsible for maintaining the cache, and the client is responsible for interacting with the user and executing those tools not built into Odin. Normally the client and the server run as separate processes, possibly on different machines, communicating via network sockets.
When you give an
The client connects to the server via the socket. It continues to interact with you, accepting input and delivering reports. Whenever you make a request, the client packages that request and forwards it to the server. If the server needs to invoke a tool that is not built into Odin, then it sends an appropriate command line to the client to be executed.
Each client may have a colon-separated list of machines as the value of its
When the client terminates, either because you have responded to a prompt with C-d or because the end of the input file has been reached in a batch run, it informs the server of this termination.
A server is associated with a specific cache. The first client started on that cache creates the server by forking (see Client execution). Subsequent clients started on that cache do not fork new servers, but connect to the existing server via the socket already established for the cache. When all clients connected to the server have terminated, the server itself terminates and closes the socket.
The server for a cache receives all requests involving that cache, and is responsible for ensuring that requests involving specific files are sequenced so that all requests are effectively atomic.
Tools built into Odin are also executed by the server. If a client request requires execution of an external tool, the server builds a command line invoking the tool and passes it to the first client that is not currently executing such a command line. Thus the server may involve a client in running tools necessary to satisfy requests from other clients.
When a client is started on a cache (see Client execution), it checks whether the cache contains a file named `SOCKET'. `SOCKET' is normally a text file containing a single line giving the name of the machine on which the server is running and the port number to which it is listening (but see Socket implementation). If `SOCKET' is not present, then the client creates it and forks a new server for the cache.
The client creates `SOCKET' by opening it with the
The simplest execution scenario involves a single person working on a single computer. There may be a number of caches, representing different projects, but normally the user will work with only one at a time. One client process and one server process will be running whenever the user is actually interacting with Odin.
A slightly more complex situation involves a team, all of whom run on the same machine. Again, the team may be working on several different projects so there may be several caches. More than one member of the team may, however, be working with a particular cache at the same time. In that case, there will be one server process associated with each active cache, and one client process associated with each active user. The server process will mediate all of the client requests on a particular cache to guarantee atomicity. If two clients' requests have products in common, each of those products will only be built once because the server knows that once the product is up-to-date it need not be rebuilt.
Students working on individual projects, using a collection of networked
computers with a shared file system, can treat the situation as though they
were running on a single machine.
A small problem is that if they do not specify a cache explicitly with the
The most complex scenario is exemplified by students working in teams in a lab with multiple machines and a shared file system. Presumably each team member is sitting at a different machine. One student will start a client, resulting in a server running on that student's machine. Now a second student starts a client on a different machine, but referring to the project's common cache. At that point, one server and two clients are running. The server and one of the clients are both running on the first student's machine, and the other client is running on the second student's machine.
Having clients run on different machines improves performance.
Remember that the client is responsible for executing tools that are not
built into Odin, so running clients on different machines allows such tools
to operate in parallel.
Since much of the cost of a given derivation is the cost of running tools,
this is an important benefit.
Another way of executing tools on different machines is to use a
An Odin server makes very heavy use of cache files, many of which are quite small. Experience has shown that there is a serious performance degradation when the server must access the cache over the network. Thus it is useful to run an Odin server on the computer where the files are actually stored. The educational computer laboratory again provides a good example: the shared file system is often stored on a central computer that serves the information to the workstations in the laboratory. One could start a session for the desired cache on the central computer, and simply leave that session open. Subsequent sessions for that cache, started on workstations, would use the server running on the central computer.
Odin supports two socket implementations:
unix internet (
The socket implementation is reflected in two cache files,
If the socket implementation is
`ENV' stores the values of important environment variables
(including that of
Interprocess communication failures in Eli can occur in two contexts:
communication between client and server, and
communication between a client and a child running a command.
A failure to communicate between client and server almost always results in
the error report
make clean ODIN_LOCALIPC=1 make
Please report all instances in which setting
If you are able to build the system initially without setting
A failure to communicate between a client and a child running a command
usually results in the system simply locking up.
Lockups could also conceivably occur in the socket communication,
so it is important to try to eliminate that possibility.
If the parameter
If the lockups are intermittent, be certain to make several test runs with