Eli   Documents

General Information

 o Eli: Translator Construction Made Easy
 o Global Index
 o Frequently Asked Questions
 o Typical Eli Usage Errors

Tutorials

 o Quick Reference Card
 o Guide For new Eli Users
 o Release Notes of Eli
 o Tutorial on Name Analysis
 o Tutorial on Scope Graphs
 o Tutorial on Type Analysis
 o Typical Eli Usage Errors

Reference Manuals

 o User Interface
 o Eli products and parameters
 o LIDO Reference Manual
 o Typical Eli Usage Errors

Libraries

 o Eli library routines
 o Specification Module Library

Translation Tasks

 o Lexical analysis specification
 o Syntactic Analysis Manual
 o Computation in Trees

Tools

 o LIGA Control Language
 o Debugging Information for LIDO
 o Graphical ORder TOol

 o FunnelWeb User's Manual

 o Pattern-based Text Generator
 o Property Definition Language
 o Operator Identification Language
 o Tree Grammar Specification Language
 o Command Line Processing
 o COLA Options Reference Manual

 o Generating Unparsing Code

 o Monitoring a Processor's Execution

Administration

 o System Administration Guide

Mail Home

Solutions of common problems

Previous Chapter Next Chapter Table of Contents


Computing a Hash Value

This C module computes a 32-bit hash of the contents of specified memory. Every bit of the memory contents affects every bit of the hash. The probability that the same hash will be computed for different memory contents is very low.

The module is instantiated by

   $/Tech/Hash.specs

All entities exported by this module can be used in specifications of type .lido, .init, and .finl. They can also be used in C modules if the interface file hash.h is imported there.

The module exports the following entities:

ub1
A typedef identifier defining an unsigned byte. The memory over which a hash is to be computed is made up of a set of contiguous blocks of ub1 values.
ub4
A typedef identifier defining an unsigned 32-bit value. The hash computed is a single ub4 value.
ub4 hash(ub1 *block, size_t length, ub4 previous)
A function computing a hash value. The `block' argument is a pointer to a contiguous sequence of `length' unsigned bytes; `previous' is the hash computed from other contiguous sequences in the specified memory area. (If `block' is the first sequence of the area, `previous' should be `0'.) The result of hash is a hash of all of the contiguous sequences considered so far.

In the simplest case, we need to compute a hash for the contents of a single contiguous block of memory. For example, here is a call to compute a hash of a single string pointed to by `str':

hash(str, strlen(str), 0)

A more complex situation is when there are several related areas of memory, and we need to compute a single hash for all of them. Suppose that there was a pair of strings, pointed to by `str1' and `str2', which constituted a conceptual unit. Here are two ways to compute a single hash for the pair:

hash(str1, strlen(str1), hash(str2, strlen(str2), 0))
hash(str2, strlen(str2), hash(str1, strlen(str1), 0))
Either of these sequences would be perfectly satisfactory, but the resulting values would differ.

When computing the hash of an array or structure, it is important to realize that there may be padding with unknown content involved. Consider the following variable declaration:

struct{ char c; int i; } foo, bar;
On some machines, the compiler may insert three bytes of padding between the end of field `c' and the beginning of field `i'. There is no guarantee that these three bytes will be initialized in a particular way. Thus `hash(foo, sizeof(foo), 0)' and `hash(bar, sizeof(bar), 0)' may not yield the same result when `foo' and `bar' have identical field values; the hash also depends on the contents of the padding. Unless you know that there is no padding present, or that padding is always intialized in the same way, the only safe approach is to hash the fields as separate memory areas:

hash (foo.c, sizeof(foo.c), hash(foo.i, sizeof(foo.i), 0))


Previous Chapter Next Chapter Table of Contents