Read In the Beginning Was Information Online
Authors: Werner Gitt
Tags: #RELIGION / Religion & Science, #SCIENCE / Study & Teaching
Theorem 5: Shannon’s definition of information exclusively concerns the statistical properties of sequences of symbols; meaning is completely ignored.
It follows that this concept of information is unsuitable for evaluating the information content of meaningful sequences of symbols. We now realize that an appreciable extension of Shannon’s information theory is required to significantly evaluate information and information processing in both living and inanimate systems. The concept of information and the five levels required for a complete description are illustrated in Figure 12. This diagram can be regarded as a nonverbal description of information. In the following greatly extended description and definition, where real information is concerned, Shannon’s theory is only useful for describing the statistical level (see chapter 5).
Figure 12: |
4.2 The Second Level of Information: Syntax
When considering the book B mentioned earlier, it is obvious that the letters do not appear in random sequences. Combinations like "the," "car," "father," etc. occur frequently, but we do not find other possible combinations like "xcy," "bkaln," or "dwust." In other words:
• Only certain combinations of letters are allowed (agreed-upon) English words. Other conceivable combinations do not belong to the language. It is also not a random process when words are arranged in sentences; the rules of grammar must be adhered to.
Both the construction of words and the arrangement of words in sentences to form information-bearing sequences of symbols, are subject to quite specific rules based on deliberate conventions
[9]
for each and every language.
Definition 3: Syntax is meant to include all structural properties of the process of setting up information. At this second level, we are only concerned with the actual sets of symbols (codes) and the rules governing the way they are assembled into sequences (grammar and vocabulary) independent of any meaning they may or may not have.
Note: It has become clear that this level consists of two parts, namely:
A) Code: Selection of the set of symbols used.
B) The syntax proper: inter-relationships among the symbols.
A) The Code: The System of Symbols Used for Setting Up Information
A set of symbols is required for the representation of information at the syntax level. Most written languages use letters, but a very wide range of conventions exists: Morse code, hieroglyphics, international flag codes, musical notes, various data processing codes, genetic codes, figures made by gyrating bees, pheromones (scents) released by insects, and hand signs used by deaf-mute persons.
Several questions are relevant: What code should be used? How many symbols are available? What criteria are used for constructing the code? What mode of transmission is suitable? How could we determine whether an unknown system is a code or not?
The number of symbols: The number of different symbols q, employed by a coding system, can vary greatly, and depends strongly on the purpose and the application. In computer technology, only two switch positions are recognized, so that binary codes were created which are comprised of only two different symbols. Quaternary codes, comprised of four different symbols, are involved in all living organisms. The reason why four symbols represent an optimum in this case is discussed in chapter 6. The various alphabet systems used by different languages consist of from 20 to 35 letters, and this number of letters is sufficient for representing all the sounds of the language concerned. Chinese writing is not based on elementary sounds, but pictures are employed, every one of which represents a single word, so that the number of different symbols is very large. Some examples of coding systems with the required number of symbols are:
– Binary code (q = 2 symbols, all electronic DP codes)
– Ternary code (q = 3, not used)
– Quaternary code (q = 4, e.g., the genetic code consisting of four letters: A, C, G, T)
– Quinary code (q = 5)
– Octal code (q = 8 octal digits: 0, 1, 2, …, 7)
– Decimal code (q = 10 decimal digits: 0, 1, 2, …, 9)
– Hexadecimal code
[10]
(q = 16 HD digits: 0, 1, 2, …, E, F)
– Hebrew alphabet (q = 22 letters)
– Greek alphabet (q = 24 letters)
– Latin alphabet (q = 26 letters: A, B, C, …, X, Y, Z)
– Braille (q = 26 letters)
– International flag code (q = 26 different flags)
– Russian alphabet (q = 32 Cyrillic letters)
– Japanese Katakana writing (q = 50 symbols representing different syllables)
– Chinese writing (q > 50,000 symbols)
– Hieroglyphics (in the time of Ptolemy: q = 5,000 to 7,000; Middle Kingdom, 12th Dynasty: q = approximately 800)
Criteria for selecting a code: Coding systems are not created arbitrarily, but they are optimized according to criteria depending on their use, as is shown in the following examples:
The choice of code depends on the mode of communication. If a certain mode of transmission has been adopted for technological reasons depending on some physical or chemical phenomenon or other, then the code must comply with the relevant requirements. In addition, the ideas of the sender and the recipient must be in tune with one another to guarantee certainty of transmission and reception (see Figures 14 and 15). The most complex setups of this kind are again found in living systems. Various existing types of special message systems are reviewed below:
– Natural spoken languages used by humans
– Mating and warning calls of animals (e.g., songs of birds and whales)
– Mechanical transducers (e.g., loudspeakers, sirens, and fog horns)
– Musical instruments (e.g., piano and violin)
– Written languages
– Technical drawings (e.g., for constructing machines and buildings, and electrical circuit diagrams)
– Technical flashing signals (e.g., identifying flashes of lighthouses)
– Flashing signals produced by living organisms (e.g., fireflies and luminous fishes)
– Flag signals
– Punched cards, mark sensing
– Universal product code, postal bar codes
– hand movements, as used by deaf-mute persons, for example
– body language (e.g., mating dances and aggressive stances of animals)
– facial expressions and body movements (e.g., mime, gesticulation, and deaf-mute signs)
– dancing motions (bee gyrations)
– Braille writing
– Musical rolls, barrel of barrel-organ
– magnetic tape
– magnetic disk
– magnetic card
– telephone
– radio and TV
– genetic code (DNA, chromosomes)
– hormonal system
– scents emitted by gregarious insects (pheromones)
– nervous system
How can a code be recognized? In the case of an unknown system, it is not always easy to decide whether one is dealing with a real code or not. The conditions required for a code are now mentioned and explained, after having initially discussed hieroglyphics as an example. The following are necessary conditions (NC), all three of which must be fulfilled simultaneously for a given set of symbols to be a code:
NC 1: A uniquely defined set of symbols is used.
NC 2: The sequence of the individual symbols must be irregular.
Examples:
–.– – –.– * – – * * . – .. – (aperiodic)
qrst werb ggtzut
Counter examples:
– – –...– – –...– – –...– – –... (periodic)
– – – – – – – – – – – – – – (the same symbol constantly repeated)
r r r r r r r r r r r r r r r r r r r
NC 3: The symbols appear in clearly distinguishable structures (e.g., rows, columns, blocks, or spirals).
In most cases a fourth condition is also required:
NC 4: At least some symbols must occur repeatedly.
Examples:
Maguf bitfeg fetgur justig amus telge.
Der grüne Apfel fällt vom Baum.
The people are living in houses.
It is difficult to construct meaningful sentences without using some letters more than once.
[11]
Such sentences are often rather grotesque, for example:
Get nymph; quiz sad brow; fix luck (i, u used twice, j, v omitted).
In a competition held by the Society for the German Language, long single words with no repetitions of letters were submitted. The winner, comprised of 24 letters, was: Heizölrückstoßabdämpfung (Note that a and ä for example, are regarded as different letters because they represent different sounds.)
There is only one sufficient condition (SC) for establishing whether a given set of symbols is a code:
SC 1: It can be decoded successfully and meaningfully (e.g., hieroglyphics and the genetic code).
There are also sufficient conditions for showing that we are NOT dealing with a code system. A sequence of symbols cannot be a code, if:
a) it can be explained fully on the level of physics and chemistry, i.e., when its origin is exclusively of a material nature. Example: The periodic signals received in 1967 by the British astronomers J. Bell and A. Hewish, were thought to be coded messages from space sent by "little green men." It was, however, eventually established that this "message" had a purely physical origin, and a new type of star was discovered: pulsars.
or
b) it is known to be a random sequence (e.g., when its origin is known or communicated). This conclusion also holds when the sequence randomly contains valid symbols from any other code.
Example 1:
Randomly generated characters: AZTIG KFD MAUER DFK KLIXA WIFE TSAA. Although the German word "MAUER" and the word "WIFE" may be recognized, this is not a code according to our definition, because we know that it is a random sequence.
Example 2:
In the Kornberg synthesis (1955) a DNA polymerazae resulted when an enzyme reacted with Coli bacteria. After a considerable time, two kinds of strings were found:
... TATATATATATATATATATATATAT ...
... ATATATATATATATATATATATATA ...
... GGGGGGGGGGGGGGGGGGGGGG ...
... CCCCCCCCCCCCCCCCCCCCCCCC ...
Although both types of strings together contained all the symbols employed in the genetic code, they were nevertheless devoid of information, since necessary condition (NC) 2 is not fulfilled.
The fundamentals of the "code" theme were already established by the author in the out-of-print book having the same name as the present one [G5, German title:
Am Anfang war die Information
]. A code always represents a mental concept and, according to our experience, its assigned meaning always depends on some convention. It is thus possible to determine at the code level already whether any given system originated from a creative mental concept or not.