Schinner-2007-The Voynich Manuscript Evidence of the Hoax Hypothesis-000

8/14/2019 Schinner-2007-The Voynich Manuscript Evidence of the Hoax Hypothesis-000

1/14

This article was downloaded by: [USC University of Southern California]On: 29 February 2012, At: 23:23Publisher: Taylor & FrancisInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

CryptologiaPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/ucry20

The Voynich Manuscript: Evidence of theHoax HypothesisAndreas Schinner

Available online: 22 Mar 2007

To cite this article: Andreas Schinner (2007): The Voynich Manuscript: Evidence of the HoaxHypothesis, Cryptologia, 31:2, 95-107

To link to this article: http://dx.doi.org/ 10.1080/01611190601133539

PLEASE SCROLL DOWN FOR ARTICLE

Full terms and conditions of use: http://www.tandfonline.com/page /terms-and-conditions

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden.

The publisher does not give any warranty express or implied or make any representationthat the contents will be complete or accurate or up to date. The accuracy of anyinstructions, formulae, and drug doses should be independently verified with primarysources. The publisher shall not be liable for any loss, actions, claims, proceedings,demand, or costs or damages whatsoever or howsoever caused arising directly orindirectly in connection with or arising out of the use of this material.
http://www.tandfonline.com/page/terms-and-conditionshttp://dx.doi.org/10.1080/01611190601133539http://dx.doi.org/10.1080/01611190601133539http://www.tandfonline.com/page/terms-and-conditionshttp://dx.doi.org/10.1080/01611190601133539http://www.tandfonline.com/loi/ucry20


2/14

The Voynich Manuscript: Evidence of the HoaxHypothesis

ANDREAS SCHINNER

Abstract In this article, I analyze the Voynich manuscript, using random walkmapping and token =syllable repetition statistics. The results significantly tightenthe boundaries for possible interpretations; they suggest that the text has beengenerated by a stochastic process rather than by encoding or encryption of lan-guage. In particular, the so-called Chinese theory now appears less convincing.

Keywords hoax hypothesis, statistical analysis, stochastic process, Voynichmanuscript

Introduction

The Voynich manuscript (the VMS) is a handwritten codex of about 250 pages, inkon vellum, appearing on stylistic grounds to date from around 1500. It contains illus-trations of mostly unidentifiable plants, astronomical or astrological diagrams, andnaked nymphs, bathing in strange arrangements of pools or tubs connected bycomplex systems of pipes. The most striking feature, however, is the text, written

in an elegant unique script that has defied commonly accepted translation so far.Information about the VMS, its possible history, as well as attempts of explanationcan be found in various places [4, 7, 15, 14]. Only a brief summary will be given here.

Interpretations of the VMS can roughly be divided into three classes:

. Cipher text hypothesis . The VMS contains natural language text (from the originof the manuscript this should most probably be Latin or German) that has beenencrypted.

. Plain text hypothesis . The VMS text is plain text in natural, not yet identified lan-guage that either did not possess an original alphabet in the beginning 16th cen-tury or the system of writing appeared too complex to a medieval scholar. The

word length statistics makes East Asian languages, in particular Chinese, the mostpromising candidate for this ( Chinese theory ). Alternatively, the script could alsohave been invented together with an artificial language.

. Hoax hypothesis . The VMS contains no meaningful text at all. In this context, theword hoax should be associated with a broad spectrum of possibilities, rangingfrom intentional forgery for monetary gain to the work by an idiot savant , inter-preted by medieval scholars as revelation of arcane lore.

These three classes are not completely distinct. For example, the VMS couldcontain a message hidden steganographically in a set of otherwise meaningless

Address correspondence to Dr. Andreas Schinner, Institut fu r Experimentalphysik,Abteilung fu r Atom- and Oberflachenphysik, Johannes Kepler Universita t, AltenbergerStrae 69, 4040 Linz, Austria. E-mail: [email protected]

Cryptologia , 31:95107, 2007Copyright Taylor & Francis Group, LLCISSN: 0161-1194 printDOI: 10.1080/01611190601133539

95

D

w

U

CU

C


3/14

strings. This theory is especially difficult to prove or disprove; the best argumentagainst it known so far is a psychological one: the basic principle of steganographyis to hide the mere existence of a messageand the worst place to hide a genuinesecret is an apparently mysterious book.

It is one of the most striking features of the VMS that even modern computeraided analysis so far could not rule out a single one of these interpretations defi-nitely. Instead, arguments pro and contra all three viewpoints can be given: sincestatistical properties characteristic for natural languages are also present in theVMS text, the encryption method usedif anyshould not be too complex;additionally, around 1500, cryptology was still in its early beginnings. Despite thesefacts, all attempts of decipherment by modern cryptanalysts have failed. On theother hand, the text shows several exotic linguistic features like the frequent wordrepetitions, or the preferred positions for certain letters within a line; this appearsto be incompatible with the plain text hypothesis, even in the artificial languageversion.

Consequently, there are attractions in the hoax hypothesis . However, the VMS textobviously is not composed of simple random strings, and it shows rich linguistic-likestructure. It seemed unlikely that a medieval hoaxer (or even an early 20th century for-ger) could create such a convincing facsimile language within reasonable time. Thework by Gordon Rugg [11] has proven that this need not necessarily be true: an algor-ithm feasible even with medieval technology (the table-and-grille method) makes itpossible for a single person to generate a text as long and complex as the VMS withinapproximately three months. This, however, is just a possibility and far from a proof of the hoax hypothesis . Furthermore, the table-and-grille method as investigated sofar does not explain all of the statistical text properties of the VMS. The three concur-

rent explanation classes are thus still of roughly equal relevance.In this article, statistical investigations of the VMS are presented that provideadditional restrictions to possible solutions. Mapping the text to a random walkuncovers characteristic long-range correlations not present in normal human writ-ings; they better fit to a stochastic process with memory effects than a sequence of tokens chosen according to linguistic rules. Furthermore, the distribution of gapsbetween two similar or selected tokens, respectively, also differs qualitatively fromnormal texts; its mathematical properties indicate the presence of very unusual ran-dom effects. Possible implications of these results for the interpretation of the VMSare discussed in the conclusions section.

Throughout this article, the following usual conventions are used: the term tokendenotes any string of characters separated by spaces or line start or end; a word is atype of token regardless of its frequency in the text. For characters or tokens fromthe VMS script the European Voynich Alphabet (EVA) is used [15]; the letters (orsequences of letters) are written italic and are put in angle brackets: (for example,the notorious most frequent VMS token will be transcribed as hdaiin i).Finally, the analysis presented in this article is based on the various text sampleslisted in Table 1.

Random Walk Model

Following Kokol, Podgorelec, Zorman, Kokol, and Njivar [8], long-range power lawcorrelations are present in a wide variety of information encoding systems, rangingfrom human writings (natural languages and computer programs) to DNA

96 A. Schinner

D

w

U

CU

C


4/14

sequences. To some extent, they characterize the information content and complexityof communication.

A useful method to study correlations in character strings is based on mappingthe symbol sequence to a stochastic process that, especially in linguistic literature,frequently is called Brownian walk . This terminology is somehow misleading, sinceBrownian motion can be described as scaling limit of a so-called random walk : inthe theory of stochastic processes [2] it is characterized by independent steps thatall have the same probability distribution, i.e., are uncorrelated. On the other hand,in statistical physics, for example, the expression random walk with memory is some-times used to describe a situation when the stochastic process generating the steps isof Markovian or even non-Markovian type. In the following, random walk should be

understood with respect to this generalized meaning.As a first step it is necessary to encode the characters of the texts under investi-

gation to bit sequences. It has been shown that the actual definition of this code tablehas negligible influence on the interesting quantities, as long as all (or at least almostall) possible bit patterns are used [12]. Since the VMS contains no punctuation signsthey are removed from the other texts too; upper case characters are converted tolower case. Thus the remaining character set consists of the letters az, theGerman umlauts a, o, u, and the German sz ligature ; empty spaces areignored. These 30 characters can be represented by a 5-bit code.

The bits of the resulting binary string then define the steps 1 of a randomwalk. LetD yl ; l 0 yl l 0 yl 0 1

be the walk displacement between step numbers l 0 and l l 0 . ThenF l

2 D y2 D yh i

22

describes the variance of the mean displacement. The angle brackets denote aver-aging over all l 0 . For pure (uncorrelated) random walks of infinite length, wherethe steps are Bernoulli trials with probability p, one easily obtains:

F

l

2

4 p

1 p

l

3

In general, F (l ) will behave asymptotically as F l /l a , where an exponent a 60.5indicates the presence of long-range correlations.

Table 1. Text sources used in this article

Text Text part Language Number of tokens Number of words

Voynich manuscript 1 All 2 Unknown 36,000 7000Vulgate Bible 5% 3 Latin 25,000 6000Luther Bible 5 % 3 German 35,000 4000Alice in Wonderland All English 26,000 3000Chinese Bible Genesis Mandarin 4 34,000 2000

1 majority vote version of interlinear EVA transcription 1.6e6 [15]; 2 or particular sections of it; see Table 2; 3 percentages are counted from top of document; and, 4 in pin-yin romanizationwith all tones removed.

The Voynich Manuscript: Evidence of the Hoax Hypothesis 97

D

w

U

CU

C


5/14

Particular care has to be taken evaluating Eq. (2) for a walk of finite length N toavoid finite size effects: as l ! N 1 the sample size available for calculating theaverages (i.e. the number of possible l 0 values) tends to 1; consequently,F l !0. In the calculations presented here l is limited to a maximal value of N =10.The resulting F (l ) on applying this method to the VMS and other texts is shownin Figure 1. Previous investigations by Kokol et al. [8] of various human writingshave demonstrated that for natural language texts (almost independent of the lan-guage used) the asymptotic exponent a of F (l ) does not notably differ from 0.5, whilefor computer program source codes significant deviations are observed. As far as thenormal language samples are considered the present results confirm this.

Most interestingly, the VMS text shows completely different behavior: a cross-over point exists where the random process a 0.5 turns into an asymptoticexponent a 0:85, indicating the presence of memory effects in the underlyingstochastic process. The principal structure of F (l ) remains the same also for singlesections of the VMS, as presented in Table 2: the asymptotic exponents for partsof the VMS are somewhat lower (between 0.7 and 0.8) than for the whole text;the difference is mainly due to the relatively high sensitivity of a to reduction of the walk length. Two facts are especially noteworthy: (i) the crossover pointl co 360 72 characters 5 bits of the whole text fits well to the average linelength; (ii) this value approximately also holds for sections that are associated withCurriers language A [3], while for sections written in language B l co is significantlyhigher (by approximately a factor of 3).

It appears that in the VMS significant correlations between tokens with spacing

of more than an average text line exist, while within a line the text behaves randomly(like ordinary human writings). To inspect this more closely, the step (or bit) auto-correlation function

Figure 1. Root mean square fluctuation of the random walk displacement for the VMS andnormal language texts. Inset : VMS curve (full line) with low and high l asymptotic behavior,respectively (dashed lines).

98 A. Schinner

D

w

U

CU

C


6/14

C

l

n

l

l 0

n

l 0

h i n

l 0

h i2

4

and its corresponding cumulative distribution function

C cl 1l X

l

k 1C k 5

are useful quantities. n(k ) denotes the value (0 or 1) of the bit at position k in the binarystring generating the random walk. As demonstrated in Figure 2 positive correlationsin the VMS build up within approximately l < 400 that are by an order of magnitudestronger than in ordinary text. These correlations decay after some thousand steps.Such positive correlations are typical for a stochastic process in which the probabilityof a particular random event is increased by previous occurrences of this event.

Table 2. Random walk asymptotic displacement variance

VMS section Folios Walk length a1 a 1 l co2 Script 3

All 1r116v 954456 0.131 0.846 356 A

B

Herbal 1r66v 272896 0.243 0.768 196 AAstrological 67r73v 74721 0.396 0.659 339 ?Biological 75r84v 172096 0.161 0.762 1065 BPharmaceutical 87r102v 99176 0.314 0.706 277 ARecipes 103r116v 282536 0.182 0.738 1285 B

1 F l ! al a for l >> 1; see Eq. (2) and text; 2 Crossover l -value: l 0:5co al aco; and,

3 Currier lan-guage [3] that is dominant in this section.

Figure 2. Cumulative step autocorrelation function C c (l ), cf., Eq. (5), (smoothed by 100 pointsadjacent averaging); full line: VMS, dashed line: Vulgate Bible. Inset : autocorrelation functionC (l ), cf., Eq. (4), for l between 1000 and 1030; full line: VMS, gray shaded area: Vulgate Bible.


D

w

U

CU

C


7/14

A classical model for such a system, often applied to cascade processes like par-ticle induced electron emission [1], is the so-called P oolya process. It is based on theP oolya urn scheme , where on drawing a ball of particular color from an urn a specificnumber of balls of the same color are put into the urn, increasing the probability of

drawing this color again [5] ( spurious contagion ). In the scaling limit of large stepnumbers l the resulting distribution is the so-called P oolya distribution, also known asnegative binomial distribution

P n 1=bn bl n1 bl 1=b n 6

In the present context P n is the probability that in a walk of length l !1 the numberof up-steps is equal to n. Mean and variance of P n are given bynh i l 7

r2

l 1 bl 8The parameter b describes the cascading strength of the process: for b 0 the ran-dom steps are uncorrelated and Eq. (6) turns into a Poisson distribution, while forb 1 the so-called Yule-Ferry process (also known as simple birth process ) isrecovered [2].

Since l / l , from Eq. (8) follows that an underlying P oolya process results in theasymptotic behavior F l / ffiffiffibp l 1 of the random walk model. In order to reproducethe observed a 0:85 from Figure 1, l -dependence of b is necessary. Strictly speak-ing, the underlying process then is no longer a pure P oolya process, since with non-constant b Eq. (6) no longer satisfies the Kolmogorov equations exactly. Due to the

rather weak variation of b / l 0:3, however, it still remains a useful approximation.The actual representation of the random walk in form of the VMS text can be usedto estimate the true distribution P n (l ). Unfortunately, in particular for large l (whichrepresents the interesting case) the sample size is too small to identify the distributionwith compelling evidence (mainly because b is small). The data, however, do not con-tradict the hypothesis Eq. (6).

The unusual shape of F (l ) for the VMS has major impact on possible interpreta-tions. In particular, the Chinese hypothesis appears not to be compatible with it. Theimpression that a non-Markovian stochastic process, where the step probabilitydepends on the long-term history, may play a key role in the interpretation of theVMS will be still deepened in the following sections.

Similar Tokens Repetition Distance Distribution

In a previous work by G. Landini [9] the repetition distance distribution of the mostfrequent tokens in the VMS ( hdaiin i), Alice in Wonderland ( the ), and the VulgateBible (et ), respectively, have been investigated, i.e., the probability distribution of thenumber of other tokens between two occurrences of the particular one (iso-wordgap). The result did not show characteristic difference between the VMS and thenormal texts, apart from the well-known enigmatic VMS feature that commonwords, in particular hdaiin i, quite frequently appear in sequences and consequentlyhave non-vanishing probability for zero repetition distance.As will be demonstrated in this section it is more instructive to investigate therepetition distance of two similar rather than exactly matching tokens. From the

100 A. Schinner

D

w

U

CU

C


8/14

many well-known string distance metrics the more straight-forward Levenshtein dis-tance [6] will be used here. More sophisticated methods of calculating string distancestend to be optimized for human writings which appears problematic in the VMS con-text of unknown language and meaning (if any). The Levenshtein distance of two

character strings is an integer ranging from 0 (exact match) to the maximum of the two string lengths (no similarity), denoting the number of elementary edit opera-tions necessary to make both strings equal. Mapping this number to the interval[0,100] yields a percentage of dissimilarity for two tokens.

In Figure 3, the similar token repetition distance distribution P n for the VMScompared with normal texts is presented. Here n denotes the number of other tokensbetween two similar ones, i.e., n 0 corresponds to the situation of two alike tokensin immediate vicinity. Two words are considered similar if their dissimilarity asdefined above is less or equal to 30 % ; it turns out that the precise value ( 10% )of this threshold changes P n only quantitatively, not qualitatively. The most strikingfeature is the almost mathematically perfect smooth shape of the VMS curve forn ! 0, while the other text sample data display the expected irregular behaviorand tend to zero (or at least small values). As noted previously, this simply expressesthe effect that writers normally try to avoid word repetitions. It is especially note-worthy that even the Chinese text lies closer to the European languages than theVMS, although the higher tendency of common-word repetition sequences in Asianlanguages is a frequent argument in favor of the Chinese theory . The remaining textsamples listed in Table 1 have been omitted in Figure 3 just to avoid confusion bytoo many markers; their behavior is comparable to that of the Vulgate Bible.

Let us consider an infinite random text consisting of N words occurring withprobabilities kk , k 1,. . . , N . The chance for a particular word k to reappear for

Figure 3. Similar tokens repetition distance distribution (maximal dissimilarity 30% ) of theVMS, compared with Vulgate Bible and the pin-yin text. Inset : VMS result and fit usingEq. (12) (a 3.5618, b 0.1534, q 0.9885).


D

w

U

CU

C


9/14

the next time exactly after n other tokens follows a geometric distribution kk 1 kk n.

The total token repetition distance distribution is then given by

P n

XN

k 1k2k

1 kk

n

9

The geometric distribution has its maximum at n 0 and decreases monotonically; abehavior also true for the VMS data in Figure 3.The fact that normal texts as well as the VMS obey Zipfs first law [10] suggests

the approximation kk / 1=k . As rough estimate for small n the discrete index k maybe replaced by a continuous variable j , turning the sum Eq. (9) into an integral. Set-ting kj c=j with an upper cutoff j m to ensure convergence of the kj -norm, andunder the reasonable assumption c


10/14

Selected Tokens Repetition Distance Distribution

In the previous section the probability for n other tokens separating two arbitrarilyselected but similar ones (with respect to Levenshtein string metric distance) has beeninvestigated. Although the unusual behavior of the VMS text contrasting normalhuman writings is clearly visible, the statistical details are somehow concealed dueto the nature of the problem: the geometric distribution characteristic for randomsequences is expanded to power-law behavior by the summation Eq. (9). Further-more, the concept of similarity as well as finite sample size effects add extra ran-dom noise.

In this section the problem will be modified slightly: what is the probability fortwo tokens sharing a particular property, being separated by n ones that do not pos-

sess this property? Such a property may be the occurrence of a particular letterwithin a token, or a special word structure. This type of question appears especiallypromising since it is a well-known fact that VMS words possess a rich variety of characteristic structural details ( crust-mantle-core decomposition [14]).

The symbol hqi in the VMS appears almost always in word-initial position. Ithas been speculated that it might be a prefix with meaning and, rather than partof the remaining token (much like the Latin suffix que). In Figure 5, the repetitiondistance distribution of tokens beginning with hqi is plotted, compared with that of the token und (the German word for and) in the Luther Bible. Again, the VMSresult yields a surprisingly simple and smooth curve, qualitatively different from thatassociated with the normal text. A more detailed analysis of the data shows that P ncan be excellently fitted by a mixture of two geometric distributions :

P n a p11 p1n

1 a p21 p2n

13

Figure 4. Similar tokens repetition distance distribution (maximal dissimilarity 30% ) of theVMS, compared with Vulgate Bible and token scrambled versions of both texts. The lines justconnect the markers to guide the eye.


D

w

U

CU

C


11/14

A mixture of two probability distributions indicates the presence of two inde-pendent subpopulations in the statistical data. Eq. (13) is, for example, producedby the following random process: use two dice with success probabilities p1 and p2 , respectively. Throw a die until success (failure means not to add the hqi pre-fix to a token in the sequence); then continue with either die 1 or 2, depending on arandom decision with probability a.

However, this should only be seen as example algorithm; the mechanismsbehind the text generation process must be somehow more complex, as has beendemonstrated in the previous sections. In this context it is especially noteworthy thatEq. (13) is also compatible with (i.e., is a good approximation to) the situation of astochastic process with varying step probability , being gradually decreased from p1 to p2 on failure events, and reset to p1 on success. This provides another link tospurious contagion processes like the Poolya scheme discussed previously.

The hqi prefix is just a single aspect of the fairly complex VMS word grammar.However, the behavior expressed by Eq. (13) is found throughout a wide variety of token selection conditions; a few examples are listed in Table 3. Most interestingly,the crossover point between the two geometric distributions (i.e., the real value n forwhich both terms of Eq. (13) contribute equally) is in most cases close to the averagenumber of tokens per line. For a token scrambled version of the VMS, however, Eq.(13) is reduced to a single geometric distribution, as is expected in agreement with theprevious analysis.

On the other hand, for normal texts two possible results have been found so far:if the selection criterion is weak and linguistically (almost) irrelevant, then the resultwill be a single geometric distribution (straight random result); an example is the

Figure 5. Repetition distance distribution of VMS tokens beginning with EVA hqi (fullsquares), and the token und in the Luther Bible (open circles), respectively. Full line : fitof the geometric distribution mixture Eq. (13) with parameters a 0.50275, p1 0.28531, p2 0.10482.

104 A. Schinner

D

w

U

CU

C


12/14

selection of all tokens in an English text that contain the letter e. If, however, thecondition is correlated with semantic (sub-) structures or at least nontrivial tokenparts the result more or less resembles the Luther Bible curve in Figure 5. Like inthe previous sections the behavior of the Chinese text does not differ significantly.

Conclusions

Concerning the VMS enigma, such investigations are of special interest that empha-size the peculiar structural properties of the VMS text in contrast to normal lan-guage. All methods of analysis used in the present article fall into this category.

Interpreting normal texts as bit sequences yields deviations of little significancefrom a true (uncorrelated) random walk. For the VMS, this only holds on a small

scale of approximately the average line length; beyond positive correlations buildup: the presence =absence of a symbol appears to increase =decrease the tendencytowards another occurrence. The P oolya urn scheme is an example for such a beha-vior; it is, however, not exactly reproducing the VMS data and should be seen asa first approximation only.

Nevertheless, this result has important implications on the possible solutions of the VMS riddle. Encryption tends to destroy correlations in a text rather than build-ing them up. The method, however, could be a more complex variant of a wordgame, like the childrens secret language Opish (there you add the syllable opbefore each vowel); in this case the effective information content of the VMS wouldat least be rather low. The result appears incompatible with the plain text hypothesis .Even in artificial language correlations tend to be contextual, i.e., on the small scaleof a few sentences.

Thus, the hoax hypothesis may provide the most convincing explanation base forthe data. A variant of the table-and-grille method still is a promising candidate, if the table is filled with syllables selected under involvement of some lottery algor-ithm producing the observed statistical effects. The source for the positive correla-tions might as well be (or partly be) a psychological one: the creator of the tablecould unconsciously have written them into it while trying to equally distributethe syllables (the human mind is extremely poor at generating random numbers).An additional problem, however, arises upon reusing a table with different grilles:

the variance Eq. (2) is very sensitive to correlations created by overlapping slot pat-terns, leading to significant structures in F (l ) for large l . To avoid this behavior notobserved for the VMS text, about 4 to 6 only (of the 27 possible) 3 3 grilles can be

Table 3. Some examples for the parameter fits of Eq. (13)

Selection condition a p1 p2 xC 2

Token begins with

hq

i 0.50275 0.28531 0.10482 4.5

Token contains hcC hi1 0.58879 0.12189 0.03196 17.4Token contains hchei 0.90501 0.16838 0.03489 25.7Token contains hshei 0.74403 0.12395 0.02811 24.6Token ends with haiin i 0.93027 0.12892 0.02001 37.81 C stands for a gallows character (h f i, hk i, h pi, hti); and,

2 crossover point: xC ln 1 a p2a p1 . ln 1 p11 p2 :


D

w

U

CU

C


13/14

used with a particular 39 40 table. It is unlikely that the creator of the VMS hasexcluded the forbidden grille layouts by mere luck, but perhaps out of aesthetic(symmetry) considerations? On the other hand, the table-and-grille scheme neednot necessarily contain the (whole) truth about the VMS generation process, even

if the hoax hypothesis finally might turn out to be correct.The token repetition statistics also emphasizes the strangeness of the VMS lan-

guage. Again the results differ significantly from comparative text samples, indicat-ing that the VMS language is more closely related to a stochastic process thanhuman communication. Of particular interest is the mixture of two geometric distribu-tions Eq. (13) that almost perfectly describes the gap distribution of tokens with, forexample, a particular prefix. Such exact statistical properties of complex systemsare either trivial (as in the case of purely random aspects) or express an underlyingprinciple. Since Eq. (13) contains a crossover between two terms it most probably isnot trivial (pure randomness would have yielded a single geometric distribution).

Another exact property of the VMS is already well known: the word length dis-tribution follows almost exactly a binomial distribution. This fact has been a strongargument in favor of the Chinese theory [13] since East Asian languages, in particularChinese, also show this feature. The present investigations, however, let the Chinesetheory appear much less promising; instead, the mathematically exact shape of theVMS word distribution may be seen as additional evidence of an underlying stochasticprocess (a binomial distribution describes the sum of independent random summands).

It must be emphasized that the present study is not a proof of the hoax hypothesis ,nor can it definitely rule out either of the two other main theory classes. It gives, how-ever, some hints on the most promising direction for future investigations. In the textso far I was trying to avoid writing down my personal opinion about the VMS, where

it goes beyond the presentation of facts and the inevitable basic interpretation of stat-istics (I am aware of how easily statistics can be misinterpreted following prejudice).From my viewpoint, the VMS is a cleverly set psychological trap still active after fivecenturies, reflecting the analysts expectations and hopes like a mirror without con-taining meaningful information itself. It has been created using algorithmic meth-ods, implicitly or explicitly involving some degree of randomness.

A frequent argument against the hoax hypothesis is that even utilizing somethinglike the table-and-grille the effort for a hoaxer would have been inadequately high:to defraud Emperor Rudolf II of Bohemia (the possible first buyer of the VMS) amuch simpler concept should have been sufficient. As always with psychologicalarguments there is the intrinsic danger of projecting a value system. Perhaps theVMS is the once-in-a-lifetime masterpiece of a habitual forgeror simply a specialkind of artwork, created with no immoral motivation: around 1980 the Italian archi-tect and industrial designer Luigi Serafini has written and illustrated his famousCodex Seraphinianus (most probably inspired by the VMS) that looks like the visualencyclopedia of an extraterrestrial world, and is written in incomprehensible lan-guage with strange curvilinear script. Obviously there is some artistic or even philo-sophical attraction in the creation of a phantasmagoric book that has no inherentmeaningand therefore, can take on any one.

Acknowledgment

The author wishes to thank M. A. Labi for stimulating discussions and proofreadingthe manuscript.

106 A. Schinner

D

w

U

CU

C


14/14

About the Author

Dr. Andreas Schinner is a theoretical physicist, performing freelance research at theJohannes Kepler University in Linz, Austria. His main area of scientific interest istheoretical solid state physicsparticularly particle beam interactions with matter.He is also working as a self-employed software developer.

References

1. Benka, O., A. Schinner, and T. Fink. 1995. Distribution of the Number of Emitted Elec-trons for MeVH -, and He 2 -ion Impacts on Metals, Phys. Rev. A , 51(3):22812284.

2. Cox, D. R. and H. D. Miller. 1965. The Theory of Stochastic Processes . London: Methuen& Co Ltd.

3. Currier, P. H. 1976. Some important new statistical findings. Proceedings of a Seminarheld on 30 November 1976 in Washington DC. In edited by M. E. DImperio. Privatelyprinted pamphlet, 30 November 1976. ftp://ftp.funet.fi/pub/doc/religion/occult/necro-nornicon/voynich/currier.paper. Last date accessed by me =web document update: 20Feb 2007.

4. DImperio, M. E. 1978. The Voynich ManuscriptAn Elegant Enigma . Laguna Hills, CA:Aegean Park Press.

5. Feller, W. 1957. An Introduction to Probability Theory and its Applications . Vol. 1, NewYork: Wiley.

6. Gilleland, M. 2002. Levenshtein Distance in Three Flavors. http://ww.merriampark.-com/Id.htm last accessed 20 Feb 2007.

7. Kennedy, G. and R. Churchill. 2005. The Voynich Manuscript: The Unsolved Riddle of anExtraordinary Book Which has Defied Interpretation for Centuries . London: Orion Pub-lishing Group Ltd.

8. Kokol, P., V. Podgorelec, M. Zorman, T. Kokol, and T. Njivar. 1999. Computer andNatural Language TextsA Comparison Based on LongRange Correlations, Journal of the American Society for Information Science , 50:12951301.

9. Landini, G. 2000. Zipfs laws in the Voynich Manuscript, http-document, currently no 405longer available in the Internet.

10. Landini, G. 2001. Evidence of Linguistic Structure in the Voynich Manuscript UsingSpectral Analysis, Cryptologia , 25(4):275295.

11. Rugg, G. 2004. An Elegant Hoax? A Possible Solution to the Voynich Manuscript,Cryptologia , 28(1):3146.

12. Schenkel, A., J. Zhang, and Y. Zhang. 1993. Long Range Correlations in Human Writ-ings, Fractals , 1(1):4755.

13. Stolfi, J. 2002. Chinese theory Redux: Comparing the VMS and East Asian word lengthdistributions. http://www.ic.unicamp.br/~stolfi/voynich/02-01-18-chinese-redux/ lastaccessed 20 Feb 2007.

14. Stolfi, J. 2003. Voynich manuscript Stuff. http://www.ic.unicamp.br/~stolfi/voynich/lastaccessed 20 Feb 2007.

15. Zandbergen, R. 2003. The Voynich manuscript. http://www.voynich.nu/ last accessed 20Feb 2007.


D

w

U

CU

C

Documents

Schinner-2007-The Voynich Manuscript Evidence of the Hoax Hypothesis-000