Crosswords and Information Theory

Crosswords and Information Theory

Peter Andreasen

Deember 17, 2000

Abstrat

A gentle introdution to the wonders of the infor-

mation theoretial onept of entropy through ele-

mentary alulation of the number of rosswords.

What is a rossword, really?

Most people have solved a rossword puzzle or

played Srabble. The existene of word-games like

those are not to be taken for granted, though. As

we are going to see, the existene of rosswords is

entirely at the mery of the underlying language. In

fat, there is a onnetion between the information

theoreti onept \entropy", and the possibility of

reating rossword puzzles!

Before we proeed, we need to get a few deni-

tions straight. First, what is a rossword? Let us

take a look at one:

g e m

a r e

m a t h

e e

What we see are rows and olumns of words (single

letters are a

epted as words) separated by white

squares.

1

Now, the words in a rossword need not

be English as in the example above. We might

want to reate a Danish rossword or we might

even want to have the olumns and rows be quotes

from Shakespeare's sonnets. To be able to handle

suh omplex rules for the reation of rosswords

we make the following denition.

1

It is in the white squares you will normally nd the hints

needed to solve the puzzle { and in most rosswords the

topmost row and the leftmost olumn are lled with these

hints. For simpliity we will make no assumptions about the

plaement of the white squares.

Denition 1 A language L is a set of sequenes

of letters from an alphabet A (say, the letters 'a'

through 'z' and the symbol ''). A rossword of

size n is a matrix with the dimensions nn where

all of the rows are sequenes (of length n) from L

and all of the olumns are sequenes (of length n)

from L.

So if we want to make a really sophistiated

rossword, we may let L be all the possible quotes

from Shakespeare. In that ase we should use an al-

phabet whih inluded the letters as well as spae

and the various puntation symbols. If we want-

ed to make a 'lassi' rossword, we would have L

be equal to any sequene you an make by taking

words from a ditionary and gluing them together

with one or more 's inbetween. In this ase the

alphabet A would just be the letters and the speial

symbol .

How many are there?

It is obvious, that very small rosswords are easi-

ly onstruted. Espeially rosswords whih have

only one row or one olumn. It is also easy to re-

ate a few very big but very dull rosswords: If you

keep alternating the rows between 'I I I ' and

' I I ', you ertainly get a valid (and as big

as you like) 'Classi English' rossword. So we want

to not only onsider the existene of big rosswords,

but also hek if there are many dierent of them.

We are going to alulate the number of big ross-

words now.

Assume we have hosen an alphabet A and a lan-

guage L over whih the rosswords must be made.

We use the notation jAj for the number of letters

and symbols in the alphabet. Let us introdue the

following number as well,

L

(n) = number of sequenes from L of length n:

1

So for onstruting a square rossword of size nn

over the language L, there are

L

(n) possible hoi-

es for the rst row. We will now use a small trik

and for a moment employ a bit of probability the-

ory: If we piked an absolutely random sequene

of n letters from A, what are the hane that we

got a 'valid' row, that is, a sequene from L? The

answer is

L

(n)

jAj

n

beause there are

L

(n) valid sequenes and jAj

n

possible sequenes. An example: suppose we want-

ed to reate a normal English rossword. In my

ditionary, there are 1 word (namely 'I') of length

1 and 49 words of length 2. Valid sequenes of

length 2 are '', ' I', 'I', and then the 49

words of length 2. A total of 52 sequenes, that is,

L

(2) = 52. The total number of possible sequenes

of length 2 is jAj

2

= 2727 = 729 (note, that even

though we only have 26 letters, the size of A is 27,

beause we need the symbol as well). And thus

the probability of getting a valid sequene of length

2 would be 52=729 0:07, in this example.

However, that was the probability of just one

valid row. What about the rest? The probability of

all n rows being valid equals the above probability

multiplied with itself n times

2

,

L

(n)

jAj

n

n

=

L

(n)

n

jAj

n

2

:

Now for the olumns the situation is idential. And

beause the olumns are as high as the rows are

wide, the result is the same: The probability of all

n olumns being valid (that is, from L) equals

L

(n)

n

jAj

n

2

:

Now we may alulate the probability of a random-

ly seleted matrix of n n letters from A being in

fat a rossword: We want both its rows and its

olumns to be valid, so we multiply:

L

(n)

n

jAj

n

2

2

=

L

(n)

2n

jAj

2(n

2

)

2

This is basi probability theory. It is omparable to

when we say that the probability of a oin landing heads up

equals

1

2

and then proeed to alulate the probability of two

heads in a row as

1

2

1

2

=

1

4

. We multiply the probabilities

when we want the probability of both events.

We now return to our original question: How

many (big) rosswords are there? Well, we know

the probability of a randomly seleted matrix of

n n letters being a rossword, and there are a

total of jAj

nn

= jAj

n

2

possible n n matries, so

we may write

3

N

n

= jAj

n

2

L

(n)

2n

jAj

2(n

2

)

=

L

(n)

2n

jAj

n

2

:

This makesN

n

our symbol for the mumber of ross-

words of size n n.

Explosive numbers

To get to the ore of the matter, we need to do a

bit of mathematial wizardry, so now is the time to

wear your pointed hat! First, we apply the loga-

rithm

4

to N

n

:

logN

n

= 2n log

L

(n) n

2

log jAj

= 2n

2

log

L

(n)

n

log jAj

2

=

2n

2

log jAj

log

jAj

L

(n)

n

1

2

The speial symbol log

jAj

is simply the logarithm to

the base of jAj, that is, jAj

log

jAj

x

= x. Reall that

we are interested in the number N

n

when n grows

large. In the expression above, the rst fration,

2n

2

log jAj

;

just grows towards innity as n does the same. The

seond fration,

n

=

log

jAj

L

(n)

n

is more interesting (so we name it

n

). The value

of

L

(n) must be between 0 and jAj

n

(that should

be lear from the denition of

L

(n)). So (assuming

3

This is another appliation of basi probability theory:

The number of valid rosswords are alulated as the proba-

bility of a random matrix being valid times the total number

of possible matries.

4

Reall, that taking the logarithm of a produt yields a

sum (log ab = log a + log b), a logarithm of a fration yields

a dierene (log a=b = log a log b) and the logarithm of a

power turns into a produt (log a

b

= b log a).

2

L

(n) > 0)) we see that log

jAj

L

(n) is between 1

and n. Thus, when n grows large, value of

n

stays

between 0 and 1. Let us assume that

n

in fat

onverges

5

to some number between 0 and 1. We

may now onlude, that if 3) ross-

words. The probability of one dimension (think:

row) of the rosswords being valid equals

L

(n)

jAj

n

n

d1

=

L

(n)

n

d1

jAj

n

d

:

This is almost the same result as before, but note

the exponent n

d1

. In the ase d = 3, where we

might imagine the rossword as a ube made up

of 'stiks' of sequenes from L, the exponent orre-

sponds to the fat that in eah dimension there are

n

2

stiks. The probability of all dimensions (think:

rows and olumns) being valid equals

L

(n)

n

d1

jAj

n

d

!

d

=

L

(n)

dn

d1

jAj

dn

d

:

Again, this should ome as no shok. The total

number of possible rosswords (think: any matrix)

is multipliated with the probability and we nd:

N

(d)

n

= jAj

n

d

L

(n)

dn

d1

jAj

dn

d

=

L

(n)

dn

d1

jAj

n

d

(d1)

:

Applying logarithm yields

logN

(d)

n

= dn

d1

log

L

(n) n

d

(d 1) log jAj

and reorganizing the terms,

logN

(d)

n

=

dn

d

log jAj

log

jAj

L

(n)

n

d 1

d

: (1)

The seond fration in the above expression is

reognized from before. We reall that the number

is used to denote the limiting value of the fration

as n beomes very big. We nd, that if, say, d = 3

the value of must be at least

2

3

if we want to

have many, big rosswords. As the dimension of

the rosswords grow, the languages L must have

larger and larger -value to sustain the notion of

many rosswords.

It seems like

L

expresses something fundamen-

tal about the language L. So information theorists

have a name for that value:

Denition 2 Let L be a language. The entropy of

L is dened as

~

H(L) = lim

n!1

log

jAj

L

(n)

n

:

3

We reognize the entropy as the same thing as

we know as

L

. The little symbol above

~

H is there

to remind us that this is a speial kind of entropy:

The theory leading up this denition is not as on-

ise and rigid as many information theorists would

want. But we should not feel nothing has been a-

omplished: Our entropy aptures some very deep

aspets of the onept.

We may wonder what happens if, say,

~

H(L) =

1

4

. What kind of rosswords are possible? Why,

rosswords of dimension d >

4

3

of ourse! If d =

4

3

we have (d1)=d =

1

4

. How to visualize a rossword

in 1:333 dimensions is probably better left as an

exerise to the reader!

A reapitulation

Let us briey examine what we have learnt so far:

We introdued the onept of language whih is

nothing but a set of sequenes of letters. We have

then made a satisfatory denition of what is a

rossword over a language. Using elementary om-

binatoris and probability theory we have alulat-

ed the number of valid rosswords of size n n

(or, in the ase of other dimensions, size n

d

). This

number depends on the size of the alphabet, jAj,

as well as the speial funtion

L

(n). We then ob-

served, that there are essentially two dierent ases:

In the rst ase (

L

d1

d

), the same number be-

omes innitely big as the size grows.

This alls for a reformulation of our initial ques-

tion: While we opened this paper asking about the

existene of rosswords, we are now tempted to ask:

\Given a language L, what is the greatest dimen-

sion d for whih there are many (big) rosswords?".

This move enourages us to onsider non-integer

values of d, and thus we have denitely left the

realm of ordinary rosswords puzzles, and maybe

the real world as well! The answer to the new ques-

tion is related to the entropy as we have just seen.

In fat, ombining the denition of entropy and for-

mula (1), shows

~

H(L) =

d

0

1

d

0

and d

0

=

1

1

~

H(L)

;

where d

0

is exatly the largest dimension where it

is possible to reate (many) rosswords over L.

Note how d

0

may be arbitrarily big, even 1 if

the entropy equals 1. How is that for a rossword

puzzle! Atually, if

~

H(L) = 1 it is quite trivial to

reate rossword puzzles (in any dimension). An

example of suh a language L is that whih is made

up of every integer. The alphabet A is just the dig-

its, and

L

(n) = jAj

n

= 10

n

(beause any sequene

of length n whih is made up from digits, is a valid

number) so learly

~

H(L) = 1.

We have arrived at the onept of entropy by

a quite unusual method. Aside from (hopefully)

some pedagogial advantages there are other rea-

sons for piking this approah: We now have an

entropy onept dened on any language or, whih

is the same, any set of sequenes made up of letters

from A. This is not true for the traditional entropy

whih is introdued by the onept of information

soures (whih are also known as stohasti pro-

esses, and are based on a quite tehnial proba-

bility theoreti framework). In addition, while our

entropy, in its urrent form, does not handle ran-

dom languages (e.g. the language of all possible se-

quenes of 0's and 1's reated by ipping a oin),

it is possible to rene our denitions to over (and

indeed generalize the probability theoreti entropy)

these important ases as well.

Entropy

Something should probably be said about why en-

tropy is so entral to information theory. But where

to start and where to end! We will look at only two

aspets, one of somewhat philosophial nature and

the other of very pratial nature.

Entropy is often said to be a measure of how

'omplex' or even 'aoti' things are. This orre-

sponds niely with the observations given above: A

language made up of every possible integer is devoid

of any form or struture. Anything goes. It is im-

possible to distinguish between the a sequene from

the language and a sequene of ompletely random

digits. This language, as explained above, has the

entropy 1. On the other hand, a language made up

of sequenes of only one letter, say 'a', is omplete-

ly strutured. No room for hoies. The funtion

L

(n) is onstantly 1 regardless of the value of n.

This orresponds to the ase where

~

H(L) = 0.

The other use of entropy whih we will touh up-

on is data ompression. We mention the following

4

theorem in sketh form:

Theorem 1 (Shannon) Let L be a language over

A. There exists an enoding suh that any sequene

x 2 L of length n may be enoded into a sequene

of no more than n(

~

H(L)+) letters. This holds for

any positive number however small, provided the

length n of x is large enough.

This formulation only aptures the essene of

Shannon's theorem. What is important is the or-

der in whih things happen: First we hoose the

value as small as we want it. This determines

how \lose" to the entropy we want our enoding

to be. Then the theorem tells us that there exists a

number N and a ode so that any sequene x 2 L

whih are at least N letters long an be enoded

into just jxj(

~

H(L) + ) letters. So if the entropy of

L is

1

2

we an ompress long sequenes from L by

a fator 2.

This onludes our tour. The onnetion be-

tween the omplexity of a language and the ability

to reate rosswords may not ome as a surprise.

But that this onnetion leads diretly to entropy,

the ornerstone of information theory is, at the very

least, rather neat.

Notes

This setion ontains some notes about the history

of the results. It is probably most interesting to

readers already familiar with the onepts in this

paper. The idea of linking rossword puzzles and

entropy is, in fat, as old as [Shannon, 1948, from

whih we quote the last paragraph of setion 7:

The redundany of a language is related

to the existene of rossword puzzles. If

the redundany is zero any sequene of let-

ters is a reasonable text in the language

and any two-dimensional array of letters

forms a rossword puzzle. I fthe redun-

dany is too high the language imposes

too many onstraints for large rossword-

s puzzles to be possible. A more detailed

analysis shows that if we assume the on-

straints imposed by the language are of a

rather haoti and random nature, large

rossword puzzles are just pussible when

the redundany is 50%. If the redundany

is 33%, three-dimensional rossword puz-

zles should be possible, et.

For a more detailed disussion as well as a bit of

history on the result see the last part of [Immink

et al., 1998. The entropy

~

H introdued in this

paper is releated to the Hartley entropy and Haus-

dor dimensions of 'nie' subsets of A

1

(onsid-

ered as subsets of [0; 1). Or, if one onsiders arbi-

trary subsets of A

1

, the entropy

~

H might be inter-

pretated as a form of the box ounting dimension,

see e.g. [Faloner, 1990. The onnetion between

entropy and Hausdor dimension is desribed in

[Billingsley, 1965 and interesting results in this di-

retion an be found in [Ryabko, 1986.

Referenes

Billingsley, P. Ergodi theory and Information.

John Wiley & Sons, 1965.

Faloner, K. Fratal Geometry - Mathematial

Foundations and Appliations. John Wiley &

Sons, 1990.

Immink, K.A.S., P.H. Siegel and J.K. Wolf. Codes

for digital reorders. IEEE Trans. Inform. The-

ory, 44(6):2260{2299, 1998.

Ryabko, B. Y. Noiseless oding of ombinatori-

al soures, hausdor dimensoin, and kolmogorov

omplexity. Problems of Inform. Trans., 22(3):

170{179, 1986.

Shannon, C.E. A mathematial theory of ommu-

niation. Tehnial report, Bell System, 1948.

5

Documents

Crosswords and Information Theory