Transcript

Artificial Intelligence

CIS 342

The College of Saint RoseDavid Goldschmidt, Ph.D.

March 6, 2009

Crossword Puzzle Construction

Given:– Dictionary of valid words

and phrases– Empty crossword grid

Problem:– Fill the crossword grid such

that all words both acrossand down are valid

– Assign clues

Crossword Puzzle Construction

Depth-First Search (DFS)– Fill in words until a solution is found

or a dead-end is encountered– Backtrack from dead-ends

– Questions: Where do we start? What word do we fill in next? What backtracking strategies do we use? How do we avoid repetition (boring puzzles)?

Crossword Puzzle Construction

Optimize the DFS:– Add longer (most constrained) words first– Associate weights with words in dictionary

based on frequency of letters Friendly crossword puzzle words

include letters: S, R, E, T, D, A, I, L Unfriendly crossword puzzle words

include letters: J, Q, X, Z, F, V, W e.g. quiz, fix, jazz, quaff, xylophone, wax

1 01 0X1i

Generation i

0 01 0X2i

0 00 1X3i

1 11 0X4i

0 11 1X5i f = 56

1 00 1X6i f = 54

f = 36

f = 44

f = 14

f = 14

1 00 0X1i+1

Generation (i + 1)

0 01 1X2i+1

1 10 1X3i+1

0 01 0X4i+1

0 11 0X5i+1 f = 54

0 11 1X6i+1 f = 56

f = 56

f = 50

f = 44

f = 44

Crossover

X6i 1 00 0 01 0 X2i

0 01 0X2i 0 11 1 X5i

0X1i 0 11 1 X5i1 01 0

0 10 0

11 101 0

Mutation

0 11 1X5'i 01 0

X6'i 1 00

0 01 0X2'i 0 1

0 0

0 1 111X5i

1 1 1 X1"i1 1

X2"i0 1 0

0X1'i 1 1 1

0 1 0X2i

Crossword Puzzle Construction

Genetic Algorithm (GA)– Evolve a solution by crossovers and

mutations through many generations– Initial population of crossword grids:

Random letters? Random letters based on Scrabble® frequencies? Random words from dictionary?

– Fitness of each grid is number of valid words

Solving Crossword Puzzles

Given:– Crossword grid – Clues

Problem:– Fill the grid such

that all words correctly answerthe given clues

Solving Crossword Puzzles

Obtain candidate answers for each clue– Assign a confidence value to each candidate– Are we guaranteed to have the correct

answer?

Place candidate answers in grid until a solutionis found or a dead-end occurs– Which backtracking strategies

should we use?

Solving Crossword Puzzles

PROVERB — Duke University, 1999– Modules provide candidate answers

from dictionaries, encyclopedias,movie databases, etc.

– Module sources a Crossword Puzzle Database ofexactly 5142 previously solved puzzles

Pivotal in PROVERB’s success

– Another module generates all combinationsof letters (ouch!)

Solving Crossword Puzzles

Google CruciVerbalist (GCV)

Solving Crossword Puzzles

GCV solved 13x13 puzzle with 68 clues– Many clues are fill-in-the-blank

or pop-culture clues– Candidate answers

obtained from Googleresults page (top 50)

– Solved using 559 Google queries– Queries yielded 68 correct answers

44 correct answers had highest confidence

Solving Crossword Puzzles

Clue Preprocessing

Categorize clues based on text and type of clues:– Fill-in-the-blank clues– Synonyms/Antonyms– “Type of” (or “Kind of”) clues– Abbreviations– Clues with “and” or “or”– Singular or plural– Number of words in answer

Clue Preprocessing

Translate clues to Google-friendly forms– “To ___ is human”

“To * is human” “To * * is human”

– “Mary ___ little lamb” (2 words) “Mary * * little lamb”

– “___ to Joy” by Beethoven “* to Joy” by Beethoven “* * to Joy” by Beethoven

Clue Preprocessing

Translate clues to Google-friendly forms– Diplomacy

synonyms of Diplomacy

– Not dry opposite of dry antonyms of dry

– Joy synonyms of Joy

Clue Preprocessing

Translate clues to Google-friendly forms– Type of dancing [or Kind of dancing]

* dancing

– Second sight (abbr.) Second sight abbreviations of Second sight

– Superman’s admirer admirer of Superman

Clue Preprocessing

Translate clues to Google-friendly forms– Couldn’t move

Could not move Could opposite of move Could antonyms of move

– Knight or Danson Knight Danson

Clue Preprocessing

Translate clues to Google-friendly forms– Bosley and Arnold

Bosley Arnold Append an ‘s’

– Henson, and others [or Henson, and namesakes]

Henson Append an ‘s’

Results of Google-Querying

Results of Google-Querying

GCV excels at solving fill-in-the-blank and pop-culture clues– Why?

Though results are encouraging,using keyword-based searchingis limited– Why?

Populating the Crossword Grid

Use a Depth-First Search (DFS) algorithm:– Fill in the crossword grid based on confidence

values of candidate words– At each iteration:

Select candidate word with highest confidence valueamongst clues not yet placed

Attempt to fit candidate word into grid

– Halt when a solution is found or a dead-end occurs

Populating the Crossword Grid

When a dead-end occurs, what do we do?– Backtrack: Remove last word placed in grid

Disadvantages?

– Backjump: Identify culprit and remove all wordsback to culprit word

Disadvantages?

Populating the Crossword Grid

When a dead-end occurs, what do we do?– Extricating Backjump: Identify and remove

the culprit Disadvantages?

– How do we identifythe culprit?

Extricating Backjumping

Assign weights to the squares of the grid– Square weights correspond to confidence

valuesof candidate words placed

– e.g. Place TWAIN withconfidence value of 10at 5-Across

Extricating Backjumping

Weights of interlocking words are multiplied

Extricating Backjumping

Define grid weight of a word as the sum of each individual square weight

– e.g. TWAIN = 100, NOW = 72

Extricating Backjumping

When a dead-end occurs, the culprit is theword with the lowest grid weight

A Sampling of Crossword Puzzles

A Sampling of Crossword Puzzles

New York Times

A Sampling of Crossword Puzzles

A Sampling of Crossword Puzzles

TV Guide #42

A Sampling of Crossword Puzzles

A Sampling of Crossword Puzzles

TV Guide #63

A Sampling of Crossword Puzzles

A Sampling of Crossword Puzzles

Mensa Kids Puzzle #3

Results of Grid Solving

Limitations of Keyword-Based Search

Google and GCV use keyword-based tricksto artificially improve result sets– Word frequency & proximity to other words– Additional keywords to help direct queries to

good candidate answers e.g. synonyms of

– Grammatical and structural rearrangements

Lack of precision in keyword-based search– Irrelevant results in candidate answer lists– Confidence values based on word

frequencyproduces many false positives

– Correct answer is often buried in other mediocre(and incorrect!) candidates

Limitations of Keyword-Based Search

In Conclusion....

Other uses of theWeb as an automatedinformation source?– Keyword-based search

is insufficient– Lacks the means for

machine-interpretableinformation

– Semantic Web


Recommended