5
O R © 2011 Macmillan Publishers Limited. All rights reserved

Word Play - Mathematics of Humanities

Embed Size (px)

Citation preview

Page 1: Word Play - Mathematics of Humanities

8/6/2019 Word Play - Mathematics of Humanities

http://slidepdf.com/reader/full/word-play-mathematics-of-humanities 1/5

OR

© 2011 Macmillan Publishers Limited. All rights reserved

Page 2: Word Play - Mathematics of Humanities

8/6/2019 Word Play - Mathematics of Humanities

http://slidepdf.com/reader/full/word-play-mathematics-of-humanities 2/5

Erez Liebermn Aien is stning on the sun eck of his town house, rocking bck n forth on the bllsof his bre feet s he belts out blessing. The Hebrew

wors echo cross the quiet courtyrs of Hrvr Uni- versity in Cmbrige, Msschusetts. The sky hs turneinigo s the light n wrmth lek wy from this y inlate April. Shalom aleichem, he sings. Peace be upon you.

Liebermn Aien — moleculr biologist, pplie mth-ematician and, at 31 years old, the precocious doyen of theemerging fiel known s the igitl humnities — coul owith a little peace. The cries of his 10-month-old son havebte — for the moment — n he hs h just enoughtime to throw on pir of frye blck trousers n shiny synthetic pullover before his guests rrive. A five o’clockshow rkens the terrin between his thick gotee nunkempt hir. The night before, he cught lte trin bck

from Princeton University in New Jersey, where he, thegeeky scientist, h the elicte tsk of informing roomof eruite historins tht his efforts t mining tbse of 5 million books, bout 4% of ll those ever publishe, hme much of wht they o trivilly esy. The scrupuloustrcking of ies cross history, for instnce — work thths consume entire creers — cn be one in seconswith tools tht Liebermn Aien n his collegues hveinvented.

Yet his role s evngelist for chnge in the humnities —or oomsy prophet, epening on your point of view —

is just one of the mny prts plye by Liebermn Aien.He is lso: the inventor of grounbreking protocol thtreveals how DNA can be tightly wound and yet untangledenough to orchestrte life; the chief executive of iShoe, company that is testing sensor-stuffed shoe inserts to helpthe elerly with their blnce; n the co-founer, with hiswife, of Bears Without Borders, which sends thousands of stuffe nimls to chilren in the eveloping worl. (Brely concele in the couple’s bsement re mouns of ontenimls witing elivery.) In pouring his energies intoll the projects tht excite him, Liebermn Aien oesn’ttrnscen isciplinry bounries so much s ignorethem. An lthough he is still techniclly postoctorlresearcher at Harvard, Lieberman Aiden seems to publishthe results of those projects almost exclusively on the cov-ers of Science n Nature; hung in the stirwell below thesun eck, he hs frme blow-ups of the mgzine coversto prove it.

But tht is work, n this is Shbbt inner, the strt of the Jewish Sbbth: time for rest. The light switches willremin untouche, leving the house illuminte throughthe night; the hot plte in the kitchen, on which the melis being wrme, is on timer. Three cnles hve beenlit, one for ech member of the househol. LiebermnAiden sings unabashedly in a hearty baritone that is not atll like his reey, excitble speking voice. He gzes t hiswife, Aviv Presser Aien, who grins bck t him, hol-ing her sweter tight to herself in the chilly night ir. Shetoo hs reson to rest contentely. The week before, she

lerne tht she h won US$100,000 grnt from the Bill& Melin Gtes Fountion in Settle, Wshington, tobuil microbil fuel cell tht coul chrge mobile phonesin Afric. The project mens yer-long brek from herstudies at Harvard Medical School in Boston, where she isadding an MD to her PhD in genetics.

It is only by comprison with this cemic power-couple tht the other inner guests — two young, self-ssure Hrvr physics grutes — look bit lost, but

PLABy mining a databaseof the world’s books,Erez Lieberman Aiden isattempting to automatemuch of humanitiesresearch. But is the fieldready to be digitized?B Y E R I C H A N D

   S   A   M    O

   G   D   E   N

2 3 J U N E 2 0 1 1 | V O L 4 7 4 | N A T U R E | 4 3 7

NEWSFEATURE

© 2011 Macmillan Publishers Limited. All rights reserved

Page 3: Word Play - Mathematics of Humanities

8/6/2019 Word Play - Mathematics of Humanities

http://slidepdf.com/reader/full/word-play-mathematics-of-humanities 3/5

tht probbly hs more to o with their unf-milirity with the Shbbt rituls. They flipthrough the Hebrew pryer books n try tofollow long. But Liebermn Aien, who inhis 20s toye with becoming rbbi, hs nonee for the book. These re the texts tht hehs stuie for yers. These re the wors heknows best.

READING VERY NOT-CAREFULLY

As reer with finite mount of time,Liebermn Aien likes to sy, you pretty muchhve two choices. You cn re smll numberof books very crefully. Or you cn re lots of books “very, very not-crefully”. Most humni-ties scholrs bie by the former pproch. In process known s close-reing, they seek outoriginl sources in rchives, where they uner-line, nnotte n cross-reference the text inefforts to ientify n interpret uthors’ inten-tions, historicl trens n linguistic evolution.It’s the pproch Liebermn Aien followe for

2007 pper in Nature

1

. Sifting through olgrmmr books, he n his collegues ien-tifie 177 verbs tht were irregulr in the erof Ol English (roun ad 800) n stuietheir conjugtion in Mile English (rounad 1200), then in the English use toy. They foun tht less-commonly use verbs regulr-ize much more quickly thn commonly useones: ‘wrought’ becme ‘worke’, but ‘went’ hsnot become ‘goe’. The stuy gve LiebermnAien first-hn lesson in how pinstkinga traditional humanities approach could be.

But wht if, Liebermn Aien wonere,you coul re every book ever written ‘not-crefully’? You coul then show how verbs reconjugte not just t isolte moments in his-tory, but continuously through time, s the cul-

ture evolves. Stuies coul tke in more t,fster. As he begn thinking bout this ques-

tion, Liebermn Aien relize tht ‘reing’books in this wy ws precisely the mbitionof the Google Books project, igitiztion of some 18 million books, most of them pub-lishe since 1800. In 2007, he ‘col e-mile’members of the Google Books tem, n wssurprise to get fce-to-fce meeting withPeter Norvig, Google’s irector of reserch,

 just over week lter. “It went well,” LiebermnAiden says, in an understatement.

Working with Google n his chief col-lbortor, 29-yer-ol Hrvr psychology postoc Jen-Bptiste Michel, LiebermnAien built softwre tool clle the n-grms

 viewer to chrt the frequency of phrses cross

corpus of 500 billion wors. A ‘one-grm’plots the frequency of single wor such s‘feminism’ over time; ‘two-grm’ shows thefrequency of contiguous phrse, such s‘touch base’ (see ‘Think outside the box’).

Google unveile the tool on 16 December2010, the sme y tht Liebermn Aien nhis collegues publishe pper in Science2 

escribing how the tool coul be use, forexmple, to ientify the verb tht hs regulr-ize the fstest: ‘chi’ n ‘choe’ to ‘chie’in some 200 yers (see ‘The fstest verb onthe plnet’). “We foun ‘foun’ 200,000 timesmore often thn we fine ‘fine’,” they wrote, with chrcteristic plyfulness. “In con-trst, ‘welt’ welt in our t only 60 timess often s ‘welle’ welle.” Interspersebetween the jokes were rel iscoveries —mny of which h nothing to o with verbs.By compring Germn n English texts fromthe first hlf of the twentieth century, the temshowe tht the Nzi regime suppresse men-

tion of the Jewish rtist Mrc Chgll, ntht the n-grms tool coul be use to ientify rtists, writers or ctivists whose suppressionh hitherto been unknown. Liebermn Aienn Michel clle their pproch culturomics, reference to the genomics-like scle of thebook tbse, n no to the future, whenthey hope tht more of the mei tht uner-pin culture — newsppers, blogs, rt, music— will be folded in.

In the first 24 hours fter its lunch, then-grms viewer (ngrms.googlelbs.com)receive more thn one million hits. DnCohen, director of the Roy Rosenzweig Centerfor History n New Mei t George MsonUniversity in Firfx, Virgini, clls the tool “gtewy rug” for the igitl humnities, fiel

tht hs been gining pce n funing in thepst few yers (see ‘A iscipline goes igitl’).

The nme is n umbrell term for pprochesthat include not just the assembly of large-scaledatabases of media and other cultural data, butlso the willingness of humnities scholrs toevelop the lgorithms to engge with them.“These tools re revolutionizing the wy wework n the kins of rguments we cn mke,”sys Dn Eelstein, historin t Stnfor Uni- versity in Cliforni, who hs use mppingsoftwre to show unexpecte ptterns in thewy tht Voltire’s letters spre through Europeduring the Enlightenment.

Yet some humanities researchers in the tra-itionl cmp complin tht their fiel cn

never be encpsulte by the frequency chrts

of wors n phrses prouce by the n-grmstool. “I think sying ll books equl the DNAof humn experience — I think tht’s very ngerous prllel,” sys Cohen. How o youfctor in the culturl contributions of furni-ture, or nce, or ticket stubs t movie hll, hesks. Wht bout ll the books tht were neverpublishe? Or the culture s experience by 

the world’s vast illiterate populations?Other scholrs hve eep reservtions boutthe igitl humnities movement s whole —especilly if it will come t the expense of tr-itionl pproches. “You cn’t help but worry tht this is going to sweep the eck of ll money for humnities everywhere else,” sys Anthony Grfton, historin t Princeton n presientof the Americn Historicl Assocition, whouses gint, gere wooen reing wheel tohelp him mnge his oversize, Renissncetexts. He wnts reserchers to hol onto thepower tht comes with intimtely knowingtheir primry sources, right own to the scrib-

ble notes in the mrgin tht woul elue thebook scnners. “You on’t wnt to give up whtis your own core activity,” he says.

FOLLOWING TRADITION

Bck t the Aien house, the Shbbt innerguests hve ll lve their hns with glss of wter n returne to the sun eck for mtzo-bll soup. Liebermn Aien explins some of the trepition he felt when he n Micheltlke to the historins t Princeton bouttheir work. “I ws little bit nervous going in,”he sys. “I relly thought tht we were going toget denounced at one point.”

Although Lieberman Aiden and Michel aresensitive to the feelings of tritionl humni-ties scholrs, they re lso too young, restlessn eeply mbitious to slow their own pur-suits. Liebermn Aien sys tht the influenceof technology on the humnities is lrey pst tipping point. The tools n methosthat it provides, he says, will be impossible forreserchers to ignore. An yet he oesn’t thinktht the ol pproches will ever ispper. “Ithink you shoul use the best methos vil-ble — n ll of them,” he sys. “An I thinktht inclues crefully reing texts n tryingto get behind what authors think.”

Dniel Koll, one of the inner guests, shyly 

interrupts. “Erez? Do you think you’re mybeprtly influence in tht kin of thinking by your religious upbringing? From my limiteoutsier’s perspective, Juism hs very strong interpretive component. There is nosingle uthority on text, n so on.” He won-ers whether Liebermn Aien, like ny goohumnities scholr, enjoys wrestling with thembiguities of religious texts s much s heenjoys cool, hard data.

Clerly, the nswer is yes — why else woulhis host hve spent yer of his life t YeshivUniversity in New York, stuying the Tlmun Jewish cse lw? But Liebermn Aien,

who prefers to tlk bout other people n their

“THESE TOOLS ARE REVOLUTIONIZING

THE WAY WE WORK.”

4 3 8 | N A T U R E | V O L 4 7 4 | 2 3 J U N E 2 0 1 1

FEATURENEWS

© 2011 Macmillan Publishers Limited. All rights reserved

Page 4: Word Play - Mathematics of Humanities

8/6/2019 Word Play - Mathematics of Humanities

http://slidepdf.com/reader/full/word-play-mathematics-of-humanities 4/5

HEAVY-DUTY DATA

BIG SCIENCE

BIG HUMANITIES

The computer-storage space required tosupport projects in the digital humanities isnow starting to rival that of big-science projects.

GENBANK530 GIGABYTESThis database, which stores publiclyavailable sequenced DNA, included127 billion bases at the latest count.

CULTUROMICS -GRAMS VIEWER300 GIGABYTES (English only)The string of letters in this corpus of5 million books is 1,000 times longerthan the human genome.

 YEAR OF SPEECH1 TERABYTEThis database includes recordingsfrom telephone conversations,

broadcast news, talk shows and USSupreme Court arguments.

UNIVERSITY OF SOUTHERNCALIFORNIA SHOAH ARCHIVE200 TERABYTESThis archive stores 52,000 videotapedinterviews with Holocaust survivorsfrom 56 countries.

SLOAN DIGITAL SKY SURVEY 50 TERABYTESThe survey, begun in 1998 using a2.5-metre telescope in New Mexico, hasdiscovered nearly half-a-billion asteroids,stars, galaxies and quasars.

1 petabyte = 1,024 terabytes = 1,048,576 gigabytes

LARGE HADRON COLLIDER13 PETABYTES (2010)

The proton collider, nearGeneva, Switzerland, generatesabout 15 petabytes of data per

year — even after rejecting

99.9995% of collisions.

The digital humanities — the use of algorithms to search for meaning

in databases of text and other media — have been around for decades.

Some trace the field’s origins to Roberto Busa, an Italian priest who, in

the late 1940s, teamed up with IBM to produce a searchable index ofthe works of thirteenth-century theologian Thomas Aquinas.

But the field has taken on new life in recent years. Journals have

sprouted up and professional societies are blooming. Some universities

are now requiring graduate students in the humanities to take statistics

and computer-science courses. Funding — far harder to come by in the

humanities than in the sciences — flows slightly more generously to those

willing to adopt the new methods. This year, the US National Endowment

for the Humanities, in collaboration with the National Science Foundation

and research institutions in Canada and Britain, plans to hand out 20

grants in the digital humanities, worth a total of US$6 million.

Many researchers in the digital humanities use textual databases

composed primarily of books — as Erez Lieberman Aiden does in his

‘culturomics’ project (see ‘Heavy-duty data’). Franco Moretti, a literary

scholar at Stanford University in California, has shown that genres of

fiction — Gothic novels, for example, or romance — have a textual

‘fingerprint’ that is apparent even in simple frequency counts of nouns,

verbs and prepositions. “These genres are different at every scale,” he

says, “not only in the huge scene of being held captive by a Count.”

Some researchers are busy digitizing other forms of cultural data.

 John Coleman, a phonetician at the University of Oxford, UK, is putting

5 million spoken words — about 3 months of speech, end to end — into

a database, down to the level of the individual phonemes. Collected

largely as recordings made with Sony Walkmen in the 1990s, it contains

all sorts of things typically ignored by linguists: neologisms, slurring

and sub-verbal honks and snorts. Coleman is already learning how

conversation partners take pacing cues from each other, and how pitch

of voice reflects attitude. And, he says, he can prove that women and

men talk at the same speed. The linguistics textbooks, he says, “are

going to have to be rewritten”.

Ichiro Fujinaga, a music technologist at McGill University in Montreal,

Canada, is trying to do something similar for music. In a project known

as SALAMI (Structural Analysis of Large Amounts of Music Information),

Fujinaga is finding the common structural patterns (such as verse–

chorus) in 350,000 pieces of music from all over the world. With more

than 7,000 hours of Grateful Dead recordings in the database, he says,

his team will be able to answer the all-important question: “Did the

guitar solos get extended over the years or did they get shorter?” E.H.

A DISCIPLINE GOES DIGI TALThe humanities mine cultural databases 

THE FASTEST VERB ON THE PLANETRarely used verbs regularize quickly; the n-grams viewerreveals that ‘chide’ has changed fastest of all.

Text analysis using the n-grams viewer shows theinltration of corporate speak into the English language.

   F  r  e  q  u  e  n  c  y   (   %   o

   f   t  o   t  a   l      n -  g  r  a  m  s  ×   1   0  −   5   )

1800 1850 1900 1950 2000

10

5

0

1900 1920 1940 1960 1980 2000

0.2

0.4

0.6

0.8

1

0

THINK OUTSIDE THE BOX

Chided

Chid/chode

Think outside the box

Incentivize

Strategize

Synergize

Touch base

Goal oriented

   S   O

   U   R   C   E  :   R   E   F .   2

   S   O   U   R   C   E  :   G   O   O   G   L   E   N   G   R

   A   M   S   V   I   E   W   E   R

2 3 J U N E 2 0 1 1 | V O L 4 7 4 | N A T U R E | 4 3 9

© 2011 Macmillan Publishers Limited. All rights reserved

Page 5: Word Play - Mathematics of Humanities

8/6/2019 Word Play - Mathematics of Humanities

http://slidepdf.com/reader/full/word-play-mathematics-of-humanities 5/5

ies rther thn himself, provies n inirectnswer by wy of history. He tells the story of Isc Csubon, sixteenth-century Protes-tnt scholr, who unermine the presumeEgyptin provennce of set of religious textsby ientifying reference to Greek ply onwords — something that could only have beenwritten hunres of yers lter. “Tht point is s

objective n interpretive remrk s ny remrk scientist might mke,” sys Liebermn Aien.“So the methos of humnists re very, very formible. An I think the egree of insecu-rity they hve over whether these methos rehere to stay is not really befitting.”

TWO CULTURES

From the y he ws born in New York City hospital, Lieberman Aiden was steeped in thecultures of both lnguge n technology. Theson of Hungrin mother n Romninfther, both émigrés by wy of Isrel, Lieber-

mn Aien grew up in community of StmrJews, brnch of Hsiic Orthoox Juism.English ws his thir lnguge, fter Hungr-in n Hebrew, n by the ge of nine, he wshelping his fther, self-tught inventor, withthe English contrcts for the fmily sw-mnu-fcturing business. Liebermn Aien stuiet religious high school in Brooklyn, but soonfoun tht vieo gmes hel more llure. Inhis second year there, he found himself flunk-ing clsses, n his iction to X-COM: UFODefense ws consuming so much time tht heeventully h to quit, col turkey. “It ws namazing game, actually,” he says, ruefully.

Liebermn Aien soon foun more eify-ing outlets for his energies: he ws lloweto skip school one y week to stuy in moleculr-biology lb t Brooklyn College,n he begn his own computer-repir busi-ness. The fmily ws quite seculr by Hsiic

stnrs, going to syngogue only for theHigh Holiys of Rosh Hshn n YomKippur. One y in high school, he went toBurger King for his usul bcon cheeseburger,and decided to respect kosher rules by forgo-ing the bcon — not relizing tht mixingiry n beef in the cheeseburger itself wsnot kosher at all.

Liebermn Aien went to Princeton s nunergrute, where he wsn’t content tostuy just mths n physics, but lso fulfillell the requirements for philosophy egree.An even s he took five, six, seven clsses term, he mnge to squeeze in cretive-writ-

ing courses, specilizing in hiku poetry. While

t Yeshiv University fter Princeton, he tughtmths on the sie to py for mster’s egree inhistory, n complete the first yer of rbbini-cl stuies. “He’s non-conformist by esignn he revels in it,” sys his Tlmuic stuy prtner, Avi Bossewitch. An yet, he sys, “he’sthe least arrogant person I’ve ever met”.

The llure of science eventully prove too

strong. Liebermn Aien left Yeshiv to begina PhD at the Broad Institute of the Massachu-setts Institute of Technology (MIT) n Hr- vr in Cmbrige, uner the supervision of fme geneticist Eric Lner. But even whilemstering moleculr biology, he put his mthsskills to use. He relize tht knot-free shpeescribe in 120-yer-ol mths pper — frctl globule — coul escribe the wy tht the 2-metre-long humn genome folsup into the cell nucleus, spce one milliontimes smller. He then evelope protocol toprove tht this ws true. The results, publishe

in the first of his ppers to mke it onto thecover of Science3, showe tht the frctl glob-ule allowed widely separated sections of DNAto unfol n interct. “There re no bouns towht he cn be intereste in,” sys Lner, who,like others, suspects tht culturomics mighteventually be merely a sideline for LiebermanAiden in his ascent in mathematical biology.

It ws uring his time in the Lner lb ththe met Aviv Presser, shy young womnfrom Los Angeles who ws lso workingtowrs her PhD. They mrrie in 2005, withLner giving one of the blessings. Rtherthn one of them tking the other’s nme,they ecie to shre common new nme:Aien, which mens Een in Hebrew n, inGaelic, little fire.

Those fires stoke, they were fce withnother nming issue lst June: wht to clltheir son. The working title, in utero, ws

Sneley Blgn (Blgn mens ‘fisco’ inHebrew). But they soon settle on GbrielGalileo Aiden. Presser Aiden says that no onebelieves them when they sy tht they h noie tht their son’s initils forme the DNAcode for a sweet amino acid.

WORK AND PLAY

Towrs the en of the Shbbt inner, Gbrielecies to wke up, ing to Presser Aien’sobvious exhustion. But her husbn oesn’twant to miss out on the highlight of the night,n Aien-fmily stple: Dessert Fce-off, inwhich each guest competitively designs a slice

of brownie within theme. Given tht guests

Koll n his girlfrien, Lriss Zhou, re inter-este in moleculr gstronomy, LiebermnAien ecies on the theme of foo science.A box of eible Betty Crocker ecortions islaid out.

Koll turns his brownie into the cross-sectionof wok. Zhou trnsforms hers into pig —efinitely not kosher. But Liebermn Aien’s

construct is perplexing: he hs crete strry night-time skyscpe, with multicoloure sprin-kles as stars and the Milky Way. What does thishve to o with foo science? “Well, you know,”he sys, little prou of himself, “gstronomy is just one letter away from astronomy.”

Gbriel hs returne to be, n PresserAien looks rey to follow, but her husbnisn’t quite finishe yet. As minight pproches,Liebermn Aien is holing forth on the mth-emticl beuty of rmen nooles — s epictein the 2008 film Kung Fu Panda — n remrk-ing on piece of mths softwre, KnotPlot, thtcoul help his wife to bri some relly ri-

cl chllh bre. The guests finlly o PresserAien fvour n excuse themselves to gohome.

But Liebermn Aien just might keep rev- ving n riffing through the night: he cn workfor 70–80 hours straight, fuelled by Diet Coken junk foo. He hs big plns for cultur-omics, s he n Michel more lnguges,books n other mei to the n-grms t-bse. He lso hs new projects to think over,such s one currently uner wy with EBoyden, a young, acclaimed neurobiologist atMIT, in which the pir re eveloping wy to etect the genes expresse in thousnsof iniviul cells t time. But tonight ntomorrow he’ll keep his computer off, lthoughit is not religious conviction tht mkes himbie by the Sbbth rules. Doing so forces himto etch, cler his min n go for wlks inthe park with his wife and son.

An yet the bounry between work nplay — just like that between the sciences andthe humnities — is not one tht LiebermnAien respects. Tht might just be wht mkeshim successful, sys Lner. For centuries, thebest science hs come from the most plyfulscientists, he says. Think of Watson and Crickshirking the lb in fvour of tennis; think of Einstein and his wild-haired bike rides.

“What do children do?” says Lander. “They lern, they’re curious, they’re stimulte. Theproblem is, t some point, mny people get in rut. They’re not relly intereste in lerningmore. They’re not ble to be fscinte ndelighted by everything around them. Erez —he hsn’t lost the plyfulness.”■ SEE EDITORIAL P.420

Eric Hand is a reporter for Nature inWashington DC.

1. Lieberman, E., Michel, J.-B., Jackson, J., Tang, T. &

Nowak, M. A. Nature 449, 713–716 (2007).

2. Michel, J.-B. et al. Science 331, 176–182 (2010).

3. Lieberman-Aiden, E. et al. Science 326, 289–293

(2009).

“HE’S A NON-CONFORMIST BY DESIGN AND HE RE VELS I N IT.”

4 4 0 | N A T U R E | V O L 4 7 4 | 2 3 J U N E 2 0 1 1

FEATURENEWS

© 2011 Macmillan Publishers Limited All rights reserved