Transcript
Page 1: March 6, 2009 Page 6 HashCache is hot - KAISTkyoungsoo/HashCache_article.pdfgies at Princeton. When tabla! wrote to him at 5pm (Singapore time), he replied immediately. He was in the

PATRICK JONAS

E VERY year, the Massachusetts In-stitute of Technology’s (MIT) Tech-nology Review magazine comes

up with its annual list of 10 top emerg-ing technologies.

This year, there are Indian names inthe list.

One of the inventions that made thelist is the result of work done by a teamof researchers led by Prof Vivek Pai atPrinceton University. It is called Hash-Cache and has drawn recognition as arevolutionary way to expand Internetaccess around the world.

“HashCache is a system that storesWeb content to disk, and reuses it whenpossible, so that your Internet access ap-pears to be much faster,” Prof Pai toldtabla!.

It has the potential to expand Inter-net use in developing regions aroundthe world, taking advantage of cheapercomponents like large disks.

To understand how HashCacheworks, see report below.

There is another Indian face in ProfPai’s team. Mr Anirudh Badam is a grad-uate student at Princeton who leads theproject, working closely with Prof Pai.The other members of the team areProf Kyoung Soo Park, now at the Uni-versity of Pittsburgh, Prof Larry Peter-son, the department chair at Princeton,and research scientist Marc Fiuczynskiwho helps arrange and coordinate thetest deployments in various parts of theworld.

The new system is currently beingtested at the Kokrobitey Institute in Gha-na and Obafemi Awolowo University inNigeria.

Prof Pai was born in Mumbai andmoved to the United States with his par-ents in 1970, when he was just twoyears old. He married an Indian and of-ten visits India with her.

“Part of the motivation for looking at

technologies for the developing worldwas my experience when I was last in In-dia, trying to get my in-laws’ computerworking again,” said Prof Pai.

Added Mr Badam: “I was very excit-ed and enthusiastic right from the dayVivek first proposed that we find a solu-tion to this challenging problem. The re-sult of our efforts in thisdirection is HashCacheand we are all very excit-ed about it.”

The Hyderabad na-tive is a third-year PhDstudent at Princeton. Hemoved to the US in2006 after doing his de-gree in computer sciencefrom the Indian Instituteof Technology in Chen-nai.

Mr Badam has beeninterested in high performance serverssince his days at the IIT. “Being from adeveloping country, I have had firsthand experience of using slow and inter-mittent Internet connections. Never un-til I came to the US did I realise whatgood Internet actually is. This was also amotivating reason behind me being veryenthused to solve this problem,”

Mr Badam told tabla!.He added that Prof Pai was previous-

ly involved in developing the world’sfastest Web server (Flash Web Server)and later in the development of theworld’s fastest Web proxy cache (iMim-ic Web Proxy Cache). “So, once I gotthe opportunity to work with him, there

was no looking back,”he said.

The Technology Re-view presented the listin New Delhi on March2 at the inauguralEmTech India confer-ence which Prof Pai un-fortunately could not at-tend. “We recently hada new baby, and I’m inthe middle of the schoolsemester, so my travel isa bit constrained at the

moment,” he said.What about Mr Badam?He is busy working on new technolo-

gies at Princeton. When tabla! wrote tohim at 5pm (Singapore time), he repliedimmediately. He was in the lab, at 4amin Princeton!

[email protected]

HIMACHAL PRADESH------------------------------

Schools dump painterHussainTHE Himachal PradeshBoard of SchoolEducation, on therecommendation of acommittee, has decided todo away with a chapter onnoted painter M.F.Hussain, reported DNA.

Board chairmanChaman Lal Gupta saidthe chapter on Hussain“which has nothing toinspire students” would bereplaced with one onpainter Sobha Singh andformer Indian presidentA.P.J. Abdul Kalam. Hesaid Singh would makemore sense to students inHimachal Pradesh becausehe was based in the stateand incorporatedHimachali culture in hispaintings.

GUJARAT------------------------------

Reliance createsrefinery giantIN THE largest evermerger of business units inIndia, Reliance Industries(RIL) on March 2 decidedto merge its subsidiaryReliance Petroleum Ltd(RPL) with it, through ashare swop of one RILshare for 16 RPL shares.

RIL’s chief financialofficer Alok Agarwal toldthe media that, with acombined refiningcapacity of 1.24 millionbarrels per day, themerger will create theworld’s largest refinerycomplex at any singlelocation.

TAMIL NADU------------------------------

GPS for ChennaibusesTHE MetropolitanTransport Corporation isfitting 600 buses with GPS(Global PositioningSystem). By the end ofthis month 300 of themwill be fitted and the restwill get it by the end ofApril. The GPS will helppassengers waiting atbus-stops to be updated onthe expected arrival timeof buses.

Most of these buseswould be operated onAnna Salai andPoonamallee High Road,reported The Hindu.

WE ASKED Prof Vivek Pai to explainit simple terms how HashCache works.This is his explanation:

“Let’s say that you want to visit aWeb page, likehttp://www.cs.princeton.edu/~vivek

To your browser, that page actuallylooks like many different pieces, whichhave names likehttp://www.cs.princeton.edu/~vivek/-index.html

http://www.cs.princeton.edu/~viv-ek/nsg.css

Each of those pieces is called anobject. So, to build that page, yourbrowser is sending multiple requests,one for each object. If anothercolleague at your paper also visited the

page, his browser would also send thatsame sequence of requests. All thattraffic would flow from the US toSingapore twice.

What a cache does is store eachobject as it comes in, and then checksto see whether a request can besatisfied from the object it has stored,instead of being re-fetched all the wayfrom the US.

The way that caches typicallyoperate is that they have to store eachobject somewhere on disk. Since it’shard to work with long names likehttp://www.cs.princeton.edu/~vivek-/index.html the cache reduces it to anumber, say 125371242. This processis called hashing. However, it then hasto keep in RAM some information that

says that this particular file is located ata given location on the disk. Thatportion is called the index. As the sizeof your disk grows, the size of thisindex also grows.

What HashCache does is to get ridof this index. Instead, once it calculatesthe hash value for a file, it uses thatsame number to determine whatlocation on the disk it should use tostore that file.

There are some tricks involved andsome complications, but that’s the basicidea. By eliminating this mapping,HashCache does not need to keep a lotof information in RAM. Since it’s easierto buy lots of disks rather than lots ofRAM, HashCache allows you to buildcaches using much cheaper systems.

HashCache is hotPrinceton Universityinvention makesInternet access faster

Cache them ifyou can...(from left)Mr Badam,Prof Peterson,Prof Pai andMr Fiuczynski.PHOTO:PRINCETONUNIVERSITY

How you can cache in...

Once I got theopportunityto work withProf Pai, there

was no lookingback.

– Mr Anirudh Badam

regionalroundup

INDIA tabla! March 6, 2009 Page 6

Recommended