64
Providing Internet Search for Low-Connectivity Regions Bill Thies Computer Science and Artificial Intelligence Laboratory Massachusetts Institute of Technology Princeton University March 6, 2007

Providing Internet Search for Low-Connectivity Regions Bill Thies

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Providing Internet Search forLow-Connectivity Regions

Bill Thies

Computer Science and Artificial Intelligence LaboratoryMassachusetts Institute of Technology

Princeton UniversityMarch 6, 2007

Global Development Challenges

Yes.

• 1.2 billion people live on less than a dollar a day

• 1.1 billion lack access to clean water

• 800 million people are going hungry

• 8,000 people die of HIV/AIDS every day

• … child mortality, primary education, maternal health …

• Can technology research make a difference?

Yes.

Impact of Technology: Examples

• Before computers: Green Revolution (1940s – 1960s)– High-yielding seed varieties + irrigation, pesticides, fertilizer

– Cereal production doubled in dev. countries (1961-1985)

• Cell phones– Saves costly trip to town

• In Bangladesh, 3-10% of monthly income saved per call

– Farmers get better prices; plan shipments better

– Families facilitate remittances with relatives

• Software industry– Macroeconomic impact in India

Source: N. Cohen, “What Works: Grameen Telecom’s Village Phones”

Technology Researchfor the Developing World

• Most research falls in intersection (II)– Invented for developed areas, later applied in developing

areas

– Examples: computers, cell phones, Internet, …

– Many people are working in this area

Research BenefittingDevelopedWorld

Research BenefittingDevelopingWorld

I II III

Technology Researchfor the Developing World

• How to target developing world (III)?1. Identify trend, constraint or opportunity that is present in

developing world, but not in developed world2. Invent technology to exploit or compensate for the trend3. Result would not have been invented in developed world

• Strive for deep, novel, long-term research innovations– Not merely engineering– Fewer people working in this area large potential impact

Research BenefittingDevelopedWorld

Research BenefittingDevelopingWorld

I II III

In This Talk: Three Directions

1. Internet search

2. Data compression

3. Personable computing (future work)

In This Talk: Three Directions

1. Internet search

2. Data compression

3. Personable computing (future work)

Percentage of Country’s Population Online

Internet Users Worldwide

Copyright 2004, Matthew Zook. Data Source: ClickZ Stats.

Barriers to Internet Access• Infrastructure

– Limited phone lines– Low-bandwidth international links– Unreliable power supplies

• High costs– Computer unaffordable or unavailable– ISP, telephone costs can exceed local wage– Exacerbated by slow connections

• Social barriers– Illiterate or non-technical users– Lack of local content

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Nigeria

Malawi

Kenya

Papau N

ew G

uinnea

Sri Lan

kaViet

nam

Nepal

Macedo

niaPakis

tanBan

gladesh

Ecuado

rGeorg

iaMexic

o

Bosnia

and H

erzeg

ovina

Kazahk

stan

Zimba

bweAlban

iaInd

ones

iaInd

iaPhillip

innes

South

Africa

Iran

Roman

iaBraz

ilColom

biaCroa

tiaArge

ntina

Bulgaria

Australi

a

United S

tates

Austria

Czeck

Rep

ublic

Saudi A

rabia

Cost of Dial-up Internet Accessas a Fraction of Household Income

240% 140%

Sources: ISP Websites 2005, UNDP Development Report 2004, WorldBank 2003

• Monthly Internet access: 30 hours – Or unlimited rate (if cheaper)

• Monthly household incomederived from GNP per capita

– Ignore richest 20% of population– Household size: 2.5

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Nigeria

Malawi

Kenya

Papau N

ew Guin

neaSri L

anka

VietnamNepa

lMac

edonia

Pakist

anBang

ladesh

Ecuado

rGeorg

iaMex

ico

Bosnia

and H

erzeg

ovina

Kazahk

stan

Zimbab

weAlba

niaInd

ones

iaInd

iaPhillip

innes

South A

frica

Iran

Romania

Brazil

Colombia

Croatia

Argenti

naBulga

riaAustr

alia

United S

tates

Austria

Czeck

Rep

ublic

Saudi A

rabia

Full Internet

Email only

Cost of Dial-up Internet Accessas a Fraction of Household Income

240% 140%

Sources: ISP Websites 2005, UNDP Development Report 2004, WorldBank 2003

• Monthly Internet access: 30 hours – Or unlimited rate (if cheaper)

• Monthly household incomederived from GNP per capita

– Ignore richest 20% of population– Household size: 2.5

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Nigeria

Malawi

Kenya

Papau

New

Guin

nea

Sri Lan

kaViet

nam

Nepal

Maced

onia

Pakist

anBan

glade

shEcu

ador

Georgi

aMex

ico

Bosnia

and H

erzeg

ovina

Kazah

kstan

Zimba

bwe

Albania

Indon

esia

India

Phillip

innes

South

Africa

Iran

Roman

iaBraz

ilColo

mbiaCroa

tiaArge

ntina

Bulgari

aAus

tralia

United

States

Austria

Czeck

Rep

ublic

Saudi

Arabia

Full Internet

Email onlyor at night

Cost of Dial-up Internet Accessas a Fraction of Household Income

240% 140%

Sources: ISP Websites 2005, UNDP Development Report 2004, WorldBank 2003

• Monthly Internet access: 30 hours – Or unlimited rate (if cheaper)

• Monthly household incomederived from GNP per capita

– Ignore richest 20% of population– Household size: 2.5

TEK: Email-Based Search

Solution has two components:

1. Transfer all data through email, not http- Connect only to send/receive email, not to browse web

2. TEK Server optimizes for bandwidth requirements

TEK: “Time Equals Knowledge”

user

ISP

Email

WorldWideWeb

TEKServer

TEK Client

• Implemented as an HTTP Proxy Server bundled with a custom version of Firefox

• When offline, users can:– Search and browse old results as if connected

– Enqueue queries for new results or missing pages

• When online, users can:– Send pending queries

– Receive new results (attached to standard emails)

TEKServer

WorldWideWebuser

ISP

Email

TEK Server

user

ISP

EmailTEK

Server

WorldWideWeb

• Queries Google or Scirus for relevant pages

• Returns filtered content of ˜20 pages to user– Remove images– Remove junk HTML (JavaScript, colors, meta tags, etc.)– Convert PDF, PS to HTML

• Compress pages, send as single attachment– Overall size reduction: 5-10X on HTML, 10X on PDF

• Maintain server image of client page cache– Avoid sending duplicate pages

Usability of TEK• Viewing results offline: quick, reliable

– Establish local database of shared information

• Connection time is shorter– In school: time-share Internet line with voice

• Manageable amount of information

TEK Users

• TEK available on SourceForge and via free CD

• Most active users in partner organizations– People’s First Network - United Villages

People’s First Network

• Solomon Islands served by HF Radio Network

• Email only

Source: http://www.peoplefirst.net.sb/General/PFnet_Update.htm

People’s First Network• TEK installed: $0.65 per query from kiosk

– Compare to $0.25 per email, $0.65 to type one page– $1.30 / hour for operator assistance browsing results– Contributes to kiosk sustainability

• Many applications reported1. Farmers – information on diseases; networking

2. Teachers – environmental impact of local logging3. Pastors – downloading sermons4. Entrepreneurs – download / sell lyrics5. General – health, education, sports, entertainment

Subsistence farmers on Rennell have obtained advice concerning taro diseases affecting their crop. Via the 'TEK-websearch' facility, one group of farmers was able to access detailed technical information about vanilla farming and to communicate with a specialist from the Kastom Gaden Association. -- Chand et al., PFNet Case Study, 2005

United Villages• Store-and-forward connectivity via Mobile Access Point

United Villages• Store-and-forward connectivity via Mobile Access Point

Mobile access point

United Villages

Source: www.firstmilesolutions.com

• Store-and-forward connectivity via Mobile Access Point– India, Costa Rica, Cambodia, Rwanda, Paraguay

• TEK provides only Web access

Applications in Developed Countries

• Airplanes– Tenzing supplies email-only connection ($10-$20)

• Continental Airlines, United Airlines, US Airways

– ~2.4kbs satellite link for entire plane1

• Mobile phones

• ISPs charge for bandwidth (Australia)

• Conservative religious sects2

• Anxiety about browser security2

1 http://www.pcworld.com/news/article/0,aid,114216,00.asp2 anecdotes from EmailWeb users

In This Talk

1. Internet search

2. Data compression

3. Personable computing (future work)

Cost of Storage

$0

$1

$10

$100

$1,000

$10,000

$100,000

$1,000,000

1980 1985 1990 1995 2000 2005 2010

Year

Cos

t per

GB

$0.10

1GB of Bandwidth

(USA)

1GB of Bandwidth(Papau New Guinnea)

How much more expensiveto download than to store?- USA: 3x- PNG: >1000x

State-Based Compression

TEKServer

http://www.hivnetnordic.org/index.shtmhttp://www.acnl.net/Basic_HIV_&_AIDS_Info.htmhttp://www.info.gov.hk/health/aids/http://www.cdc.gov/hiv/graphics/women.htmhttp://www.med.unsw.edu.au/nchecr/http://www.utopia-asia.com/aids.htmhttp://www.sfaf.org/aids101/hiv_testing.htmlhttp://www.gaytoronto.com/casey/

http://www.aidsinfo.nih.gov/

http://www.hivnetnordic.org/index.shtmhttp://www.acnl.net/Basic_HIV_&_AIDS_Info.htmhttp://www.info.gov.hk/health/aids/http://www.cdc.gov/hiv/graphics/women.htmhttp://www.med.unsw.edu.au/nchecr/http://www.utopia-asia.com/aids.htmhttp://www.sfaf.org/aids101/hiv_testing.htmlhttp://www.gaytoronto.com/casey/

• Idea: use client-side storage reduce bandwidth requirements

• If server knows everything stored on client, can it improve compression of search results?

State-Based Compression• General problem:

If two parties share a large dictionary, can they reduce communication bandwidth?

• In general: no– info content (index) = info content (entry)

• In practice: maybe– Space of inputs is not uniformly populated

• E.g., many images are text, bullets, smileys, patterns

– Lossy: send index of closest match in dictionary– Lossless: send exact diff from dictionary entry

Photo Mosaics

• Mosaic: picture made of other pictures

1. Break image into cells

2. Match each cell against image library• Use wavelet decomposition for perceptual match

Mosaic Compression(Samidh Chakrabarti 2002)

• Idea: server constructs mosaic from client images– Send pointers to image components, not image data

• Experiments– Image library: 4096 images from Wikipedia

– Use shareware PhotoMosaic software (BlackDog)

78

2315

Wikipedia JPEG: 46 Kb Mosaic: 2.0 Kb(22X smaller)

2.0 Kb JPEG Mosaic: 2.0 Kb

2.0 Kb GIF Mosaic: 2.0 Kb

Compressing Landscapes

JPEG Image: 52 Kb Mosaic: 1.6 Kb(33X smaller)

Compressing Landscapes

1.6 Kb GIF Mosaic: 1.6 Kb

Importance of Small Images• Most bandwidth spent on small images!

File Size (Kb)

100%

80%

60%

40%

20%

0%Fra

ctio

n of

Tot

al B

andw

idth

10 20 30 40 50 60 70 80 90

- Source: Chakrabarti’02- 42,684 images from sites in Google programming contest- 5,540 images from 1,000 most popular sites (ZDNet)

60% of bandwidthon images < 10Kb

Text recognition?Icon substitution?

Compressing Logos

Original GIF: 4 Kb Mosaic: 0.8 Kb(5X Smaller)

Compressing Logos

Mosaic: 0.8 Kb(5X Smaller)

3.7X Smaller GIF: 0.8Kb

• Many avenues for improvement– What is the best image library?

– Impact of smoothing, rotation, diffs?

– Edge detection + texture mapping• Lossy compression of edges

• Random noise for realism

• In current form, perhaps useful as a preview– 5-33X smaller than JPEG

– More entertaining than ALT tag or blurry picture

What’s the Verdict?

In This Talk

1. Internet search

2. Data compression

3. Personable computing (future work)

Personal Computing Trends

0.001

0.01

0.1

1

10

100

1980 1985 1990 1995 2000 2005Year

Dev

ices

per

100

Peo

ple

Personal computers (India) Personal computers (USA) Personal computers (Ghana)

Cell phones (India) Cell phones (USA) Cell phones (Ghana)

Source: International Telecommunications Union

Personal Computing Trends

0.001

0.01

0.1

1

10

100

1980 1985 1990 1995 2000 2005Year

Dev

ices

per

100

Peo

ple

Personal computers (India) Personal computers (USA) Personal computers (Ghana)

Cell phones (India) Cell phones (USA) Cell phones (Ghana)

Source: International Telecommunications Union

Personal Computing Trends

0.001

0.01

0.1

1

10

100

1980 1985 1990 1995 2000 2005Year

Dev

ices

per

100

Peo

ple

Personal computers (India) Personal computers (USA) Personal computers (Ghana)

Cell phones (India) Cell phones (USA) Cell phones (Ghana)

Source: International Telecommunications Union

Personal Computing Trends

0.001

0.01

0.1

1

10

100

1980 1985 1990 1995 2000 2005Year

Dev

ices

per

100

Peo

ple

Personal computers (India) Personal computers (USA) Personal computers (Ghana)

Cell phones (India) Cell phones (USA) Cell phones (Ghana)

1.4 billion people have a cell phone,but not a computer

Source: International Telecommunications Union

Cell Phones will Replace Computersas De-Facto Personal Device

• Trend applies everywhere in the world

• But developing countries will get there first,and they have unique constraints:– Devices must stay cheap

– Devices must cater to illiterate

“Personable computing”

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Storage

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Storage

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Storage

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Memory

CPU Television

Display

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Storage

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Memory

CPU Television

Display

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Television

Display

CPU

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Television

Display

Memory

CPU

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Television

Display

Memory

CPU

Storage

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Television

Display

Memory

CPU

Storage

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Television

Display

Pay with time

Memory

CPU

Storage

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Television

Display

Pay with time

Memory

CPU

Storage

Computer:

Storage

Keyboad

Mouse

Networking

Storage

Memory

CPU

Display

Base Station

Enhanced Functionality at “Zero” Cost?

Cell Phone

Keyboad

Mouse

Networking

Storage

Optional

Television

Display

Pay with time

Memory

CPU

Storage

Application: Scalable Distance Education

• Content distribution is scalable today– Broadcast TV

• Need to scale feedback & evaluation– Self-Guided Tutorials

– Evaluation Tools

– Tests

– Chat Room

• Research aspects:– User interfaces

– Distributed file systems

– Heterogeneous systems

Television

Base StationCell Phone

TV Station

Programs

Feedback

Catering to the Illiterate

• Voice becomes the natural user interface– Even for literate, voice-driven devices cheaper

• Unlike developed world, voice used for content generation as well as content retrieval

• Ideas:– Design an audio wiki (audio web?) for local content

• Automatically adapts to local language

– Index and search voice databases without text intermediates

– Social networking from your cell phone?

Related Work• Delay-tolerant networking

– TierStore (Demmer, Brewer et al.) - SeNDT (Geraghty et al.)

– ZebraNet (Martonosi et al.) - EmailWeb (Griswold)

– Postmanet, Digital Study Hall (Wang et al.) - Interplanetary Internet

• Extreme compression– Space-optimized texture maps (Balmelli et al.)

• Low-cost computing– FonePlus (Microsoft)

– PCtvt (Reddy)

– One Laptop per Child (Negroponte et al.)

• User interfaces for emerging markets– Camera-equipped mobile phone (Parikh)

– Text-free user interfaces (Medhi et al., MSR India)

– Multi-mouse (Pawar et al., MSR India)

Conclusions• Technology research for the developing world:

1. Identify unique trends and opportunities

2. Address with new and appropriate technology

• Community of researchers is small but growing!

Distributed computer (Cell/TV/Base Station)

Cheap cell phones in place of PCs

Audio interfaces and content generation

Illiterate users

State-based compressionStorage cheaper than bandwidth

Email-based searchEmail cheaper than Web access

Research DirectionTrend / Opportunity

TEK Team• Current

– Prof. Saman Amarasinghe - Bill Thies

• Previous participants– Libby Levison (PI)– Samidh Chakrabarti– Tazeen Mahtab– Genevieve Cuevas– Saad Shakhshir– Janelle Prevost– Mark Halsey

– Marjorie Cheng– Hongfei Tian– Damon Berry– Bihn Vo– Sheldon Chan– Sid Henderson– Alexandro Artola