An Experiment To Characterize Videos On The Web Soam Acharya Brian Smith Cornell University MMCN...

Preview:

Citation preview

An Experiment To Characterize Videos On

The Web

Soam Acharya

Brian Smith

Cornell University

MMCN 1998

Overview

• Designed and implemented an experiment to search and analyze videos on the web

• 22500 HTML documents

• 57000 movies

• 100 Gbytes of data

www

www

www

www

Why?

• Codec Designers

• Network Engineers

• Other Multimedia Researchers• MM file systems

• Webmasters

• How many movies are out there?

• What are their basic properties?

• What compression formats are popular?

• How well do the formats compare?

• Are standard modem rates enough?

Questions We Asked

Not all that many. We found 57,000.

90% last 45 seconds or less. 1.1 Mbytes is their median size

QuickTime is about 53%, followed by MPEG (30%) and AVI

MPEG compresses best. QuickTime and AVI are similar.

28.8 - 128 Kilobits/sec (Kbps) are useless for real-time download and display of movies.

Roadmap

• Data Collection Methodology

• Analysis

• Results

• Conclusion

• Future Work

• Open Questions

Data Collection Methodology

• Hunting Phase– get links to movies

• Gathering Phase– download movies and gather raw statistics

• Sifting Phase– eliminate outliers

Early April 1997 -Hunting Phase

• Milked AltaVista for documents dated– January 1995 - March 1997

• looked for MPEG, QuickTime, AVI• no streaming video format

Gathering Phasemid April 1997 - May 1997

LP11. http://www.eg.com/movie.html

LDG: movie link distributor/gathererLP: link processor

www.eg.com

2. movie.html

www.vid.com

3. my.mov4. summary statistics

LP0

LP2

LDG

Http://www.eg.com/movie.html

http://www.cnn.com/pepe.html

…..

Sifting Phase

• Processed 100 Gbytes of data and 57,000 titles– used mpegstat and modified xanim

• 4 < frames/sec < 40 {5000 titles}

• duration > 0.5 seconds {1000 titles}

• 0.6 < aspect ratio < 1.667 {1000 titles}

• bitrate < 10 Mbps {1000 titles}– bitrate = (movie size)/(movie duration)

• duplicate URL detection {1500 titles}

Analysis• 47500 titles remained

– 53% QuickTime, 30% MPEG, 17% AVI

• Can be divided into two categories– Distributions:

• by date• fps• size• duration• aspect ratio• bitrate

– Comparing movie formats against each other

Roadmap

• Data Collection Methodology

• Analysis

• Results

• Conclusion

• Future Work

• Open Questions

Movie Growth

0

500

1000

1500

2000

2500

3000

3500Ja

n-94

Apr

-94

Jul-9

4

Oct

-94

Jan-

95

Apr

-95

Jul-9

5

Oct

-95

Jan-

96

Apr

-96

Jul-9

6

Oct

-96

Jan-

97

Apr

-97

Month

Nu

mb

er

of

mo

vie

s

Breakdown of Movie Growth By Type

0

200

400

600

800

1000

1200

1400

1600

1800

2000

Jan-94

Apr-94

Jul-94

Oct-94

Jan-95

Apr-95

Jul-95

Oct-95

Jan-96

Apr-96

Jul-96

Oct-96

Jan-97

Apr-97

Month

Nu

mb

er o

f m

ovi

es

QuickTime

MPEG

AVI

FPS Distribution

0

2000

4000

6000

8000

10000

12000

4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31

Frame Rate

Nu

mb

er o

f m

ovi

es

AVI

MPEG

QuickTime

Movie Size (In bytes)

• 70% of movies are 2Mbytes or less

• Median movie size is about 1.1 MBytes

90% of the movies are 45 sec or less, 50% < 15 sec

Overall Duration Distribution

0

2000

4000

6000

8000

10000

12000

5 15

25

35

45

55

65

75

85

95

10

5

115

Length (in seconds)

Nu

mb

er o

f M

ovi

es

Aspect Ratio

• 74% of all files had an aspect ratio of 1.333– 320 x 240– 160 x 120

• 89% had aspect ratios of 1.2 - 1.5

• Movie Bitrate = movie size / movie duration

Overall Average Bitrate Distribution

0

1000

2000

3000

4000

5000

6000

28

.8

30

0

70

0

110

0

15

00

19

00

23

00

27

00

31

00

35

00

39

00

70

00

Mo

re

Kbits/sec

# o

f m

ov

ies

0%

10%

20%30%

40%

50%

60%

70%80%

90%

100%

Frequency

Cumulative %

So Far ...

• Distributions:– by date– fps– size– duration– aspect ratio– bitrate

• Comparing movie formats

AVI/QuickTime Comparison

Video Codecs AVI QuickTime

Radius Cinepak 43% 60%Intel Indeo R3.2 25% 2%Microsoft Video I 26% 0%Apple Video-RPZA 0% 22%

• 25% of AVI, 33% of QuickTime: video only

AVI QuickTimeAudio Codec PCM PCM

MS-ADPCM TWOS

How Compare Compression?

• Bits/pixel = (video size in bits)__

(width * height * # of frames)

Mean Median (bits/pixel)

AVI 2.51 2.14QT 2.16 1.82MPEG 0.72 0.51

MPEG Bits/pixel Distribution

• Size of I:P:B frames ~ 1: 2 : 5

• 90% of MPEG files were video only

Frame Type Mean bits/pixel Median bits/pixel

I 1.25 1.10P 0.76 0.54B 0.31 0.19

MPEG Frame Patterns

Frame Pattern % Distribution Mean bits/pixel

I 27.1 1.17IBBPBB 15.7 0.7IBBPBBPBBPBBPBB 10.4 0.31IBBPBBPBBPBB 8.1 0.5IBBBPBBBPBBB 4.4 0.66IPBBIBB 4.2 0.39IIP 3.5 0.7

80% of MPEG: some recurring pattern

Recap• Number of movies coming online - exponential, then

flat• MPEG higher fps, QuickTime/AVI lower• Median size of movies: 1.1 Mbytes• 90% of movies last 45 seconds or less• 1.333 is the most common aspect ratio• 28.8 - 128 Kbps modem rates useless for real-time

downloads• Radius Cinepak is widely used by QuickTime and AVI• MPEG compresses better than QuickTime and AVI• 80% of MPEGs have some sort of recurring pattern

Conclusion• Existing compression technologies not

enough for transmission over standard modems– explains rise of streaming video technologies– users cope by making file sizes, duration

smaller– but not by throttling the bitrate

– perceptual threshold?

Future Work

• How do videos age?

• Another study to confirm findings– Brewster Kahle,– www.archive.org

• Develop tools to automate the process

Open Questions

• What are video access patterns on the Web?

• How to analyze streaming video files?