Thesis arpan pal_gisfi

1 PhD. Thesis Defense – 9th April, 2013

Novel Applications for Emerging Markets Using Television as a Ubiquitous DeviceCTIF, Department of Electronic Systems

Arpan PalPrincipal Scientist and Head of ResearchTCS Innovation Labs, Kolkata

SupervisorProf. Ramjee Prasad

2

Motivation - TV as a Ubiquitous Computing Device

“Ubiquitous computing enhances computer use by making many computers available throughout the physical environment, but making them effectively invisible to the user” – Mark Weiser

Can TV be the Ubiquitous Screen for Home?• It is available• It is easy to use

Market Facts (India) Source – IMRB, ITU, IMAI (2009, 2010, 2011)

PC – Low penetration - cost, skill and usability issues Mobile screen - good for social networking - not for

Information-heavy content Tablets and large-screen smartphones – still costly Television - low-cost of ownership, large screen

real-estate , familiarity of usage

Device Penetration (%) Internet (%)

PC 6.1% 4.2% (Home)

Mobile 61.4% 3.2%

Television 60% (Household) ??

Internet Browser Media Player (Audio, Video, Image) Video Chat, SMS Remote Healthcare Distance Education

Pilot Deployment in India and Philippines

Home Infotainment Platform

3

User Study

User survey with HIP (Pilot version of “Dialog” launched by Tata Teleservices Ltd.) was conducted among the city users in India – by TNS

Sample - 50 middle-class and lower middle-class families involving 50 working adults and 50 students (12-18 years age).

Confidence score measure - % of respondents responding with a score of 4

or 5

4

User Study Findings

Slow to very-slow internet connection affecting the user experience• Video chat could not be tested – how to do acceptable quality video chat under low-

QoS networks?

Browser - Ease of use Issue and not much liking for TV-Internet blending• How to make experience acceptable through TV-Internet Mash-ups?

Preference of remote control for non-computer-savvy users and dislike of the QWERTY layout for on-screen keyboard

• How to design a better text entry method using remote control?

Additional multimedia content security requirement for remote healthcare and distance education applications

• How to have low computational complexity Access Control and DRM schemes capable of running on the constrained low-cost platform?

Problem StatementImproving the user experience of TV as low-cost Internet-access device through QoS-aware Video Transmissions Low-complexity Video Security TV context-aware Intelligent TV-Internet Mash-ups TV Remote-based on-screen keyboard for text entry

Challenges Resource Constrained

Platform Non-Computer-Savvy

Users Low-QoS Network

Video Chat over low-QoS Networks

Low-QoS Network

Resource Constrained

Platform

Low-cost Infotainment

Platform

ICT Infrastructure

Limitations

User Experience

Requirements

Challenges

6

Problem Definition

Video Chat• Bandwidth-hungry with real-time packet delivery requirement

2G wireless networks (CDMA1xRTT / GPRS) • Poor bandwidth and high latency – fluctuates with time and place

Need for a adaptive rate control based video chat system with preference to audio

State-of-the-Art Analysis Network Condition Estimation and Adaptation on WiFi and IP

networks, not on 2G wireless where the latency is higher Adequate work not reported on low computational complexity rate control

algorithms for streaming video Adequate work not reported for optimal design for audio/video

streaming systems addressing the latency and perceptual quality issue

7

Proposed System

Contribution Network Sensing

• An experimental heuristics based mapping of effective bandwidth to probe packet delay

Adaptation• A low complexity video

rate control algorithm for H.264 CBR - automatic switching between frame/MB as basic units for quantization based on video complexity

• An adaptive fragmentation scheme with frame-level sequencing that minimizes packet-delay based discards, prioritizes audio and improves perceived video quality

Video Chat Application

Middleware Framework

Rate Control - Encode Rate Control - Decode

Video Encode

Audio Encode

Audio Decode

Video Decode

Network Sensing

Fragmentation Re-assembly

Underlying Network Stack (UDP/IP)

8

Results

Grandma @ 30kbps Akiyo @ 25kbps

PSN

R (in

dB)

PSN

R (in

dB)

Network TypeMean (kbps) Stdev (kbps)

ADSL-ADSL 596.14 203.45Modem- Modem 26.96 19.23

Modem-ADSL 18.13 3.21

Network Sensing

Adaptive Rate Control (QCIF, 5fps)Adaptive Fragmentation

Feedback from 20 users about overall experience using a) Standard RTP based systemb) Proposed system

100% reported better audio quality and better perceived video quality

Low Complexity Video SecurityWatermarking for DRM (Education)


Platform


PlatformSocially Value-Adding Apps

User Experience

Requirements

Challenges

10

Problem Definition

Digital Watermarking Requirement Needs to be imperceptible and robust at the same time Needs to have low-computational complexityState-of-the-Art Initial and classical works are on MPEG2 and not on H.264 Reported H.264 watermarking systems have high computational

overhead Reported works lack perceptual quality analysis and attack-

robustness analysis

.

Video Encryption Requirement Needs to have low-computational complexity, yet adequate

securityState-of-the-Art Uncompressed domain encryption - high decryption

computational overhead Reported H.264 video encryption works – no focus on computational

complexity No work reported on Video Quality assessment after

encryption/decryption

.

11

Digital Watermarking – Proposed System

ContributionRobust, imperceptible yet low-computational-complexity H.264 watermarking system Hash based integrity check Embed watermark reusing the

quantizer Embedding location carefully

chosen for imperceptibility

Evaluation methodology for watermark attack and perceptual video quality Peak-Signal-to-Noise-Ratio

(PSNR) based Imperceptibility Evaluation against 10 known

attacks Mean-Opinion-Score (MoS)

based post-attack measurement of Video Quality, Retrieved Image Quality and Retrieved Text Quality

Input video (YUV 4:2:0 format)

F/ n-1(Reference)

F/ n(Reconstructe

d)

De-blocking Filter

Entropy Encoder

ReorderQT

Inter (P)

Intra (I)

Q-1T-1

NAL

+

+

+

+

Extract watermark

Embed watermark

Best Prediction Mode

and block size

selection

Text

And Image

Retrieved Text

And

Image

12

Digital Watermarking – Results

Operation No. of Operations per GOPADD 2779

MULTIPLY 3564DIVIDE 1980

MODULO 3564CONDIONAL 7524MEMORY I/O 1584

Function CPU Mega Cycles taken per GOP (1 second)

Watermark Embedding 6.8Watermark Extraction 3.8

Computational Complexity Video Quality after Watermark

Attack Video Quality Image Quality Text Quality Overall Performance against Attack

AA Excellent Excellent Excellent ExcellentCAA Poor Medium Bad Attack Degrades Video QualityFFA Poor Poor Bad Attack Degrades Video QualityGCA Poor Good Good Attack Degrades Video QualityGA Bad Bad Bad Attack Degrades Video Quality

HEA Poor Good Poor Attack Degrades Video QualityLEA Poor Good Bad Attack Degrades Video Quality

NLFA Poor Medium Bad Attack Degrades Video QualityRsA Excellent Excellent Excellent ExcellentRoA Poor Excellent Excellent Attack Degrades Video Quality

Perceptual Video Quality after Attacks + Retrieved Image / Text Quality (14 streams, 20 users)

Context-aware Intelligent TV-Internet Mash-ups

TV Channel Identity as ContextTextual Context in Static TV pages

Textual Context embedded in Broadcast Video


Platform


PlatformTelevision as the

Ubiquitous Access Device

User Experience

Non-Computer-savvy Users

Requirements

Challenges

14

Requirement (Analog TV Context)

DTH / Cable Set top Box

HIP

Video Capture

&Context

Extraction

Information Mash-up Engine

Television

InternetRF in

A/V in

Video

Graphics

A/V Out

ALPHA Blending

Channel Identity – EPG, Viewership Rating Text on Static Pages – Return path on DTH

Text on Broadcast Video – News Mash-up

15

Problem Definition

TV Channel Identity Audio watermarking and audio signature based - need content

modification or non-real-time offline processing TV channel logo detection based

Reported works use PCA and ICA – computationally intensive and works only on Static, opaque and rectangular logos, not on non-rectangular / transparent / dynamic logos

Text on Broadcast VideoChallenge is in identifying the text against dynamically changing video background Pixel domain and Compressed domain

• Region Based (Connected Component Based (CC) and Edge Based (EB))

Different methods work for varying kinds of texts against varying video backgrounds

• Need for Hybrid Approach (Region and Texture, CC and EB)• Text Area Localization and pre-processing remain the biggest

challenge

Text on Static Pages Noisy Data with fixed fonts. Efficient pre-processing techniques is the main challenge.

16

TV Channel Identity – Results

110 channels tested: r = 0.96 and p = 0.95.• The channel logos with very small number of pixels are missed in

1% cases. • For rest 3% misses – moving channel logo / changed logo.• 3% false positive from small size logo – removed from template• 2% false positive due to highly transparent logos

TV Channel Identity Recall and Precision

Text Detection on Static Pages

Textual Context from Broadcast TV• 20 News Channels (5 min. duration each)• Recall of 100% and Precision of 78% for the

text localization and OCR• Precision Improves to 88.57% after

Heuristics-based keyword spotting• Almost 100% precision after Google

Dictionary based post-processing

Novel On-screen Keyboard


PlatformTelevision as the

Ubiquitous Access Device

User Experience

Non-Computer-savvy Users

Requirements

Challenges

18

Problem Definition

Requirements• To provide cost-effective and easy-to-use text-entry mechanism for

accessing services like internet, email and short message services (SMS) from television.

• Full-fledged separate wireless keyboard (Bluetooth or RF) is costly• Explore option of using infra-red remotes with on-screen keyboard on

TV screen

State-of-the-Art Traditional “QWERTY” on-screen keyboards require a large number of

keystrokes to navigate which makes it cumbersome to use. Available on-screen keyboards do not address the usability aspect for a

non-computer-savvy user. Most of them are designed for cursor-based systems and not for

traditional key-based infra-red remotes which have relatively slow response time on key press.

Insufficient user study and modeling for on-screen layouts

19

Proposed System

Contribution A novel formulation of on-screen

layout • Significantly reduces the

number of key strokes while typing (19 for QWERTY, 9 for proposed)

Formal methodology for user study evaluation

• Popular Keystroke-level-model (KLM) and Goal-Operator-Methods (GOMS) model for formal evaluation

Extend the standard KLM operator set to model the remote based operations and hierarchical layouts.

• A finger movement operator replacing standard pointing device operator

Aa AbAg Ah

20

Results -User Study 1 (Basic Benchmarking)

Users25 users (diverse age / keyboard exposure)TasksUsers were asked to type - “The quick brown fox jumps over a lazy dog” on SMS/Email and “www.google.com” as URL

Layout 1

QWERTY

QWERTY vs. Layout-1

Layout 2

Layouts % improvement (Experiment)

% improvement (predicted from KLM-

GOMS)

Layout 1 over QWERTY

44.23 45.75


45 46.75

Layout 2 over layout1

2 1.84



GOMS)

Layout1 over QWERTY

42.2 35.23

Layout2 over QWERTY

43.4 37.2

Layout 2 over Layout1

3.18 2

21

Conclusions

Motivation – Why Television Validated through Market Data from India

Background – TCS Home Infotainment Platform (HIP) Internet Browser, Media Player, SMS, Video Chat, Remote Healthcare,

Distance Education

Requirement Analysis – Field Study Challenges - Slow Internet Speed, Non-computer-savvy users, Resource

Constrained Platform

How to Improve user experience of TV as low-cost Internet-access device Network-adaptive Video Rate Control and Packet Fragmentation Protocols for

improved Video Chat experience Low-computational-complexity Video Watermarking and Encryption for secure

multimedia content sharing TV context-aware Intelligent TV-Internet Mash-ups for improving the Internet

Browsing Experience on TV Remote-based on-screen keyboard for improved text entry on TV using remote

control

22

Publications

1. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous Access Device for Masses”, Proceedings on Ubiquitous Computing and Multimedia Applications (UCMA), Miyazaki, Japan, March 2010.

2. Dhiman Chattopadhyay, Aniruddha Sinha, T. Chattopadhyay, Arpan Pal, “Adaptive Rate Control for H.264 Based Video Conferencing Over a Low Bandwidth Wired and Wireless Channel”, IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Bilbao, Spain, May 2009.

3. Arpan Pal and T. Chattopadhyay, “A Novel, Low-Complexity Video Watermarking Scheme for H.264”, Texas Instruments Developers Conference, Dallas, Texas, March 2007.

4. T. Chattopadhyay and Arpan Pal, “Two fold video encryption technique applicable to H.264 AVC”, IEEE International Advance Computing Conference (IACC), Patiala, India, March 2009.

5. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, Debabrata Pradhan, Soumali Roychowdhury, “Recognition of Channel Logos From Streamed Videos for Value Added Services in Connected TV”, IEEE International Conference for Consumer Electronics (ICCE), Las Vegas, USA , January 2011.

6. T. Chattopadhyay, Arpan Pal, Utpal Garain, “Mash up of Breaking News and Contextual Web Information: A Novel Service for Connected Television”, Proceedings of 19th International Conference on Computer Communications and Networks (ICCCN), Zurich, Switzerland, August 2010.

7. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, “TV Video Context Extraction”, IEEE Trends and Developments in Converging Technology towards 2020 (TENCON 2011), Bali, INDONESIA, November 21-24, 2011.

8. Arpan Pal, Chirabrata Bhaumik, Debnarayan Kar, Somnath Ghoshdastidar, Jasma Shukla, “A Novel On-Screen Keyboard for Hierarchical Navigation with Reduced Number of Key Strokes”, IEEE International Conference on Systems, Man and Cybernetics (SMC), San Antonio, Texas, October 2009.

9. Arpan Pal, Debatri Chatterjee, Debnarayan Kar, “Evaluation and Improvements of on-screen keyboard for Television and Set-top Box”, IEEE International Symposium for Consumer Electronics (ISCE), Singapore, June 2011.

23

Publications (contd…)

10. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous Access Device for Masses”, Book Chapter in Springer Communications in Computer and Information Science, Volume 75, 2010, Pages 11-19.DOI: 10.1007/978-3-642-13467-8.

11. Arpan Pal, Ramjee Prasad, Rohit Gupta, “A low-cost Connected TV platform for Emerging Markets–Requirement Analysis through User Study”, Engineering Science and Technology: An International Journal (ESTIJ), ISSN: 2250-3498, Vol.2, No.6, December 2012.

12. T. Chattopadhyay and Arpan Pal, “Watermarking for H.264 Video”, EE Times Design, Signal Processing Design Line, November 2007.

13. Arpan Pal, Aniruddha Sinha and Tanushyam Chattopadhyay, “Recognition of Characters from Streaming Videos”, Book Chapter in book: Character Recognition, Edited by Minoru Mori, Sciyo Publications, ISBN: 978-953-307-105-3, September 2010.

14. Arpan Pal, Tanushyam Chattopadhyay, Aniruddha Sinha and Ramjee Prasad, “The Context-aware Television using Logo Detection and Character Recognition”, (Submitted) Springer Journal of Pattern Analysis and Applications

15. Debatri Chatterjee, Aniruddha Sinha, Arpan Pal, Anupam Basu,, “An Iterative Methodology to Improve TV Onscreen Keyboard Layout Design Through Evaluation of user Study”, Journal of Advances in Computing, Vol.2, No.5, October 2012), Scientific and Academic Publishing (SAP), p-ISSN:2163-2944, e-ISSN:2163-2979.

24

Future Work

Adaptive Video Chat Network sensing via IP statistics, Support multicast

Low-complexity Video Security Extending security analysis, Audio Watermarking

Internet-TV mash-up Extend to second-screens (mobile, tablets), seamless transfer of

streaming content to/from TV from/to mobile/tablets, Integrating social media into broadcast TV.

On-screen Keyboard Predictive keyboard integration, incorporating voice interfaces and

gesture controls. Explore using mobile phones as the remote control for TV.

Generic Optimization the contradicting triad of requirements in form of cost,

feature and performance, in the face of ever improving hardware functionalities.

25

Learning from the CTIF-GISFI PhD Program

Access to excellent and state-of-the-art technical course content from Aalborg University

Very useful Research Soft Skill courses Problem based Learning Professional Networking Qualitative Research Approaches Innovation Methodology Writing Scientific Papers

Technical Support from CTIF faculty

Culture of Practical Problem based Research

Attending relevant conferences and EU FP7 related programs

Networking with experts in related areas

Thank You

Questions?

[email protected]

Supporting Slides

28

Introduction

Objective Improving the user experience of TV as a low-cost Internet-access

device for masses in emerging markets like India

• Motivation• Background• User Study Based Requirement Analysis• Contributions

1) Improve the Quality of Experience (QoE) of Video Chat under poor network conditions using network sensing

2) Provide computationally efficient yet sufficiently secure algorithms for efficient access control and digital rights management (DRM) for sensitive multimedia content

3) Improve the experience of browsing the Internet on TV through intelligently understanding the context of the TV program being watched and blending related information from the Internet (known as TV-Internet mash-up)

4) Improve the experience of text entry on TV using the remote control through a novel design of an on-screen keyboard layout

29

Thesis Organization

ICT InfrastructureLimitations

User Experience

Resource Constrained Platform

Low-QoS Network

Non computer-savvy users

TV context-aware Intelligent TV-Internet

Mash-ups

Multimedia Framework for Quick

Application Development

Low complexity Video Encryption and

Watermarking Algorithms

Improved on-screen keyboard layout for

text entry on TV using remote

QoS-aware Video

Transmission


Platform

Television as a Ubiquitous

Display Device

Applications of Social Value-add

Chapter 2 Chapter 4 Chapter 5 Chapter 6 Chapter 3

Scientific Contributions

Chapter 2Chapter 2

Appendix A Appendix AAppendix A

Engineering Contribution

Requirements

Challenges

30

Background - Home Infotainment Platform (HIP )

A/V Out

A/V in

Internet Browser Media Player (Audio, Video,

Image) Video Chat, SMS Remote Healthcare Distance Education

Pilot Deployment in India and Philippines

31

• In Philippines it is launched through Smart Communications under the brand name of “SmartBro SurfTV”.

HIP Deployment

Tata Teleservices Ltd. launched HIP under the brand name of “Dialog” in Tamil Nadu and West Bengal.

TCS’ Home Infotainment Platform launched successfully in India and Philippines.

32

HIP Applications

Chipset - TI Da-Vinci DM6446 (297 MHz ARM9 core and 594 MHz DSP)

RAM - 256MB DDR

Flash - 64MB NAND

OS - Embedded Linux 2.6.x

Browser - Opera for devices 9.6 with Flash

Multimedia Codecs

Video - H.264,MPEG1, MPEG2, MPEG4Audio - MP3, AMR-NB, AAC, OGG, FLACImage - JPEG

33

Socially Value-adding Apps – Healthcare and Education

ECG

Blood Pressure Monitor

Pulse OxyMeter

HIP

Patient

Records

Health Center / Home

Expert Doctor

Digital Stethoscope

34

HIP Application Development Framework

NetworkMicrophone

Camera

A/V in

Storage

USB

Compress

Decompress

Multiplex

Demux

Render

Blend

Network

VGA

TV Video

TV Audio

Headphone

Storage

Control APIs

SRC PROC SINK

Applications

Application SRC PROC SINKVideo from Internet Network Demux – Decompress TV Video / Audio

Media player Storage Demux – Decompress TV Video / AudioVideo Chat (Far View) Network Demux - Decompress TV Video /

HeadphoneVideo Chat (Near View) Camera and

MicrophoneCompress – Multiplex Network

Internet Browser Network Render TV VideoRemote Healthcare USB Compress - Multiplex Network

Distance Education - Lecture Recording

A/V in Demux – Decompress, Remux - Compress

Storage

Lecture Playback Storage Demux – Decompress TV Video / AudioQA and Course Guide Network Demux Parser TV Video / Audio

35

Sensing of Network Condition

T = average (RTT (P1, t1i), RTT (P2, t2i)) .

Heuristic based mapping of Effective bandwidth of the network based on experimentation on the real network (CDMA1xRTT using Tata Docomo Photon) at different times of the day and at different places

RTT(P1, t1i)

RTT(P2, t2i)

P1(t1i) P2 (t2i)

Transmitter End

T (msec) BWeff (kbps)

T < 300 50

300 < T < 800 13

800 < T < 1600 4

1600 < T < 1900 2

T > 1900 1

36

Rate Control in Audio and Video

Audio Codec (AMR-NB)• If Effective Bandwidth > 4 kbps, bit rate = 5.15 kbps, else bit rate

= 4.75 kbps

Video Codec (H.264)• For complex scenes, only frame level control exhausts bit

budget at the initial frames as MAD increases. • MB level control makes sure that bit budget is not exhausted

because the whole frame is not complex throughout. Estimation of complexity of the video scene –• Bit rate:• Mean Absolute Difference (MAD) prediction model –• Threshold for frame T (n) is defined as 80% of MADavg(n).

Threshold selection done based on 20 different test sequences of different resolutions using classical decision theory.

• If in a frame, MADcb > T(n), for at least one MB in a frame, the frame is declared as complex and MB is chosen as the basic unit for that frame.

• If in a frame, MADcb <= T(n), for all MBs, then basic unit is chosen as frame.

37

Adaptive Packetization of Encoded Video/Audio

• Fragment Size N = (fr+H) = 1440 bytes (optimal size through experimentation)

• Transmission interval dt = (fr +H) / (1000*BWeff) • A 9-byte header is added to each video fragment.

Frame type (1 byte) – I (Independent) or P (Predictive) frame. Total sub-sequence number (1 byte) – total number of fragments in a video

frame. Sub sequence number (1 byte) – fragment number. Sequence number (4 bytes) – video frame number. Video payload size (2 bytes) – video bytes in the current fragment.

• Drop a frame iff a newer frame fragment arrives before all the fragments of the current frame (improvement over RTP)

• Do not discard packets based on transit delay - probability of receiving good packets on a slow network is increased (improvement over TCP).• 20 msec AMR_NB audio frames - M (=10, matching 5 fps video) audio frames aggregated and sent in a single UDP fragment.

• DTX (Discontinuous Transmission) enabled in the AMR-NB encode (VAD) - If the silence period is more than D seconds then the audio transmission is discontinued.

For a good channel (BWeff > 4 kbps), D > 10 seconds For a bad channel ((BWeff <= 4 kbps), D = 3 to 5 seconds

38

Problem Definition

Digital Watermarking Requirement Needs to be imperceptible and robust at the same time Needs to have low-computational complexityState-of-the-Art Initial and classical works are on MPEG2 and not on H.264 Reported H.264 watermarking systems have high computational

overhead Reported works lack perceptual quality analysis and attack-

robustness analysis

.

Video Encryption Requirement Needs to have low-computational complexity, yet adequate

securityState-of-the-Art Uncompressed domain encryption - high decryption

computational overhead Reported H.264 video encryption works – no focus on computational

complexity No work reported on Video Quality assessment after

encryption/decryption

.

39

Watermarking Algorithm Flow

Is the frame IDR?

Y

N

N

Is it an even IDR?Y

CONTINUE

Hash the previous GOP Check for Message Size

Embed Watermark (Hash Number of previous GOP)

Find location for embedding (image + data)

Embed Watermark (actual message)

Find location for embedding (data)

40

Digital Watermarking – Algorithm Details

Embed information in corresponding coefficient (10th or 15th bit depending on bit-index being odd or even)

Data

Image

Message is image or data

Diagonal SB? Ab-Diagonal SB?

SKIP

Y YN N

• HxW logo image in binary format and K byte text data information.

• Total number of bits to embed N = HxW + K*8 - stored in an N byte binary array (called wn).

• Wn quantized using same quantization parameter (qp) used in H.264 - quantized values stored in array (wqn ).

• For each wqn, find the location of embedding inside the image - Image location mapped wqn is depicted as M(u,v), where (u,v) denotes the position in the DCT domain.

41

Digital Watermarking - Attacks

.264 with WM .YUV .YUV

.264 .YUV

Retrieved Image / Text

H.264 Decoder (without WM

detector)

H.264 Encoder (without WM embedding)

H.264 Decoder (with WM detector)

Measure of quality for retrieved Image / Text

Original Image / Text

Attack

Report Generation

6. Non-linear filtering attack (NLFA)

7. Gaussian attack (GA)8. Gama correction attack

(GCA)9. Histogram equalization

attack (HEA)10.Laplacian attack (LEA)

1. Averaging attack (AA)2. Circular averaging attack

(CAA)3. Rotate attack (RoA)4. Resize attack (RsA)5. Frequency filtering attack

(FFA),

Video Quality Comparison

42

Watermarking Perceptual Video Quality after Attack

• Ten Quality Measures - Average Absolute Difference (AAD), Mean Square Error (MSE), Normalised Mean Square Error (NMSE), Laplacian Mean Square Error (LMSE), Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Image Fidelity (IF), Structural Content (SC), Global Sigma Signal to Noise Ratio (GSSNR), Histogram Similarity (HS)

• Three pairs of videos – a) Two identical videos (high quality), b) Two completely different videos (poor quality) and c) original video and compressed / decompressed video (average quality)

• Weighted Quality Metric W_VAL = ((AAD+GSSNR+LMSE+MSE+PSNR)*3+HS+IF+NMSE+SC+SNR)

• 14 test video streams were taken, subjected to different kinds to attacks and W_VAL calculated 20 users requested to judge attacked and original watermarked video based on perception.

• Judgement based on human vision psychology (HVS) – converted to a fuzzy Mean Opinion Score (MOS based parameter Cqual

IF (W_VAL >= 90), Cqual = Excellent ELSEIF (W_VAL >= 80), Cqual = GoodELSEIF (W_VAL >= 75), Cqual = AverageELSEIF (W_VAL >= 70), Cqual = BadELSE Cqual = Poor

43

Watermarking Retrieved Image and Text Quality

Image • Normalized deviation parameter (de) from Euclidian distance d • Bit error (be) - % of bits differing between retrieved and original binary image• c - difference in crossing count of 0 to 1 of original and retrieved binary image,

Crossing count error • Final error in retrieved image• Based on MOS, decision logic for the quality of the retrieved image (Cimg)

IF e < 0.5 Cimg = Excellent IF 5 > e > 0.5 Cimg = GoodIF 10 > e > 5 Cimg = MediumIF 15 > e > 10 Cimg = BadELSE Cimg = Poor

Text• Compute mean error (te) of the Hamming distance and Levensthein distance • MOS based retrieved text quality Ctxt -

IF te< 0.5 Ctxt = Excellent IF 1 > te >0.5 Ctxt = GoodIF 3> te > 1 Ctxt = MediumIF 5> te > 3 Ctxt = BadELSE Ctxt = Poor

44

Watermarking Perceptual Video Quality - Results

Attack W_VAL Cqual

AA 100 Excellent

CAA 52 Poor

FFA 25 Poor

GCA 27 Poor

GA 71 Bad

HEA 27 Poor

LEA 28 Poor

NLFA 25 Poor

RsA 100 Excellent

RoA 37 Poor

Original Logo

AA

FFA

GCA

NLFA

HEA

LEA

RoA

RsA

Original Video

45

Watermarking Decision Logic on Perceptual Quality after Attacks

Video Quality Retrieved Image Quality

Retrieved Text Quality

Overall Measure of Goodness

Excellent or Good Excellent Excellent Excellent

Excellent or Good Excellent Good Good

Excellent or Good Good Excellent Good

Excellent or Good Good Good Good

Excellent or Good Medium Medium Medium

Excellent or Good Bad or Poor Medium Bad

Excellent or Good Medium Bad or Poor Bad

Excellent or Good Bad or Poor Bad or Poor Poor

Medium, Bad or Poor

Any Any Attack degrades video quality

46

Watermarking Results on Retrieved Quality after Attacks

Attack be ce de e Image Quality (CImg)

AA 0.000 0.000 0.000 0.000 Excellent

CAA 5.469 9.896 3.448 6.271 Medium

FFA 5.469 10.93855.17

223.86

0Poor

GCA 0.781 1.563 3.448 1.931 Good

GA 4.948 9.89624.13

812.99

4Bad

HEA 1.563 1.563 3.448 2.191 Good

LEA 1.823 2.083 0.000 1.302 Good

NLFA 5.729 10.41713.79

39.980 Medium

RsA 0.000 0.000 0.000 0.000 Excellent

RoA 0.781 0.521 0.000 0.434 Excellent

Attack L H te Text Quality (Ctxt)

AA 0 0 0.000 Excellent

CAA 6 1 3.5 Bad

FFA 6 1 3.5 Bad

GCA 0 1 .5 Good

GA 5 1 3 Bad

HEA 6 7 6.5 Poor

LEA 4 5 4.5 Bad

NLFA 6 1 3.5 Bad

RsA 0 0 0.000 Excellent

RoA 0 0 0.000 Excellent

47

Encryption – Proposed System

ContributionLow-computational-complexity two-stage H.264 video encryption algorithm Separate Header Encryption Reuse of the flexible macro-

block re-ordering (FMO) of H.264/AVC as the encryption operator

Analysis of the effect of the encryption-decryption chain on the video quality Important from end-user

experience perspective. PSNR used as Quality

Measure

End of Slice?

N

Encode Next frame

End of Sequence?

Y

N

Encode Next MB

Encode Next Slice

End of Frame?

Y

N

End

FMO to Get Next MB Number

Proceed to next Macroblock

Y

FMO

Modify MB ordering using key based look-up

Get next MB number

48

Encryption – Two Stage Algorithm

Read NALU

Read Control data Read Video data

Read Macroblock

NALU type = ControlY N

SPS PPS IDR P SPS PPSP …. ….. IDRP

SPS PPS P P IDR PP …. ….. PP

Header Encryption• First encrypt the SPS (Seq. Param. Set), PPS (Pict. Param. Set),

IDR• Encode the first frame using conventional H.264 encoder and take a 16-

bit Key (KU) and • Take the length of IDR (lIDR) - It is a 16-bit number for QCIF resolution• Define encryption key value KP using a Hash function of lIDR and KU

Modify macro-block ordering using key based look-up• Use KP as seed to generate random sequence Le (between 0 to 97). This

is used for the first GOP. For subsequent GOPs, KP of the previous GOP is

used as KU for a new KP.

• MBs in an IDR frame encoded in the order specified by the look-up table Le.

49

Encryption – Results

Security Analysis Brute force attack – 98 possible Macroblock orders

98! = 9.42 x 10153 attempts Actually, this is restricted by the 32 bit key used for generating the MB

order - can be generated in 232 ways, requiring half that number of attempts to decrypt

Key changed every GOP - proposed method is robust enough in comparison to other reported similar methods

Operation # per GOP % Increase

ADD 2*24*97 + 3 = 4659 0.004

MULTIPLICATION 5*24*97 = 11,640 0.200

DIVISION 5 0

MODULO 4*24*97 + 4 = 9316 0

Resolution (wxh) Picture size in MBs

Memory (bytes)

QCIF (176x144) 99 198CIF (352x288) 396 792

VGA (640x480) 1200 2400SDTV-525 (720x480) 1350 2700SDTV-625 (720x576) 1620 3240

Video Sequence

Size / frame (in bytes) PSNR (of Y component)Without encryption With encryption Without encryption With encryption

Claire 155.125 158.16 39.03 39.03

Foreman 668.975 700.3 35.14 35.16

Hall monitor 264.82 268.705 36.97 36.96

Computational Complexity Analysis

Video Quality Analysis

50

Textual Context from Broadcast TV – Requirement

ANDHRA CM’S MISSINGCHOPPER MYSTERY

Andhra, CM, Missing Chopper

Keyword spotting

Search for RSS feed containing related information

Search through any Engine for related information

Display Related Information on TV

Missing Chopper Found, CM Dead

51

Textual Context in Static Pages – Proposed System for DTH

Contribution

Pre-processing and Enhancement Noise Removal through low-

pass filtering on Y Resolution Enhancement

through Interpolation-based zooming

Binarization and Touching Character Segmentation Adaptive Thresholding based

Binarization Touching character

segmentation using width outlier detection

Use standard OCR tools like GOCR and Tesseract

A-priori ROI Mapping

Pre-processing for noise removal and image enhancement

Binarization and Touching Character Segmentation

OCR using standard engines

52

TV Channel Identity – Results

110 channels tested: r = 0.96 and p = 0.95.• The channel logos with very small number of pixels are missed in 1%

cases. • For rest 3% misses – moving channel logo / changed logo.• 3% false positive from small size logo – removed from template• 2% false positive due to highly transparent logos

Recall and Precision Measures

Original Channel Detected AsZee Trendz DD NeZee Punjabi TV9 GujaratiDD News DD Ne

Nick DD NeNepal 1 Zee Cinema

Module Time (msec)YUV to HSV 321.09ROI mapping 0.08Mean SAD matching

293.65

Correlation 847.55

Changed Logo Examples False Positive Examples

.

Computational Complexity

53

Textual Context from Broadcast TV– Proposed System for News Mash-up from Internet

Breaking News Heuristics• Breaking news always comes in

Capital Letter.• Font size of breaking news is larger

than that of the ticker text• They tend to appear on the central

to central-bottom part of the screen

Localization of suspected text regions

Text region confirmation using Temporal Consistency

Binarization

Text Recognition

Keyword Selection

Contribution An improved method for text

region localization and screen layout segmentation

Pre-processing techniques on the text region (same as previous section)

Heuristics based keyword spotting algorithm followed by Google’s in built dictionary-based correction

54

Image

Output of GOCR Output of Tesseract After Applying Proposed AlgorithmsGOCR Tesseract

(a)

Sta_ring Govind_. Reem_ _n. RajpaI Yadav. Om Puri.

Starring Guvinda, Rcema Sen, Raipal Yadav, Om Puri.

Starring Govind_. Reem_ _n. RajpaI Yadav. Om Puri.

Starring Guvinda. Reema Sen, Raipal Yadav. Om Puri.

(b) _____ ___ ___ _________ ____ __ __

Pluww SMS thu fnlluwmg (adn In 56633

___ SMS th_ folIcmng cod_ to S__

Planta SMS tha Iullmmng mda tn 56633

(c) SmS YR SH to SMS YR SH in 56633 SmS YR SH to _____ SMS YR SH to 56533(d) _m_ BD to _____ SMS BD to 56633 SMS BD to S____ SMS BD to 56633(e) AM t___o_,_b _q____ AM 2048eb 141117 AM tOa_gb _q____ AM 2048eb 141117(f) _M_= _ _A___ to Sd___ SMS: SC 34393 tn 56533 _M_= _ _A___ to Sd___ SMS: SC34393 tn 56633g) _W _ ' _b _ Ib_lb _a W6.} 048abl;lbwzIb1a ___ __Y_b yIbw_Ib_a WP 2048ab Mlbwzlb 1 a(h) ADD Ed_J to S____ ADD Eau to $6633 ADD Ed_J to S____ ADD Edu to 56633(i) AIC STAlUSlS/OUO_

t_;OS;t_AIC STATUS25/02/09 1 9:05:1 4

mlC S_ATUSlS/OUO_ t_;OS=tA A/C STATUS 25/02/09 1 9:05:14

(j) _ ________'__ Sub ID 1005681893 WbID_OOS_B_B__ Sub ID 1005681893

Screenshots of candidate ROI’s Accuracy of Text Detection

OCR Example Results

Source: Tata Sky DTH Service in India

Textual Context in Static Pages – Results

55

Textual Context from Broadcast TV - Text Region Localization

• Filter out low-contrast components - intensity based thresholding (output Vcont.).

• Count the number of Black pixels in a row in each row of Vcont. • Let the number of Black pixels in ith row be defined as cntblack(i)• Compute the average ( avg black ) number of Black pixels in a row as

where ht is the height of the frame.• Compute the absolute variation av(i) in number of black pixels in a row from

avgblack as av(i) = abs(cntblack(i) – avgblack)

• Compute the average absolute variation ( aav ) as

• Compute the threshold for marking the textual region as

• Mark all pixels in ith row of Vcont as white if

Confirmation Of The Text Regions Using Temporal Consistency• Assumption that texts in the breaking news persist for some time. • Vcont sometime contains noise because of some high contrast regions in the

video frame• In a typical video sequence with 30 FPS, one frame gets displayed for 33 msec. • Assuming breaking news to be persistent for at least 2 seconds, all regions

which are not persistently present for more than 2 seconds can be filtered out.

56

Textual Context from Broadcast TV - Post-processing Heuristics

• Operate the OCR only in upper case• If the number of words in a text line is above a heuristically obtained threshold value they are considered as candidate text region.• If multiple such text lines are obtained, chose a line near the bottom• Remove the stop words (like a, an, the, for, of etc.) and correct the words using a dictionary.• Concatenate the remaining words to generate the search string for internet search engine

Selected keyword can be given to Internet search engines using Web APIs to fetch related news, which can be blended on top of TV video to create a mash up between TV and Web.

Since search engines like Google already provide word correction, thereby eliminating the requirement of dictionary based correction of keywords.

57

Textual Context from Broadcast TV – Results

Accuracy Of Text Localization• Experimental results show a

recall rate of 100% and precision of 78%

• The reason behind a low precision rate is tuning the parameters and threshold values in a manner so that the probability of false negative (misses) is minimized.

• The final precision performance can be only seen after applying text recognition and keyword selection algorithms

Accuracy Of Text Recognition• in case of false positives a number of special

characters are coming as out put of OCR. • So the candidate texts having special

character/ alphabet ratio > 1 are discarded. • Moreover proposed keyword de tection

method suggests that concentrating more on capital letters.

• So only the words in all capi tals are kept under consideration.

• It is found that character level ac curacy of the selected OCR for those cases in improves to 86.57%.

Accuracy of information retrieval • Limitations of the OCR module can be overcome by having a strong dictionary or

language model. • But in the proposed method this constraint is bypassed as the Google search

engine itself has one such strong module. • So one simply gives the output of OCR to Google search engine and in turn

Google gives the option with actual text

58

Proposed Algorithm for Keyboard Layout

Algorithm• Total No. of Character Cells = T• Total no. of rows of key-blocks = R• Total no. of columns of key-blocks = C• Total no. of Cells in a key-block = 4• T = R x C x 4, Max. Keystrokes in

worst case K = (R+C+1) keystrokes. • Hence, the desired solution boils

down to finding R and C for which K is minimum.

Start

numCharsSqrt = Square root of (T)

sqRootInt = Ceiling of (numCharsSqrt)sqRootInt even no. ?

N=sqRootInt N=sqRootInt + 1Yes

No

N2= N - 2

C= N/ 2

(N* N2)>= T

R = N2/2 R = CYes

No

output R,C

Stop

Input T

Example• “QWERTY” - T=54 : 4 rows, 14

columns (14+4=18) keystrokes max. • “PROPSED” - T=54, sqrtInt = 8, N=8.,

N2=6, C=4, (N*N2=48 < 54), R = C = 4. Max 9 keystrokes

• Final Layout used - T=48, sqrtInt = 7, N=8, N2=6, C=4,. (N*N2=48 == T=48) , R = N2/2 = 3. Finally, R=3 and C=4. Max 8 keystrokes

59

Onscreen Keyboard Results – User Study 1

QWERTY vs. Layout-1 (after practice)

QWERTY vs. Layout-1 (before practice)

1. Does the on-screen keyboard provide enough assistance?2. How is the ease of use of the on-screen keyboard?

60

Results – User Study 2 (KLM-GOMS Modelling)



GOMS)


44.23 45.75


45 46.75

Layout 2 over layout1

2 1.84



GOMS)

Layout1 over QWERTY

42.2 35.23

Layout2 over QWERTY

43.4 37.2

Layout 2 over Layout1

3.18 2

• A total of 20 users

• Simple Text Entry and Email Sending Task.

• Six phrase sets selected randomly from standard MacKenzie’s phrase set.

• Users were given initial

familiarization phrase and then asked to enter six phrases at one go.

• Time taken by each user and the number of keystrokes required to type the phrase were recorded.

Simple Text Entry

Complete Email Typing and Sending Task

61

Onscreen Keyboard Results – User Study 2

P - redefined as the total time taken in finding a key and moving the focus to select the block containing that particular key. Layout 1 – 1.77 sec, Layout 2 – 1.73 sec and QWERTY – 1.10 secH – not usedNew parameter F - the time required for finger movement – 0.22 sec.

Operators

Description Time in sec

P Pointing a pointing device 1.10K Key or button press 0.20H Move from mouse to

keyboard and back0.40

M Mental preparation and thinking time.

1.35

Operations Time for

Layout-1 in sec

Time for Layout-

2 in sec

Time for

QWERTY in sec

Open/close onscreen keyboard

layout.

0.4 0.4 0.4

Find any key 1.07 1.03 1.1

Move focus to select a key

0.7 0.7 2.6

Move finger to the corner keys

0.2 0.2 0.2

Enter a character using

keyboard

2.17 2.13 4.0

Sub-goals

Time for layout

1 in sec

Time for

layout 2 in sec

Time for

QWERTY in sec

Open browser

0.5 0.5 0.5

Open gmail

server & login

45.3 44.5 82.1

Compose mail

68.0 65.8 115.1

Dispatch

0.4 0.4 0.4

KLM-GOMS Operators

KLM-GOMS sub-goals for Email Task

Documents

Thesis arpan pal_gisfi