Upload
arpan-pal
View
42
Download
0
Embed Size (px)
Citation preview
1 PhD. Thesis Defense – 9th April, 2013
Novel Applications for Emerging Markets Using Television as a Ubiquitous DeviceCTIF, Department of Electronic Systems
Arpan PalPrincipal Scientist and Head of ResearchTCS Innovation Labs, Kolkata
SupervisorProf. Ramjee Prasad
2
Motivation - TV as a Ubiquitous Computing Device
“Ubiquitous computing enhances computer use by making many computers available throughout the physical environment, but making them effectively invisible to the user” – Mark Weiser
Can TV be the Ubiquitous Screen for Home?• It is available• It is easy to use
Market Facts (India) Source – IMRB, ITU, IMAI (2009, 2010, 2011)
PC – Low penetration - cost, skill and usability issues Mobile screen - good for social networking - not for
Information-heavy content Tablets and large-screen smartphones – still costly Television - low-cost of ownership, large screen
real-estate , familiarity of usage
Device Penetration (%) Internet (%)
PC 6.1% 4.2% (Home)
Mobile 61.4% 3.2%
Television 60% (Household) ??
Internet Browser Media Player (Audio, Video, Image) Video Chat, SMS Remote Healthcare Distance Education
Pilot Deployment in India and Philippines
Home Infotainment Platform
3
User Study
User survey with HIP (Pilot version of “Dialog” launched by Tata Teleservices Ltd.) was conducted among the city users in India – by TNS
Sample - 50 middle-class and lower middle-class families involving 50 working adults and 50 students (12-18 years age).
Confidence score measure - % of respondents responding with a score of 4
or 5
4
User Study Findings
Slow to very-slow internet connection affecting the user experience• Video chat could not be tested – how to do acceptable quality video chat under low-
QoS networks?
Browser - Ease of use Issue and not much liking for TV-Internet blending• How to make experience acceptable through TV-Internet Mash-ups?
Preference of remote control for non-computer-savvy users and dislike of the QWERTY layout for on-screen keyboard
• How to design a better text entry method using remote control?
Additional multimedia content security requirement for remote healthcare and distance education applications
• How to have low computational complexity Access Control and DRM schemes capable of running on the constrained low-cost platform?
Problem StatementImproving the user experience of TV as low-cost Internet-access device through QoS-aware Video Transmissions Low-complexity Video Security TV context-aware Intelligent TV-Internet Mash-ups TV Remote-based on-screen keyboard for text entry
Challenges Resource Constrained
Platform Non-Computer-Savvy
Users Low-QoS Network
Video Chat over low-QoS Networks
Low-QoS Network
Resource Constrained
Platform
Low-cost Infotainment
Platform
ICT Infrastructure
Limitations
User Experience
Requirements
Challenges
6
Problem Definition
Video Chat• Bandwidth-hungry with real-time packet delivery requirement
2G wireless networks (CDMA1xRTT / GPRS) • Poor bandwidth and high latency – fluctuates with time and place
Need for a adaptive rate control based video chat system with preference to audio
State-of-the-Art Analysis Network Condition Estimation and Adaptation on WiFi and IP
networks, not on 2G wireless where the latency is higher Adequate work not reported on low computational complexity rate control
algorithms for streaming video Adequate work not reported for optimal design for audio/video
streaming systems addressing the latency and perceptual quality issue
7
Proposed System
Contribution Network Sensing
• An experimental heuristics based mapping of effective bandwidth to probe packet delay
Adaptation• A low complexity video
rate control algorithm for H.264 CBR - automatic switching between frame/MB as basic units for quantization based on video complexity
• An adaptive fragmentation scheme with frame-level sequencing that minimizes packet-delay based discards, prioritizes audio and improves perceived video quality
Video Chat Application
Middleware Framework
Rate Control - Encode Rate Control - Decode
Video Encode
Audio Encode
Audio Decode
Video Decode
Network Sensing
Fragmentation Re-assembly
Underlying Network Stack (UDP/IP)
8
Results
Grandma @ 30kbps Akiyo @ 25kbps
PSN
R (in
dB)
PSN
R (in
dB)
Network TypeMean (kbps) Stdev (kbps)
ADSL-ADSL 596.14 203.45Modem- Modem 26.96 19.23
Modem-ADSL 18.13 3.21
Network Sensing
Adaptive Rate Control (QCIF, 5fps)Adaptive Fragmentation
Feedback from 20 users about overall experience using a) Standard RTP based systemb) Proposed system
100% reported better audio quality and better perceived video quality
Low Complexity Video SecurityWatermarking for DRM (Education)
Resource Constrained
Platform
Low-cost Infotainment
PlatformSocially Value-Adding Apps
User Experience
Requirements
Challenges
10
Problem Definition
Digital Watermarking Requirement Needs to be imperceptible and robust at the same time Needs to have low-computational complexityState-of-the-Art Initial and classical works are on MPEG2 and not on H.264 Reported H.264 watermarking systems have high computational
overhead Reported works lack perceptual quality analysis and attack-
robustness analysis
.
Video Encryption Requirement Needs to have low-computational complexity, yet adequate
securityState-of-the-Art Uncompressed domain encryption - high decryption
computational overhead Reported H.264 video encryption works – no focus on computational
complexity No work reported on Video Quality assessment after
encryption/decryption
.
11
Digital Watermarking – Proposed System
ContributionRobust, imperceptible yet low-computational-complexity H.264 watermarking system Hash based integrity check Embed watermark reusing the
quantizer Embedding location carefully
chosen for imperceptibility
Evaluation methodology for watermark attack and perceptual video quality Peak-Signal-to-Noise-Ratio
(PSNR) based Imperceptibility Evaluation against 10 known
attacks Mean-Opinion-Score (MoS)
based post-attack measurement of Video Quality, Retrieved Image Quality and Retrieved Text Quality
Input video (YUV 4:2:0 format)
F/ n-1(Reference)
F/ n(Reconstructe
d)
De-blocking Filter
Entropy Encoder
ReorderQT
Inter (P)
Intra (I)
Q-1T-1
NAL
+
+
+
+
Extract watermark
Embed watermark
Best Prediction Mode
and block size
selection
Text
And Image
Retrieved Text
And
Image
12
Digital Watermarking – Results
Operation No. of Operations per GOPADD 2779
MULTIPLY 3564DIVIDE 1980
MODULO 3564CONDIONAL 7524MEMORY I/O 1584
Function CPU Mega Cycles taken per GOP (1 second)
Watermark Embedding 6.8Watermark Extraction 3.8
Computational Complexity Video Quality after Watermark
Attack Video Quality Image Quality Text Quality Overall Performance against Attack
AA Excellent Excellent Excellent ExcellentCAA Poor Medium Bad Attack Degrades Video QualityFFA Poor Poor Bad Attack Degrades Video QualityGCA Poor Good Good Attack Degrades Video QualityGA Bad Bad Bad Attack Degrades Video Quality
HEA Poor Good Poor Attack Degrades Video QualityLEA Poor Good Bad Attack Degrades Video Quality
NLFA Poor Medium Bad Attack Degrades Video QualityRsA Excellent Excellent Excellent ExcellentRoA Poor Excellent Excellent Attack Degrades Video Quality
Perceptual Video Quality after Attacks + Retrieved Image / Text Quality (14 streams, 20 users)
Context-aware Intelligent TV-Internet Mash-ups
TV Channel Identity as ContextTextual Context in Static TV pages
Textual Context embedded in Broadcast Video
Resource Constrained
Platform
Low-cost Infotainment
PlatformTelevision as the
Ubiquitous Access Device
User Experience
Non-Computer-savvy Users
Requirements
Challenges
14
Requirement (Analog TV Context)
DTH / Cable Set top Box
HIP
Video Capture
&Context
Extraction
Information Mash-up Engine
Television
InternetRF in
A/V in
Video
Graphics
A/V Out
ALPHA Blending
Channel Identity – EPG, Viewership Rating Text on Static Pages – Return path on DTH
Text on Broadcast Video – News Mash-up
15
Problem Definition
TV Channel Identity Audio watermarking and audio signature based - need content
modification or non-real-time offline processing TV channel logo detection based
Reported works use PCA and ICA – computationally intensive and works only on Static, opaque and rectangular logos, not on non-rectangular / transparent / dynamic logos
Text on Broadcast VideoChallenge is in identifying the text against dynamically changing video background Pixel domain and Compressed domain
• Region Based (Connected Component Based (CC) and Edge Based (EB))
Different methods work for varying kinds of texts against varying video backgrounds
• Need for Hybrid Approach (Region and Texture, CC and EB)• Text Area Localization and pre-processing remain the biggest
challenge
Text on Static Pages Noisy Data with fixed fonts. Efficient pre-processing techniques is the main challenge.
16
TV Channel Identity – Results
110 channels tested: r = 0.96 and p = 0.95.• The channel logos with very small number of pixels are missed in
1% cases. • For rest 3% misses – moving channel logo / changed logo.• 3% false positive from small size logo – removed from template• 2% false positive due to highly transparent logos
TV Channel Identity Recall and Precision
Text Detection on Static Pages
Textual Context from Broadcast TV• 20 News Channels (5 min. duration each)• Recall of 100% and Precision of 78% for the
text localization and OCR• Precision Improves to 88.57% after
Heuristics-based keyword spotting• Almost 100% precision after Google
Dictionary based post-processing
Novel On-screen Keyboard
Low-cost Infotainment
PlatformTelevision as the
Ubiquitous Access Device
User Experience
Non-Computer-savvy Users
Requirements
Challenges
18
Problem Definition
Requirements• To provide cost-effective and easy-to-use text-entry mechanism for
accessing services like internet, email and short message services (SMS) from television.
• Full-fledged separate wireless keyboard (Bluetooth or RF) is costly• Explore option of using infra-red remotes with on-screen keyboard on
TV screen
State-of-the-Art Traditional “QWERTY” on-screen keyboards require a large number of
keystrokes to navigate which makes it cumbersome to use. Available on-screen keyboards do not address the usability aspect for a
non-computer-savvy user. Most of them are designed for cursor-based systems and not for
traditional key-based infra-red remotes which have relatively slow response time on key press.
Insufficient user study and modeling for on-screen layouts
19
Proposed System
Contribution A novel formulation of on-screen
layout • Significantly reduces the
number of key strokes while typing (19 for QWERTY, 9 for proposed)
Formal methodology for user study evaluation
• Popular Keystroke-level-model (KLM) and Goal-Operator-Methods (GOMS) model for formal evaluation
Extend the standard KLM operator set to model the remote based operations and hierarchical layouts.
• A finger movement operator replacing standard pointing device operator
Aa AbAg Ah
20
Results -User Study 1 (Basic Benchmarking)
Users25 users (diverse age / keyboard exposure)TasksUsers were asked to type - “The quick brown fox jumps over a lazy dog” on SMS/Email and “www.google.com” as URL
Layout 1
QWERTY
QWERTY vs. Layout-1
Layout 2
Layouts % improvement (Experiment)
% improvement (predicted from KLM-
GOMS)
Layout 1 over QWERTY
44.23 45.75
Layout 2 over QWERTY
45 46.75
Layout 2 over layout1
2 1.84
Layouts % improvement (Experiment)
% improvement (predicted from KLM-
GOMS)
Layout1 over QWERTY
42.2 35.23
Layout2 over QWERTY
43.4 37.2
Layout 2 over Layout1
3.18 2
21
Conclusions
Motivation – Why Television Validated through Market Data from India
Background – TCS Home Infotainment Platform (HIP) Internet Browser, Media Player, SMS, Video Chat, Remote Healthcare,
Distance Education
Requirement Analysis – Field Study Challenges - Slow Internet Speed, Non-computer-savvy users, Resource
Constrained Platform
How to Improve user experience of TV as low-cost Internet-access device Network-adaptive Video Rate Control and Packet Fragmentation Protocols for
improved Video Chat experience Low-computational-complexity Video Watermarking and Encryption for secure
multimedia content sharing TV context-aware Intelligent TV-Internet Mash-ups for improving the Internet
Browsing Experience on TV Remote-based on-screen keyboard for improved text entry on TV using remote
control
22
Publications
1. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous Access Device for Masses”, Proceedings on Ubiquitous Computing and Multimedia Applications (UCMA), Miyazaki, Japan, March 2010.
2. Dhiman Chattopadhyay, Aniruddha Sinha, T. Chattopadhyay, Arpan Pal, “Adaptive Rate Control for H.264 Based Video Conferencing Over a Low Bandwidth Wired and Wireless Channel”, IEEE International Symposium on Broadband Multimedia Systems and Broadcasting (BMSB), Bilbao, Spain, May 2009.
3. Arpan Pal and T. Chattopadhyay, “A Novel, Low-Complexity Video Watermarking Scheme for H.264”, Texas Instruments Developers Conference, Dallas, Texas, March 2007.
4. T. Chattopadhyay and Arpan Pal, “Two fold video encryption technique applicable to H.264 AVC”, IEEE International Advance Computing Conference (IACC), Patiala, India, March 2009.
5. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, Debabrata Pradhan, Soumali Roychowdhury, “Recognition of Channel Logos From Streamed Videos for Value Added Services in Connected TV”, IEEE International Conference for Consumer Electronics (ICCE), Las Vegas, USA , January 2011.
6. T. Chattopadhyay, Arpan Pal, Utpal Garain, “Mash up of Breaking News and Contextual Web Information: A Novel Service for Connected Television”, Proceedings of 19th International Conference on Computer Communications and Networks (ICCCN), Zurich, Switzerland, August 2010.
7. T. Chattopadhyay, Aniruddha Sinha, Arpan Pal, “TV Video Context Extraction”, IEEE Trends and Developments in Converging Technology towards 2020 (TENCON 2011), Bali, INDONESIA, November 21-24, 2011.
8. Arpan Pal, Chirabrata Bhaumik, Debnarayan Kar, Somnath Ghoshdastidar, Jasma Shukla, “A Novel On-Screen Keyboard for Hierarchical Navigation with Reduced Number of Key Strokes”, IEEE International Conference on Systems, Man and Cybernetics (SMC), San Antonio, Texas, October 2009.
9. Arpan Pal, Debatri Chatterjee, Debnarayan Kar, “Evaluation and Improvements of on-screen keyboard for Television and Set-top Box”, IEEE International Symposium for Consumer Electronics (ISCE), Singapore, June 2011.
23
Publications (contd…)
10. Arpan Pal, M. Prashant, Avik Ghose, Chirabrata Bhaumik, “Home Infotainment Platform – A Ubiquitous Access Device for Masses”, Book Chapter in Springer Communications in Computer and Information Science, Volume 75, 2010, Pages 11-19.DOI: 10.1007/978-3-642-13467-8.
11. Arpan Pal, Ramjee Prasad, Rohit Gupta, “A low-cost Connected TV platform for Emerging Markets–Requirement Analysis through User Study”, Engineering Science and Technology: An International Journal (ESTIJ), ISSN: 2250-3498, Vol.2, No.6, December 2012.
12. T. Chattopadhyay and Arpan Pal, “Watermarking for H.264 Video”, EE Times Design, Signal Processing Design Line, November 2007.
13. Arpan Pal, Aniruddha Sinha and Tanushyam Chattopadhyay, “Recognition of Characters from Streaming Videos”, Book Chapter in book: Character Recognition, Edited by Minoru Mori, Sciyo Publications, ISBN: 978-953-307-105-3, September 2010.
14. Arpan Pal, Tanushyam Chattopadhyay, Aniruddha Sinha and Ramjee Prasad, “The Context-aware Television using Logo Detection and Character Recognition”, (Submitted) Springer Journal of Pattern Analysis and Applications
15. Debatri Chatterjee, Aniruddha Sinha, Arpan Pal, Anupam Basu,, “An Iterative Methodology to Improve TV Onscreen Keyboard Layout Design Through Evaluation of user Study”, Journal of Advances in Computing, Vol.2, No.5, October 2012), Scientific and Academic Publishing (SAP), p-ISSN:2163-2944, e-ISSN:2163-2979.
24
Future Work
Adaptive Video Chat Network sensing via IP statistics, Support multicast
Low-complexity Video Security Extending security analysis, Audio Watermarking
Internet-TV mash-up Extend to second-screens (mobile, tablets), seamless transfer of
streaming content to/from TV from/to mobile/tablets, Integrating social media into broadcast TV.
On-screen Keyboard Predictive keyboard integration, incorporating voice interfaces and
gesture controls. Explore using mobile phones as the remote control for TV.
Generic Optimization the contradicting triad of requirements in form of cost,
feature and performance, in the face of ever improving hardware functionalities.
25
Learning from the CTIF-GISFI PhD Program
Access to excellent and state-of-the-art technical course content from Aalborg University
Very useful Research Soft Skill courses Problem based Learning Professional Networking Qualitative Research Approaches Innovation Methodology Writing Scientific Papers
Technical Support from CTIF faculty
Culture of Practical Problem based Research
Attending relevant conferences and EU FP7 related programs
Networking with experts in related areas
28
Introduction
Objective Improving the user experience of TV as a low-cost Internet-access
device for masses in emerging markets like India
• Motivation• Background• User Study Based Requirement Analysis• Contributions
1) Improve the Quality of Experience (QoE) of Video Chat under poor network conditions using network sensing
2) Provide computationally efficient yet sufficiently secure algorithms for efficient access control and digital rights management (DRM) for sensitive multimedia content
3) Improve the experience of browsing the Internet on TV through intelligently understanding the context of the TV program being watched and blending related information from the Internet (known as TV-Internet mash-up)
4) Improve the experience of text entry on TV using the remote control through a novel design of an on-screen keyboard layout
29
Thesis Organization
ICT InfrastructureLimitations
User Experience
Resource Constrained Platform
Low-QoS Network
Non computer-savvy users
TV context-aware Intelligent TV-Internet
Mash-ups
Multimedia Framework for Quick
Application Development
Low complexity Video Encryption and
Watermarking Algorithms
Improved on-screen keyboard layout for
text entry on TV using remote
QoS-aware Video
Transmission
Low-cost Infotainment
Platform
Television as a Ubiquitous
Display Device
Applications of Social Value-add
Chapter 2 Chapter 4 Chapter 5 Chapter 6 Chapter 3
Scientific Contributions
Chapter 2Chapter 2
Appendix A Appendix AAppendix A
Engineering Contribution
Requirements
Challenges
30
Background - Home Infotainment Platform (HIP )
A/V Out
A/V in
Internet Browser Media Player (Audio, Video,
Image) Video Chat, SMS Remote Healthcare Distance Education
Pilot Deployment in India and Philippines
31
• In Philippines it is launched through Smart Communications under the brand name of “SmartBro SurfTV”.
HIP Deployment
Tata Teleservices Ltd. launched HIP under the brand name of “Dialog” in Tamil Nadu and West Bengal.
TCS’ Home Infotainment Platform launched successfully in India and Philippines.
32
HIP Applications
Chipset - TI Da-Vinci DM6446 (297 MHz ARM9 core and 594 MHz DSP)
RAM - 256MB DDR
Flash - 64MB NAND
OS - Embedded Linux 2.6.x
Browser - Opera for devices 9.6 with Flash
Multimedia Codecs
Video - H.264,MPEG1, MPEG2, MPEG4Audio - MP3, AMR-NB, AAC, OGG, FLACImage - JPEG
33
Socially Value-adding Apps – Healthcare and Education
ECG
Blood Pressure Monitor
Pulse OxyMeter
HIP
Patient
Records
Health Center / Home
Expert Doctor
Digital Stethoscope
34
HIP Application Development Framework
NetworkMicrophone
Camera
A/V in
Storage
USB
Compress
Decompress
Multiplex
Demux
Render
Blend
Network
VGA
TV Video
TV Audio
Headphone
Storage
Control APIs
SRC PROC SINK
Applications
Application SRC PROC SINKVideo from Internet Network Demux – Decompress TV Video / Audio
Media player Storage Demux – Decompress TV Video / AudioVideo Chat (Far View) Network Demux - Decompress TV Video /
HeadphoneVideo Chat (Near View) Camera and
MicrophoneCompress – Multiplex Network
Internet Browser Network Render TV VideoRemote Healthcare USB Compress - Multiplex Network
Distance Education - Lecture Recording
A/V in Demux – Decompress, Remux - Compress
Storage
Lecture Playback Storage Demux – Decompress TV Video / AudioQA and Course Guide Network Demux Parser TV Video / Audio
35
Sensing of Network Condition
T = average (RTT (P1, t1i), RTT (P2, t2i)) .
Heuristic based mapping of Effective bandwidth of the network based on experimentation on the real network (CDMA1xRTT using Tata Docomo Photon) at different times of the day and at different places
RTT(P1, t1i)
RTT(P2, t2i)
P1(t1i) P2 (t2i)
Transmitter End
T (msec) BWeff (kbps)
T < 300 50
300 < T < 800 13
800 < T < 1600 4
1600 < T < 1900 2
T > 1900 1
36
Rate Control in Audio and Video
Audio Codec (AMR-NB)• If Effective Bandwidth > 4 kbps, bit rate = 5.15 kbps, else bit rate
= 4.75 kbps
Video Codec (H.264)• For complex scenes, only frame level control exhausts bit
budget at the initial frames as MAD increases. • MB level control makes sure that bit budget is not exhausted
because the whole frame is not complex throughout. Estimation of complexity of the video scene –• Bit rate:• Mean Absolute Difference (MAD) prediction model –• Threshold for frame T (n) is defined as 80% of MADavg(n).
Threshold selection done based on 20 different test sequences of different resolutions using classical decision theory.
• If in a frame, MADcb > T(n), for at least one MB in a frame, the frame is declared as complex and MB is chosen as the basic unit for that frame.
• If in a frame, MADcb <= T(n), for all MBs, then basic unit is chosen as frame.
37
Adaptive Packetization of Encoded Video/Audio
• Fragment Size N = (fr+H) = 1440 bytes (optimal size through experimentation)
• Transmission interval dt = (fr +H) / (1000*BWeff) • A 9-byte header is added to each video fragment.
Frame type (1 byte) – I (Independent) or P (Predictive) frame. Total sub-sequence number (1 byte) – total number of fragments in a video
frame. Sub sequence number (1 byte) – fragment number. Sequence number (4 bytes) – video frame number. Video payload size (2 bytes) – video bytes in the current fragment.
• Drop a frame iff a newer frame fragment arrives before all the fragments of the current frame (improvement over RTP)
• Do not discard packets based on transit delay - probability of receiving good packets on a slow network is increased (improvement over TCP).• 20 msec AMR_NB audio frames - M (=10, matching 5 fps video) audio frames aggregated and sent in a single UDP fragment.
• DTX (Discontinuous Transmission) enabled in the AMR-NB encode (VAD) - If the silence period is more than D seconds then the audio transmission is discontinued.
For a good channel (BWeff > 4 kbps), D > 10 seconds For a bad channel ((BWeff <= 4 kbps), D = 3 to 5 seconds
38
Problem Definition
Digital Watermarking Requirement Needs to be imperceptible and robust at the same time Needs to have low-computational complexityState-of-the-Art Initial and classical works are on MPEG2 and not on H.264 Reported H.264 watermarking systems have high computational
overhead Reported works lack perceptual quality analysis and attack-
robustness analysis
.
Video Encryption Requirement Needs to have low-computational complexity, yet adequate
securityState-of-the-Art Uncompressed domain encryption - high decryption
computational overhead Reported H.264 video encryption works – no focus on computational
complexity No work reported on Video Quality assessment after
encryption/decryption
.
39
Watermarking Algorithm Flow
Is the frame IDR?
Y
N
N
Is it an even IDR?Y
CONTINUE
Hash the previous GOP Check for Message Size
Embed Watermark (Hash Number of previous GOP)
Find location for embedding (image + data)
Embed Watermark (actual message)
Find location for embedding (data)
40
Digital Watermarking – Algorithm Details
Embed information in corresponding coefficient (10th or 15th bit depending on bit-index being odd or even)
Data
Image
Message is image or data
Diagonal SB? Ab-Diagonal SB?
SKIP
Y YN N
• HxW logo image in binary format and K byte text data information.
• Total number of bits to embed N = HxW + K*8 - stored in an N byte binary array (called wn).
• Wn quantized using same quantization parameter (qp) used in H.264 - quantized values stored in array (wqn ).
• For each wqn, find the location of embedding inside the image - Image location mapped wqn is depicted as M(u,v), where (u,v) denotes the position in the DCT domain.
41
Digital Watermarking - Attacks
.264 with WM .YUV .YUV
.264 .YUV
Retrieved Image / Text
H.264 Decoder (without WM
detector)
H.264 Encoder (without WM embedding)
H.264 Decoder (with WM detector)
Measure of quality for retrieved Image / Text
Original Image / Text
Attack
Report Generation
6. Non-linear filtering attack (NLFA)
7. Gaussian attack (GA)8. Gama correction attack
(GCA)9. Histogram equalization
attack (HEA)10.Laplacian attack (LEA)
1. Averaging attack (AA)2. Circular averaging attack
(CAA)3. Rotate attack (RoA)4. Resize attack (RsA)5. Frequency filtering attack
(FFA),
Video Quality Comparison
42
Watermarking Perceptual Video Quality after Attack
• Ten Quality Measures - Average Absolute Difference (AAD), Mean Square Error (MSE), Normalised Mean Square Error (NMSE), Laplacian Mean Square Error (LMSE), Signal to Noise Ratio (SNR), Peak Signal to Noise Ratio (PSNR), Image Fidelity (IF), Structural Content (SC), Global Sigma Signal to Noise Ratio (GSSNR), Histogram Similarity (HS)
• Three pairs of videos – a) Two identical videos (high quality), b) Two completely different videos (poor quality) and c) original video and compressed / decompressed video (average quality)
• Weighted Quality Metric W_VAL = ((AAD+GSSNR+LMSE+MSE+PSNR)*3+HS+IF+NMSE+SC+SNR)
• 14 test video streams were taken, subjected to different kinds to attacks and W_VAL calculated 20 users requested to judge attacked and original watermarked video based on perception.
• Judgement based on human vision psychology (HVS) – converted to a fuzzy Mean Opinion Score (MOS based parameter Cqual
IF (W_VAL >= 90), Cqual = Excellent ELSEIF (W_VAL >= 80), Cqual = GoodELSEIF (W_VAL >= 75), Cqual = AverageELSEIF (W_VAL >= 70), Cqual = BadELSE Cqual = Poor
43
Watermarking Retrieved Image and Text Quality
Image • Normalized deviation parameter (de) from Euclidian distance d • Bit error (be) - % of bits differing between retrieved and original binary image• c - difference in crossing count of 0 to 1 of original and retrieved binary image,
Crossing count error • Final error in retrieved image• Based on MOS, decision logic for the quality of the retrieved image (Cimg)
IF e < 0.5 Cimg = Excellent IF 5 > e > 0.5 Cimg = GoodIF 10 > e > 5 Cimg = MediumIF 15 > e > 10 Cimg = BadELSE Cimg = Poor
Text• Compute mean error (te) of the Hamming distance and Levensthein distance • MOS based retrieved text quality Ctxt -
IF te< 0.5 Ctxt = Excellent IF 1 > te >0.5 Ctxt = GoodIF 3> te > 1 Ctxt = MediumIF 5> te > 3 Ctxt = BadELSE Ctxt = Poor
44
Watermarking Perceptual Video Quality - Results
Attack W_VAL Cqual
AA 100 Excellent
CAA 52 Poor
FFA 25 Poor
GCA 27 Poor
GA 71 Bad
HEA 27 Poor
LEA 28 Poor
NLFA 25 Poor
RsA 100 Excellent
RoA 37 Poor
Original Logo
AA
FFA
GCA
NLFA
HEA
LEA
RoA
RsA
Original Video
45
Watermarking Decision Logic on Perceptual Quality after Attacks
Video Quality Retrieved Image Quality
Retrieved Text Quality
Overall Measure of Goodness
Excellent or Good Excellent Excellent Excellent
Excellent or Good Excellent Good Good
Excellent or Good Good Excellent Good
Excellent or Good Good Good Good
Excellent or Good Medium Medium Medium
Excellent or Good Bad or Poor Medium Bad
Excellent or Good Medium Bad or Poor Bad
Excellent or Good Bad or Poor Bad or Poor Poor
Medium, Bad or Poor
Any Any Attack degrades video quality
46
Watermarking Results on Retrieved Quality after Attacks
Attack be ce de e Image Quality (CImg)
AA 0.000 0.000 0.000 0.000 Excellent
CAA 5.469 9.896 3.448 6.271 Medium
FFA 5.469 10.93855.17
223.86
0Poor
GCA 0.781 1.563 3.448 1.931 Good
GA 4.948 9.89624.13
812.99
4Bad
HEA 1.563 1.563 3.448 2.191 Good
LEA 1.823 2.083 0.000 1.302 Good
NLFA 5.729 10.41713.79
39.980 Medium
RsA 0.000 0.000 0.000 0.000 Excellent
RoA 0.781 0.521 0.000 0.434 Excellent
Attack L H te Text Quality (Ctxt)
AA 0 0 0.000 Excellent
CAA 6 1 3.5 Bad
FFA 6 1 3.5 Bad
GCA 0 1 .5 Good
GA 5 1 3 Bad
HEA 6 7 6.5 Poor
LEA 4 5 4.5 Bad
NLFA 6 1 3.5 Bad
RsA 0 0 0.000 Excellent
RoA 0 0 0.000 Excellent
47
Encryption – Proposed System
ContributionLow-computational-complexity two-stage H.264 video encryption algorithm Separate Header Encryption Reuse of the flexible macro-
block re-ordering (FMO) of H.264/AVC as the encryption operator
Analysis of the effect of the encryption-decryption chain on the video quality Important from end-user
experience perspective. PSNR used as Quality
Measure
End of Slice?
N
Encode Next frame
End of Sequence?
Y
N
Encode Next MB
Encode Next Slice
End of Frame?
Y
N
End
FMO to Get Next MB Number
Proceed to next Macroblock
Y
FMO
Modify MB ordering using key based look-up
Get next MB number
48
Encryption – Two Stage Algorithm
Read NALU
Read Control data Read Video data
Read Macroblock
NALU type = ControlY N
SPS PPS IDR P SPS PPSP …. ….. IDRP
SPS PPS P P IDR PP …. ….. PP
Header Encryption• First encrypt the SPS (Seq. Param. Set), PPS (Pict. Param. Set),
IDR• Encode the first frame using conventional H.264 encoder and take a 16-
bit Key (KU) and • Take the length of IDR (lIDR) - It is a 16-bit number for QCIF resolution• Define encryption key value KP using a Hash function of lIDR and KU
Modify macro-block ordering using key based look-up• Use KP as seed to generate random sequence Le (between 0 to 97). This
is used for the first GOP. For subsequent GOPs, KP of the previous GOP is
used as KU for a new KP.
• MBs in an IDR frame encoded in the order specified by the look-up table Le.
49
Encryption – Results
Security Analysis Brute force attack – 98 possible Macroblock orders
98! = 9.42 x 10153 attempts Actually, this is restricted by the 32 bit key used for generating the MB
order - can be generated in 232 ways, requiring half that number of attempts to decrypt
Key changed every GOP - proposed method is robust enough in comparison to other reported similar methods
Operation # per GOP % Increase
ADD 2*24*97 + 3 = 4659 0.004
MULTIPLICATION 5*24*97 = 11,640 0.200
DIVISION 5 0
MODULO 4*24*97 + 4 = 9316 0
Resolution (wxh) Picture size in MBs
Memory (bytes)
QCIF (176x144) 99 198CIF (352x288) 396 792
VGA (640x480) 1200 2400SDTV-525 (720x480) 1350 2700SDTV-625 (720x576) 1620 3240
Video Sequence
Size / frame (in bytes) PSNR (of Y component)Without encryption With encryption Without encryption With encryption
Claire 155.125 158.16 39.03 39.03
Foreman 668.975 700.3 35.14 35.16
Hall monitor 264.82 268.705 36.97 36.96
Computational Complexity Analysis
Video Quality Analysis
50
Textual Context from Broadcast TV – Requirement
ANDHRA CM’S MISSINGCHOPPER MYSTERY
Andhra, CM, Missing Chopper
Keyword spotting
Search for RSS feed containing related information
Search through any Engine for related information
Display Related Information on TV
Missing Chopper Found, CM Dead
51
Textual Context in Static Pages – Proposed System for DTH
Contribution
Pre-processing and Enhancement Noise Removal through low-
pass filtering on Y Resolution Enhancement
through Interpolation-based zooming
Binarization and Touching Character Segmentation Adaptive Thresholding based
Binarization Touching character
segmentation using width outlier detection
Use standard OCR tools like GOCR and Tesseract
A-priori ROI Mapping
Pre-processing for noise removal and image enhancement
Binarization and Touching Character Segmentation
OCR using standard engines
52
TV Channel Identity – Results
110 channels tested: r = 0.96 and p = 0.95.• The channel logos with very small number of pixels are missed in 1%
cases. • For rest 3% misses – moving channel logo / changed logo.• 3% false positive from small size logo – removed from template• 2% false positive due to highly transparent logos
Recall and Precision Measures
Original Channel Detected AsZee Trendz DD NeZee Punjabi TV9 GujaratiDD News DD Ne
Nick DD NeNepal 1 Zee Cinema
Module Time (msec)YUV to HSV 321.09ROI mapping 0.08Mean SAD matching
293.65
Correlation 847.55
Changed Logo Examples False Positive Examples
.
Computational Complexity
53
Textual Context from Broadcast TV– Proposed System for News Mash-up from Internet
Breaking News Heuristics• Breaking news always comes in
Capital Letter.• Font size of breaking news is larger
than that of the ticker text• They tend to appear on the central
to central-bottom part of the screen
Localization of suspected text regions
Text region confirmation using Temporal Consistency
Binarization
Text Recognition
Keyword Selection
Contribution An improved method for text
region localization and screen layout segmentation
Pre-processing techniques on the text region (same as previous section)
Heuristics based keyword spotting algorithm followed by Google’s in built dictionary-based correction
54
Image
Output of GOCR Output of Tesseract After Applying Proposed AlgorithmsGOCR Tesseract
(a)
Sta_ring Govind_. Reem_ _n. RajpaI Yadav. Om Puri.
Starring Guvinda, Rcema Sen, Raipal Yadav, Om Puri.
Starring Govind_. Reem_ _n. RajpaI Yadav. Om Puri.
Starring Guvinda. Reema Sen, Raipal Yadav. Om Puri.
(b) _____ ___ ___ _________ ____ __ __
Pluww SMS thu fnlluwmg (adn In 56633
___ SMS th_ folIcmng cod_ to S__
Planta SMS tha Iullmmng mda tn 56633
(c) SmS YR SH to SMS YR SH in 56633 SmS YR SH to _____ SMS YR SH to 56533(d) _m_ BD to _____ SMS BD to 56633 SMS BD to S____ SMS BD to 56633(e) AM t___o_,_b _q____ AM 2048eb 141117 AM tOa_gb _q____ AM 2048eb 141117(f) _M_= _ _A___ to Sd___ SMS: SC 34393 tn 56533 _M_= _ _A___ to Sd___ SMS: SC34393 tn 56633g) _W _ ' _b _ Ib_lb _a W6.} 048abl;lbwzIb1a ___ __Y_b yIbw_Ib_a WP 2048ab Mlbwzlb 1 a(h) ADD Ed_J to S____ ADD Eau to $6633 ADD Ed_J to S____ ADD Edu to 56633(i) AIC STAlUSlS/OUO_
t_;OS;t_AIC STATUS25/02/09 1 9:05:1 4
mlC S_ATUSlS/OUO_ t_;OS=tA A/C STATUS 25/02/09 1 9:05:14
(j) _ ________'__ Sub ID 1005681893 WbID_OOS_B_B__ Sub ID 1005681893
Screenshots of candidate ROI’s Accuracy of Text Detection
OCR Example Results
Source: Tata Sky DTH Service in India
Textual Context in Static Pages – Results
55
Textual Context from Broadcast TV - Text Region Localization
• Filter out low-contrast components - intensity based thresholding (output Vcont.).
• Count the number of Black pixels in a row in each row of Vcont. • Let the number of Black pixels in ith row be defined as cntblack(i)• Compute the average ( avg black ) number of Black pixels in a row as
where ht is the height of the frame.• Compute the absolute variation av(i) in number of black pixels in a row from
avgblack as av(i) = abs(cntblack(i) – avgblack)
• Compute the average absolute variation ( aav ) as
• Compute the threshold for marking the textual region as
• Mark all pixels in ith row of Vcont as white if
Confirmation Of The Text Regions Using Temporal Consistency• Assumption that texts in the breaking news persist for some time. • Vcont sometime contains noise because of some high contrast regions in the
video frame• In a typical video sequence with 30 FPS, one frame gets displayed for 33 msec. • Assuming breaking news to be persistent for at least 2 seconds, all regions
which are not persistently present for more than 2 seconds can be filtered out.
56
Textual Context from Broadcast TV - Post-processing Heuristics
• Operate the OCR only in upper case• If the number of words in a text line is above a heuristically obtained threshold value they are considered as candidate text region.• If multiple such text lines are obtained, chose a line near the bottom• Remove the stop words (like a, an, the, for, of etc.) and correct the words using a dictionary.• Concatenate the remaining words to generate the search string for internet search engine
Selected keyword can be given to Internet search engines using Web APIs to fetch related news, which can be blended on top of TV video to create a mash up between TV and Web.
Since search engines like Google already provide word correction, thereby eliminating the requirement of dictionary based correction of keywords.
57
Textual Context from Broadcast TV – Results
Accuracy Of Text Localization• Experimental results show a
recall rate of 100% and precision of 78%
• The reason behind a low precision rate is tuning the parameters and threshold values in a manner so that the probability of false negative (misses) is minimized.
• The final precision performance can be only seen after applying text recognition and keyword selection algorithms
Accuracy Of Text Recognition• in case of false positives a number of special
characters are coming as out put of OCR. • So the candidate texts having special
character/ alphabet ratio > 1 are discarded. • Moreover proposed keyword de tection
method suggests that concentrating more on capital letters.
• So only the words in all capi tals are kept under consideration.
• It is found that character level ac curacy of the selected OCR for those cases in improves to 86.57%.
Accuracy of information retrieval • Limitations of the OCR module can be overcome by having a strong dictionary or
language model. • But in the proposed method this constraint is bypassed as the Google search
engine itself has one such strong module. • So one simply gives the output of OCR to Google search engine and in turn
Google gives the option with actual text
58
Proposed Algorithm for Keyboard Layout
Algorithm• Total No. of Character Cells = T• Total no. of rows of key-blocks = R• Total no. of columns of key-blocks = C• Total no. of Cells in a key-block = 4• T = R x C x 4, Max. Keystrokes in
worst case K = (R+C+1) keystrokes. • Hence, the desired solution boils
down to finding R and C for which K is minimum.
Start
numCharsSqrt = Square root of (T)
sqRootInt = Ceiling of (numCharsSqrt)sqRootInt even no. ?
N=sqRootInt N=sqRootInt + 1Yes
No
N2= N - 2
C= N/ 2
(N* N2)>= T
R = N2/2 R = CYes
No
output R,C
Stop
Input T
Example• “QWERTY” - T=54 : 4 rows, 14
columns (14+4=18) keystrokes max. • “PROPSED” - T=54, sqrtInt = 8, N=8.,
N2=6, C=4, (N*N2=48 < 54), R = C = 4. Max 9 keystrokes
• Final Layout used - T=48, sqrtInt = 7, N=8, N2=6, C=4,. (N*N2=48 == T=48) , R = N2/2 = 3. Finally, R=3 and C=4. Max 8 keystrokes
59
Onscreen Keyboard Results – User Study 1
QWERTY vs. Layout-1 (after practice)
QWERTY vs. Layout-1 (before practice)
1. Does the on-screen keyboard provide enough assistance?2. How is the ease of use of the on-screen keyboard?
60
Results – User Study 2 (KLM-GOMS Modelling)
Layouts % improvement (Experiment)
% improvement (predicted from KLM-
GOMS)
Layout 1 over QWERTY
44.23 45.75
Layout 2 over QWERTY
45 46.75
Layout 2 over layout1
2 1.84
Layouts % improvement (Experiment)
% improvement (predicted from KLM-
GOMS)
Layout1 over QWERTY
42.2 35.23
Layout2 over QWERTY
43.4 37.2
Layout 2 over Layout1
3.18 2
• A total of 20 users
• Simple Text Entry and Email Sending Task.
• Six phrase sets selected randomly from standard MacKenzie’s phrase set.
• Users were given initial
familiarization phrase and then asked to enter six phrases at one go.
• Time taken by each user and the number of keystrokes required to type the phrase were recorded.
Simple Text Entry
Complete Email Typing and Sending Task
61
Onscreen Keyboard Results – User Study 2
P - redefined as the total time taken in finding a key and moving the focus to select the block containing that particular key. Layout 1 – 1.77 sec, Layout 2 – 1.73 sec and QWERTY – 1.10 secH – not usedNew parameter F - the time required for finger movement – 0.22 sec.
Operators
Description Time in sec
P Pointing a pointing device 1.10K Key or button press 0.20H Move from mouse to
keyboard and back0.40
M Mental preparation and thinking time.
1.35
Operations Time for
Layout-1 in sec
Time for Layout-
2 in sec
Time for
QWERTY in sec
Open/close onscreen keyboard
layout.
0.4 0.4 0.4
Find any key 1.07 1.03 1.1
Move focus to select a key
0.7 0.7 2.6
Move finger to the corner keys
0.2 0.2 0.2
Enter a character using
keyboard
2.17 2.13 4.0
Sub-goals
Time for layout
1 in sec
Time for
layout 2 in sec
Time for
QWERTY in sec
Open browser
0.5 0.5 0.5
Open gmail
server & login
45.3 44.5 82.1
Compose mail
68.0 65.8 115.1
Dispatch
0.4 0.4 0.4
KLM-GOMS Operators
KLM-GOMS sub-goals for Email Task