38
Audio Based Networking M.Tech Dissertation Submitted in the partial fulfillment of the requirements for the degree of Master of Technology by Mahima Roll No : 143050066 under the guidance of Prof. Bhaskaran Raman Department of Computer Science and Engineering Indian Institute of Technology Bombay June,2016

Audio Based Networking - cse.iitb.ac.in

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Audio Based Networking - cse.iitb.ac.in

Audio Based Networking

M.Tech Dissertation

Submitted in the partial fulfillment of the requirements for the degree of

Master of Technology

by

MahimaRoll No : 143050066

under the guidance of

Prof. Bhaskaran Raman

Department of Computer Science and Engineering

Indian Institute of Technology Bombay

June,2016

Page 2: Audio Based Networking - cse.iitb.ac.in

Dissertation Approval

This dissertation entitled “Audio Based Networking”, submitted by Mahima (Roll No:143050066) is approved for the partial fulfilment of the requirement of Master of Technology inComputer Science and Engineering from Indian Institute of Technology Bombay.

Prof. Bhaskaran RamanDept. of CSE, IIT Bombay

Supervisor

Prof. Kameswari Chebrolu Prof. Mythili VutukuruDept. of CSE, IIT Bombay Dept. of CSE, IIT BombayExaminer Examiner

Chairperson

Date : / /2016

Place :

Page 3: Audio Based Networking - cse.iitb.ac.in

Declaration

I declare that this written submission represents my ideas in my own words and where others’ideas or words have been included, I have adequately cited and referenced the original sources. I alsodeclare that I have adhered to all principles of academic honesty and integrity and have not misrepresentedor fabricated or falsified any idea/data/fact/source in my submission. I understand that any violation ofthe above will be cause for disciplinary action by the Institute and can also evoke penal action from thesources which have thus not been properly cited or from whom proper permission has not been takenwhen needed.

Date : / /2016 Mahima

Place: IIT Bombay, Mumbai Roll No: 143050066

Page 4: Audio Based Networking - cse.iitb.ac.in

Acknowledgment

I would like to thank Prof. Bhaskaran Raman for all the efforts he put, reviews by him on my work,and guidance throughout the project progress. I would also like to thank Prof. Mythili Vutukuru andDeepthi Bhushan chander for their valuable suggestions and help.

Page 5: Audio Based Networking - cse.iitb.ac.in

Abstract

Today, smart phones are the most prevalent devices among people and that is why, most of the newimplementations are proposed for these devices. Sharing of information between these devices is veryeasy and common. There exists a number of techniques through which information can be shared overair. Henceforth, Audio waves/ Sound waves also can be used as a medium to transfer the informationbetween these devices. Existing implementations in this direction have data rates in multiple of tens ofbits/second for transmission. But sometimes, there is a need to share short information like URL or anyidentification number etc. and for such a short data sharing, this small data rate is fine. The techniquewhich I propose has the advantage of using built-in microphones and speakers of the smart phone deviceand is an implementation that does transfer without any authentication procedure. Some work has beendone in the related field but that is incapable to work in presence of noisy environment. I want to proposea design implementation for smart phones which is robust to environmental noise. This report includesthe factors that affect design and its final data rates. These factors include operating environment,frequencies for transmission, modulation technique, encoding techniques, distance between sender andreceiver etc. The design attains maximum data rate 19.6 bits/second in noisy environment and 25.3bits/second in lab conditions. These best data rates are obtained with 8-FSK. It uses Viterbi codes forerror correction at physical layer and CRC for error detection at link layer. MAC layer techniques andtheir feasibility for the design is also described in detail. These techniques include CSMA, CDMA andpure ALOHA. Hardware and environmental dependencies of implementation and limitations of proposeddesign are also explained in the report.

Page 6: Audio Based Networking - cse.iitb.ac.in

Contents

1 Introduction 11.1 Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Outline of report . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2 Previous Work 32.1 Data transfer Implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32.2 Localisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

3 Preliminaries 63.1 Terms and Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.1.1 Frame Loss Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.1.2 Error Rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.1.3 Throughput . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.1.4 FFT (Fast Fourier Transformation) . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3.2 Frequency band selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2.1 Set up and tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63.2.2 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4 Physical layer 84.1 Physical Layer: Design Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

4.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.1.2 Synchronisation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.1.3 Modulation Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104.1.4 Error Correcting Codes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Error Rate and Throughput Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2.1 Experimental Setup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2.2 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.2.3 Error rate vs number of data bits per frame . . . . . . . . . . . . . . . . . . . . . . 144.2.4 Comparison of no encoding, Hamming encoding and Viterbi encoding: . . . . . . . 14

4.3 Performance at physical layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.1 Performance for Laptops . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.3.2 Performance for smart phones . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5 Link Layer 185.1 Cyclic Redundancy Check . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

5.1.1 8-bit CRC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.1.2 Procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

5.2 Media Access control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 195.2.1 CSMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205.2.2 CDMA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225.2.3 Slotted ALOHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2.4 Pure ALOHA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2.5 Experimental set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

6 Application Layer 26

i

Page 7: Audio Based Networking - cse.iitb.ac.in

7 Conclusion and Future work 287.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 287.2 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

ii

Page 8: Audio Based Networking - cse.iitb.ac.in

List of Figures

3.1 Frequency band test set up . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Frequency band test results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

4.1 Physical layer at sender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.2 Physical layer ar receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84.3 Preamble detection rate vs number of bits in preamble . . . . . . . . . . . . . . . . . . . . 94.4 Preamble detection rate vs number of samples for preamble . . . . . . . . . . . . . . . . . 104.5 Effect of samples per data bit on error rate . . . . . . . . . . . . . . . . . . . . . . . . . . 134.6 Effect of Number of bits per frame on error rate . . . . . . . . . . . . . . . . . . . . . . . 144.7 Comparison of encoding techniques for throughput (SS-2048, Data bit-2048 samples) . . . 154.8 Comparison of encoding techniques for throughput (SS-4096, Data bit-2048 samples . . . 154.9 Comparison of encoding techniques for throughput (SS-8192, Data bit-8192 samples) . . . 164.10 Frame format at physical layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.1 packet at link layer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 185.2 CRC simulation results for even number of bits in error in 28 bit frame . . . . . . . . . . 195.3 8-bit CRC simulation results for even number of bits in error in 28 bit frame . . . . . . . 195.4 Xperia Z3 receptivity- No Transmission vs Transmission at distance 1 meter . . . . . . . . 205.5 Xperia Z3 receptivity- No Transmission vs Transmission at distance 5 meters . . . . . . . 215.6 Comparison between microphone reception capabilities of two smart phones . . . . . . . . 215.7 Set up for pure ALOHA experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.8 Frame inter arrival time distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.9 Pure Aloha results with Samsung S3 as receiver . . . . . . . . . . . . . . . . . . . . . . . . 255.10 Pure Aloha results with LG Nexus as receiver . . . . . . . . . . . . . . . . . . . . . . . . . 25

6.1 Distribution of 28 data bits in first frame . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Distribution of 28 data bits in second frame . . . . . . . . . . . . . . . . . . . . . . . . . . 266.3 Screenshot at sender . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 276.4 Screenshot at receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

iii

Page 9: Audio Based Networking - cse.iitb.ac.in

List of Tables

2.1 Data transfer implementations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.2 Implementations for localization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

4.1 4-FSK implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.2 8-FSK implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114.3 Finite state machine for Viterbi decoding . . . . . . . . . . . . . . . . . . . . . . . . . . . 134.4 Comparison of no encoding, Hamming and Viterbi for error rates (SS-2048, Data bit-2048

samples) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144.5 Comparison of no encoding, Hamming and Viterbi for error rates (SS-4096, Data bit-2048

samples) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154.6 Comparison of no encoding, Hamming and Viterbi for error rates (SS-8192, Data bit-8192

samples) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164.7 Dependence of throughput upon distance . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.1 Comparison of Smart phones for reception of inaudible frequencies (16kHz-18.5kHz) . . . 21

6.1 GSM information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

iv

Page 10: Audio Based Networking - cse.iitb.ac.in

Chapter 1

Introduction

1.1 Problem Statement

Smart phones are integral part of our life since numerous kind of techniques implemented on thesedevices. Information transfer in between two phones or broadcasting among phones is very commonand essential need. There are number of ways to transfer information and each is preferable dependingupon various parameters like energy consumption, distance between devices, number of devices involved,types of devices in use, hardware in use etc. Similarly, sound waves can also be a way of transferringinformation between these devices, using their built-in speakers and microphones. Sound waves can beused for connection oriented as well as for connection less communication. Report proposes a design forbroadcasts (connectionless communication) using sound waves from a smart phone’s speaker and then itsreception from microphones of other nearby devices.

Secure transfer using sound waves have been implemented for NFC (near field communication)between smart phones in [1], which uses audible band of frequencies. Work proposed in [2] uses inaudiblefrequencies above 18 kHz for transmission among devices (other than smart phones), placed within radiusof 8 meters. No work describes its feasibility on smart phones at distance greater than 1 meter in noisyenvironment. Report proposes a design on smart phones which is robust against noisy environment andits data rates are checked after implementing it for both lab and noisy environments. Senders sends at itsmaximum volume for all of the experiments described in report. For experiments in noisy environment,

• Experiments are done in city buses in between 5:30 pm to 7:30 pm in Mumbai

• In room, playing recordings of city buses(recorded in between 5:30 pm to 7:30 pm) near receiver’smicrophone from three devices, maintaining the same SNR as of recording

Design does not involve authentication for communication and hence, can be used for broadcasts. Feasi-bility check for such a design requires examining parameters like, environmental interference, generationand reception of inaudible frequencies and, hardware dependencies. The design has involvement of pream-ble and start symbol for frame synchronisation, modulation techniques, error correcting codes, CRC andMAC layer protocols.

1.2 Outline of report

sat This report describes preamble design, behaviour of encoding techniques in lab environment, CRCselection and feasibility of prevalent MAC protocols. Data rate for two different environment are differentsince, transmissions are being affected in noisy channel. Maximum data rate achieved for smart phones innoisy environment is 19.6 data bits/second for 56 bits of transmission. Best results in noisy channel arewith modulation using 8-FSK and Viterbi encoding, that is, out of 56 data bits 28 bits are for redundantinformation. Dependency of throughput upon the distance between sender and receiver is analysed. Onthe top of this, link layer adds 8-bit CRC for error detection in frame that further reduces data bits to20 bits. Report further discusses feasibility of MAC protocols: CSMA, CDMA and pure Aloha. Due tohardware dependencies CSMA can not be carried as MAC protocol for the design. CDMA can not be usedas MAC technique because of less available bandwidth and restrictions imposed by FFT. Experimentsresults with pure Aloha, follow the graph of pure aloha theoretical throughputs. Experimental resultsfor pure Aloha at different channel loads are presented in graphs and table. Overall, it can be a way to

Page 11: Audio Based Networking - cse.iitb.ac.in

transmit few bytes (3 to 4 bytes) among smart phones where there is no need of connection establishmentphase. For example, application is successfully tested for broadcasting GSM cell ID, operator nameand RSSI (received strength of signal) in two frames from a smart phone. Advantages of adding startsymbol, comparison of modulation techniques for design, results for smart phones in both lab and noisyenvironment, and CPU time and power consumption of application can be referred from report [3].

2

Page 12: Audio Based Networking - cse.iitb.ac.in

Chapter 2

Previous Work

Sound tone can be used as alternate to existing techniques for transmissions, where bandwidth require-ments are not large but up-to some kilo hertz. Smart phones with sampling rate of 44 kHz, are capableof communicating with sound of frequencies up to 21 kHz. Higher frequencies can be generated andrecorded by devices with high audio sampling rates. For inaudible transmission, one can use inaudiblespectrum of sound but facing frequency selectivity for these higher frequencies. Most of the works, thathave implementation with inaudible sound waves use frequencies above 16 kHz. Audibility of frequenciesgreater than 15 kHz is tested, frequencies above 17 kHz were not audible to us but frequencies till 19 kHzwere audible to children.

Frequency band of our interest has implementations for:

• Data transfer

• Localisation

2.1 Data transfer Implementations

Data transfer implementations uses frequencies from 1 kHz to 21 kHz. According to discussions inprevious work this band can be classified in two categories

1. Audible band (Below 15 kHz)

2. Inaudible band (Above 15 kHz)

Works proposed for audible bands are for distances less than 1 m as these transmissions createdisturbance in surroundings. [4] uses 10 kHz frequency with on-off keying for transmissions up to distanceof 30 cm. It gives data rate of 251 bits/second with Harman Kardon HK206 speakers and internal laptopmicrophones. Results are for the office environment with two occupants. [5] uses audio frequency band1200-3100 Hz to transfer IP address and port number from a laptop to phone or phone to laptop. In thisscenario, a person holds both of the devices and maximum data rate obtained is up to 32 bits/secondin medium noise environment. This kind of environment is created by people walking around, chattingconstantly, occasionally opening or closing doors. Noise level is around -32 dB. According to reportedresults, transmissions got disturbed when surroundings have loud music and it requires 2-3 attempts totransmit information successfully. [1] uses frequency band of 6-7 kHz and gives data rate of 2.4 kbps at10 cm separation using 8-PSK with OFDM. This quite high rate is because of Near field communicationand use of OFDM in modulation. Experimental environment is not specified in the [1] as transmissionsfor 30 cm distance are not being interfered by environmental noise. Both of these implementations fordistance of 30 cm has error detection using CRC. [5] employs forward error correction with Reed-Solomonencoding to avoid retransmissions.

To avoid disturbances due to audible transmissions, some implementations use band of frequencieswhich they claimed inaudible. [4] uses 21.2 kHz frequency with on-off keying to transmit over distanceof 3.4 meters and gives data rate of 8 bits/second. Experiments are performed in office environment.[2] uses frequencies 18 kHz-20 kHz and gives data rate of 35.8 bits/second for distance of 8 meters.This implementation uses piezoelectric speakers(BNM0026) and piezoelectric receivers and data ratesreported are for indoor environment. Data rate is achieved with 16-FSK modulation. [6] proposes acounting technique based on audio waves. Band used for this purpose is 15-20 kHz and it is divided in

3

Page 13: Audio Based Networking - cse.iitb.ac.in

98 frequencies. Each of the user uses its phone for transmitting one of these frequencies(as its identifier)and implementation is tested for 5 meters in noisy environment. Noisy environments for experiments arebus stop and running bus. User is capable of transmitting 5 tones/sec, where each tone corresponds toa particular frequency. [7] presents design for smart phones to transfer information over a distance of 1meter. Data rate obtained is 10 bits/second with BFSK modulation technique. [7] and [6] use CRC forerror detection.lab environment, max volume

Tables 2.1 and 2.2 show details of different implementations of audio Networking:

PaperModulationTechnique

Frequency Range Data rateErrorcontrol

A System for Au-dio Signaling BasedNAT Traversal [5] MFSK

1200 Hz-3100Hz(two fre-quency bands,higher andlower)

4 bits/sec-32 bits/sec(depends onsound-cardin use)

FEC

Inaudible DualTone Data Trans-mission for HomeAppliances [2]

2-tone MFSK

18 kHz-20 kHz (band of16 frequencies)

i) 4 mii) 8 m

35.8 bps –

Low Cost CrowdCounting usingAudio Tones [6]

MFSK

15 kHz-20 kHz (98 fre-quencies)

5 m(single hop)

5 tones/sec(1 tone=1frequency)

Context-AwareComputing withSound [4]

i) DTMFii) OOKiii) Inaudible OOK

i) –ii) 10 kHziii) 21.2 kHz

i) 3 mii) 30 cmiii) 3.4 m

i)20 bits/secii)251bits/seciii)8 bits/sec

CRC

Dhwani: Se-cure Peer-to-PeerAcoustic NFC [1]

i) OFDM + BPSKii)OFDM + QPSKiii)OFDM + 8-PSK

6-7 kHz( sub carriers of171 Hz)

10 cm

i)800 bpsii)1.6 kbpsiii)2.4 kbps

24 bitCRC

BattMan: Acous-tic short-rangecommunicationleveraging ultra-sound [7]

BFSK16.5 kHz, 17.5kHz

1 m 10 bits/sec

64 bitCRC overpacketheaderandpacket

Table 2.1: Data transfer implementations

2.2 Localisation

[8] proposes use of ambient sound to determine the proximity between devices. This proposal is forindoor environment where each device records the sound around it. Paper describes a mechanism todetermine fingerprint of recorded sound. Proximity between devices is determined by the similaritybetween their fingerprints. For experiments recording device is Zoom H2. Results has 80% accuracy indetecting proximity (not absolute). [2] presents implementation for transmitting room ID using audiowaves. In this way, microphone of recorder will decode the audio and able to find its location in somebuilding. Walls separating the rooms will help to prevent interference between signals. [9] presents anaudio fingerprinting mechanism “Acoustic background spectrum” for the recorded sound. It uses powerspectrogram to calculate fingerprint of recorded sound. ABS fingerprint of ambient sound can be usedby a phone, in determining its location to the resolution of a single room.

4

Page 14: Audio Based Networking - cse.iitb.ac.in

Paper EnvironmentType ofsound

H/W usedSamplingrate

Accuracy

A wearable, ambientsound-based approach forinfrastructure less fuzzyproximity estimation [8]

indoorambientsound

Zoom H2 8000 Hzi) 80%ii) 46%

Indoor Localization with-out Infrastructure usingthe Acoustic BackgroundSpectrum [9]

indoor (butnot quite noisy)

persistentsound

i) Zoom H4nrecorderii) Apple i phonemicrophone

i) 96 kHzii) 44 kHz

i)69%ii)69%combinedwith Wi-Filocalization

Inaudible Dual Tone DataTransmission for HomeAppliances [2]

IndoorInaudiblebeacons

Smart phones 44 kHz 95%

Table 2.2: Implementations for localization

From the existing works it can be extracted that audible sound waves are used for transmissionsover distances less than or equal to 1 meter. This type of transmission is sensitive to surrounding noise.If we want communication over some meters then we should go for inaudible sound waves. Inaudiblefrequencies are used in previous work for either NFC (near field communication) or if used for transferover few meters of distance then most of them are for silent environment. Existing works for localisationusing smart phones, also have their implementation only in indoor environment. For noisy environment,distance coverage of these frequencies with smart phones is not determined. This kind of design can betested and can be proposed according to its obtained data rate. We are going to present a data transfertechnique for smart phones that uses inaudible sound waves. This design is expected to work in noisyenvironment.

5

Page 15: Audio Based Networking - cse.iitb.ac.in

Chapter 3

Preliminaries

3.1 Terms and Definitions

3.1.1 Frame Loss Rate

Frame loss rate is the percentage of the undetected frames out of the total frames sent.

Loss rate = Number of lost framesTotal frames sent × 100

3.1.2 Error Rate

Error rate = Number of Frames in error(Total frames sent−Number of Lost frames) × 100

A frame is in error if bit sequence in received frame does not exactly match with bit sequence in sentframe.

3.1.3 Throughput

Throughput is defined by the amount of information transferred from sender to receiver in given periodof time.

Throughput = Number of framesnot in error ∗ bits per frame(Total frames sent) ∗ one frame duration

3.1.4 FFT (Fast Fourier Transformation)

FFT is used to identify frequency with its highest number of components in the given number of samples.FFT algorithm followed for implementation is part of “Project Nayuki” [10] to compute the discretefourier transformation of a vector. It implements non-recursive Cooley-Tuckey radix-2 FFT algorithmwhich uses divide and conquer approach. It takes number of samples in powers of 2 only and has timecomplexity of O(NlogN).

3.2 Frequency band selection

To proceed, frequency band for the proposed design is to be determined. Testing must be done for bothless noisy and noisy environments. This will helps us to decide the common frequency band that can beused for transmissions over 5 to 10 meters in both of the environments.

3.2.1 Set up and tests

For experiments two smart phones were used, one as sender and other as receiver. In each of theexperiment, sender and receiver are kept same. Figure 3.1 represents experimental set up. Sender usesapplication “Frequency Generator”, which can generate frequencies up to 20 kHz and these frequenciesare recorded by an application “Easy Recorder”, which is able to sample audio at 44 kHz. In this set up,“Xperia Z3” is used as sender and “Samsung Grand Neo” is used as recorder. For less noisy environment,testing is done on roof of IIT Bombay hostel and city bus is chosen for noisy environment testing.

6

Page 16: Audio Based Networking - cse.iitb.ac.in

x meter

Sender Receiver

Figure 3.1: Frequency band test set up

Frequencies lying in frequency range 16-20 kHz are tested. In each of the environment, distanceis varied from 1 meter to 10 meters for each of the inaudible frequencies and their detection rates areidentified. Recordings are analysed using “Audacity” software.

3.2.2 Results

Testing results show that received power of frequencies decreases as distance between sender and receiverincreases. In less noisy environment, 20 kHz was able to be detected till 3 meters separation betweensender and receiver. So, frequency range 16-19 kHz is tested in noisy environment. In less noisy envi-ronment, these frequencies in this band were detectable till 8 meters of separation. Average signal powerand average noise power(in dB) at sender-receiver separation of 1 meter to 5 meters for frequencies 16kHz, 17 kHz, 18 kHz and 19 kHz are shown in graph 3.2. Noise power is defined by the power of allfrequencies (in band 16 kHz- 19 kHz) except the transmitted one. In noisy environment, every frequencyexcept the frequency 19 kHz was detectable till separation of 5 meters. Hence, 19 kHz is eliminatedfrom frequency band to be used for further experiments. However, 16 kHz-17 kHz were audible to adultsbut as frequency band is limited and design can be implemented in noisy environment, frequencies in 16kHz-17 kHz are included in frequency band of our design.

1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5−80

−60

−40

−20

0

20

Powe

r (db)

meters:16 KHz 17 KHz 18 KHz 19 KHz

Average Signal power Average Noise power

Figure 3.2: Frequency band test results

7

Page 17: Audio Based Networking - cse.iitb.ac.in

Chapter 4

Physical layer

4.1 Physical Layer: Design Parameters

Transmission and reception takes place using speaker and microphone respectively. Sender transmitsdata bits as sound waves and on the other side receiver captures these waves and regenerates the databits. First the transmission and reception of data bits should be synchronised and, then these data bitsshould be correctly regenerated by the receiver. Our goal is to maximise data rates of the transmissionand minimise error rates.

4.1.1 Overview

• Sender

Figure 4.1: Physical layer at sender

• Receiver

Figure 4.2: Physical layer ar receiver

4.1.2 Synchronisation

For a successful reception, sender and receiver need to be synchronised on transmitted frame. Forsynchronised reception, each frame has a fixed preamble at start of it.

8

Page 18: Audio Based Networking - cse.iitb.ac.in

Preamble

Sender sends frame starting with preamble, second half of preamble is replication of first half of thepreamble. Whenever receiver tries to get frame, it will find start of frame detecting preamble. Tominimise the effect of environmental noise in audible band, first filter is applied on the received data andthen, preamble detection operates on this filtered data. Preamble detection internally employs “Pearson’sauto-correlation” over the window size equal to total samples used for preamble. This window for autocorrelation is sliding window, sliding over the complete recorded frame. Formula for correlation used is:

correlation(x, y) =

∑pr/2

i=1xiyi√∑pr/2

i=1xi

2∗∑pr/2

i=1yi

2

Where, x and y are windows containing half of number of samples in preamble. These two windows moveover two halves of preamble. Here, “pr” is number of samples in preamble.

Preamble detector feeds these correlation values to the peak detection algorithm. It finds peaksin the correlation values after applying expected moving average on them. Peak detection algorithm usesa threshold to detect a value as a peak and returns a number of peaks as a result. Finally, it calculates themaximum one of these peaks and records corresponding index value as start index of preamble. Numberof bits in the preamble are varied for 44100 samples, with two modulation techniques FSK and ASK, tofind the best combination. Results are given in the graph 4.3.

0 20 40 60 80 100 120 140Number of bits in Preamble

0

20

40

60

80

100

Pream

ble De

tection

Rate (

%)

FSK ASK

Figure 4.3: Preamble detection rate vs number of bits in preamble

110 bits are fixed for the preamble with ASK(frequency 16 kHz). To minimise the number ofsamples in a preamble, different number of samples like, 44100, 22050, 14700, 11025 were tried. Resultsare presented in the graph 4.4. Among three choices shown in red colour in the graph, one with themaximum throughput can be the optimal choice. Preamble with 110 bits and 90 bits are not having gooddetection at number of samples less than 22050. However, one can go for 14700 samples with 70 bits. Todetermine the optimal choice let us assume that frame has at least 22050 samples for data bits. So,

1. With 70 bits that is, 14700 samples for preamble

Throughput = 1 frame1second × 0.81 = 0.81 frames/second

2. With 110 bits that is, 22050 samples for preamble

Throughput = 1 frame0.83second × 0.63 = 0.756 frames/second

Difference between calculated throughputs will increase as number of samples for data bits increase. So,110 bits with 22050 samples is the best choice for preamble. After trying random preambles and obser-vations, a particular pattern with good detection capability is fixed as preamble. It has 56 number of 0’sand 54 number of 1’s. This preamble is used for all of the further experiments. First 55 bits of the thefixed preamble is:

1010010011011110000001000100011101110101100101001011101 (4.1)

9

Page 19: Audio Based Networking - cse.iitb.ac.in

44100 22050 14700 11025Samples per preamble

0

10

20

30

40

50

60

70

80

90

Detect

ion Ra

te (%)

Preamble length=110bitsPreamble length=70bits

Figure 4.4: Preamble detection rate vs number of samples for preamble

Preamble+Start Symbol

Synchronisation using the decided preamble was still poor due to false peaks detected by algorithm.Environment noise affects the correlation output and this produces false peaks in the data followingpreamble. Since, receiver was selecting the maximum peak, there were chances that it can be a false one.

So, instead of completely relying over correlation output, sender inserts a “start symbol” offrequency 18 kHz for a fixed number of samples just after the preamble. Now, instead of taking themaximum one, receiver applies FFT after each of the peak index detected, over the window equal tolength of start symbol. Start of window is (peak index+22050). If dominating frequency component inthat window is 18 kHz then that peak index is considered as start of preamble. Now the starting sampleof data bits is:

Peak index selected+22050+start symbol duration in samples

Different number of samples are tried for start symbol. Difference in success rate of frame withonly preamble and one with preamble and start symbol can be looked in the report [3].

4.1.3 Modulation Techniques

Data bits are generated and then, sent to the environment after being modulated to a frequency. Thesefrequencies are demodulated at receiver side using FFT i.e. Fast Fourier transformation. Differentmodulation techniques are tried to maximise throughput. Modulated bits are sent just after start symbolby the sender. Receiver tries to demodulate the transmitted signal after detecting the starting index ofthese bits. Starting index detection is carried out using both preamble and start symbol.

OOK(On-off Keying)

A tone of 17 kHz frequency was sent for bit ’1’ and no frequency for bit ’0’. Receiver applies FFTover the samples for each data bit and compares returned power level of frequency 17 kHz against afixed threshold. If this power value for current window of samples is greater than threshold then bit isregenerated as ’1’ otherwise as ’0’.

FSK(Frequency Shift Keying)

Both BFSK, 4-FSK, 8-FSK and 16-FSK are tried for modulation.

• Binary Frequency Shift Keying:Sender sends 17 kHz for bit ’0’ and 18.5 kHz for bit ’1’. Receiver detects the preamble with the helpof start symbol and applies FFT from the start of data bits. FFT window is varied according tonumber of samples per bit. Receiver compares the power level returned by FFT for the frequencies17 kHz and 18.5 kHz and the one with greater power makes the next bit as ’0’ or ’1’ respectively.

• 4-FSK (4-ary Frequency Shift Keying):Four frequencies 16.5 kHz, 17 kHz, 17.5 kHz and, 18.5 kHz are used for this implementation. Bitpatterns and their corresponding frequencies are given in the table 4.1-

10

Page 20: Audio Based Networking - cse.iitb.ac.in

Bit pattern Frequency00 16.5khz01 17khz11 17.5khz10 18.5khz

Table 4.1: 4-FSK implementation

Frequencies are sent according to the table above. Just like BFSK, demodulation process is samein 4-FSK except that in this case power values of these four frequency components are comparedand two bits are generated for the one with maximum power value.

• 8-FSK (8-ary Frequency Shift Keying):Bit patterns and their corresponding frequencies are given in the table 4.2-

Bit pattern Frequency000 16494 Hz001 17250 Hz010 17750 Hz011 18497 Hz100 18250 Hz101 16250 Hz110 17506 Hz111 17011 Hz

Table 4.2: 8-FSK implementation

• 16-FSK (16-ary Frequency Shift Keying): 16 frequencies with gap of 100 Hz are assigned toeach bit pattern of 4 bits. More details are given in report [3].

Start symbol length and number of samples per data bit are varied to calculate throughput and resultsare included in the report [3].

4.1.4 Error Correcting Codes

Error correcting codes are used to detect and correct the errors at physical layer. To minimise error rate,send encoded bits instead of sending original data bits. Encoding adds redundant information whichhelps to regenerate data bits with more accuracy. However, this redundant information decreases datarate. For this propose, two different type of codes, Block codes and Convolution codes are tried.

Hamming Codes:

Hamming codes are categorised under block codes. Hamming code calculates parity bits over data bits.7,4 Hamming code is tested for performance.

Encoding: A block of 7 bits contains 4 data bits and 3 parity bits for these data bits. Parity bitsare inserted at positions 1,2 and 4 in block when position of LSB is 0. This block of 7 bits is transmittedfor 4 data bits. Let the data bits are d4d3d2d1, then parity bits are calculated as:

p1 = d1 ⊕ d2 ⊕ d4 (4.2)

p2 = d1 ⊕ d3 ⊕ d4 (4.3)

p3 = d2 ⊕ d3 ⊕ d4 (4.4)

11

Page 21: Audio Based Networking - cse.iitb.ac.in

Block after adding parity bits is given as:

d4d3p3d2p2p1d1

Decoding: Receiver extracts the 4 data bits and 3 parity bits from the received block of bits. It calculatesone more set of parity bits from extracted data bits. Now, it matches two sets of parity bits. If,

1. p1, p2 and p3 match then there is no error.

2. p1 and p2 don’t match but p3 with the calculated ones then d1 is in error.

3. p1 and p3 don’t match but p2 with the calculated ones then d2 is in error.

4. p2 and p3 don’t match but p1 with the calculated ones then d3 is in error.

5. all p1, p2 and p3 don’t match with the calculated ones then d4 is in error.

Error detection and correction by Hamming

• If errors are less than or equal to two bits then they can be detected,

• If error is only at a position and that position is of a data bit, it can be corrected and,

• If error is in parity bit then error can not be corrected.

Viterbi Codes:

Viterbi code is categorised under convolution codes. Instead of operating over block of bits, these codesoperate on stream of bits. In this encoding, original data bits are never sent by sender instead for eachbit position using two combinations of three successive bits (from current bit position), two bits are gen-erated and these encoded bits are sent. Hence, for each bit position two bits are generated. It generatesnumber of encoded bits equal to the twice of number of data bits.

Encoding:Let data bits are:dndn−1....d1d0 (4.5)

Encoded bits will be as follows:e2∗n = dn ⊕ dn−1 ⊕ dn−2 (4.6)

e2∗n+1 = dn ⊕ dn−2 (4.7)

for n=0,

e2∗n = dn, e2∗n+1 = dn (4.8)

for n=1,

e2∗n = dn ⊕ dn−1, e2∗n+1 = dn (4.9)

Decoding: Decoder traces input according to a finite state machine. FSM has four states, two inputs’0’ and ’1’ and, four outputs each of two bits. This FSM is constructed according to encoding beingimplemented on sender side.Cost of Initial state is taken as Zero. We considered ’00’ as initial state. Each next pair of encoded bits iscompared with output bits at current state, for both of the input bits ’0’ and ’1’. Comparison is Hammingdistance given as HammingDistance0 and HammingDistance1 . Next state cost is calculated as:Let nextState0 is next state for input ’0’ and nextState1 is next state for input ’1’ at current state.

cost[nextState0] = cost[ currentState] + cost[HammingDistance0]cost[nextState1] = cost[ currentState] + cost[HammingDistance1]

Now, algorithm will search for all possible next states from nextState0 and nextState1. At all interme-diate steps, when there are multiple path to a state, it will choose the one with minimum cost, discardothers and then proceed.At the end, input bits for path with minimum cost is considered as data bits. Finite state machine usedfor Viterbi decoding is given in table 4.3

12

Page 22: Audio Based Networking - cse.iitb.ac.in

Input=0 Input=1Current State Next state/Output Next state/Output

00 00/00 10/1101 00/11 10/0010 01/10 11/0111 01/01 11/10

Table 4.3: Finite state machine for Viterbi decoding

Error detection and Correction

• Since we are using FSM with two states, a frame with two bits in error can be corrected.

• Encoded bits for which multiple paths have minimum cost, may be decoded wrong.

• Errors are always detected and Viterbi always gives a minimum erroneous decoded output.

4.2 Error Rate and Throughput Analysis

4.2.1 Experimental Setup

Experiments are performed using laptops in lab environment. Sender is “Dell Studio 1580” and receiveris “Lenovo-G450”. Sender and receiver are kept at a distance of 2 meters. Each result is taken for datacollected from 500 runs. Runs are automated using client server socket connection.

4.2.2 Analysis

Error correcting codes are used to minimise error rate by adding some redundancy in frame. We triedtwo error correcting codes. These two codes use different approaches to correct errors and result differentthroughputs. For two encoding techniques, error rates and throughputs are calculated and presented intables and graphs respectively. Graphs also show comparison of throughputs with error correcting codesand without any error correcting code.

Error rate vs number of samples/bit

If start of preamble is detected at some samples off from its exact starting sample then, windows for databits got shifted. In this case if number of samples per data bit are decreased then chances of detectingfrequency other than the sent one as dominant component got increased. It increases error rates.It can be interpreted from the graph 4.5 that for a fixed start symbol as the number of samples per data

16384 8192 4096 2048 10240

20

40

60

80

100

Error Ra

te (%)

Samples in Start Symbol

16384 samples/bit8192 samples/bit4096 samples/bit

2048 samples/bit1024 samples/bit

Figure 4.5: Effect of samples per data bit on error rate

bit decreases, error rate always increases. For optimal throughput we have two choices,

13

Page 23: Audio Based Networking - cse.iitb.ac.in

1. start symbol length of 4096 samples and data bit length of 2048 samples

2. start symbol length of 2048 samples and data bit length of 2048 samples

4.2.3 Error rate vs number of data bits per frame

As number of data bits per frame are increased while keeping the samples for start symbol samples andsamples per data bit constant, error rate also got increased. This increase in error rate is linear. This is

4096_2048 2048_20480

20

40

60

80

100

Error Ra

te (%)

Start Symbol__Data bits duration in samples

4 bits/frame8 bits/frame12 bits/frame16 bits/frame

Figure 4.6: Effect of Number of bits per frame on error rate

because of probability of being a frame in error got increased. Pattern can be followed by graph 4.6.

4.2.4 Comparison of no encoding, Hamming encoding and Viterbi encoding:

1. Samples in Start symbol=2048 and Samples in Data bit=2048 + FSK

• Error Rates:Pattern of error rates can be observed from the table4.4. As described above, for both of theencoding techniques as well as for no encoding, error rate is increasing linearly as number ofbits per frame are increased.

Number of Bits no encoding Hamming Viterbi4 31.07% 25.51% 28.27%8 46.63% 42.13% 35.89%12 58.08% 49.42% 46.71%16 63.54% 51.12% 46.81%

Table 4.4: Comparison of no encoding, Hamming and Viterbi for error rates (SS-2048, Databit-2048 samples)

Error rates are higher if we do not use any of two encodings. For Viterbi and Hamming thereis no uniform pattern for comparing error rates as in case of 4 bits per frame Hamming isperforming better but after further increments in number of bits per frame Viterbi is producingless errors. Difference in the error rates for two techniques is about 3-6%.

• Throughput:

Graph 4.7 shows that even error rates are less in case of Viterbi and Hamming still, till 12 bitsper frame throughput is high in case of “no encoding”. This is because there is not that muchdifference in error rates for “with encoding” and “without encoding” and encoding requiresredundancy.Maximum throughput is for 16 bits per frame with Hamming codes = 4.158 bits/sec

14

Page 24: Audio Based Networking - cse.iitb.ac.in

4 8 12 160

1

2

3

4

5

6

Throug

hput (b

ps)

Number of data bits in Frame

No EncodingHammingViterbi

Figure 4.7: Comparison of encoding techniques for throughput (SS-2048, Data bit-2048 samples)

There is not much difference in error rates for two encoding techniques and Hamming codeinsert less redundancy in comparison to Viterbi codes. So, Hamming codes have better per-formance than Viterbi codes.

2. Samples in Start symbol=4096 and Samples in Data bit=2048 + FSK

• Error Rates:Transmission with no encoding has higher error rates than any of the encoding schemes in thiscase also. There is no uniform pattern of error rates for comparing two encoding techniques,and for these two the difference can be observed as 3-5% in each row.

Number of Bits no encoding Hamming Viterbi4 29.89% 21.73% 24.78%8 44.77% 29.91% 26.47%12 57.05% 35.26% 39.69%16 74.78% 50.64% 55.67%

Table 4.5: Comparison of no encoding, Hamming and Viterbi for error rates (SS-4096, Databit-2048 samples)

• Throughput:Behaviour of two encoding techniques towards throughput is same as in the case before.

4 8 12 160

1

2

3

4

5

6

Throug

hput (b

ps)

Number of data bits in Frame

No EncodingHammingViterbi

Figure 4.8: Comparison of encoding techniques for throughput (SS-4096, Data bit-2048 samples

Throughput of no encoding is less than that of Hamming code because of large difference intheir respective error rates.

ThroughputHamming > Throughputnoencoding

15

Page 25: Audio Based Networking - cse.iitb.ac.in

Maximum throughput for 12 bits/frame with Hamming codes=4.637 bits/secIf number of bits/frame are increased beyond 12 bits/frame then throughput starts to drop .

3. Samples in Start symbol=8192 and Samples in Data bit=8192 + ASK

• Error Rates:In this case, error rates for Viterbi encoding are significantly higher than those for the Hammingcodes.

Error Ratenoencoding > Error RateV iterbi > Error RateHamming

Number of Bits no encoding Hamming Viterbi4 42.13% 9.52% 34.81%8 61.25% 26.54% 54.04%12 70.59% 42.68% 56.92%16 76.98% 46.94% 64.03%

Table 4.6: Comparison of no encoding, Hamming and Viterbi for error rates (SS-8192, Databit-8192 samples)

• Throughput:Hamming codes has better performance than the other two categories as error rates for itare comparatively less. Since, Viterbi has bit overheads and its error rates are also high, itsthroughput falls below than no encoding.Maximum throughput for 4 bits/frame with Hamming codes=1.722 bits/secThis is because samples/bit are higher and we increase number of bits per frame throughputdrops for each encoding technique irrespective of the error rates.

4 8 12 160

1

2

3

4

5

6

Throug

hput (b

ps)

Number of data bits in Frame

No EncodingHammingViterbi

Figure 4.9: Comparison of encoding techniques for throughput (SS-8192, Data bit-8192 samples)

4.3 Performance at physical layer

4.3.1 Performance for Laptops

Performance is first evaluated for Laptops. Devices used for experiments are

• Sender- DELL STUDIO 1580

• Receiver- Lenovo G450

Experiments carried out in lab environment show that modulation technique 8-FSK has best results forour design. Graphs related to it can be referred from report [3]. It can also be seen that it is better touse error correction codes since error rates are quite high if there are no error correction codes. Out ofthe two encoding schemes tried, Hamming codes have better results in comparison to Viterbi.

16

Page 26: Audio Based Networking - cse.iitb.ac.in

Maximum throughput in Lab environment with 4-FSK: 6 bits/second

Same experiments are repeated for noisy environment. Noisy environment is created by playing thebus recording (done at time of peak traffic) from three devices (Three phones-Sony Xperia Z3, NokiaLumia 630, Samsung Grand) near receiver’s microphone. Results for noisy environment be referred fromreport [3]. These results show that Viterbi encoding gives better throughput than Hamming encodingwhich can be consequence of large number of transmission errors in noisy environment.

Maximum throughput in noisy environment with 4-FSK: 5.2 bits/second

4.3.2 Performance for smart phones

To test the behaviour of design on smart phones so, Android application is designed and its behaviouris tested in noisy environment. Noisy environment is created by playing the same recording from threedevices (laptop- DELL STUDIO, two phones-Nokia Lumia 630, Lg Nexus 5). Devices used for experimentsare

• Sender- Samsung S5

• Receiver- Xperia Z3

Preamble length was most crucial element for low throughput. Preamble length is reduced from0.5 sec to 84 msec using 11 bit length Barker sequence. Graphs drawn from results for smart phonescan be referred from [3]. Using this preamble maximum throughput achieved for Smart phones is 19.6bits/second with 56 transmitted bits (28 data bits+ 28 redundant bits). Frame decided at physical layeris given in figure 4.10. Application power consumption statistics can be referred from report [3]. We

3675

samples

Preamble Encoded bits

(2048*56) samples

Figure 4.10: Frame format at physical layer

performed experiments in bus and varied distance between sender and receiver. For these experimentssender is Samsung S5 and receiver is LG nexus. Throughput values at different distances are shown intable 4.7.

Distance Throughput1 22.90%2 16.96%3 10.29%4 7.25%5 3.14%

Table 4.7: Dependence of throughput upon distance

17

Page 27: Audio Based Networking - cse.iitb.ac.in

Chapter 5

Link Layer

5.1 Cyclic Redundancy Check

Previous results show that for our design throughput with “Forward Error correction” is greater thanthroughput without it. However, retransmissions are not included in our implementation but to validatethe decoded data an error detection code is needed along with encoding. CRC is used for this propose asit is quite powerful technique and not computation intensive. Earlier experiments show that throughputis maximum for 56 bits in frame that is, 28 data bits.

5.1.1 8-bit CRC

• Theoretically, 8-bit CRC has error detection rate of 99.6094 % .

• Good 5-bit CRC polynomials have good detection up to message length of 11 bits while, 6-bit CRCpolynomials are good up to message length of 25 bits [11].

• A good 8-bit CRC polynomial guarantees detection of any combination of 3-bits in error, up to themessage length of 119 bits [11].

• 8-bit CRC is used, as frame length decided on physical layer is for 28 data bits. So, each transmissionuses 8-bits of CRC over 20 data bits.

Data bits CRC

20-bits 8-bits

Figure 5.1: packet at link layer

• (20 data bits+ 8 CRC bits) are further encoded using Viterbi algorithm.

We used generator polynomial: x8 + x2 + x+ 1, also known as ATM-8. It follows above properties of agood 8-bit CRC polynomial.Bit pattern corresponding to generator polynomial is “100000111”.

1. Our generator polynomial is not divisible by “x” which guarantees that all burst errors of a lengthless than degree of polynomial that is, are detected.

2. Our generator polynomial is divisible by “x+1” which ensures that all errors of odd length will bedetected.

3. Simulation with this generator polynomial results that 100% of total errors are detected if oddnumber of bits are in error. Simulation results for even number of bits in error are presented ingraph 5.2. These detection rates are compared with detection rates of 5-bit CRC generator and6-bit CRC generator and presented in 5.2.

4. Comparison of simulation results of two 8-bit CRC polynomials 0x83 and 0xEA is presented in 5.3

18

Page 28: Audio Based Networking - cse.iitb.ac.in

0 5 10 15 20 25 30Number of bits in error

91

92

93

94

95

96

97

98

99

100

Error

Detec

tion R

ate (p

ercen

tage)

5-bit CRC generator (0x15)6-bit CRC generator (0x23)8-bit CRC generator (0x83)

Figure 5.2: CRC simulation results for even number of bits in error in 28 bit frame

2 4 6 8 10 12 14 16 18 20 22 24 26 28Number of bits in error

98.6

98.8

99.0

99.2

99.4

99.6

99.8

100.0

Error

Detec

tion R

ate (p

ercen

tage)

8-bit CRC generator-0x838-bit CRC generator-0xEA

Figure 5.3: 8-bit CRC simulation results for even number of bits in error in 28 bit frame

5.1.2 Procedure

Append 20-bits of message with 8 zeros. Each time do XOR 9 bit of this new appended message havingMSB ’1’ with 9-bits of bit pattern corresponding to generator polynomial. After shifting till end of28-bit message, remainder of last XOR operation replaces the appended zeros and this new message istransmitted by sender.

At receiver side also, this 8-bit CRC is applied to received 28-bits and, with same procedure it ischecked whether final remainder is zero. If it is zero then, it is assumed that frame is detected correctotherwise frame is assumed in transmission error. No back channel is used in design so, if a frame is inerror then it is simply discarded by the receiver.After inclusion of CRC in frame, throughput at link layer is 14 data bits/second.

5.2 Media Access control

When multiple devices try to transmit their information in same band of frequencies on some sharedmedium then these transmissions interfere each other and it can increase error rates. So, there is needof some protocol according to which a device can have access to the medium for its own transmissionto reduce collisions on the shared channel. Multiple transmissions in same band of inaudible frequenciesalso face collisions so, some feasible protocol suite is required for these devices to access the channel.

19

Page 29: Audio Based Networking - cse.iitb.ac.in

5.2.1 CSMA

Sender can avoid collision of a frame on the shared medium if it starts its transmission only if channel isfree. To find whether channel is free, it should first sense the channel for transmissions.Our design does not incorporate authentication between sender and receiver so, the sole propose is havingminimum collisions among multiple broadcasts originating from different devices. To implement CSMA,all devices must have some threshold value for the energy. Each sender who is ready for transmission,checks if the energy on the channel is below than that threshold. If it is below than that threshold thenit finds that there are no transmissions on the channel and sender starts its own transmission else it findsthat channel is in use by some other senders and can go in waiting state.

• First to check feasibility of the CSMA, device behaviour is checked towards ongoing transmissions.Average energy of inaudible frequencies is calculated in each transmitted frame for a number oftransmissions. This data is collected for both cases if there are no transmissions on the channel andif there are transmissions on the channel.

• To collect data for the scenario when there are transmissions on the channel, a frame for durationof one second is transmitted. Receiver averages the received energy values of inaudible frequenciesover the number of samples in transmission.

• Running bus is taken as noisy environment. Results are taken between 5:30 pm to 7:30 pm forthree days. Behaviour of five phones are checked. Graph 5.4 and 5.5 show average energy of deviceSony Xperia Z3 in time domain when there are no transmissions and when there is a transmissionat distance of 1 to 5 meters. It can be seen that it is easy to differentiate between energy values onchannel with transmission and with no transmission if the distance between sender and receiver is1 meter but not if it is 5 meters.

0 20 40 60 80 100 120Transmission Number

0

100000

200000

300000

400000

500000

600000

700000

800000

900000

Avera

ge en

ergy p

er fra

me

Transmission at 1 meterNo Transmission

Figure 5.4: Xperia Z3 receptivity- No Transmission vs Transmission at distance 1 meter

Infeasibility of CSMA

CSMA can be implemented only if a fixed threshold is applicable for any pair of devices and dependingon which a device can sense the transmissions on the channel. It is not possible to fix a common thresholdacross all devices due to hardware dependencies of energy values. Two factors which affect this thresholdare:

1. For same frequency, energy value of audio produced by different smart phones may vary. Forexample: Samsung S5 is able to generate inaudible frequencies at high energy values While LGnexus generates these at low energy values. This difference exists in different models of samecompany. As Smart phones speakers are intended to produce audible range of frequencies so, theirbehaviour towards generation of inaudible frequencies is more variable.

2. Receptivity of smart phone microphones towards transmitted frequenciesReceiver uses its microphones to receive the transmitted information. Devices reproduce received

20

Page 30: Audio Based Networking - cse.iitb.ac.in

0 10 20 30 40 50 60 70 80Transmission Number

0

200000

400000

600000

800000

1000000

Avera

ge en

ergy p

er fra

meTransmission at 5 meterNo Transmission

Figure 5.5: Xperia Z3 receptivity- No Transmission vs Transmission at distance 5 meters

signal at reception. Reproduction occurs using inbuilt amplifiers in microphones. Generally, micro-phones regenerate sound in range 20 Hz- 20 kHz but, sound lying in frequency range 2 kHz- 8 kHzis reproduced at peak. This regeneration is device specific depending upon their microphones.

For example, LG nexus and Xperia Z3 have good regeneration for higher frequencies while Moto Edoes not have. Statement can be followed from the results given in table 5.1. Sender is SamsungS5 for all of the results presented in table 5.1. According to observation phones with high pricesgenerally have good microphone sensitivity for higher frequencies. Graph 5.6 compares receivedenergy values of Xperia Z3 and LG nexus for audio signal transmitted by Samsung S5 . Graph 5.6is plotted in frequency domain.

Receiver No Transmission Transmission @5mSamsung S3 18279.55 96772.38Samsung A3 450.77 1429.43

Sony Xperia Z3 61841.48 189334.72Moto E 238.67 866.11Nexus 281738 305177

Table 5.1: Comparison of Smart phones for reception of inaudible frequencies (16kHz-18.5kHz)

15000 16000 17000 18000 19000 20000 21000 22000 23000Frequency(Hz)

0

1

2

3

4

5

Energ

y(Squ

are of

amplit

ude)

1e7

LG NexusXperia Z3

Figure 5.6: Comparison between microphone reception capabilities of two smart phones

21

Page 31: Audio Based Networking - cse.iitb.ac.in

Above discussion ensures that it is not possible to put a common threshold limit to detect theenergy of a transmission present on the channel. Next we tried to find out any factor common amongdevices, like if there is a ratio in average energy level of audible frequencies and that of inaudible frequen-cies that can be common across the devices. That does not work out since, smart phone microphones donot equally regenerate audible and inaudible frequencies.

5.2.2 CDMA

Another option is CDMA, in which a number of transmitters can transmit information simultaneously inthe same frequency band over a shared channel without any interference. Each sender chooses a code outof number of codes. All codes are of same length and desired to be mutually orthogonal. These codes haveproperty of good auto-correlation and weak cross-correlation. Number of users that can be supported onthe channel depends upon the length of the code in use. CDMA is used in cellular communication todifferentiate users present in region under same base station.

• Following the requirements of CDMA, Walsh codes is a good option to choose with propertiesof high auto-correlation and zero cross-correlation. Correlation of the code with its own shiftedversion (at shift greater than zero) have low cross-correlation. Walsh code set of n-bit code lengthcan support n users of different codes over the same channel.

• Code rate is higher than the data rate. Data for transmission is combined with code using XORoperation at code rate, and this new generated bit sequence is modulated and transmitted.

• A receiver checks demodulated signal against locally generated Walsh code of the desired user anddecides whether signal is from intended user.

Infeasibility of CDMA

We tried to use Walsh codes to implement CDMA and tested its feasibility in design. 4-bit Walsh codeset is generated using Hadamard matrix. Results are tested for two senders and one receiver and eachsender has its own different code.

1. To test CDMA, bit duration in a code is kept as 2048 samples per bit that is, data bit duration is2048*4.

2. For single data bit, receiver demodulates four consecutive chips and matches four decoded chipswith the chips in locally generated code. If chip in decoded code matches the chip in local codethen it returns 1 else it returns 0.For a data bit, if this matching returns 1 for number of chips greater than half of total number ofchips then data bit is taken as 1 else taken as 0. This behaviour is tested in silent environmentfirst. With above set up receiver was able to distinguish transmissions from different senders.

3. It will not be good to keep the bit duration as long as 2048*4 samples, as data rates are alreadylow. Data bit duration is kept 2048 samples only and chip duration is taken as 512 bits. But atthe receiver side these much samples are not enough to demodulate the frequencies correctly andhence, CDMA does not work out with FSK.

4. CDMA with ASK and CDMA with PSK can be other possible implementations. Details of ASKimplementation can be referred from [3]. Result shows that transmitted audio signal power getsaffected by operating environment and is device specific. These reasons make ASK difficult to work.Implementation of PSK requires perfect frame synchronisation between sender and receiver that is,receiver should detect start of data frame such that phase can be determined correctly. According toexperimental analysis, start of the frame gets detected at some offset such that receiver demodulateswrong phase.

Above analysis shows that CDMA can not be used as MAC technique for our implementation because ofexisting physical layer constraints, modulation technique and number of samples dedicated per bit.

22

Page 32: Audio Based Networking - cse.iitb.ac.in

5.2.3 Slotted ALOHA

Slotted ALOHA implementation needs a device to monitor transmissions. In our design, no device canhave such capability as each entity broadcasts when it has something to send without establishing aconnection. Each device can independently choose to broadcast whenever it wants. So, all the devicesare at same level and no centralised monitoring can be implemented.

5.2.4 Pure ALOHA

Pure ALOHA is simplest protocol for MAC. According to pure ALOHA:

• If sender has data to send, it will send immediately

• Frame of a sender can collide with frame of other sender if they are on the same channel and theirtransmissions have any common point of time

• If a sender generates two frames consecutively with a time gap of less than frame length, there willbe collision between two frames of same sender

• At a time, a sender can either transmit or generate a frame

• Any number of transmitters may try to transmit at a point of time

To carry out experiments, there are some assumptions:

1. All senders have frames of equal length

2. Frames arrive at sender according to a poission process, λ is the arrival rate of frames for each ofthe sender and λ can lie in range 0 to 1 (both inclusive). Load on the channel is determined bymultiplying the number of senders and arrival rate.

Load on channel = Number of transmitters ∗ arrival rate (5.1)

3. In experiments receiver is not trying to transmit the frames, it is only listening to the transmissions

4. Poission arrival of frames is maintained by generating inter arrival times of frames at sender expo-nentially using a java function

Math.log(randomGenerator.nextDouble())/(−lambda) (5.2)

where, lamda is mean arrival rate of frames.Graph 5.8 shows the distribution of inter arrival times with 0.5 frame arrival rate,

5. As back channel is not implemented so, if a frame is not received or received in error by receiver,retransmissions are not done

5.2.5 Experimental set up

All experiments of pure Aloha are carried out with two senders and one receiver. Receiver is placed atdistance of 2 meters with each of two senders. Distance of 1.5 meters is maintained between senders.Samsung S3 Neo and Samsung S5 are two senders for experiments. Experiments are repeated for tworeceivers- LG nexus and Samsung S3.

Analysis

Both senders have different transmission energies corresponding to inaudible frequencies, it can be inter-preted from CSMA results for both of the phones. Both receivers have different microphone sensitivitiesso, physical layer throughput is different for two devices. Experimental results follow the curve drawnfor theoretical pure ALOHA results. Total errors of a experiment include physical layer errors for thatsender receiver pair. Graphs 5.9 and 5.10 show behaviour of our design in pure ALOHA set up.Receiver: Samsung S3 Neo(Physical layer error rate=27%)

23

Page 33: Audio Based Networking - cse.iitb.ac.in

Sender 1

Sender 2

Receiver

2 meters

2 mete

rs

Figure 5.7: Set up for pure ALOHA experiments

Figure 5.8: Frame inter arrival time distribution

Maximum theoretical throughput: 13.42( at load 0.5)Maximum throughput for sender S3: 10.76( at load 0.5)Maximum throughput for sender S5: 13.64( at load 0.5)

Receiver: Samsung S5 (Physical layer error rate=19%)Maximum theoretical throughput: 14.7( at load 0.5)Maximum throughput for sender S3: 13.27( at load 0.5)Maximum throughput for sender S5: 14.12( at load 0.5)

Throughputs are calculated with physical layer error rates using formula:

Throughput on MAC =Experimental throughput

success rate on physical layer(5.3)

The senders are chosen according to availability so, these two transmit with different energies.This behaviour is reflected from higher throughput values of S5 than of S3 Neo corresponding to each ofthe receivers. Results shown here are for loads 0.1, 0.3, 0.5, 0.7 and 0.9 with number of senders two. Forboth of the devices, load for which throughput is maximum is 0.5, which follows the theory. Throughputincreases as load increases from 0 to 0.5 and then it start decreasing. Experimental throughput valuesdo not match theoretical ones exactly but follow their pattern.

24

Page 34: Audio Based Networking - cse.iitb.ac.in

0.1 0.3 0.5 0.7 0.9Channel load

2

4

6

8

10

12

14

16

18

20

Succe

ss rat

e (Pe

rcenta

ge)

Sender-S3 NeoSender-S5Theoretical

Figure 5.9: Pure Aloha results with Samsung S3 as receiver

0.1 0.3 0.5 0.7 0.9Channel load

2

4

6

8

10

12

14

16

18

20

Succe

ss rat

e (Pe

rcenta

ge)

Sender-S3 NeoSender-S5Theoretical

Figure 5.10: Pure Aloha results with LG Nexus as receiver

25

Page 35: Audio Based Networking - cse.iitb.ac.in

Chapter 6

Application Layer

At application layer, we implemented design on android operating system. Our implementation enablessharing of GSM information among smart phones using sound waves. A device extracts its GSM in-formation by turning on its GSM. When a device has GSM information, it broadcasts this informationperiodically using built-in speakers. Devices which want to access this information records the transmit-ted signal through built-in microphones. These devices do not need pairing to share the information.Broadcasting enables number of devices to receive this transmitted information at same time. GSMinformation which is being shared among devices:

Information LengthCell ID 16 bitsRSSI 6 bits

Operator name 4 bits

Table 6.1: GSM information

We have finalised 28 data bits per frame out of which 8 bits are used for CRC. To fit this GSMinformation in frame structure, we need to have two frames. Distribution of this information in twoframes is given in figures 6.1 and 6.2. Sender inserts a sequence number of 4-bits at start of two framesfor unique identification of its two frames. Sender chooses this sequence number randomly. However,there are chances that two senders choose same sequence number.Sender broadcasts two frames back to back. Receiver receives a frame, if this frame is not in error then

Cell ID CRC

4 bits 16 bits 8 bits

Figure 6.1: Distribution of 28 data bits in first frame

RSSI CRC

4 bits 6 bits 8 bits

Operator

Name

4 bits

Figure 6.2: Distribution of 28 data bits in second frame

it extracts its sequence number and waits for the next error free frame. It extracts the sequence numberof this frame and matches it against the sequence number of first frame:

• If these two sequence numbers match, information is extracted from frames according to the distri-bution of information bits in two frames

26

Page 36: Audio Based Networking - cse.iitb.ac.in

• Else it discards the first frame and consider the previous next frame as first frame and repeats theprocedure

Screenshots of running application are presented in figures 6.3 and 6.4. Sender is Samsung S5 and receiveris LG nexus 5. So, in this manner a device is able to determine its cell ID, RSSI and operator name just byreceiving sound broadcasts. Power consumption at the receiving side for accessing the GSM informationusing application and by turning on its own GSM is given in [3]. Power consumption at sender is quitelow than the power consumption at receiver because of less processing requirements at sender side.

Figure 6.3: Screenshot at sender

Figure 6.4: Screenshot at receiver

27

Page 37: Audio Based Networking - cse.iitb.ac.in

Chapter 7

Conclusion and Future work

7.1 Conclusion

Experiments carried out on design show that in noisy environment, throughput of 19.6 data bits/secondcan be achieved at physical layer for a frame containing 28 data bits. Design is sensitive to environmentalnoise and it can be seen from the difference in data rates obtained for both environments. In labenvironment, for frame of 28 data bits we achieved throughput of 23.2 bits/second at physical layer.Experimental set up has Xperia Z3 as receiver and Samsung S5 as transmitter at separation of 2 meter.Results may differ if any of the sender and receiver is exchanged with some other device. For example,throughput is 21.1 data bits/second in lab environment and 17.3 data bits/second in noisy environmentif receiver is Samsung S3 instead of Xperia Z3. Our frequency band is not wide, it imposes restrictionon modulation techniques e.g. low throughput for 16-FSK (Gap between frequencies is low). Achievedthroughput is dependent upon distance between sender and receiver. Throughput decreases from 22bits/second to 3 bits per second if we increase distance from 1 meter to 5 meters. Packet loss rateincreases as we increase distance between sender and receiver.

In noisy environment transmission errors are very frequent so, we used error correcting codes toimprove throughput. Viterbi codes have better results out of the two codes we tried. Design does nothave any retransmission strategy but to validate received frame, frame includes 8- bit CRC. Accordingto simulation results, 8-bit CRC has 99.25 % error detection rate for frame of 28 bits. With 8-bit CRCover 20 data bits, we achieved throughput of 14 data bits/second at link layer. Smart phones havegood reception of inaudible frequencies but receptivity is different across them depending on quality ofmicrophones. It restricts use of CSMA as across the devices a common threshold on signal power can notbe put that can differentiate transmission and no transmission. Implementation of CDMA with 8-FSKis difficult since it is not possible to have a chip size less than 2048 samples. Experiments with ASK andPSK show that these are not feasible way to modulate the data. So, MAC technique CDMA is infeasiblefor our design. Results of pure Aloha experiments follow the theoretical results for pure Aloha. Aloharesults for different devices differ depending on their physical layer throughput.

We achieved higher data rates than the data rates of already existing implementations that usesmart phones, except the implementations for NFC. Power consumption is 8 mW at sender and 295 mWat receiver.

7.2 Future Work

• We are not able to manage number of samples less than 2048 for a data bit. Reasons responsiblefor it can be:(1) Lack of perfect frame synchronisation(2) FFT algorithm used by us can not detect dominating frequency component in number of samplessamples less than 2048If this limitation of design can be overcome then data rates can be improved.

• If frame synchronisation can be improved then CDMA can be suggested as a MAC technique forour design

28

Page 38: Audio Based Networking - cse.iitb.ac.in

Bibliography

[1] Rajalakshmi Nandakumar, Krishna Kant Chintalapudi, and Venkata N Padmanabhan. Dhwani:Secure Peer-to-Peer Acoustic NFC. In ACM SIGCOMM, pages 63–74, 2013.

[2] Sungsil Park, Youngsoo Do, Jaesung Park, and Dongsoo S Kim. Inaudible Dual Tone Data Trans-mission for Home Appliances. pages 131–134, 2014.

[3] Nisha. Sound tone communication-nisha. http://www.cse.iitb.ac.in/synerg/lib/exe/fetch.

php?media=public:students:nisham:nisha-report-mtp2.pdf, 2015.

[4] Anil Madhavapeddy, David Scott, and Richard Sharp. Context-Aware Computing with Sound.Proceedings of The International Conference on Ubiquitous Computing (UbiComp ’03), pages 315–332, 2003.

[5] Ashish Patro, Yadi Ma, Fatemah Panahi, Jordan Walker, and Suman Banerjee. A system for Audiosignalling based NAT traversal. In 2011 3rd International Conference on Communication Systemsand Networks, COMSNETS 2011, 2011.

[6] Pravein Govindan Kannan, Seshadri Padmanabha Venkatagiri, Mun Choon Chan, Akhihebbal LAnanda, and Li-shiuan Peh. Low cost crowd counting using audio tones. Proceedings of the 10thACM Conference on Embedded Network Sensor Systems - SenSys ’12, page 155, 2012.

[7] Amit Panghal. A project report, BattMan : Acoustic short-range communication leveraging ultra-sound.

[8] Martin Wirz, Daniel Roggen, and Gerhard Troster. A wearable, ambient sound-based approach forinfrastructureless fuzzy proximity estimation. In Proceedings - International Symposium on WearableComputers, ISWC, 2010.

[9] Stephen P Tarzia, Peter a. Dinda, Robert P Dick, and Gokhan Memik. Indoor localization with-out infrastructure using the acoustic background spectrum. Proceedings of the 9th internationalconference on Mobile systems, applications, and services (MobiSys ’11), page 155, 2011.

[10] MIT. Nayuki-fft. http://www.nayuki.io/res/free-small-fft-in-multiple-languages/Fft.

java, 2014.

[11] Philip Koopman. Cyclic redundancy code (crc) polynomial selection for embedded networks. http://repository.cmu.edu/cgi/viewcontent.cgi?article=1672&context=isr, 2004.

29