34
Tutorial February 2007 Formatting media for OTA streaming and download to Sony Ericsson phones

Tutorial Media OTA - Sony Mobile · OTA Over-The-Air, refers to voice or ... Setting up an RTSP test environment ... term digital audio refers to audio stored in a digi tal format

Embed Size (px)

Citation preview

Tutorial

February 2007

Formatting media for OTA streaming and downloadto Sony Ericsson phones

Tutorial | Formatting media for OTA streaming and download

Preface

About this tutorial

This tutorial has been authored by Ian Simpson, a magazine journalist and freelance technology writer based in the U.K. Ian has a strong background in 3D animation and rendering, and to a lesser extent sound engineering. His personal interests include game design, football, motorsports (FIA world rally), a wide range of music, classic films, some very light C/C++ coding, and a fairly random selection of technol-ogies.

This document introduces the broader concepts of digital multimedia and networking, along with specific information on formatting and providing media with network-based delivery to Sony Ericsson phones. Practical examples of the processes involved are included.

This document may benefit:

• Content providers.• Software developers.• Operators and service providers

The competence level required to follow this document extends only to basic computer skills, as all con-cepts are fully explained. Prior experience with editing digital media or networking is advantageous.

2 February 2007

This document is published by Sony Ericsson Mobile Communications AB, without any warranty*. Improvements and changes to this text necessitated by typographical errors, inaccuracies of current information or improvements to programs and/or equipment, may be made by Sony Ericsson Mobile Communications AB at any time and without notice. Such changes will, however, be incorporated into new editions of this document. Printed versions are to be regarded as temporary reference copies only.

*All implied warranties, including without limitation the implied warranties of merchantability or fitness for a particular purpose, are excluded. In no event shall Sony Ericsson or its licensors be liable for incidental or consequential damages of any nature, including but not limited to lost profits or commercial loss, arising out of the use of the information in this document.

This Tutorial is published by:

Sony Ericsson Mobile Communications AB, SE-221 88 Lund, Sweden

Phone: +46 46 19 40 00Fax: +46 46 19 41 00www.sonyericsson.com/

© Sony Ericsson Mobile Communications AB, 2007. All rights reserved. You are hereby granted a license to download and/or print a copy of this document.Any rights not expressly granted herein are reserved.

First edition (February 2007)Publication number: EN/LZT 108 9401 R1A

Tutorial | Formatting media for OTA streaming and download

Sony Ericsson Developer World

On www.sonyericsson.com/developer, developers find documentation and tools such as phone White papers, Developers guidelines for different technologies, SDKs (Software Development Kits) and relevant APIs (Application Programming Interfaces). The Web site also contains discussion forums monitored by the Sony Ericsson Developer Support team, an extensive Knowledge base, Tips and tricks, example code and news.

Sony Ericsson also offers technical support services to professional developers. For more information about these professional services, visit the Sony Ericsson Developer World Web site.

Document conventions

Products

Sony Ericsson phones are referred to in this document using generic names. Please note that while other Sony Ericsson phones not listed here do support the same containers, compression codecs and connec-tion speeds, the exact parameters needed to encode media for those phones may differ. As such it is rec-ommended to verify successful playback on those phones when creating media for them.

The following phones are covered by this document:

Generic namesSeries

Sony Ericsson phones

W950 W950i, W958c

W900 W900i

W880 W880i, W888c

W850 W850i, W850c

W830 W830i, W830c

W810 W810i, W810c, W810a

W710 W710i, W710c

W800 W800i, W800c

W610 W610i, W610c

W600 W600i

W550 W550i, W550c

W300 W300i, W300c

K800 K800i, K800c

3 February 2007

Tutorial | Formatting media for OTA streaming and download

Terminology

K790 K790i, K790c

K810 K810i, K818c

K750 K750i, K750c

K610 K610i, K610c, K618i

K550 K550i, K550c

M600 M600i, M608c

P990 P990i, P990c

Z710 Z710i, Z710c

Z610 Z610i

Generic namesSeries

Sony Ericsson phones

Digital multimedia Media file(s) containing video and/or audio data streams

Data stream An encoded video or audio track

Pixel resolution The number of pixels used in an image, as height by width

Aspect ratio The ratio of pixels in width to pixels in height

Frame rate The number of video frames displayed every second

Sample rate The number of audio samples per second, measured in Kbps or Hz

Stereo sound Audio consisting of two channels, left and right

Mono sound Audio consisting of a single channel

Audio normalization The process of adding volume gain to the audio track up to the point just before distortion or clipping occurs

Codec COmpressor/DECompressor, an algorithm that compresses and decom-presses data streams

Encoding Compressing a data stream through a codec

Decoding Decompressing a data stream through a codec

Transcoding Encoding an already compressed data stream to a different codec

Lossy compression A codec that discards some data when encoding a data stream

Bit rate The number of bits used to compress a data stream, typically measured in kilo-bits-per-second (Kbps)

4 February 2007

Tutorial | Formatting media for OTA streaming and download

Trademarks and acknowledgements

WALKMAN® and Vegas™ are trademarks or registered trademarks of Sony Corporation.

Microsoft® and Windows® are either registered trademarks or trademarks of Microsoft Corporation in the United States and/or other countries.

Apple, QuickTime and Final Cut are trademarks of Apple Computer, Inc., registered in the U.S. and other countries.

Premiere™, After Effects™ and Audition™ are trademarks or registered trademarks of Adobe Systems Incorporated.

Nero is a registered trademark of Nero AG.

3GPP is a trademark or registered trademark of ETSI.

Other product and company names mentioned herein may be the trademarks of their respective owners.

Container The package that video and audio streams are placed in, to ensure synchro-nized playback

Multiplexing The act of placing data streams inside a container, also referred to as muxing

OTA Over-The-Air, refers to voice or data communication using a cellular network

IP address The numeric address of a device on a TCP/IP network, often shorthanded to IP

Private IP address An IP address that is non-routable to wider networks such as the Internet

Public IP address An IP address that is routable and valid on wider networks such as the Internet

TCP packet Transmission Control Protocol, a type of IP network data packet

UDP packet User Datagram Packet, a type of IP network data packet

NAT Network Address Translation, a method of connecting private networks to pub-lic ones

GPRS General Packet Switched Radio

EDGE Enhanced Data rates for GSM Evolution, also known as Enhanced GPRS

UMTS Universal Mobile Telecommunications System

HTTP Hyper-Text Transfer Protocol, a protocol used by Web servers

RTSP Real-Time Streaming Protocol, a protocol used by media streaming servers

5 February 2007

Tutorial | Formatting media for OTA streaming and download

Document history

Change history

2007-02-19 Version R1A First version published on Developer World

6 February 2007

Tutorial | Formatting media for OTA streaming and download

7 February 2007

Contents

Technical overview ..................................................................................................8Background information on digital multimedia .........................................................8

Components of digital multimedia .........................................................................8Resolution ..............................................................................................................8Compression .........................................................................................................9Containers ...........................................................................................................10

Background information on TCP/IP networking .....................................................10Addresses ............................................................................................................11Ports ....................................................................................................................11Packets ................................................................................................................11Network Address Translation ..............................................................................12

Background information on OTA data connections ................................................12GPRS ...................................................................................................................12EDGE ...................................................................................................................13UMTS ...................................................................................................................13

Tutorial ...................................................................................................................14Formatting media for OTA delivery to Sony Ericsson phones. ...............................14

Supported containers ..........................................................................................14Audio codec .........................................................................................................14Video Codec ........................................................................................................15Recommended video and audio stream parameters for video content ..............16The process of formatting multimedia video .......................................................18Recommended audio stream parameters for audio-only content ......................19Other considerations ...........................................................................................21

Content delivery methods .......................................................................................21OTA downloads using HTTP ...............................................................................21OTA real-time streaming using RTSP ..................................................................22The importance of correct hint tracks within media delivered using RTSP .........23Other considerations when delivering media OTA ..............................................24

Setting up an RTSP test environment .....................................................................24Software setup ....................................................................................................24Establishing RTSP connectivity with your server and troubleshooting ...............25

Practical examples of authoring video media for OTA download and streaming ...26Notes on software used .......................................................................................26Environment and software set-up .......................................................................27Creating the video project ...................................................................................27Importing the source video and formatting track properties ...............................27Exporting the formatted video project .................................................................28Transcoding streams to MPEG-4/AAC within a 3GP file .....................................28Transcoding streams to H.264/AAC within a 3GP file .........................................29

Practical example of authoring audio-only media for OTA download and streaming 30Notes on software used .......................................................................................30Environment and software setup .........................................................................31Normalizing the audio volume .............................................................................31Encoding to HE-AAC ...........................................................................................31Inserting a correct hint track for the audio stream ..............................................32Adding metadata for artist and title information and changing extension ...........33

Conclusion .............................................................................................................34

Tutorial | Formatting media for OTA streaming and download

Technical overview

Background information on digital multimedia

This section is intended to inform those with no prior knowledge of the concepts and terms surrounding digital multimedia, where relevant to the creation of multimedia content for Sony Ericsson phones.

The term digital video refers to video stored in a digital format. The most obvious example is a DVD movie, but it also covers many other desktop computer formats such as WMV, AVI, MP4, MOV, and MPG. The term digital audio refers to audio stored in a digital format. A prime example is any audio CD, but many other formats exist such as MP3, AAC, and AMR.

Components of digital multimedia

Digital multimedia generally consists of one or two major elements; a video track, and/or an audio track. These tracks or components of the digital media are termed data streams.

Although typically within video media there is one data stream for video and another for audio, It is also possible for more than two data streams to be present. An example of this is the commentary feature found on many DVDs, where the second of two audio data streams can be chosen to accompany the video stream.

Audio media usually consists of a single data stream.

Resolution

Both visible and audible information are naturally analogue mediums. This analogue information must be sampled to be stored digitally. The number of samples dictates the resolution of the digital copy, with more samples improving the resolution and definition. Within the scope of this document and its purpose, there are only four digital sample resolutions that are relevant.

The first is the visual resolution of a video stream, measured in pixels as width by height, comparable to resolution of your computer display. Higher pixel resolution permits more detailed images. The visual res-olution also dictates the aspect ratio of the video. Aspect ratio is the ratio of width to height, typically 16:9 for widescreen video, and 4:3 for standard video.

The second is the temporal resolution of a video stream, the frame rate, measured in frames per second (fps). It affects the perceived smoothness of motion in the video. With the aid of motion blur and similar effects that make use of how our eyes perceive movement, 24 fps is the minimum to produce perfectly

8 February 2007

Tutorial | Formatting media for OTA streaming and download

fluid motion on any scene, and is the standard for feature films. Slightly lower frame rates are still quite smooth to the eye for many scenes, down to around 20 fps, below which a staggered effect on motion is ever more easily perceived by the viewer.

The third resolution of importance is the temporal resolution of an audio stream. It is termed sample rate, and is measured in kilohertz (kHz). It governs the maximum sound frequency (as in pitch) the digital audio stream can contain. To reproduce a sound in a digital copy, the sample rate must be at least double the frequency of the sound. 44.1 kHz and 48 kHz are the main sample rate standards in multimedia video, therefore covering frequencies beyond the upper audible range of 20 kHz.

Finally there is also the directional resolution of an audio stream, measured by the number of channels present in the audio. A single channel is a mono recording, while two channels afford stereo sound. Increasing channels further to 4 or 5 allows fully directional surround sound, for example 5.1 surround sound uses 5 channels in a digital audio stream.

Compression

To reduce storage requirements and improve portability, data streams are usually compressed. Each stream is passed through a compression algorithm, referred to as a codec (COmpresser/DECompressor). Separate codecs are used for video and audio streams, as each stream has quite different characteristics.

The act of compressing a stream is known as encoding. In order to play back a compressed stream, it must be decoded. Therefore the playback hardware must support decoding of the compression codec used. Converting an already compressed stream from one codec to another codec is termed transcoding.

Compression is usually “lossy”, meaning that some data is lost and the compressed stream is not a bit-perfect copy of the original. Whether the effect of these “lost” bits of data is perceptible upon playback depends greatly on the type and amount of compression used. Data loss from high compression levels can introduce perceptible noise and other distortions into a stream, and these noticeable effects are termed compression artefacts.

Compression amounts are set by the amount of data assigned to the stream, typically measured in kilobits per second (Kbps), and referred to as bit rate (as in the rate of bits assigned).

Compression codecs often have subfeatures that extend or improve their performance. It is important to note that some playback devices do not support certain features of a codec. To make the compatibility of these features easier to manage and understand, most codecs have profiles and/or levels describing which features and parameters can be used.

For example MPEG-4 video can be encoded in Simple or Advanced Simple profiles in levels of 0 to 5, with Simple Profile level 1 being used for video on phones, and Advanced Simple profile levels used on most Web video content. Another example is WMV video, which comes in profiles such as WMV8, WMV9, and WMV9.1.

Video codecs include MPEG-1 and MPEG-2 as used for DVDs, and the many extensions and implementa-tions of MPEG-4 such as WMV9, H.264, DivX, and so on. Common audio codecs include MP3, OGG, AAC, WMA, and AMR.

9 February 2007

Tutorial | Formatting media for OTA streaming and download

When compressing audio streams, the chosen bit rate is used for every second of the audio file, and this is known as Constant Bit Rate (CBR) compression. Some audio codecs also offer Variable Bit Rate (VBR) compression, where more bits are spent on complicated moments in the audio, and fewer bits are used for silent or less complex moments. By using VBR compression the overall sound quality is improved without increasing file size.

Containers

The separate data streams of video media are almost always multiplexed (also referred to as “muxed”) together into one file, facilitating delivery and playback of both video and audio stream as a single object. The data streams are packaged in what is called a container file. You likely already know several container types as filename extensions, such as AVI which is an Audio Video Interleave container file.

The container also serves to present the streams to playback devices so that they remain synchronized on playback, by linking positions in one stream to positions in the other; interleaving them. They also present the streams in a standardized structure, with an index of the contained streams and other assorted meta-data. Simply recognizing the container type also provides a good indication of the likely compression codecs used on the data streams inside them, without reading the stream data from the container, once you know the compression codecs supported within certain containers.

For example, MP4 files are the MPEG-4 IsoMedia container, and from that we can assume that the video stream uses a form of MPEG-4 compression with AAC compression for audio. Similarly WMV files are Advanced Systems Format (ASF) containers using WMV video streams and WMA audio streams. AVI files are Audio Video Interleave containers that use Video For Windows compatible codecs for video streams and generally WAV or MP3 audio streams.

Digital audio files consisting of a single data stream do not always require a container, as no interleaving is needed for a single data stream. In the case of MP3 audio files the metadata is often simply appended to the stream itself. Other formats such as AAC and AMR audio must be placed within an IsoMedia container to support the inclusion of metadata such as artist and title information.

Background information on TCP/IP networking

This section is intended to inform those with no prior knowledge of TCP/IP networking, where relevant to OTA delivery of media content.

TCP/IP networking is the underlying technology that forms networks such as a Local Area Network (LAN), private networks and the wider Internet (public networks).

10 February 2007

Tutorial | Formatting media for OTA streaming and download

Addresses

In order to have a presence on a network a device must have its own address, known as an IP address. IP addresses are formed of four numbers separated by periods, for example 192.168.110.120. Each number in the address must have a value between 0 and 255 for the address to be valid.

Each number in the address defines a range. Addresses within some ranges are known as private addresses, and cannot directly communicate with other addresses beyond their immediate local network. For example the 192.168.x.x address range is private, and devices using IP addresses in this range can-not directly use the Internet or other external networks. Because private addresses cannot directly com-municate beyond their immediate local network, it is possible for the same address to be used elsewhere on another network by another device.

Ports

Each IP address has a range of ports for sending and receiving data. These ports are numbered 1 to 65535, with ports below 1024 typically reserved for serving data, and those above 1024 used for request-ing data. Specific ports below 1024 are associated with specific services.

For example, when you visit a Web site such as http://www.sonyericsson.com, the following occurs: Firstly your computer finds the IP address associated with the name www.sonyericsson.com, and then sends a request from your own IP address on a port such as 1036, to port 80 at the IP address of the Web site.

The request is sent specifically to port 80 as this is the port number on which HTTP servers (Web servers) listen for requests. Similarly email servers listen on port 25, and FTP servers listen on port 21.

Packets

Data that is sent and received on a TCP/IP network is encapsulated within small packets. For example when you receive a Web page, that page is sent in the form of many small packets of data that are then reconstructed into the text and images. These packets generally come in one of two types; TCP (Trans-mission Control Protocol), or UDP (User Datagram Protocol).

The main difference between TCP and UDP packets is the level of communication involved. For TCP con-nections, the requesting address and serving address perform a “handshake” to establish that they are both ready for data to be sent and received. For every packet of data sent, the receiving address replies with an acknowledgement that the packet was successfully received.

UDP connections instead simply send data packets from one address and port to another address and port with no built-in form of acknowledgement. This allows for data to be sent faster, and for latency to be lower (as the sending address does not need to wait for acknowledgement of reception before continuing to send the next data packet).

Generally speaking TCP is used more often, as it allows for more reliable data exchanges. For example, TCP is used for Web pages and email. When data speed and fast response times are needed more than overall data integrity, UDP is used. UDP is often employed for services such as real-time media streaming and network gaming.

11 February 2007

Tutorial | Formatting media for OTA streaming and download

Network Address Translation

As mentioned, IP addresses are formed from a group of four numbers, each between 0 and 255. This allows for just over 4 billion unique addresses (256 to the power of 4). However the number of existing computers, servers, cell phones, and other IP-capable devices is even higher than 4 billion. At the same time, the number of public IP addresses belonging to any network operator is certainly less than the number of their subscribers wishing to use Internet access on their device. This means it is not possible to assign a unique Internet-capable IP address to every device that wants Internet access.

The solution to this problem is a technology known as NAT (Network Address Translation). This involves assigning an IP address in a private range to a device that requires Internet access, which by itself cannot reach the Internet. Direct communication is limited to the local network it is connected to.

Another device on the local network hosts a NAT service. A NAT host essentially translates requests from private IP addresses to its own public IP address, forwards the request to the Internet, and then translates the reply it receives on its own public IP address back to the private IP address that made the original request.

For example, when a device with a private IP address requests www.sonyericsson.com, the request is directed through the NAT host on the local network, which notes the private IP that requested the Web site and translates the request to its own public IP address. It then passes the request to the Web server at www.sonyericsson.com, which sees the request as originating from the NAT host's IP address, and not the unreachable private IP address of the originating device. The Web server then replies with the Web page data to the NAT host, and the NAT host having noted which private IP address requested the Web site, then forwards this reply to the correct private IP address.

NAT works well for common TCP-based requests such as Web pages and emails, removing the need for assigning insufficient numbers of public IP addresses to every device that needs Internet access.

Background information on OTA data connections

This section is intended to inform those with no prior knowledge of the various data connections available to Sony Ericsson phones. Relevant details for OTA streaming and download of media are highlighted.

There are a number of data technologies on 2.5G and 3G networks. It is important to note that regardless of the connection type, data is often given a lower priority than voice on most networks. This fact along with uncertain network conditions makes it critical to allow for imperfect connection speeds when deliver-ing media OTA.

GPRS

GPRS (General Packet Switched Radio) is the most common form of data transmission currently in use on 2.5G networks. GPRS connections consist of time slots, with most Sony Ericsson phones supporting 4 download slots and 1 upload slot for GPRS connections.

12 February 2007

Tutorial | Formatting media for OTA streaming and download

The majority of 2.5G networks provide 13.4 Kbps per time slot, and with 4 time slots available, this allows for a maximum download rate of 53.6 Kbps. In practise the maximum rate seen on a GPRS network can often be the full 53 Kbps.

EDGE

EDGE (Enhanced Data Rates for GSM Evolution), also known as Enhanced GPRS, is a faster form of data transmission for 2.5G networks. Its usage is not as widespread as GPRS but it has strong support in the Americas and many parts of Europe. EDGE is not available on all Sony Ericsson phones, but those with EDGE capability support it with 4 download time slots and 2 upload time slots.

The time slot transmission speed for EDGE can vary between 8.8 Kbps and 59.2 Kbps, which with 4 download slots allows for download rate of 35 to 236 Kbps. In practise, on the majority of networks, 50 to 160 Kbps is more realistic.

UMTS

UMTS (Universal Mobile Telecommunications System) is the technology used for 3G networks in Europe and many other parts of the world. UMTS connections are only available on Sony Ericsson phones that support 3G.

UMTS networks generally allow for downloads rates up to 384 Kbps. In practise the rate seen is often closer to 256 Kbps. Additionally, some network operators are to known to artificially restrict their UMTS data speeds to only 128 Kbps.

13 February 2007

Tutorial | Formatting media for OTA streaming and download

Tutorial

Formatting media for OTA delivery to Sony Ericsson phones.

This section contains specific details and recommendations on creating media content for Sony Ericsson phones, delivered through mobile networks. This can be through OTA download or real-time streaming playback using RTSP.

Supported containers

Sony Ericsson phones have very functional support for the IsoMedia container as 3GP and MP4. MP4 Containers support MPEG-4 video streams and several types of AAC audio stream. 3GP is a simplified version of the container that is used primarily for low bandwidth applications such as streaming content. It has additional support for AMR audio tracks and H.263 video, and support for LC-AAC.

It is recommended to use the 3GP form of the IsoMedia container for OTA delivery of video and audio media. This container supports MPEG-4 video, as well as AMR and LC-AAC audio, in profiles most suita-ble for network delivery.

For software developers, the relevant ISO specifications for MPEG-4 multimedia content in relation to Sony Ericsson phones are:

• MP4 container – MPEG-4 Parts 12 & 14 / ISO 14496-12 & 14496-14• MPEG-4 Visual – MPEG-4 Part 2 / ISO 14496-2• MPEG-4 Audio – MPEG-4 Part 3 / ISO 14496-3• MPEG-4 Advanced Video Coding (AVC) – MPEG-4 Part 10 / ISO 14496-10

Audio codec

AMR (Adaptive Multi-Rate) audio compression is the preferred codec for OTA delivery of video media at low bitrates. Developed as a voice codec, it is ideally suited to the low bit rates available. AMR audio per-forms well at low bitrates in part due to its relatively low sample of rate of 8 kHz, which can still reproduce speech well. However, for music-oriented video and musical audio files the AAC codec is more suitable, as it can use a higher sampling rate.

Advanced Audio Coding (AAC) compression has a standard profile known as Low Complexity, or LC-AAC. As the name suggests, it is a computationally cheap form of compression, requiring little processor time. Originally designed for the MPEG-2 format as a more efficient replacement for MP3, it offers higher sound quality than MP3 compression at the same bit rate. AAC offers a second profile called High Effi-

14 February 2007

Tutorial | Formatting media for OTA streaming and download

ciency, HE-AAC, or AAC+, that further improves sound quality at bitrates at or below 64 Kbps. Recently a third AAC profile has been developed, known as High Efficiency version 2, HE-AACv2, or eAAC+, which takes the improvements of HE-AAC even further at low bitrates.

For OTA delivery of audio in an AAC format within video files, LC-AAC is the recommended AAC profile to use. Its low processor utilization and consistent processor load at constant bit rates allows processing power to be better spent on the more demanding task of video stream decoding. Audio streams within video files should be encoded at a constant bit rate (CBR encoding).

Several implementations of the LC-AAC codec exist, with varying levels of optimization and performance offered at equal bitrates. For most multimedia content, any standards compliant LC-AAC implementation is sufficient. However, all content, and especially music content, benefits from the higher quality audio encoding offered by the optimized Apple™ or Nero LC-AAC codecs.

When encoding audio-only content such as a music audio file, the other AAC profiles become more rele-vant. At bit rates of 96 Kbps or higher, LC-AAC provides the highest quality audio. At lower bit rates the two high-efficiency profiles become preferrable, when supported by a phone.

High-efficiency (HE-AAC, AAC+) is a more complex encoding method that includes spectral-band-replica-tion (SBR) alongside AAC compression. HE-AAC using a bit rate of 64 Kbps can provide similar quality to 128 Kbps MP3 while using half the bit rate, making HE-AAC a very efficient codec for streaming and OTA delivery purposes.

Finally there is the third and most recent AAC profile, enhanced-high-efficiency (HE-AACv2, eAAC+), which uses standard AAC compression in combination with SBR and parametric stereo (PS). HE-AACv2 further improves efficiency such that 48 Kbps provides audio quality comparable to 128 Kbps MP3. As the newest form of AAC compression it is only currently supported on the minority of phones.

Video Codec

MPEG-4 video streams come under two visual profiles, Simple and Advanced Simple. The Simple profile is designed for low bandwidth applications and portable devices. Sony Ericsson phones support the Sim-ple visual profile only.

The Simple profile has 4 levels defined for it. These levels suggest video resolution, frame rate, bit rate, and other values. If you adhere to the strict definition of these levels, Sony Ericsson phones only support Level 0. With some customization of the suggested values, Sony Ericsson phones can also support modi-fied levels up to Level 3.

There are many implementations of the MPEG-4 visual codec, with quite extreme differences in encoding quality. Available MPEG-4 codecs include DivX, XviD, 3ivx, Apple MPEG-4, Nero MPEG-4, and many standard implementations found in various video editing software.

Many MPEG-4 implementations offer the choice of single pass (1 pass) or 2 pass encoding. In 2 pass encodings, the source video stream to be transcoded is read twice, using the first pass to inspect the overall complexity of the video, and the encoder then assigns a higher bit rate to the more visually com-plex passages of the video on the second pass, and lower bitrates to simple passages. Whilst it is cer-tainly more efficient use of available bit rate, 2 pass encoding is not suitable for OTA delivery purposes, due to the inconsistent bit rates provided.

15 February 2007

Tutorial | Formatting media for OTA streaming and download

Although any standard MPEG-4 visual codec that supports the Simple profile can be used to encode video streams for Sony Ericsson phones, it is not advisable for optimal results. The recommended MPEG-4 visual codec is XviD, a very mature implementation with specific optimizations for low bitrate encoding in the Simple visual profile. It offers significant quality gains compared to other codecs in low bitrate appli-cations such as mobile devices.

It is important to note that MPEG-4 video on Sony Ericsson phones is only supported in the Simple profile. A great deal of online video content also uses MPEG-4 video codecs, such as the above mentioned XviD, or DivX. However, they are likely MPEG-4 visual video encoded using the Advanced Simple profile, which is unsupported on Sony Ericsson phones. This means that although they may share the same codec they need to be transcoded to MPEG-4 in the Simple profile, for use on phones.

Additionally, more recent Sony Ericsson phones include support for the MPEG-4 AVC (Advanced Video Coding) codec. This is also known as the H.264 video codec. This codec improves upon the efficiency of the MPEG-4 visual codec, allowing for improved image quality at the same bitrates compared to MPEG-4 visual. Like MPEG-4 visual, H.264 compression has multiple profiles. It is important to note that Sony Ericsson phones support H.264 in its Baseline profile only.

Recommended video and audio stream parameters for video content

The table below shows suggested values when encoding video and audio streams for OTA delivery of video content on Sony Ericsson phones.

QCIF Resolution High Quality QCIF resolution

QVGA Resolution

GPRS networks (nominal 53 Kbps bandwidth)

Video codec MPEG-4 Visual Simple H.264 Baseline N/A

Video resolution 176x144 (QCIF) 176x144(QCIF) N/A

Video frame rate (fps) 8 10 N/A

Video bit rate (Kbps) 28 28 N/A

Audio Codec AMR AMR N/A

Audio bit rate (Kbps) 8 8 N/A

Audio sample rate (kHz) 8 8 N/A

Audio channels Mono Mono N/A

Total bit rate (Kbps) 36 36 N/A

Compatible phones (GPRS, EDGE, UMTS)

W950, W900, W880, W850, W830, W810, W800, W710, W610, W600, W550, W300, K810, K800, K790, K750, K610, K550, M600, P990, Z710, Z610

W950, W880, W850, W830, W710, K810, K800, K790, K610, Z710, Z610

N/A

16 February 2007

Tutorial | Formatting media for OTA streaming and download

Notes Compatible for stream-ing and download with all phones and network speeds. Low quality image and audio.

Less compression arte-facts.

N/A

EDGE and Basic UMTS networks (nominal 128 Kbps bandwidth)

Video codec MPEG-4 Visual Simple H.264 Baseline MPEG-4 Visual Simple

Video resolution 176x144 (QCIF) 176x144(QCIF) 320x240 (QVGA)

Video frame rate (fps) 15 15 12

Video bit rate (Kbps) 70 70 70

Audio Codec LC-AAC LC-AAC LC-AAC

Audio bit rate (Kbps) 16 16 16

Audio sample rate (kHz) 22.05 22.05 22.05

Audio channels Mono Mono Mono

Total bit rate (Kbps) 86 86 86

Compatible phones (EDGE, UMTS)

W950, W900, W880i, W850, W830, W810, W710, W610, W600, W300, K800, K790, K610, K500, M600, P990, Z710, Z610

W950, W880i, W850, W830, W710, W610, K810, K800, K790, K610, K550, Z710, Z610

W950, W900, W880i, W850, W830, K810, K800, K790, M600, P990

Notes Compatible for stream-ing with all EDGE/UMTS phones and networks. Improved image and audio quality over GPRS values.

Further improved image and audio quality.

Better image details with improved audio quality.

Full UMTS networks (nominal 256 Kbps bandwidth)

Video codec MPEG-4 Visual Simple H.264 Baseline MPEG-4 Visual Simple

Video resolution 176x144 (QCIF) 176x144(QCIF) 320x240 (QVGA)

Video frame rate (fps) 25 25 25

Video bit rate (Kbps) 140 140 140

Audio Codec LC-AAC LC-AAC LC-AAC

Audio bit rate (Kbps) 32 32 32

Audio sample rate (kHz) 22.05 22.05 22.05

Audio channels Mono Mono Mono

QCIF Resolution High Quality QCIF resolution

QVGA Resolution

17 February 2007

Tutorial | Formatting media for OTA streaming and download

The first column suggests encoding parameters suitable for all phones compatible with each connection speed. The second column suggests parameters to use for phones that support the more efficient (and therefore higher quality) H.264 video compress codec, at each connection speed. The third and final col-umn suggests parameters for phones that support the higher QVGA display resolution (providing greater image detail), at each connection speed.

The parameters provided are intended to permit real-time streaming at each connection speed, as well as being loose guidelines for OTA downloads. However, because OTA downloads do not require real-time delivery of the data, it is also possible to use a higher bit rate than the connection speed itself allows. For example a video encoded in QCIF/MPEG-4 for real-time streaming on EDGE networks could also be made available as a non-real-time OTA download to all QCIF/MPEG-4 capable phones on GPRS connec-tions.

By the same token, considerations should be made for OTA download connection speeds. Offering a video file encoded for full speed UMTS network streaming as an OTA download on GPRS connections would take four times the video playback duration to download using GPRS.

The process of formatting multimedia video

Typically, there are five stages involved in formatting multimedia video for Sony Ericsson phones for OTA delivery. The process is normally as follows:

1. Edit the various video resolutions to suit the target phone.

2. Encode the video stream to MPEG-4.

3. Encode the audio stream to LC-AAC or AMR.

4. Multiplex both streams into a 3GP container file.

5. Optionally add a hint track for each data stream, in order to facilitate OTA real-time streaming.

Total bit rate (Kbps) 172 172 172

Compatible phones (UMTS)

W950, W900, W880i, W850, W830, K810, K800, K790, K610, M600, P990, Z610

W950, W880i, W850, W830, K810, K800, K790, K610, Z610

W950, W900, W880i, W850, W830, K810, K800, K790, M600, P990

Notes Compatible with full speed UMTS networks on all 3G phones. Improved image quality, motion, and clearer audio.

Further improved image quality with fluid motion and clearer audio.

Better image detail with fluid motion and clearer audio.

QCIF Resolution High Quality QCIF resolution

QVGA Resolution

18 February 2007

Tutorial | Formatting media for OTA streaming and download

With multimedia video software that supports the 3GP container as an output format, such as Apple Quicktime™ Pro or Final Cut™ Studio, transcoding both streams to the required codecs can be done together, and multiplexing is performed automatically, with adding hint tracks as an optional process. Stages 2, 3, 4, and 5 are essentially condensed to one stage.

However, for most video composition software, AVI is the recommended available output container for the MPEG-4 video stream. Similarly, although some video software supports AAC compression as an output format, for most software it is suggested to export the audio stream to WAV and then encode to LC-AAC with an external codec or other audio software. Finally, extracting the MPEG-4 video from the AVI, then multiplexing audio streams together and adding hint tracks can be achieved with tools that support 3GP container authoring.

For most video composition software, when making use of Apple Quicktime™ Pro to encode data streams and multiplex to a 3GP container, it is recommended to output multimedia video in the MOV for-mat. The MOV format allows the multimedia video to be easily opened by Quicktime Pro.

Recommended audio stream parameters for audio-only content

The table below shows suggested values when encoding audio streams for OTA delivery of audio content on Sony Ericsson phones.

High compatibility Better compression efficiency

Excellent compression efficiency

GPRS networks (nominal 53 Kbps bandwidth)

Audio Codec LC-AAC HE-AAC HE-AACv2

Audio bit rate (Kbps) 40 40 40

Audio sample rate (kHz) 44.1 44.1 44.1

Audio Channels Mono Mono Stereo

Compatible phones (GPRS, EDGE, UMTS)

W950, W900, W880, W850, W830, W810, W800, W710, W610, W600, W550, W300, K810, K800, K790, K750, K610, K550, M600, P990, Z710, Z610

W950, W900, W880, W850, W830, W810, W710, W610, W600, W550, W300, K810, K800, K790, K610, K550, M600, P990, Z710, Z610

W950, W880, W850, W830, W710, W610, K810, K800, K790, K610, K550, M600, P990, Z710, Z610

Notes Compatible for stream-ing and download with all phones and network speeds.

Improved frequency range.

Improved frequency range and stereo sound.

EDGE and Basic UMTS networks (nominal 128 Kbps bandwidth)

Audio Codec LC-AAC HE-AAC HE-AACv2

Audio bit rate (Kbps) 96 64 48

19 February 2007

Tutorial | Formatting media for OTA streaming and download

The first section of the table for GPRS connections indicates which AAC profiles each phone supports.

The suggested parameters are intended to permit real-time streaming at each connection speed. Using a higher bit rate than the connection speed supports is possible for OTA downloads, as real-time data deliv-ery is not required. This allows for CD quality audio to be delivered over GPRS connections. However, the higher quality media is provided at the expense of extended download time on a GPRS connection.

Audio sample rate (kHz) 44.1 44.1 44.1

Audio Channels Stereo Stereo Stereo

Compatible phones (EDGE, UMTS)

W950, W900, W880i, W850, W830, W810, W710, W610, W600, W300, K810, K800, K790, K610, K550, M600, P990, Z710, Z610

W950, W900, W880i, W850, W830, W810, W710, W610, W600, W300, K810, K800, K790, K610, K550, M600, P990, Z710, Z610

W950, W880i, W850, W830, W710, K810, K800, K790, K610, K550, M600, P990, Z710, Z610

Notes Compatible for stream-ing with all EDGE/UMTS capable phones and net-works. Suitable for OTA download to GPRS phones.

Maintains quality at a lower bit rate. Suitable for all EDGE/UMTS capable phones and networks only.

Maintains quality at even lower bit rate.

Full UMTS networks (nominal 256 Kbps bandwidth)

Audio Codec LC-AAC N/A N/A

Audio bit rate (Kbps) 128 (with VBR encoding) N/A N/A

Audio sample rate (kHz) 44.1 N/A N/A

Audio Channels Stereo N/A N/A

Compatible phones (UMTS)

W950, W900, W880i, W850, W830, K810, K800, K790, K610, M600, P990, Z610

N/A N/A

Notes Compatible for stream-ing with all full speed UMTS capable phones and networks. Suitable for OTA download to EDGE phones. Superb audio quality.

N/A N/A

High compatibility Better compression efficiency

Excellent compression efficiency

20 February 2007

Tutorial | Formatting media for OTA streaming and download

Other considerations

The following is general advice to achieve better results when formatting multimedia content for Sony Ericsson phones.

Sony Ericsson phone screens have a 4:3 aspect ratio. The visual resolution of the source media may be of a different aspect ratio, such as widescreen 16:9. Whilst it is possible to add black borders to the top and bottom of the video to preserve the aspect ratio, due to the limited size of phone screens it is recom-mended to instead crop pixels from the sides of the video so it becomes 4:3 aspect. Most media content is designed to permit this without losing significant areas of the image, for applications such as 4:3 aspect television broadcasts.

It is best to start with sources of a higher visual resolution than the target phone. Through intelligent down-sampling to the phone resolution, a lot of detail can be preserved compared to beginning with a low visual resolution source. It is advised to use an anti-aliasing filter when resizing the visual resolution of the media. Most video composition software automatically applies the anti-aliasing filter.

Phones have limited audio amplification power, so it is useful to make the audio stream as loud as possi-ble, without compressing dynamic range. This can be done through a technique known as peak normali-zation, which increases the volume of the entire audio stream until the loudest moment is at the absolute maximum volume. Most audio editing software contains this feature, and other tools are also available to achieve it.

Finally, when authoring 3GP media for the purpose of real-time streaming, it is extremely critical that the file contains correct hint tracks for any video or audio streams within the file. Detailed information on this requirement can be found in the next section of this document. Conversely if you have no intention of streaming the formatted media files, and intend only to offer them as OTA downloads, then omitting the hint tracks entirely can be beneficial. Hint tracks add to the file size by an appreciable fraction, so remov-ing the hint tracks can reduce file size 5-15% depending on the file contents. This in turn can reduce data usage and download time for OTA downloads.

Content delivery methods

This section describes available methods for delivering media content to Sony Ericsson phones through OTA network connections.

OTA downloads using HTTP

Providing media as a download through HTTP (Hyper-Text Transfer Protocol) is the most compatible way to deliver content. HTTP downloads can be served using any Web server, such as IIS or Apache. The media files can be provided to users as a link on any HTML or WML Web page. Users must wait for the media to be fully transferred before it can be played.

HTTP uses solely TCP packets for communication between the server and client, so this method of deliv-ery is compatible with any NAT implementation used by network operators for client access to the wider Internet.

21 February 2007

Tutorial | Formatting media for OTA streaming and download

Third party content providers who wish to deliver their content via the Internet are strongly advised to offer HTTP downloads of media, to ensure their service is reachable by all users.

One large advantage in delivering media via HTTP downloads is that the content can be saved on the phone for later viewing. To protect content from undesired redistribution by the user, Sony Ericsson phones support DRM (Digital Rights Management) for media files. Content can be DRM protected using the Sony Ericsson DRM Packager software, and its accompanying developer guide, located at http://developer.sonyericsson.com.

OTA real-time streaming using RTSP

By utilizing RTSP (Real Time Streaming Protocol), media can be played in real-time as it arrives to the phone. It should be noted that media delivered through RTSP is not saved for later viewing on the client phone (and thus does not require DRM protection). An RTSP server is needed to deliver media through RTSP, with recommended implementations including:

• Apple Darwin Streaming Server• Real Helix Server Unlimited• PacketVideo pvServer

RTSP utilizes both TCP and UDP packets. Initial connection to the server from a client phone occurs on port 554 (by default) using TCP. This TCP connection is kept open by the client to send commands such as pausing, playback, and seeking within the streamed media. The data streams within the media are sent by the server to the client phone using UDP packets. During the initial TCP connection to the server, the client phone nominates four UDP ports on which to receive the media data streams. Typically these are ports 3456 to 3459.

Because of the use of UDP packets, any NAT implementation working between the server and client phone must be capable of forwarding UDP packets correctly, or be RTSP aware. If a NAT implementation or a firewall exists between the server and client that is incompatible with RTSP, then the client phone successfully creates the TCP connection with the server, but the UDP packets containing the media streams are prevented from reaching the client phone.

Unfortunately the majority of NAT implementations and/or firewalls currently used by network operators for client Internet access do not correctly forward the UDP packets an Internet-based RTSP server sends. As such, media delivery by RTSP is currently most dependable for network providers alone. Network pro-viders can provide RTSP servers for their media content on the local network side of their NAT implemen-tation, thus avoiding use of the NAT host for their RTSP connections, and ensuring successful stream reception.

Third party content providers that wish to deliver media using Internet-based RTSP servers can still do so, but only client phones using the minority of network providers (with RTSP compatible NAT implementa-tions and/or firewalls) are able to receive streams successfully. For this reason, any third party content providers are strongly recommended to offer HTTP downloads alongside RTSP streams.

If you are a network provider and would like to provide Internet-based RTSP support for your customers, but find a NAT or firewall upgrade to be unviable or uneconomical, there is another option: Recent Sony Ericsson phones support the use of streaming proxy servers. An RTP/RTSP proxy server can be run alongside your NAT configuration, as an alternative route for RTSP served data streams to pass from the public Internet to client phones successfully. Providing access to third party media streams by implement-ing RTP/RTSP proxies could easily generate a significant increase in revenue from customer data usage, for very little investment.

22 February 2007

Tutorial | Formatting media for OTA streaming and download

When NAT implementations or firewalls do not pose a problem, RTSP is a powerful and useful tool in delivering content. Real-time playback without waiting for content to fully download can be very popular with phone users. RTSP streams can be provided as links within HTML and WML Web pages for client phones, using the rtsp:// prefix to the server and file address, for example “rtsp://myrtsphost.com/myvideo.3gp”.

The importance of correct hint tracks within media delivered using RTSP

3GP media files must contain a hint track for each data stream, for the file to be successfully streamed via RTSP. For phones to correctly receive and play the streamed audio and video data, the hint tracks must be valid and accurate for each data stream. These hint tracks are only necessary within files delivered using RTSP, and are not necessary for HTTP delivery.

Hint tracks are found within the container file just like audio and video stream tracks. The hint track for each data stream within a file should accurately describe the “payload” of the data stream, as in the type of data within the stream.

Often these hint tracks are automatically generated by the encoding or multiplexing software when a 3GP media file is created, or their creation may be provided as an option. However, not all software generates hint tracks suitable for streaming to phones. Some software creates a generic hint track for a data stream, which may allow the media file to be streamed to a more resilient desktop computer playback client, but the same file does not successfully stream to a less complex phone client, due to the unspecific generic payload descriptor.

As phone clients require accurate payload descriptions within hint tracks to ensure successful RTSP streaming playback, the following table shows the required payload descriptors for the data streams found in 3GP media files for Sony Ericsson phones.

The most common problems found with 3GP files that do not stream correctly is either a complete lack of hint tracks, or equally often that AAC audio tracks have been described as 'mpeg4-generic'. The practical example of formatting music audio for OTA delivery, found further inside this document, shows how to correct a generic hint track descriptor.

Data stream compression codec Required hint track payload descriptor

AMR AMR

MPEG-4 Visual Simple Profile MP4V-ES

H.264 Baseline H264

AAC (LC / HE / HEv2) MP4A-LATM

23 February 2007

Tutorial | Formatting media for OTA streaming and download

Other considerations when delivering media OTA

When presenting media for users to download or stream to their phone over network connections, it is advised to visibly label links with the total file size of the media. In the case of OTA download this gives the user an idea of how long the media takes to download. More importantly, due to the cost associated with data usage on many networks, this informs the user of the likely costs they could incur with their operator when streaming or downloading the content.

Another consideration is the choice of bit rates used for content. While higher quality media has its advan-tages, the added bit rate may be a factor with a user connection speed and data costs. The common solu-tion for content providers is to offer both high quality and low quality versions of the media side by side. This affords users a choice depending upon their data cost and network conditions, and serves to make the media viable to a broader range of users.

Another factor related to the overall size of content made available is the length of the media. Some users are happy to download or stream larger media files such as full length music videos and music audio tracks, but it is recommended to provide media of shorter lengths also. Again this relates to data usage costs as well as the time required to wait for OTA downloads to finish before playing of the media is pos-sible.

The actual content itself should be given some consideration for video media, especially when selecting media for GPRS network speed delivery. The small phone screen combined with heavy compression can make distant shots very difficult to follow. Video media with plenty of close-up shots or a plainly visible subject provide the best entertainment value at lower resolutions and bit rates.

Finally, it is recommended to add some form of metadata (such as an artist and/or title entry) to audio-only media such as music. Not only does this allow better integration of downloaded audio media into the music library on the user phone, but it also provides visible identification of currently streaming audio con-tent in the media player during RTSP playback. It is also recommended to change the 3GP compliant con-tainer file extension from .3gp to .m4a for audio-only media, to permit better integration with the music applications on phones.

Setting up an RTSP test environment

This section briefly discusses the procedure of evaluating RTSP media delivery using an RTSP server, and common issues that arise.

Software setup

The recommended RTSP server to use when evaluating RTSP content delivery is Darwin Streaming Server by Apple (DSS). It supports multiple operating systems, is free to download, and supports stream-ing of 3GP files containing all codecs used by Sony Ericsson phones. Please consult the relevant licenses for any software before using it for commercial and/or personal purposes.

Darwin Streaming Server requires one of the following operating systems:

24 February 2007

Tutorial | Formatting media for OTA streaming and download

• Microsoft® Windows® NT5 (2000, XP, 2003)• Red Hat Fedora Core 4 Linux• Mac OS X 10.3.9 or later

Use of the administrator utilities also requires that a working Perl installation is present.

DSS can be downloaded at the following Web site:http://developer.apple.com/opensource/server/streaming/index.html

Due to the multiple operating systems supported, and the complexities of individual network environ-ments, detailed instructions on the setup of DSS are beyond the scope of this guide. Support can be found in the Streaming-Server-Users mailing list at apple.com, and a detailed explanation of configuration options can be found in the DSS configuration file (streamingserver.xml). However, the default configura-tion should permit basic testing of RTSP streaming on your network, using the default RTSP port of 554.

Establishing RTSP connectivity with your server and troubleshooting

It is advised to begin testing connectivity using the sample files supplied with DSS, and with local network desktop clients. This minimizes the chances of media, network, or client error during initial setup of your DSS host.

The Apple Quicktime desktop client is ideal for initial testing. Firstly ensure the DSS server is a running process on the server machine. On the desktop client using Quicktime, try to open “rtsp://<server>/sample_50kbit.3gp”, where <server> is your DSS hostname or IP address. Should this step fail then inspect your DSS configuration and/or firewall configuration.

Establishing connectivity from the intended client network can also be done with a desktop client. Unless you are a network provider the intended client network will most likely be the Internet.

Once connectivity has been established from the intended client network with a desktop client, OTA con-nectivity from a phone can be tested using the same rtsp:// address for the supplied sample, using the Web browser in the phone to launch the link.

You may be faced with the result of desktop connectivity but no phone connectivity. The first step of trou-bleshooting should be to run a Web server on your DSS host and establish that HTTP connectivity from the phone is possible. This demonstrates that the host is reachable from OTA networks.

The next troubleshooting step is to ensure UDP connectivity from a desktop client using the intended cli-ent network (typically the Internet). By default most desktop client software, including Quicktime, can use TCP tunnelling to overcome UDP failure in RTSP streaming. This method places the RTP data streams inside TCP packets instead of UDP packets. This adds appreciable bandwidth overhead and is therefore not a viable solution for current OTA connection speeds on phones, so TCP tunnelling is not present in phone client software.

To disable TCP tunnelling in Quicktime, you must go to the Advanced tab of the Quicktime Preferences, and change the Transport Method from Automatic to Custom, and then set the Transport Protocol to UDP. Once this is set, you can then attempt to connect to your DSS host from the intended client network and establish the state of UDP connectivity to your DSS host.

25 February 2007

Tutorial | Formatting media for OTA streaming and download

If this step fails, you should investigate your DSS host firewall configuration to ensure outbound UDP is permitted, and check that the network route to the intended client network is free from NAT hosts and fur-ther firewalls.

If the step is successful from a desktop client, then the network operator in use with your phone most likely has an RTSP incompatible NAT implementation or firewall. In order to access your RTSP server using OTA networks, you may be able to contact your network provider for the address of any existing RTP proxy server on their network. Otherwise, you need to use an RTSP compatible network provider to reach your DSS host from a phone.

To find an RTSP compatible cellular network provider in your area, you can consult the Streaming-Server-Users mailing list at apple.com.

Practical examples of authoring video media for OTA download and streaming

This is an example of how to format a multimedia video for EDGE/UMTS network delivery to phones sup-porting the QCIF (176x144) visual resolution. The source multimedia video SEexample.avi is included in the archive with this document. The source media is of different resolution and aspect ratio in order to address the issues of correcting them within this example. Steps to ensure RTSP compatibility of the final media are included.

The parameters used in formatting the video are taken directly from the table in “Recommended video and audio stream parameters for video content” on page 16, specifically the EDGE/UMTS and QCIF reso-lution combinations. Substitution of parameters from the table for other network speed and resolution combinations can be easily made once you are familiar with the processes involved.

Notes on software used

Sony Vegas™ 6.0 has been selected to illustrate resizing video pixel resolution, normalizing audio stream volume, as well as altering video aspect ratio and frame rate. Apple® Quicktime® Pro has been selected as the tool used to encode both MPEG-4 and AAC data streams, and for multiplexing these streams into a 3GP IsoMedia container suitable for phone playback. Please consult the relevant licenses for any software before using it for commercial and/or personal purposes.

The video editing steps are easily translated to other commercial video composition software such as Adobe™ Premiere™ and After Effects™.

Similarly the audio normalization and encoding can be achieved with audio software such as Adobe Audi-tion, Nero SoundTrax, and many others.

The data stream encoding can be alternatively achieved by external encoders such as the XviD MPEG-4 encoder and Nero AAC encoder. Separate IsoMedia container authoring software such as GPAC Mp4box can be used to place streams inside a 3GP container and add valid hint tracks for streaming.

26 February 2007

Tutorial | Formatting media for OTA streaming and download

Other software such as Apple Final Cut Studio can accomplish all processes involved within a single application.

Environment and software set-up

You must be using an NT 5 operating system, such as Microsoft Windows 2000 or Windows XP, with all current service packs.

The following software is required and must be installed on your computer:

• Sony Vegas 6.0 – for editing video properties.• Apple Quicktime Pro 7.0 – for encoding, muxing, and hinting 3GP content.

At this point, select a folder on your computer as a location for saving files, for example C:\videos. What-ever location you select will be referred to in this document as the working directory. Place a copy of SEexample.avi in this directory.

Creating the video project

Load Vegas 6.0 and create a new project using the menu File – New…. In the New Project dialog set the following options on the Video tab:

• Width: 176• Height: 144• Frame Rate: 15.00• Field order: none (progressive scan)• Pixel aspect: 1.0 (square)

You may optionally wish to save these settings as a project template by entering a new name in the Tem-plate setting, and then clicking the Save icon.

On the Audio tab ensure the following options are set:

• Master bus mode: Stereo• Number of stereo buses: 1• Sample rate (Hz): 44,100• Bit depth: 16

Click the Ok button to create the new project.

Importing the source video and formatting track properties

Return to the menu and select File – Import – Media… . Browse to your working directory and open the SEexample.avi file.

27 February 2007

Tutorial | Formatting media for OTA streaming and download

Select SEexample.avi in the Project Media window and drag it to the track list (the empty panel on the left, above the Project Media window).

The video and audio tracks within SEexample.avi are added to the project, and a series of thumbnails illustrating the video track contents appears in the track view.

Right click one of these thumbnails and select Video Event Pan/Crop… . The Event Pan/Crop dialog is shown.

You find a Preset option at the very top of the dialog. Set the following:

• Preset: 4:3 Standard TV aspect ratio

The dotted video frame shown in the dialog is now 4:3 aspect ratio and also centred on the middle on the video. This makes better use of the limited display size on the phone. Close the dialog.

On the track view you also see the audio track displayed as a wave graph underneath the video track thumbnails. Right click the wave graph of the audio track and select Properties…

In the Properties dialog that appears, check the Normalize option. Click Ok to return to the main window. This normalizes the audio volume level, making the best use of the limited volume on phones.

Exporting the formatted video project

On the menu select File – Render as… and a new dialog appears.

In the Render As dialog, browse to your working directory, and set the following:

• Save as type: Quicktime 6 (*.mov)• File name: SEexample.mov

Click Save to begin rendering the video. Once this is complete, you may close Vegas 6.0, optionally saving the project for future reference.

Transcoding streams to MPEG-4/AAC within a 3GP file

This step requires the use of Quicktime 7 Pro. Open the file that was just created by Vegas, SEexam-ple.mov, using the Quicktime 7 player.

Using the menu go to File – Export... or press CTRL-E. The Save exported file as dialog appears.

Set the following option:

• Export: Movie to 3G

Now click the Options… button and the MPEG-4 Export Settings dialog appears. Set the following option:

• File Format: 3GPP

28 February 2007

Tutorial | Formatting media for OTA streaming and download

Ensure that the topmost drop down menu is set to Video and set the following options:

• Video Format: MPEG-4• Data Rate: 70 Kbps• Image Size: Current• Frame Rate: Current• Key Frame: Every - 100 frames

Change the topmost drop down menu to Audio and set the following options:

• Audio Format: AAC-LC (Music)• Data Rate: 16 Kbps• Channels: Mono• Output Sample Rate: 22.050 kHz

To add the correct hint tracks for RTSP streaming, change the topmost drop down menu to Streaming and check the Enable streaming option so that it is selected.

Click the Ok button to return to the Save exported file as dialog. Browse to your working directory and enter a filename of SEexample.3gp.

Click the Save button. Once the process is complete, the file SEexample.3gp is ready for OTA streaming playback on EDGE/UTMS network phones, or OTA download on any network with all phones.

Transcoding streams to H.264/AAC within a 3GP file

This step requires the use of Quicktime 7 Pro. Open the file that was just created by Vegas, SEexam-ple.mov, using the Quicktime 7 player.

Using the menu, go to File – Export... or press CTRL-E. The Save exported file as dialog appears.

Set the following option:

• Export: Movie to 3G

Now click the Options… button and the MPEG-4 Export Settings dialog appears. Set the following option:

• File Format: 3GPP

Ensure that the topmost drop down menu is set to Video and set the following options:

• Video Format: H.264• Data Rate: 70 Kbps• Image Size: Current• Frame Rate: Current• Key Frame: Every - 100 frames

Change the topmost drop down menu to Audio and set the following options:

• Audio Format: AAC-LC (Music)• Data Rate: 16 Kbps• Channels: Mono

29 February 2007

Tutorial | Formatting media for OTA streaming and download

• Output Sample Rate: 22.050 kHz

To add the correct hint tracks for RTSP streaming, change the topmost drop down menu to Streaming and check the Enable streaming option so that it is selected.

Click the Ok button to return to the Save exported file as dialog. Browse to your working directory and enter a filename of SEexample-H264.3gp.

Click the Save button. Once the process is complete, the file SEexample-H264.3gp is ready for OTA streaming playback on EDGE/UTMS networks to H.264 codec compatible phones, or OTA download on any network to all H.264 codec compatible phones.

Practical example of authoring audio-only media for OTA download and streaming

This section shows an example of formatting uncompressed WAV audio for use on Sony Ericsson phones, using each of the recommended formats in the previous section. The source audio file is included with this document (music.wav). The Music example is stereo PCM WAV audio at 44.1 kHz.

The parameters used in formatting the audio are taken directly from the table in “Recommended audio stream parameters for audio-only content” on page 19, specifically the GPRS and “better compression efficiency” combination. Substitution of parameters from the table for other network speed and compres-sion efficiency combinations can be easily made once you are familiar with the processes involved.

Notes on software used

Sony Sound Forge 8.0 has been selected to illustrate the process of normalizing audio volume and encod-ing to MP3 format. The Nero AAC reference encoder has been selected for encoding to various AAC pro-files. Neroaactag.exe is used for insertion of ID tag metadata within encoded files. Apple Quicktime is used to correct the payload descriptor of the hint track for streaming purposes. Please consult the rele-vant licenses for any software before using it for commercial and/or personal purposes.

Command line tools are used where fully featured software could be used instead in a production environ-ment. For example audio normalization and compression can be achieved in audio software such as Nero Soundtrax.

For batch compression of digital audio with the preservation of metadata text, software such as DBpower-amp can be used.

Software developers find that the source code is available for some of the tools.

30 February 2007

Tutorial | Formatting media for OTA streaming and download

Environment and software setup

You must be using an NT 5 operating system, such as Microsoft Windows 2000 or Windows XP, with all current service packs.

The following software is needed to complete this example:

• Sony Sound Forge 8.0 – for audio normalization and MP3 file creation• Apple Quicktime Pro 7.0 – for generating a correct hint track• Nero Digital Audio AAC codec – A command line reference AAC/AAC+ codec –

http://www.nero.com/nerodigital/eng/Nero_Digital_Audio.html

At this point select a folder on your computer as a location for saving files, for example C:\audio. Whatever location you select is referred to in this document as the working directory.

Place copies of neroaactag.exe and NeroAacEnc.exe into the working directory. If you are aware of whether your CPU supports the SSE2 optimization, use NeroAacEnc_SSE2.exe instead for faster AAC encoding times.

Place a copy of music.wav in the working directory.

Sound Forge 8.0 should be installed on the computer.

Normalizing the audio volume

For this step you must open Sound Forge 8.0. Using the menu select File – Open… and browse to your working directory. Open the music.wav file.

Using the menu, go to Process – Normalize… and in the dialog that appears, ensure Normalize using: is set to Peak level. Click Ok to normalize the volume level.

On the menu select File – Save to save the normalized version of music.wav to your working directory.

Encoding to HE-AAC

First of all a command prompt must be opened and current directory set to the working directory.

On your Windows Start Menu, select Programs – Accessories – Command Prompt to open the command window. Enter the command:

CD /d <your working directory> where <your working directory> is the path of your working directory. For example "CD /d C:\audio".

The following details the process of formatting copies of the audio content in the available AAC profiles. Encoding to HE-AAC is achieved using the NeroAacEnc.exe command line encoder, with the following command:

Neroaacenc.exe -cbr 64000 -he -if music.wav -of music_HEAAC.m4a

31 February 2007

Tutorial | Formatting media for OTA streaming and download

This encodes with the bit rate parameter set to constant bit rate with a target average of 64 kbps, and the HE-AAC profile selected, outputting the audio to a file called music_HEAAC.m4a. Although neroaacenc.exe has a -hinttrack argument, this performs hinting with an “mpeg4-generic” pay-load descriptor, which is unsuitable for streaming to phones. We correctly add hint tracks and place the audio stream into a 3GP compliant container using Apple Quicktime in the next step.

To encode to 96 Kbps LC-AAC instead of HE-AAC we would use the command:

Neroaacenc.exe -cbr 96000 -lc -if music.wav -of music_HEAAC.m4a

And to encode to 48 Kbps HE-AACv2 the following command would be used:

Neroaacenc.exe -cbr 48000 -hev2 -if music.wav -of music_HEAACv2.m4a

Inserting a correct hint track for the audio stream

This step requires the use Quicktime 7 Pro. Open the file that was just created by Neroaacenc.exe, music_HEAAC.m4a, using the Quicktime 7 player.

Using the menu go to File – Export... or press CTRL-E. The Save exported file as dialog appears.

Set the following option:

• Export: Movie to 3G

Now click the Options… button and the MPEG-4 Export Settings dialog appears. Set the following option:

• File Format: 3GPP

Ensure the topmost drop down menu is set to Video and set the following option:

• Video Format: None

Change the topmost drop down menu to Audio and set the following option:

• Audio Format: Pass through

To add the correct hint tracks for RTSP streaming, change the topmost drop down menu to Streaming and check the Enable streaming option so that it is selected.

Click the Ok button to return to the Save exported file as dialog. Browse to your working directory and enter a filename of music_HEAAC_h.3gp.

Click the Save button. Once the process is complete, the file music_HEAAC_h.3gp is ready for OTA streaming playback on EDGE/UMTS networks with all HE-AAC compatible phones, or OTA download on any network with all HE-AAC codec compatible phones. However, to improve music player integration the following step is recommended.

32 February 2007

Tutorial | Formatting media for OTA streaming and download

Adding metadata for artist and title information and changing extension

Return to the command window we opened earlier, and ensure your working directory is still the current directory (the current directory is written to the left of the cursor).

Enter the following command to add artist and title tags to the 3GP container metadata:

NeroAacTag.exe music_HEAAC_h.3gp -meta:artist="OTA Tutorial" -meta:title="HE-AAC Example"

Although we are using the 3GP form of the IsoMedia container, giving the output file an M4A extension will cause phones to recognize the file as music, which will improve the way the Walkman® Player or Music Player on the phone works with the file compared to files with the 3GP filename extension.

Enter the following command to rename the file extension to .m4a:

Rename music_HEAAC_h.3gp music_HEAAC_h.m4a

The file music_HEAAC_h.m4a now shows correct artist and title information when managed with phone music applications, and when streamed via RTSP.

33 February 2007

Tutorial | Formatting media for OTA streaming and download

Conclusion

Content providers, operators and service providers, and other interested parties, have the fundamental information required for successfully formatting multimedia video content to an impressive level of quality, for playback on Sony Ericsson phones through OTA delivery mechanisms.

This extends to the exact specifications and practical details of data stream compression and using suita-ble containers. Information is also provided on the requirements, limitations, and benefits of each OTA delivery method. Optional extra processes and suggestions can be employed to further enrich the level of quality at which the multimedia video content is delivered.

Those with no prior experience of formatting digital multimedia video or networking hopefully find the background information sufficiently enlightening, such that they can understand the more specific require-ments of Sony Ericsson phones, and feel confident about the task of formatting and delivering multimedia video for them.

Software developers can access the noted ISO standards that define the container formats and compres-sion algorithms employed. Developers can also refer the available source code of the tools employed in this document as successful examples of practical implementations of those standards.

34 February 2007