Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Click to edit Master subtitle style 11/23/09
MapReduce Jobs For Video Conversion
Ankur GuptaHarish Kumar NarwareHarsh AgrawalSourabh Gupta
11/23/09
Agenda• Motivation
• Introduction
• Why MapReduce ?
• What is FFmpeg ?
• Project Description
• Challenges Faced
• Load Balancing ( Optimization )
• Practical Use
11/23/09
Motivation
11/23/09
Motivation• MapReduce is a software framework introduced
by Google.
• It supports distributed computing on large datasets on clusters of computer.
• The framework is inspired by map and reduce functions commonly used in programming.
• Example,
MapReduce can sort a petabyte of data in only few hours.
11/23/09
Introduction
• In the project we have to convert huge number of video files from one format to another.
• We are using the MapReduce framework .
• We are also using the open source video converter FFMPEG .
• The data will be retrieved and stored on HDFS .
11/23/09
Why MapReduce ?• We need MapReduce since the number of video
files to be converted is huge .
• Using parallelism provided by MapReduce we can complete the task in less time .
• Distributed computing also provides better utilization of resources .
11/23/09
What is FFMPEG ?• FFmpeg is a complete, cross–platform solution
to record, convert and stream audio and video.
• FFmpeg is free software and is licensed under the LGPL or GPL .
• FFmpeg can be installed via downloading using SVN from the following link
http://ffmpeg.org/download.html
11/23/09
Project Description• Video files in a particular format, say AVI, will be
stored in HDFS .
• We will accept an input file containing locations of video files in HDFS and the format in which the file has to be converted.
• In Map phase we convert the video format .
• In Map phase firstly we downloaded the input video file from HDFS to local system using filesystem API’s (copyToLocalfile())
11/23/09
Project Description (cont.)• We used FFmpeg to convert this file into given
format .
• Then this new file is uploaded back into HDFS using API copyFromLocalfile() in the same directory with same name but with the extension of new video format.
• The HDFS path of the new files is then returned as output of the Map task.
• Reduce is not needed.
11/23/09
Commands • FileSystem hdfs = FileSystem.get(config);
• hdfs.copyToLocalFile(srcPath, dstPath);
• copyToLocalFile copies the file from srcPath in HDFS to dstPath in local system.
• hdfs.copyFromLocalFile(srcPath,dstPath);
• copyFromLocalFile copies the file at srcPath in local system to dstPath in HDFS . No file should be present at dstPath in HDFS .
11/23/09
Challenges we Faced !• An interesting problem we encountered was ,we
were not able to get the whole converted file using FFmpeg commands in Map task.
• Reason is when we run a command from a java program, it executes the command in a duplicate JVM (like a child process) , and our program was exiting before the child process could complete itself . Therefore only partial file was being converted .
11/23/09
How we solved the problems ?• We declare a datastream where the standard
output of the ffmpeg command (running) is shared .
• We put a while loop which waits for the output of this datastream and breaks only when this datastream returns null that is when the conversion is complete .
• So , in this way , we waited for the duplicate jvm to complete the conversion in our map task .
11/23/09
Challenges we Faced !• The main challenge was to properly distribute
the input splits .
• Each input split should contain path of files to be converted such that the total video data to be converted remains approximately same .
• For example there should no be input splits such that it contains the path of all the video files having large size . If such a thing happens then it will unbalance the load .
11/23/09
11/23/09
Load Balancing ( Optimization )
• Load Balancing between map tasks is very crucial .
• An approach
• We sorted the records in the input file on the basis of file size in HDFS using mapreduce .
• Rewritten the input file by taking one file from the start and one file from the bottom an then second file and and second last file from sorted file .
11/23/09
Load Balancing ( continued )• So , when the equal number of video files are
given to map tasks , there will be some optimization in terms of total video data converted by a map task .
• But , still it is not the best method .
11/23/09
Load Balancing ( continued )• MapReduce provides function to set the number
of map tasks for a given job .
• Job.setNumMapTasks(x);
• Parameter is only a hint for the number of map tasks . Actual value depends upon the implementation getsplits function of customInputFormat Class .
• A lower bound on the split size can be set via mapred.min.split.size .
11/23/09
Input Format• Validate the input-specification of the job.
• Split-up the input file(s) into logical InputSplit instances , each of which is then assigned to an individual Mapper .
• Provide the RecordReader implementation used to glean input records from the logical Inputsplit for processing by the Mapper .
• Default implementation is to split the input into logical InputSplit instances based on the total size, in bytes, of the input files.
11/23/09
New Approach !• Provide the InputFormat Implementation for the map task .
• In getsplit function of the InputFormat class , we divide the input split on basis of total size of video files for map task .
• In the function , we check the size of each file present in input in HDFS . And , when the total size of files exceeds a certain limit for a InputSplit , we create a new InputSplit .
• InputSplit is logical and consists of path of input file , start offset and the end offset .
11/23/09
New Approach !( cont. )• Here , we can exactly define the number of map
tasks and the input for each map task .
• Set the Input Format for the job job.setInputFormat( CustomInputFormat.class).
11/23/09
11/23/09
Practical Use• There are many website which convert video files from one format to another online. They
can use this project to do so.
• Most of this websites do not use MapReduce right now.
• Example of such sites are,
• http://www.zamzar.com/
• http://www.any-video-converter.com/products/for_video_free/
• http://www.getafreelancer.com/projects/PHP-Python/Youtube-API-video-conversion-website.html
• http://vixy.net/
11/23/09
Questions ?
11/23/09