23
Optimization of Optimization of Arithmetic Coding Arithmetic Coding By Kolluru Krishna Bharath

Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Embed Size (px)

Citation preview

Page 1: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Optimization of Optimization of Arithmetic CodingArithmetic Coding

ByKolluru Krishna Bharath

Page 2: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

OutlineOutline

ObjectiveMotivationOptimizations w.r.t platforms

◦PredicationOptimizations w.r.t algorithm (Arithmetic

Coding)◦Sequential AC◦Parallel AC

Conclusion

Page 3: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

ObjectiveObjective

To study the performance of the algorithm on different platforms.

To optimize the algorithm to achieve better performance.

Page 4: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

MotivationMotivation

Page 5: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

MQ CodingMQ Coding

MQ Coding:1. Resizing of the interval to eliminate the need for high precision for range calculation2. Adaptive Probability(MPS) calculation (requires only one pass)3. Integer Arithmetic .

Page 6: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Machine specific optimizationsMachine specific optimizations

Compilers take advantage of the architecture underneath.

Examples of machine specific optimizations are ◦Predication◦Software pipelining

ARM core supports Predication.

Page 7: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

PredicationPredication

Objective◦Eliminate hard-to-predict branches.◦Increase ILP.

Advantage of using ARM:◦Supports predication(conditional codes) and◦Gives the option of setting the flag for every

arithmetic and logical instruction.

Page 8: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Predication-ExamplesPredication-Examples

UPDATING THE COUNT LDRB R4,[R1] ;R4 HAS THE SYMBOL. LDRB R5,[R2] ;R5 HAS THE COUNT OF NUMBER OF ZEROS. LDRB R6,[R3] ;R6 HAS THE COUNT OF NUMBER OF ONES. CMP R4,#0 ;CHECK IF THE VALUE THAT IS READ FROM THE SOURCE IS 1/0. ADDNE R5,R5,#1 ;IF ZERO, ADD 1 TO THE COUNT OF COUNT_0. ADDEQ R6,R6,#1 ;IF ONE, ADD 1 TO THE COUNT OF COUNT_1. STRB R5,[R2] ;THE COUNT_1 HAS BEEN UPDATED WITH THE NEW VALUE. STRB R6,[R3] ;THE COUNT_0 HAS BEEN UPDATED WITH THE NEW VALUE.

EVALUATING THE MPS AND THE LPS. CMP R5,R6 ;CHECK IF THE COUNT_0>COUNT_1. MOVGE R4,#1 ;IF YES, MOVE 1 INTO R4. MOVLT R4,#0 ;IF NO, MOVE 0 INTO R4. LDR R0,=(MPS) STRB R4,[R0]

IF NO PREDICATION ; THE CODE WOULD LOOK LIKE LDR R0,=(MPS) CMP R5,R6 BGE LOOP1 ; HIGHLY UNPREDICTABLE BRANCH. MOV R4,#1 STRB R4,[R0] B EXIT LOOP1: MOV R4,#0 STRB R4,[R0] EXIT

Page 9: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Predication - ContinuedPredication - Continued

What is the advantage of using predication for MQ coding?◦The algorithm has small sized loops &◦The branches are highly unpredictable.

This favors predication.Performance Analysis shows that using

Predication, we get a fractional speed up of 2.75 on replacing a conditional branch instruction.

Page 10: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Predication – Continued Predication – Continued

Can predication be used for all algorithms?

No. Certain characteristics are required which best suit the usage of predication, such as◦Highly unpredictable branches◦Small loops (preferable), otherwise the cost of

executing both direction could be more the cost of misprediction.

Page 11: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Optimization of the AlgorithmOptimization of the Algorithm

Page 12: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Sequential AC-Data Flow DiagramSequential AC-Data Flow Diagram

BeginL=0;L=1;F(0)=0;for(j=1 to N) { i=index_of_symbol(j); L(j+1)=L(j)+(H(j)-L(j))*F(i-1); H(j+1)=L(j)+(H(j)-L(j))*F(i); }output((L+H)/2);end

Dependence Matrix is given by[ 1 1]

Page 13: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Sequential AC-Dependence GraphSequential AC-Dependence Graph

1 Dimension Loop

2 Dimensional Loop

1. Inner loop & outer loop parallelism are absent.

2. Loop interchange doesn’t help.

J

I

I

Page 14: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Parallel AC- Data Flow GraphParallel AC- Data Flow Graph

Page 15: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Parallel AC – Dependence GraphParallel AC – Dependence Graph

Do all i=1 to 2Do j=1 to 2

{l=index_of_symbol(j,i);L(j,i)=L(j,i)+(H(j,i)-L(j,i))*F(l-1); H(j,i)=L(j,i)+(H(j,i)-L(j,i))*F(l); }

EnddoEnddoall

L_final = L12 + (H12-L12)*L22;H_final = L12 + (H12-L12)*H22;

I

J

I

J

Dependence Graph for the code Figure(1).

Figure(1)Dependence graph for Parallel AC

Dependence Matrix for Figure(1)

Page 16: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Performance – Arithmetic CodingPerformance – Arithmetic Coding

The parallel Arithmetic Coding for a text message of length 1800 showed the follow speed up◦4.875 (without the overhead of loading the

values into separate processors)◦1.66 ( with the overhead)

For a text message of length ~10000, parallel showed a speedup of 2(with overhead).

Page 17: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

ConclusionConclusion

Running the MQ coder on ARM core improves the performance of the algorithm.

Tuning the AC for parallel execution provides a very good performance .

Page 18: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Thank youThank you

Page 19: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Questions ?Questions ?

Page 20: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Backup SlidesBackup Slides

Page 21: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

PredicationPredication

Predication◦Performance Analysis shows that using

Predication, we get a fractional speed up of 2.75 on replacing a conditional branch instruction, i.e. a reduction from 0.264us to 0.096us for a clock frequency of 41MHz (24ns). Each time a symbol (1/0) is encoded we save 7 cycles(i.e. 7 cycles/run) for every predicated instruction used instead of branch instruction. When this is executed, say, on a black & white image of size 256x256, we save ~0.5M cycles.

Page 22: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

Performance – Arithmetic CodingPerformance – Arithmetic Coding

The sequential and parallel Arithmetic Coding for the same testbench show dramatic change in the execution time◦Sequential – 0.078seconds◦Parallel – 0.016seconds

Page 23: Optimization of Arithmetic Coding By Kolluru Krishna Bharath

ReferencesReferences

Howard & Vitter, “Arithmetic Coding for Data Compression”.

David Sehr, Jay Bharadwaj, Jim Pierce, Priti Shrivastav, Carole Dulong, “IA-64 Compiler Technology”.

Utpal Bannerjee, “Loop Parallelization”.Pierre Boulet,Darte and Silber, “Loop

parallelization Algorithms: from parallelism extraction to code generation”.

Supol and Melichar, “Arithmetic Coding in Parallel”.