PRESENTED BY MOHAMMED ALHARBI MOHAMMED ALEISABAKRI AWAJI Causing Incoherencies Parallel sorting...

PRESENTED

BY MOHAMMED ALHARBI MOHAMMED ALEISA BAKRI AWAJI

Causing IncoherenciesParallel sorting algorithms and

study cache behaviors L1 and L2 on Multi-Core Architecture

Instructor Prof. Gita Alaghband

Outlines

Motivation Our implementation in detailsContributionsExperimentsEvaluationRelated workConclusion Challenges What did we learn ?

Motivation

Our motivation in this project is to cause incoherence by simulate three sorting algorithms(bubble sort, Quick sort, insertion sort) :

but, the big question is :Why would we want to cause incoherencies ?Coherence is needed to meet an architectural assumption

held by the software designer. The bad program design identified by this project demonstrates what happens when the coherence assumption is ignored.

As we know, when we use multi-core on processor effectively , that might cause coherence problems.

We need to learn how the Architecture reacts with the shared data when using multi-core processor.

Motivation(Important Questions)

How would we use the sorting algorithms in such details to demonstrate our project?

The sorting algorithms simply provide the necessary traces and overlapping reads and writes to cause coherence issues.

Why are we choosing sorting Algorithms instead of applications?Many applications use sorting algorithms of various kinds. A sorting algorithm can be an entire application (utility). Sorting algorithms provide a clear area of coherence issues when executed in parallel on the same data.

Motivation(Important Questions)

When presenting the sorting algorithms as sequential algorithms ? why are they related to cache and multicore architecture?

Sorting algorithms are frequently executed on multicore architectures and make heavy use of caches. The algorithms, again, are simply to provide functional traces that result in coherence issues.

Why are we choosing these sorting algorithms ?They are commonly known and applied and provide opportunities to examine incoherence.

How will we count the miss and hit ?Misses were counted as compulsory (never had the data to begin with), conflict (fighting same slot in direct-mapped architecture), and, in a way, coherence (in the form of updates that invalidate blocks). A hit can either be a read hit or a write hit.

Our Implementation in details

Simulating three sorting algorithms to study:

Causing Incoherence in L1 cache by applying coherences (Invalidate policy) with write through policy or write back policy

For measuring; read hit , write hit , coherence miss, conflict miss, compulsory miss Sorting Algorithms (bubble sort, Quick sort, insertion sort)

Input long data array for example:

7 3 2 1 5 4 6

Simulating two different sorting algorithmsFor example (bobble sort vs. Quick sort) in

parallel on two cores with same array using (write through policy) with (invalidation policy).

L2 is sharing.Showing in figures the fighting on the same

data between both.

Bus snooper

Core 1

Core 2

Bubble sort

Quick sort

For example: causing incoherence (write through policy) in case of invalidation

Running two different algorithms In the same time with same array

Updating with Write through policy 7 3 7 3

searchswap

Bus snooper

Core 1

Core 2

Bubble sort

Quick sort

Sending

broadcast

update data in

other cores

Or request

invalid data

For example: causing incoherence (write through policy) ) in case of invalidation

Updating with Write through policy 7 33 7

Bus snooper

Core 1

Core 2

Bubble sort

Quick sort

Sending

broadcast

update data in

other cores

Or request

invalid data

For example: causing incoherence (write through policy) ) in case of invalidation

Updating with Write through policy 3 73 7

Bus snooper

Core 1

Core 2

Bubble sort

Quick sort

Sending

broadcast

update data in

other cores

Or request

invalid data

For example: causing incoherence (write through policy) in case invalidation

Running two different algorithms In the same time With the same data

Updating with Write through policy

Analysis(Scenario of Invalidation with write through )

For example: we apply bubble sort algorithm on core1 and Quick sort algorithm on core 2.

The array will be placed first in Main memory Then it sends all array that is inside two black from main memory to L2

cache and then each L1 cache has the same block. For example : In first (data access time), Quick sort on core 2 is

searching while bubble sort algorithm on core1 is swapping that means it wants to write, so core1 updates the value of all array in L2 cache and then main memory by using write through policy.

After that core1 sends request as broadcast on the bus snoopy to invalidation the same data on another core.

Hence the core 2 read miss the data , so it needs to update its data from L2.

That means cache coherence problem happens since each data access occurs. Two algorithms have fighting data on the same array that causes (duplicate data, losing data or wrong sort and flashing copies).

Contribution

Bubble sort algorithmQuick sort AlgorithmInsertion sort AlgorithmTrace

Contribution

Bubble Sort : compares the numbers in pairs from left to right

exchanging when necessary. the first number is compared to the second and as it is larger they are exchanged.

Contribution

Bubble Sort

Contribution

Quick Sort : Given an array of n elements (e.g., integers):If array only contains one element, returnElse

pick one element to use as pivot. Partition elements into two sub-arrays:

Elements less than or equal to pivot Elements greater than pivot

Quick sort two sub-arrays Return results

Contribution

Quick Sort :

Contribution

Insertion Sort :

Our Experiments

Simulating Parallel different sorting algorithms on two cores with the same data array to study the behavior of cache.

Case 0: Bubble sort vs. Insertion sortCase 1: Bubble sort vs. Quick sort Case 2: Quick sort vs. insertion sort

Our Millstones

Cases Implementation Caches Polices

Polices of Coherences

1 Bubble sort vs. Quick sort

Write through Invalidation Done

2 Insertion sort vs. Bubble sort

Write through Invalidation Done

3 Quick vs. Insertion sort Write through Invalidation Done

4 Insertion sort vs. Bubble sort Write back Invalidation Still

5 Bubble sort vs. Quick sort

Write back Invalidation Still

Our Experiments

In our experiment we studied the higher and lower levels cache behaviors

We measured the Hits and Misses rate in the Cache.

These measurements are appeared by incoherence that occurred as a result of applying invalidation policy.

Our Experiments (Parameters)

We used the same parameters on all cases

The input data: the same array• Coherence policy: Invalidate• Cache size: 64 byte• Block size: 32 byte• Numbers of cores: 2

Our Experiments

The data type: one dimension array with size of 64 bytes.

The Trace file is generated by the code .

The Bubble Sort trace file size= 126 KB. The Insertion Sort trace file size= 62 KB. The Quick Sort trace file size= 23 KB

0x00000003 1 50x00000003 0 5

Our Experiments (result)

3- Measuring Coherence misses rate for all cases on 2 cores

Coherence Miss = Coherence Write Miss + Coherence Read Miss

CasesCoherence

Bubble sort vs. Insertion sort 1574

Bubble sort vs. Quick sort 352

Quick sort vs. insertion sort 675

Our Experiments(chart)

Measuring Coherence misses rate for all cases on 2 cores

Bubble sort vs. Insertion sort Bubble sort vs. Quick sort Quick sort vs. insertion sort0

Coherence misses rate for all cases

Our Experiments (analysis)

This figure shows the incoherence in our simulator How ??? The incoherence happened because of the

invalidation policy. That happens because of each both algorithms fighting on the

same data As we see in the chart, the coherence misses rate is high in

the first case. Why ? The array of the bubble sort can be helpful or wasteful for

the insertion sort in the same case which increases the data accessed.

The insertion sort behaviors can increase or decrease iteration numbers of algorlthm sorting for the bubble sort because of the wrong sorting that caused by the fighting on the same data.

Our Experiments(result)

3- Measuring Read Coherence misses rate and Write Coherence misses rate for all cases on 2 cores

Read Coherence Misses Write Coherence Missescase 0 944 1352case 1 1736 5204case 2 700 1169

Our Experiments (chart)

Measuring Read Coherence misses rate and Write Coherence misses rate for all cases on 2 cores

Bubble Vs Insertion Bubble Vs Quick Quick Vs Insertion0

Coherence Read and WriteMisses Rate

Read Coherence Misses Write Coherence Misses

1-This figure shows the write coherence miss and read coherence miss for all cases in details for the previous coherence miss's figure

2- As we can see, write coherence miss is higher than read coherence miss. Why ?

3- Because of the incoherence that was caused by invalidation each algorithm did a lot of swapping

4- That happens because of each both algorithms fighting on the same data

Our Experiment (result)

1- Measuring Miss and Hit Rate with write through for all cases using invalidation

ALL CACES ( HIT/MISS)- WRITE Through

Hit Miss

Bubble Vs Insertion 14909 3151

Bubble Vs Quick 11042 707

Quick Vs Insertion 7136 1353

Our Experiments (Chart)

Measuring Hits and Misses Rate with right through for all cases using invalidation

16000 14909

7071353

Hits and Misses Rate - Write Through

Hit Miss

Measuring the Cache performance.The Hit rates in all cases are greater than the Miss

rates. The reason is: the write hit occurs more often; because of

swapping operation.

The higher rate of Hit is showed in the Bubble sort. This algorithm has more 'comparing and swapping' operations

than the other sorting algorithms, and it is not efficient algorithm.

Our Experiments (Result)

2- Measuring Hits and Misses Rate with write through on each core for each case using

invalidation (to show impact of the fighting on the same data on

higher level). Bubble Sort Vs Insertion Sort

Algorithms HITMIS

CORE 0 (Bubble Sort)6702 1826

CORE 1 (Insertion Sort)3048 1324

Bubble Sort Vs Quick Sort

Algorithms HIT MISS

CORE 0 (Bubble Sort)

7489 500

CORE 1 (Quick Sort) 1140 454

Quick Sort Vs Insertion Sort

algorithms HIT MISS

CORE 0 (Quick Sort) 7489 952

CORE 1 (Insertion Sort)

1140 400

Case 0 Case 1 Case 2

Our Experiments(charts)

Measuring Hits and Misses Rate with write through on each core for each case using invalidation

CORE 0 (Bubble Sort) CORE 1 (Insertion Sort)0

18261324

Hit and Miss Rate for case 0 (Bubble Vs Insertion)

HIT MISS

CORE 0 (Bubble Sort) CORE 1 (Quick Sort)0

8000 7489

1140500 454

Hit and Miss Rate for case 1 (Bubble Vs Quick)

HIT MISS

CORE 0 (Quick Sort) CORE 1 (Insertion Sort)0

8000 7489

1140952400

Hit and Miss Rate for case 2 (Bubble Vs Quick)

HIT MISS

These figures show the high rate of hits and misses in cache for each case.

That happens because of each both algorithms fighting on the same data on the higher level.

The hits rate and misses rate occurred by incoherence that was caused by invalidation.

The array of the first algorithm can be helpful or wasteful for the second algorithm in the same case. It can reduce or increase the data accesses .

The first algorithm behaviors can increase or decrease iteration numbers of sorting for the second algorithm because of the wrong sorting that caused by the fighting on the same data.

Evaluation

In our evaluation we studied the impact of varying parameters on cache optimization :

Different block size with constant cache size To measure the coherence misses. To measure the hits and misses rate in both levels of

cache.

Different cache size with constant block size To measure the conflict misses in both levels of cache .

Evaluation (block size)

Bubble sort

vs. In

on sort

Bubble sort

uick so

Quick so

rt vs.

rtion so

Block Size= 32 Cache Size= 64

Block Size= 64 Cache Size= 64

Evaluation(analysis)

We increased the parameter value for block size and we constant the cache size with write through for all cases. Why ? to study the impact of causing incoherence

These two figures show the increase of coherence misses since we increased the block size .Why ?

Because we put the same data that means the invalidation applied many times which showed the fighting of data

The block size vector parameter plays the role of impacting on cache.

Evaluation (Block size)

1600014909

7071353

Hits and Misses Rate - Write ThroughBlock Size= 32 Cache Size= 64

Hit Miss

1600014909

13921824

Hits and Misses Rate - Write ThroughBlock Size= 64 Cache Size= 64

Hit Miss

Axis Title

Evaluation(analysis)

We increased the parameters values for block size with constant cache size with write through for all cases. Why ? to study the impact of this change on the cache optimization.

These two figures show the same hits rate and increase the misses since we increased the block size.

Because increasing of the invalidation message

Evaluation (Block size)

CORE 0 (Bubble Sort) CORE 1 (Quick Sort)0

80007489

500 454

Hit and Miss Rate for case 1 (Bubble Sort Vs Quick Sort)

Block Size=32 byte L1 Cache Size= 64 byte

HIT MISS

Bubble Quick0

80007347

940537

Hit and Miss Rate for case 1 (Bubble Sort Vs Quick Sort)

Block Size= 64 byte L1 Cache Size= 64 byte

Hit Miss

Evaluation (analysis)

We increased the parameters values for block size and we constant the cache size with write through for specific cases. Why ? to study the impact of this change on the higher level optimization.

These two figures show the different of hits and increased misses since we increased the block size .

Here we studied each algorithms behavior as we see hits of the bubble is high because has much more cycles than another which clearly seen in its trace file

The block size vector parameter plays the role of impacting on L1 cache .

Evaluation (Conflict VS. Cache Size)

Bubble sort vs. Inser-tion sort

Bubble sort vs. Quick sort

Quick sort vs. insertion sort

Conflict Misses Rate for All caseBlock Size= 32 Cache Size= 64

Bubble sort

vs. In

on sort

Bubble sort

uick so

Quick so

rt vs.

rtion so

Conflict Misses Rate for All caseBlock Size= 32 Cache Size= 32

Evaluation (analysis)

These two figures distinguish that when the cache size increased, the conflict miss rate will be decreased. Why ?

In the figure 1, the block size is 32 bytes and when the level 1 cache size is equal to block size , the conflict miss rate will be increased since the cache size fits to only one block

In the figure 2, In this case, the block size is 32 bytes and when the level 1 cache size is twice the block size , the conflict miss rate will be decreased since the cache size fits with number of blocks.

Related Work

the effect of false sharing on parallel algorithm performance occurs depending on many factors such as block size, access pattern and coherence polices[9]

The impact of false sharing to be main vector in performance among the optimal policy that uses traditional coherence policies with the new merge facility[9]

Conclusion

Coherence is needed to meet an architectural assumption held by the software designer.

flashing data extra time prevents data losing and duplicate data and fixes performance of cache.

Invalidation message increases when we changes block size with static cache size.

Future Work

Using Update coherence policy with write through and write back.

Executing algorithms parallel on more cores and counting the false sharing

Using the large data size.

Challenges

Clarifying the project idea to the class. First time to simulate the caches in software.We had a large effort for implementation with

short time.The write bake and the Quick sort algorithm

were too complicated. We read a lot of papers to find a related work

to our project; because of our big area.

What did we learn ?

Reacting of Architecture with software How to pick up small feature to make it big

research. How to make a big project from a specific

feature. The comprehensive questions from labs

assignment, we have learned how to analyze our simulation performance.

References

[1] Prabhu, Gurpur M. "COMPUTER ARCHITECTURETUTORIAL." Computer Architecture Tutorial. 2 Feb. 2003. pubesher. 05 Apr. 2014 http://www.cs.iastate.edu/~prabhu/Tutorial/title.html.[2-12]

[2] Gita Alaghband (2014) CSC 5593 Graduate Computer Architecture Lecture 2[2-12].

[3] Shaaban, Muhammed A. "EECC 550 Winter 2010 Home Page." EECC 550 Winter 2010 Home Page. 27 Nov. 2010. RIT. 07 Apr. 2014.[10-25]. <http://people.rit.edu/meseec/eecc550-winter2010/>.[2-17]

[4] Guanjun Jiang; Du Chen; Binbin Wu; Yi Zhao; Tianzhou Chen; Jingwei Liu, "CMP Thread Assignment Based on Group Sharing L2 Cache," Scalable Computing and Communications; Eighth International Conference on Embedded Computing, 2009. SCALCOM-EMBEDDEDCOM'09. International Conference on, vol., no., pp.298, 303, 25-27 Sept. 2009[13-17]

[5] Kruse and Ryba (2001). Mergesort and Quicksort [42-72]. Retrieved from www.cs.bu.edu/fac/gkollios/cs113/Slides/quicksort.ppt[16-17]

[6] Wei Zhang (2010). Multicore Architecture [ 73-81]. Retrieved from https://www.pdffiller.com/en/project/16525498.htm?form_id=11909329.[2-12]

[7] J. Hennessy, D. Patterson. Computer Architecture: A Quantitative Approach (4th ed.). Morgan Kaufmann, 2011.[2-12]

[8] D. Patterson, J. Hennessy. Computer Organization and Design (5th ed.). Morgan Kaufmann, 2011.[2-12}

[9] W. Bolosky and M. Scott. False sharing and its effect on shared memory performance. In Proceedings of the USENIX Symposium on Experiences with Distributed and Multiprocessor Systems (SEDMS IV), San Diego, CA, September 1993.

PRESENTED BY MOHAMMED ALHARBI MOHAMMED ALEISABAKRI AWAJI Causing Incoherencies Parallel sorting...

Documents

Welcome ADDrESS - CCGKhaled Abdel Wahab Khaled Abdel Aziz Khaled Zalata Manal Abdel Hamed Maryan Waheeb Mohammed Ali Mohammed Awad Mohammed Nasr Mohammed Sarhan Mona Taalab Mouneer

Institute of Engineering and Technology · mohammed khasim hussain mohammed abdul khader mohammed fareed khan mohammad ghouse khan mohammed mudassirali mohammed qutubuddin o- rdinator

Mohammed laqqan RBCs Abnormal morphology. Mohammed laqqan Blood Cells

Economic Recovery after the Great Hanshin-Awaji Earthquake

Lessons Learned From The Great Hanshin-Awaji Earthquake Case

Great Hanshin Awaji Earthquake_paper

On the Incoherencies in Web Browser Access …...On the Incoherencies in Web Browser Access Control Policies Kapil Singh∗, Alexander Moshchuk†, Helen J. Wang† and Wenke Lee∗

General Surgery Dr. Mohammed Najm Mohammed Tameemi

CURRICULUM VITAE MOHAMMED KAMIL MOHAMMED… · CURRICULUM VITAE MOHAMMED KAMIL MOHAMMED, Ph.D. ... Malaysia 12-18 SEP 2008 ... International Conference (CUTSE 2008)

Mohammed Mohammed Zaki IT Essential 1 Certificate

Mohammed Soud Mohareb 120051649 Mohammed Ismail Al- Feqawi 120050676 1

Rosario College documents.pdfmohammad riyaz mohammad thameem mohammad zareer abbas ali mohammed afreed a l mohammed inayath mohammed kausar mohammed musthafa samsheer k p mohammed

Earthquake Hanshin-Awaji 1995

Mohammed Alqahtani. Mohammed Alqahtani - CS691 Summer2011

AWAJI Catalogue

An English Indigo Dyer Awaji Island

Dr. Mohammed Abdalla Ayoub’s Resume - UofKstaffpages.uofk.edu/mohammed-abdalla/wp-content/uploads/sites/557/... · Dr. Mohammed Abdalla Ayoub’s Resume PERSONAL DETAILS Name: Mohammed

Impacts of the Great Hanshin-Awaji Earthquake on …...Impacts of the Great Hanshin-Awaji Earthquake on the Labor Market in the Disaster Areas 43 Health, Labour and Welfare announced

A Smart Community Reflecting Residents’ Wishes: The Awaji ... · the Awaji Green Future Project as an example of our Field Innovation activities. 1. Introduction Japan is currently

DR. HEBA MOHAMMED MOHAMMED SALEMscholar.cu.edu.eg › hebasalem › files › rabbit_pasteurellosis_1.pdf · HEBA MOHAMMED MOHAMMED SALEM LECTURER OF POULTRY DISEASES. Pasteurellosis