Aca 2

System Attributes to Performance

22/9/2012

CPU/Processor driven by-

A clock with a constant cycle time (τ) in nSecondClock Rate: f = 1/ τ in megahertz Ic- Instruction Count: Size of program/number of

machine instructions to be executed in the program.Different machine instructions needed- different no. of

clock cycles to executeCPI (Cycles per Instruction): Time needed to execute

each Instruction.Average CPI: For a given Instruction Set.

Performance Factors:

CPU Time (T): -Time needed to execute a Program.- in seconds/program

T = CPU Time = Ic * CPI * τ

Execution of an instruction going through a cycle of events :

Instruction fetchDecodeOperand(s) fetchExecutionStore results

Events Carried out in the CPU:Instruction decodesExecution phasesRemaining three required to Access the memory.

Memory Cycle :

Time needed to complete one memory reference.

Note- Memory cycle is k times processor cycle τ.

k depends upon speed of memory technology.

System Attributes Influence on Performance Factor (Ic, p, m, k, t):

1. Instruction-set architecture-

Affects the program length (Ic) and processor cycle needed (p)

2. Compiler Technology-

Affect value of Ic, p, m

3. CPU Implementation & Control-

Determine total processor time (p * τ)

4.Cache & Memory Hierarchy-

Affect the memory access latency (k*τ)

System Attributes

Performance FactorsInstr. Count

Avg. Cycles per Instruction, CPI Processor Cycle Time

τProcessor Cycles per instruction

Memory Reference/

Instruction, (m)

Memory Access

Latency, (k)

Instruction-Set Architecture

Compiler Technology

Processor Implementation

& ControlX X

Cache & Memory

HierarchyX X

MIPS Rate: Million Instructions per Second

C = Total no. clock cycle needed to execute a Program

T = C * τ = C/f

CPI = C/Ic

T = Ic * CPI * τ = (Ic * CPI)/f

Throughput Rate (Ws):

No. of programs a system can execute per unit Time.

Ws = Program/Second

Note:- In Multiprogrammed system, System throughput (Ws) is often lower than CPU throughput Wp.

Wp = f/ (Ic * CPI)

= 1/ Ic * CPI * τ = 1 Program/T

Ws = Wp

If the CPU is kept busy in a perfect program-interleaving fashion

Two approaches to parallel programming :

Sequential Coded Source Program

Detect Parallelism & Assign target Machine Resources

Note:- Compiler Approach applied in programming Shared-Memory Multiprocessors

•Parallel Dialects of C……•Parallelism specified in user Program

Note:- Approach applied in Multicomputer

Parallel Computers Architectural Model/ Physical Model

Distinguished by having-

1. Shared Common Memory:

Three Shared-Memory Multiprocessor Models are:

i. UMA (Uniform-Memory Access)

ii. NUMA (Non-Uniform-Memory Access)

iii. COMA (Cache-Only Memory Architecture)

2. Unshared Distributed Memory

i. CC-NUMA (Cache-Coherent -NUMA)

UMA Multiprocessor Model

Physical memory is uniformly shared by all the processors.

All Processors have equal access time to all memory words, so it is called Uniform Memory Access.

Peripherals are also shared in some fashion.

Also called Tightly Coupled Systems -due to the high degree of resource sharing.

Symmetric Vs Asymmetric Multiprocessor

Symmetric Multiprocessor: All processors have equal access to all peripheral devices.

Asymmetric Multiprocessor:

Only one or a subset of processors are executive capable.

i. MP (Executive or Master Processor)-

Can execute the O.S. and handle I/O

ii. AP (Attached Processor)-

No I/O capability

AP execute user codes under Supervision of MP

NUMA Multiprocessor ModelShared-Memory SystemAccess Time varies with the location of the Memory

WordLocal Memories (LM): Shared Memory is physically

distributed to all processorsGlobal Address Space: Forms by collection of all

Local Memories (LM) that is accessible by all processors.

Faster Access to a local memory with a local processorSlow Access to remote memory attached to other

processors due to the added delay through the interconnection network

LM – Local MemoryP - Local Processor

P – ProcessorCSM – Cluster Shared MemoryCIN – Cluster Interconnection NetworkGSM – Global Shared Memory

UMA or

(Access of Remote Memory)

Three Memory-Access Patterns when Globally Shared Memory (GSM) added to a multiprocessor system:

i. The fastest is Local Memory(LM) accessii. The next is global memory (GSM)accessiii. The slowest is access of Remote Memory

Remote Memory- LM attach to other processor

Note: All cluster have equal access to GSMAccess right among Intercluster memories can be specified.

COMA Multiprocessor Model

• Distributed Main Memory converted to Cache

•Cache form Global Address Space

•Remote Cache access by – Distributed cache Directories

C – CacheP – ProcessorD - Directories

Multiprocessor System Suitable for-General purpose Multiuser ApplicationsProgrammability is major concern

Shortcoming of Multiprocessor System-Lack of ScalabilityLimitation in Latency Tolerance for Remote Memory

Access

Mini – Super

Computer

Near- Super

Computer

MPP Class

Distributed-Memory Multicomputers

Nodes- Multiple Computer in SystemInterconnection by Message-Passing NetworkNode is an Autonomous Computer consists of:

ProcessorLocal memorySometimes attached Disks Sometimes attached I/O Peripherals

Message-passing network provide: Point-to-point Static connection among nodes

Local Memories(LM)- private (accessible only by Local Processor)

NORMA(No-remote-memory-access)-traditional multicomputer

Fig:- Generic model of a message-passing multicomputer

M – Local MemoryP - Processor

Parallel Computers: SIMD or MIMD configuration

SIMD- For special purpose applicationsCM 2 (Connection Machine) on SIMD architecture

MIMD- CM 5 on MIMD architectureHaving globally shared virtual address space

Scalable multiprocessors or multicomputer:

use distributed shared memory

Unscalable multiprocessors:

use centrally shared memory

Fig:- Gordon Bell's taxonomy of MIMD computers.

Supercomputer Classification:

Pipelined Vector machine/ Vector Supercomputers-

*Using a few powerful processors equipped with vector hardware

*Vector Processing

SIMD Computers / Parallel Processors-

*Emphasizing massive data parallelism

Vector Supercomputers

Step 1-2 Program & data are first loaded into the Main Memory through a Host computer.

Step 3 All instructions are first decoded by the Scalar Control Unit.

Step 4 If the decoded instruction is a scalar operation or a program control operation, it will be directly

executed by the scalar processor using the Scalar Functional Pipelines.

Step 5 If the instructions are decoded as a Vector operation, it will be sent to the vector control

Step 6 Vector control unit will supervise the flow of vector data between the main memory and

vector functional pipelines.

Note: A number of vector functional pipelines may be built into a vector processor.

SIMD Supercomputers

CU- Control UnitPE- Processing ElementLM- Local MemoryIS- Instruction StreamDS- Data Stream

(Abstract Model of a SIMD computer)

(Operational model of SIMD computer)

SIMD Machine Model:

An operational model of an SIMD computer is specified by a 5-tuple:

M = <N , C , I , M , R>(1) N = No. of Processing Elements (PE) in the

machine. (2) C =Set of instructions directly executed by the control

unit (CU). Scalar & Program Flow Control Instructions.(3) I = Set of instructions broadcast by the CU to all

PEs for parallel execution.Include: Arithmetic, logic, data routing, masking,

and other local operations executed by each active PE over data within that PE.

(4) M = Set of Masking Schemes

Each mask partitions the set of PEs into enabled and disabled subsets.

(5) R = Set of data-routing functions

Specifying various patterns to be set up in the interconnection network for inter-PE communications.

Aca 2

Technology

ACA Chapter3

ACA Enrollment Update for Utah - Utah Health Policy Project · 2016-04-05 · ACA Enrollment Update for Utah A Policy Memo for Utah Media MEDIA 2/2016 . Utah’s ACA sign-ups exceeded

ACA Reporting and Product Alignment Designed to Manage ACA ... · ACA Reporting and Product Alignment Designed to Manage ACA Costs Hosted by : Steve Rosenthal President & CEO Triton

ACA student training guide | ACA training | ICAEWmemberfiles.freewebs.com/80/17/72011780/documents/aca-student-tr… · 2 ACA STUDENT TRAINING GUIDE ... training agreement matters,

1. ACA Alderwood Commonsc21hoa.com/new/wp-content/uploads/2015/01/2016-AUG-ACA...1. ACA Alderwood Commons- - - - - - - - - - 1 Reports for Aug 2016 CENTURY 21 Turner Properties 2 Alderwood

ACA: YEAR 2

Advanced CPP Lecture 2- Summer School 2014 - ACA CSE IITK

ACA Chapter4

HR - 2 -ACA Panel Presentation · PDF file• Value based Model Quality & Transparency ... • Cost savings through consistent plan design and ... HR - 2 -ACA Panel Presentation Author:

ACA Survey

Vista ACA Setup and Processing - Amazon S3 ACA 2… · presentation. Information and practices offered by Viewpoint regarding the use of Vista to meet ACA reporting requirements is

ACA HEALTHCARE PLAN ANNOUNCEMENT - USA Staffing€¦ · ACA HEALTHCARE PLAN ANNOUNCEMENT The ACA (Affordable Care Act/”Obamacare”) requires that we offer an approved ACA healthcare

ACA Reporting Forms - s3-us-west-2.amazonaws.coms3-us-west-2.amazonaws.com/.../ACA_Reporting.pdf · ACA Reporting Forms Final IRS Instructions for Employer Reporting on Forms 1094-C

Revista De por aca (vol. 2)

ACA Technology - Payroll Network Inc.ACA Technology User Guide PAGE 4 LAST UPDATED: 12/2/2016 Standard ACA Reports ACA Large Employer Compliance Test The first step in determining

ACA unit 1& 2

Aca illustrator domain 2

ACA 2011 SERU Presentation (2/9/11)

Aca photoshop domain 2

Latinos and the ACA - Samuel Merritt UniversityLatinos and the ACA •My Story •ACA •Latinos and ACA . Colton, California . ... Public Health and Obamacare Changing the landscape