Download pdf - 426Lecture1_Fall2015

ECSE 426 – Fall 2015 Microprocessor Systems

Dept. Electrical and Computer Engineering

McGill University

ECSE 426, Lecture 1 2

Instructor & TAs

•  Prof. Mark Coates

•  Office: McConnell Eng Bldg, Room 759 •  Phone: 514-398-7137 •  Email: [email protected] •  Office hours: Mon. 11:30-12:30 or by appointment

•  Teaching Assistants •  Ashraf Suyyagh, Zaid Al-bayati, Harsh Aurora •  Andrey Tolstikhin, Loren Lugosch

Mark Coates, Fall 2015


Course Format

•  Lectures: •  Monday, 10:05 AM - 11:25 AM, Trottier 0060

•  Tutorials: •  Friday, 11:35 AM – 12:55 PM, Trottier 0060 •  Start: Sep. 11

•  Labs •  Monday-Friday, Trottier 4160 •  Demos on Friday (usually, although earlier options) •  TAs available for 2-3 hours Mon.-Thurs. afternoon (calendar in

myCourses) •  Start: Sep. 21 (Tutorial on Sep. 18)



Other Logistics

•  Course communications all through myCourses •  Lectures, labs, manuals, discussion, … •  Respect others, use common sense

•  If you do feel the need to email me directly about course-related matters, please include “ECSE-426” in the subject line

•  Lab Groups and Room Access: •  The 4 experiments will be conducted in pairs. •  Reserve time slots for demonstrations. •  The project will be conducted in groups of 4.



(No) Textbook; References

•  No textbook. Class notes, manuals. •  Useful:

•  J. Yiu, The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors, Newnes, 2013.

•  A. Tanenbaum and T. Austin, Structured Computer Organization, sixth edition, Prentice Hall, 2012.



Grading •  Labs – 48%

•  4 labs in pairs of two : demonstrations and reports •  Late reports: 5 percent per day penalty (Fri-Mon: one day) •  Missed demo: reschedule for 65 percent of grade

•  Project – 40% •  Group of 4: demonstration and report

•  Quizzes – 12% •  Four 15 minute quizzes in class •  Short answer & multiple choice •  covers current lab and tutorial and recent lectures



Grading •  Demos

•  performance, robustness, code quality, performance testing •  individual grade for the demonstration

•  Reports or Lab notes •  concise but comprehensive, 1 per group •  detailed report guidelines will be posted. Follow these closely. •  common grade for reports

•  Generally grades of group members will be very similar

•  Differentiation: response to questions or quality of components for which each member is responsible.



Academic Integrity

•  McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures.

•  See http://www.mcgill.ca/integrity/ for more information.

•  What does this mean for this course? •  Feel free to discuss solutions with classmates •  And consult and use online resources but… •  Do your own programming & write your own reports


Avoid Plagiarism

•  Please make sure to reference any text, code, or online resources you use to develop your solution

•  If you reproduce any figures, clearly state this in the figure caption:

“Reproduced from [1]”

•  If you re-use code from another source, clearly indicate this in your code with appropriate comments

Mark Coates, Fall 2015 ECSE 426, Lecture 1 9


What Is This Course About?


10-Sep-15 ECSE 426 Microprocessor Systems

Microprocessors

•  Enabling technology for general purpose computers and embedded systems •  Many, many applications

•  General purpose computers

•  PCs, workstations, servers, supercomputers

•  Embedded systems •  Phones, medical, aviation, automobile, industrial •  Real-time systems


Course Goals •  Provide the necessary understanding and skills to design and build

microprocessor systems. •  By the end of the course, you should:

•  understand the organization and design principles of modern microprocessor-based systems;

•  be proficient in assembly and high-level (C language) programming for embedded systems;

•  understand the performance impact of the embedded software, including the energy and memory-limited design techniques;

•  know how to connect peripheral devices and networking interfaces, and how to write programs for the efficient interface use;

•  have experience in developing a realistic embedded system solution through teamwork;



Course Goals (Restated) •  Understand microprocessor-based systems •  Become familiar with basic development tools •  Develop skills in machine interfacing, assembler and embedded C

programming

•  Design a sizeable embedded system •  Previous projects: Music player, file swapping system, PDAs (with

handwriting recognition), wireless data collection systems •  Our project: indoor tracking

•  Build teamwork skills



Lecture Structure

•  Background •  Computer architecture basics

•  Microprocessor Instruction Set Architecture •  Embedded Processors •  Embedded System Design

•  Hardware and software techniques

•  Building Real Systems •  Techniques and tools


Lab Structure

•  Four experiments + final project. •  Experiment 1 : Assembly and C •  Experiment 2: Intro. to hardware interfacing; drivers, timing … •  Experiment 3: I/O, Interrupts, Servo-motors, Advanced Sensor Use •  Experiment 4: Real-time OS, Networking

•  Project: •  Tracking through dead-reckoning •  Drawing on a map •  Wireless communication •  LCD display, keypad interface

Class Schedule


ECSE 426 – Microprocessor Systems Fall 2015

2 September 2015 3 of 3

Schedule of Lectures and Labs There are (on average) per week 1 lecture hour, 1 tutorial hour, 4 lab hours, and 3 preparation hours associated with this course. Over the course of the semester there will be 10 lectures, 5 tutorials, 4 labs and a project.

Week Lecture Material Tutorials

Labs and Project

1 (Sep 7) Introduction Tutorial A Assembly and C

Form lab groups

2 (Sep 14) Assemblers, Lab Intro Tutorial 1 Introduction to IDE and

assembly

3 (Sep 21) Linker, loader, processor architecture

Lab 1

4 (Sep 28) Processor Microarchitecture; Q1

Tutorial 2 - Introduction to embedded C, IDE and drivers

Lab 1 and Demo

5 (Oct. 5) Embedded Processors Lab 2

6 (Oct 12) IO/Processor Interfacing; Q2

Tutorial 3 - Introduction to timers, interrupts and MEMS

Lab 2 and Demo

7 (Oct 19) Buses, Networking, Operating System; Q3

Lab 3

8 (Oct 26) Embedded OS Services

Tutorial 4 – Real time Operating Systems

Lab 3 and Demo

9 (Nov 2) Real-time processing; Q4

Tutorial 5 – Wireless and writing drivers

Lab 4 and Demo

10 (Nov 9) Project Intro Project

11 (Nov 16) Project 12 (Nov 23) Project

13 (Nov 30) Project

14 (Dec 7) Project Demo As with any plan, this schedule is subject to some change.

Class Schedule


ECSE 426 – Microprocessor Systems Fall 2015

2 September 2015 3 of 3

Schedule of Lectures and Labs There are (on average) per week 1 lecture hour, 1 tutorial hour, 4 lab hours, and 3 preparation hours associated with this course. Over the course of the semester there will be 10 lectures, 5 tutorials, 4 labs and a project.

Week Lecture Material Tutorials

Labs and Project

1 (Sep 7) Introduction Tutorial A Assembly and C

Form lab groups

2 (Sep 14) Assemblers, Lab Intro Tutorial 1 Introduction to IDE and

assembly

3 (Sep 21) Linker, loader, processor architecture

Lab 1

4 (Sep 28) Processor Microarchitecture; Q1

Tutorial 2 - Introduction to embedded C, IDE and drivers

Lab 1 and Demo

5 (Oct. 5) Embedded Processors Lab 2

6 (Oct 12) IO/Processor Interfacing; Q2

Tutorial 3 - Introduction to timers, interrupts and MEMS

Lab 2 and Demo

7 (Oct 19) Buses, Networking, Operating System; Q3

Lab 3

8 (Oct 26) Embedded OS Services

Tutorial 4 – Real time Operating Systems

Lab 3 and Demo

9 (Nov 2) Real-time processing; Q4

Tutorial 5 – Wireless and writing drivers

Lab 4 and Demo

10 (Nov 9) Project Intro Project

11 (Nov 16) Project 12 (Nov 23) Project

13 (Nov 30) Project

14 (Dec 7) Project Demo As with any plan, this schedule is subject to some change.


Intro Material



Computer Architecture

•  Application of design principles on state-of-art architecture •  ARM Cortex M processor family

•  The course focuses primarily on experimental work.

•  Present microprocessor principles mainly by example



Applications •  Deciding Factors: cost, size, power, quantity



Example: Camera


•  Computer system with: •  Image control •  Hardware (lenses, motors) •  Interfaces

•  Added sophistication to consumer electronics •  Expandability (of functions) •  Connectivity

Userswitches

Computerinterface

Imagestorage

LCDscreen

Flashunit

A/Dconversion

Opticalsensors

Systemcontroller

Motor

Lens

Figure 9.2. A simplified block diagram of a digital camera.

Cable to PC


Views of Computers: Levels of Abstraction

•  Logic Level - Circuits •  Logic functions implemented by gates (interfaces, buses, etc)

•  Architectural Level - Microarchitecture •  Operations performed by resources (ALUs, registers, etc)

•  Instruction Set Level - Instructions •  Program execution

•  Operating System Level - Complete system •  System operation



Layered Computer Architecture


Microarchitecture Level

Assembly Language Level

Operating System Machine Level

Instruction Set Architecture Level

Problem-oriented Language Level

Digital Logic level Hardware

Interpreter

OS - Partial Interpretation

Translation (Assembler)

Translation (Compiler)

temp := v[k]; v[k] := v[k+1]; v[k+1] := temp;

lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2)

0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111

Mark Coates, Fall 2015 ECSE 426, Lecture 1

Computer Organization

•  Processor •  Microprocessor

•  Memory •  Peripherals •  Common Bus

24

Reproduced from Tanenbaum & Austin

ECSE 426, Lecture 1

Microprocessor Operation Reset

Fetch

Decode

Execute

Start here at power-on or when a reset signal is received

1.Output inst. address on address bus

2. Read inst. pattern from memory onto data bus

3. Increment inst. pointer (program counter)

Determine what type of instruction was fetched

1. If necessary, read data from memory

2. Execute instruction

3. if necessary, write results to memory

Repeat this process until power is turned off or the processor is halted.

25


Common Processors

•  Mainly von Neumann architecture •  Arithmetic-logic unit •  Registers •  Auxiliary registers

26



Processor Execution - Java code public class Interp {

static int PC; // program counter holds address of next instr static int AC; // the accumulator, a register for doing arithmetic static int instr; // a holding register for the current instruction static int instr_type; // the instruction type (opcode) static int data_loc; // the address of the data, or －1 if none static int data; // holds the current operand static boolean run_bit = true; // a bit that can be turned off to halt the machine

public static void interpret(int memory[ ], int starting_address) {

PC = starting_address; while (runbit) {

instr = memory[PC]; // fetch next instruction into instr PC = PC + 1; // increment program counter instr_type = get_instr_type(instr); // determine instruction type data_loc = find_data(instr, instr_type); // locate data (－1 if none) if (data_loc >= 0) // if data_loc is －1, there is no operand data = memory[data_loc]; // fetch the data execute(instr_type, data); // execute instruction }

} private static int get_instr_type(int addr) { ... } private static int find_data(int instr, int type) { ... } private static void execute(int type, int data){ ... }

}

27


Processor Execution - Java code public class Interp {

static int PC; // program counter holds address of next instr static int AC; // the accumulator, a register for doing arithmetic static int instr; // a holding register for the current instruction static int instr_type; // the instruction type (opcode) static int data_loc; // the address of the data, or －1 if none static int data; // holds the current operand static boolean run_bit = true; // a bit that can be turned off to halt the

// machine

}

28


Processor Execution - Java code

public static void interpret(int memory[ ], int starting_address) { PC = starting_address;

while (runbit) { instr = memory[PC]; // fetch next instruction into instr PC = PC + 1; // increment program counter instr_type = get_instr_type(instr); // determine instruction type data_loc = find_data(instr, instr_type); // locate data (－1 if none) if (data_loc >= 0) // if data_loc is －1, there is no operand data = memory[data_loc]; // fetch the data execute(instr_type, data); // execute instruction }

} private static int get_instr_type(int addr) { ... } private static int find_data(int instr, int type) { ... } private static void execute(int type, int data){ ... }

29


Registers

Registers

Microprocessor

Instruction Cache

Arithmetic & Logic

Unit

Control Unit Bus

Interface Unit

Data Cache

Instruction Decoder

I/O

RAM

Memory Bus

System Bus

Floating Point Unit

30


Bus Interface Unit

•  Receives instructions & data from main memory

•  Instructions are then sent to the instruction cache, data to the data cache

•  Also receives the processed data and sends it to the main memory

31


Instruction Decoder

•  Receives the programming instructions

•  Decodes them into a form that is understandable by the processing units, i.e., the ALU or FPU

•  Passes on the decoded instruction to the ALU or FPU

32


Arithmetic & Logic Unit (ALU)

•  Also known as the “Integer Unit”

•  Performs: •  whole-number calculations (subtract, multiply, divide, etc.) •  comparisons and logical operations (NOT, OR, AND, etc.)

•  More recent microprocessors: •  multiple ALUs that can do calculations simultaneously

33


Floating-Point Unit (FPU)

•  Also known as the Numeric Unit

•  Performs calculations on numbers in scientific notation (floating-point numbers)

•  Floating-point calculations are required in graphics,

engineering and science

•  The ALU can do these calculations as well, but very slowly

34


Registers •  Small amount of super-fast private memory placed

right next to ALU & FPU for their exclusive use

•  The ALU & FPU store intermediate and final results from their calculations in these registers

•  Processed data goes back to the data cache and then to main memory from these registers

35


Control Unit

•  The brain of the microprocessor

•  Manages the whole uP

•  Tasks •  fetching instructions & data •  storing data •  managing input/output devices

36


Enhancing the capability of a uP?

The computing capability of a uP can be enhanced in many different ways:

•  Increasing the clock frequency

•  Increasing the word-width

•  More effective caching algorithm and the right cache size

•  Adding more functional units (e.g. ALU’s, FPU’s, etc.)

•  Improving the architecture

37

Basic Arch. Concepts - Pipelining •  Makes processor run at high clock rate

•  But might take more clock cycles

•  Trick: overlap execution •  Some overhead with the first few instructions

38 Reproduced from Tanenbaum & Austin

Photo © ALCE \ Fotolia.com


Pipelining - Reference •  Pipeline

•  Connect data processing elements in series •  Output of one element is input of the next •  Execute in parallel (using time-slices)

•  Pipelining effects •  Does not decrease processing time for a single datum •  Increases throughput of the system when processing a stream of data •  Using many pipelining stages causes increase in latency

•  More resources (processing units, memory) than when executing one branch at the time •  Stages cannot reuse resources of previous stage •  Pipelining may increase the time required for an instruction to finish

39


Other Speedups – Multiple Units

•  Bottlenecks – execution in single pipeline units •  ALU, especially floating point

•  Resolution – provide multiple units

40



Superscalar Architectures

•  Common solution for modern processors •  Multiple execution units

41



Multiple-core architectures

42

Intel slide


Multiple-core architectures

43

Intel slide


Memory

•  Hierarchy of memory units •  Speed vs. size •  Solutions

•  Caching •  Virtual memory

44



The Main Memory Bottleneck

•  Modern uPs can process a huge amount of data in a short duration

•  They require quick access to data to maximize their performance

•  Data unavailable literally stop and wait – this results in reduced performance and wasted power

•  Current uPs can process an instruction in about a ns.

•  To fetch data from main memory (RAM): order of 10-100 ns

45


Solution to the Bottleneck Problem •  Make the main memory faster

•  Problem: 1-ns memory is extremely expensive as compared with currently popular 10-100 ns memory

•  Alternative: •  Add a small amount of ultra-fast RAM right next to the uP on the same chip •  Make sure that frequently used data and instructions resides in that ultra-

fast memory

•  Advantage: Much better overall performance due to fast access to frequently-used data and instructions

46


On-Chip Cache Memory (1)

•  On-Chip Cache Memory •  Small amount of memory located on the same chip as the uP •  May be multiple levels of caches

•  The uP stores a copy of frequently used data and instructions in its cache memory

•  When the uP wants some data: •  checks in the cache first. •  only then does the uP ask for the same data from the main memory

47


On-Chip Cache Memory (2)

•  Small size and proximity to the uP •  access times short boost in performance

•  Predict what data will be required for future calculations •  pre-fetch that data and place it in the cache •  available immediately when the need arises

•  Speed-advantage of cache memory •  Depends heavily on caching algorithm

48


Expanded View of the Memory Systems

Control

Datapath

Hard disk (Virtual Memory)

Processor

Register

Main Memory 2nd Cache

Cache

Slowest Biggest Lowest

•  Cache is handled by hardware •  Virtual memory is handled by OS •  Programmer sees only one memory and registers

Speed: Faster Size: Smaller Cost: Higher

49


Memory Organization - Standards

•  Computer Word •  Basic unit of access

•  The same memory can be accessed in different ways

50



Little Endian vs. Big Endian

•  Matter of

preference •  Significant

implications for compatibility

•  Some processors can have both

51



Standardization –ASCII set

•  Standardized way to use bits for encoding •  Characters •  Display •  Communication •  File

52


Programmer’s Model of Microprocessor Instruction Set:ldr r0 , [r2, #0]add r2, r3, r4

Memory: 80000004 ldr r0,[r2,#0] 80000008 add r2, r3, r4 8000000B 23456 80000010 AEF0

Memory mapped I/O80000100 input80000108 output

Registers: r0 - r3, pc

Programmer’sModel

Addressing Modes:ldr r12, [r1,#0]mov r1, r3How to access data in registers and memory

Software Build and Load

•  Typical flow for desktop computer

Compiler

Assembler

Linker

Loader

Read-Write Memory (RAM)

Boot Process

Object Files Executable Image File

Run-Time Library:

Operating System Image:

Example Program Creation & Run

Mainmemory

I/O bridgeBus interface

ALU

Register fileCPU

System bus Memory bus

Disk controllerGraphics

adapterUSB

controller

Mouse Keyboard Display Disk

I/O bus Expansion slots forother devices suchas network adapters

hello executable stored on disk

PC

55


Reading “Hello” Command

Mainmemory


ALU

Register fileCPU



adapterUSB

controller




PC

User types"hello"

56

Loading “Hello” Program

Mainmemory


ALU

Register fileCPU



adapterUSB

controller




PC

hello code

"hello,world\n"

57

Finally -Program Running

Mainmemory


ALU

Register fileCPU



adapterUSB

controller




PC

hello code

"hello,world\n"

"hello,world\n"

58

Instruction Set Architecture

•  Interface between HW and SW •  Virtual Machine

•  Many possible implementations

•  Given by •  Resources

•  Processor Registers •  Execution Units

•  Operations •  Instruction Types •  Data Types •  Addressing Modes

Address Bus

Control Bus

Data Bus

CPU

Memory

Operations performed

Operands and results stored

i.e., where and how to address operands HW SW

59


Problem-oriented Language layer •  Compiled to assembly or instruction set level •  You will be using embedded C •  How does this differ from usual use of C?

•  Directly write to registers to control processor operation

•  All of the registers have been mapped to macros

•  Important bit combinations have macros – use these, please !

•  Registers are 32 bits, so int type is 4 bytes

•  Register values may change without your specific instructions

•  Limited output system

•  Floating point operations are inefficient, divide & square-root to be avoided.

60


Assembly versus C

•  Efficiency of compiled code

•  Source code portability

•  Program maintainability

•  Typical bug rates (say, per thousand lines of code)

•  The amount of time it will take to develop the solution

•  Availability and cost of compilers and other development tools

•  Your personal experience (or that of the developers on your team) with specific languages or tools

•  Don’t rule out Java or C++ if you have the memory to play with.

61


Problem

•  Company “Ostrich” has recently re-developed their embedded software for flagship products •  Developed in assembly, 80 percent working, 2000 lines of code

•  Suddenly realized that the product isn’t shippable

•  Bugs: system lock-ups indicative of major design flaws or implementation errors & major product performance issues

•  Designer has left the company and provided few notes or comments

•  You are hired as a consultant. Do you: •  Fix existing code?

•  Perform complete software redesign and implementation? In this case, which language?

62


Jobs for This Week

•  Form pairs before next Friday (random assignment thereafter)

•  Tutorial next week

•  Sign up in groups on myCourses