ECSE 426 – Fall 2015 Microprocessor Systems
Dept. Electrical and Computer Engineering
McGill University
ECSE 426, Lecture 1 2
Instructor & TAs
• Prof. Mark Coates
• Office: McConnell Eng Bldg, Room 759 • Phone: 514-398-7137 • Email: [email protected] • Office hours: Mon. 11:30-12:30 or by appointment
• Teaching Assistants • Ashraf Suyyagh, Zaid Al-bayati, Harsh Aurora • Andrey Tolstikhin, Loren Lugosch
Mark Coates, Fall 2015
ECSE 426, Lecture 1 3
Course Format
• Lectures: • Monday, 10:05 AM - 11:25 AM, Trottier 0060
• Tutorials: • Friday, 11:35 AM – 12:55 PM, Trottier 0060 • Start: Sep. 11
• Labs • Monday-Friday, Trottier 4160 • Demos on Friday (usually, although earlier options) • TAs available for 2-3 hours Mon.-Thurs. afternoon (calendar in
myCourses) • Start: Sep. 21 (Tutorial on Sep. 18)
Mark Coates, Fall 2015
ECSE 426, Lecture 1 4
Other Logistics
• Course communications all through myCourses • Lectures, labs, manuals, discussion, … • Respect others, use common sense
• If you do feel the need to email me directly about course-related matters, please include “ECSE-426” in the subject line
• Lab Groups and Room Access: • The 4 experiments will be conducted in pairs. • Reserve time slots for demonstrations. • The project will be conducted in groups of 4.
Mark Coates, Fall 2015
ECSE 426, Lecture 1 5
(No) Textbook; References
• No textbook. Class notes, manuals. • Useful:
• J. Yiu, The Definitive Guide to ARM® Cortex®-M3 and Cortex®-M4 Processors, Newnes, 2013.
• A. Tanenbaum and T. Austin, Structured Computer Organization, sixth edition, Prentice Hall, 2012.
Mark Coates, Fall 2015
ECSE 426, Lecture 1 6
Grading • Labs – 48%
• 4 labs in pairs of two : demonstrations and reports • Late reports: 5 percent per day penalty (Fri-Mon: one day) • Missed demo: reschedule for 65 percent of grade
• Project – 40% • Group of 4: demonstration and report
• Quizzes – 12% • Four 15 minute quizzes in class • Short answer & multiple choice • covers current lab and tutorial and recent lectures
Mark Coates, Fall 2015
ECSE 426, Lecture 1 7
Grading • Demos
• performance, robustness, code quality, performance testing • individual grade for the demonstration
• Reports or Lab notes • concise but comprehensive, 1 per group • detailed report guidelines will be posted. Follow these closely. • common grade for reports
• Generally grades of group members will be very similar
• Differentiation: response to questions or quality of components for which each member is responsible.
Mark Coates, Fall 2015
ECSE 426, Lecture 1 8
Academic Integrity
• McGill University values academic integrity. Therefore all students must understand the meaning and consequences of cheating, plagiarism and other academic offences under the Code of Student Conduct and Disciplinary Procedures.
• See http://www.mcgill.ca/integrity/ for more information.
• What does this mean for this course? • Feel free to discuss solutions with classmates • And consult and use online resources but… • Do your own programming & write your own reports
Mark Coates, Fall 2015
Avoid Plagiarism
• Please make sure to reference any text, code, or online resources you use to develop your solution
• If you reproduce any figures, clearly state this in the figure caption:
“Reproduced from [1]”
• If you re-use code from another source, clearly indicate this in your code with appropriate comments
Mark Coates, Fall 2015 ECSE 426, Lecture 1 9
ECSE 426, Lecture 1 10
What Is This Course About?
Mark Coates, Fall 2015
10-Sep-15 ECSE 426 Microprocessor Systems
Microprocessors
• Enabling technology for general purpose computers and embedded systems • Many, many applications
• General purpose computers
• PCs, workstations, servers, supercomputers
• Embedded systems • Phones, medical, aviation, automobile, industrial • Real-time systems
ECSE 426, Lecture 1 12
Course Goals • Provide the necessary understanding and skills to design and build
microprocessor systems. • By the end of the course, you should:
• understand the organization and design principles of modern microprocessor-based systems;
• be proficient in assembly and high-level (C language) programming for embedded systems;
• understand the performance impact of the embedded software, including the energy and memory-limited design techniques;
• know how to connect peripheral devices and networking interfaces, and how to write programs for the efficient interface use;
• have experience in developing a realistic embedded system solution through teamwork;
Mark Coates, Fall 2015
ECSE 426, Lecture 1 13
Course Goals (Restated) • Understand microprocessor-based systems • Become familiar with basic development tools • Develop skills in machine interfacing, assembler and embedded C
programming
• Design a sizeable embedded system • Previous projects: Music player, file swapping system, PDAs (with
handwriting recognition), wireless data collection systems • Our project: indoor tracking
• Build teamwork skills
Mark Coates, Fall 2015
10-Sep-15 ECSE 426 Microprocessor Systems
Lecture Structure
• Background • Computer architecture basics
• Microprocessor Instruction Set Architecture • Embedded Processors • Embedded System Design
• Hardware and software techniques
• Building Real Systems • Techniques and tools
10-Sep-15 ECSE 426 Microprocessor Systems
Lab Structure
• Four experiments + final project. • Experiment 1 : Assembly and C • Experiment 2: Intro. to hardware interfacing; drivers, timing … • Experiment 3: I/O, Interrupts, Servo-motors, Advanced Sensor Use • Experiment 4: Real-time OS, Networking
• Project: • Tracking through dead-reckoning • Drawing on a map • Wireless communication • LCD display, keypad interface
Class Schedule
Mark Coates, Fall 2015 ECSE 426, Lecture 1 16
ECSE 426 – Microprocessor Systems Fall 2015
2 September 2015 3 of 3
Schedule of Lectures and Labs There are (on average) per week 1 lecture hour, 1 tutorial hour, 4 lab hours, and 3 preparation hours associated with this course. Over the course of the semester there will be 10 lectures, 5 tutorials, 4 labs and a project.
Week Lecture Material Tutorials
Labs and Project
1 (Sep 7) Introduction Tutorial A Assembly and C
Form lab groups
2 (Sep 14) Assemblers, Lab Intro Tutorial 1 Introduction to IDE and
assembly
3 (Sep 21) Linker, loader, processor architecture
Lab 1
4 (Sep 28) Processor Microarchitecture; Q1
Tutorial 2 - Introduction to embedded C, IDE and drivers
Lab 1 and Demo
5 (Oct. 5) Embedded Processors Lab 2
6 (Oct 12) IO/Processor Interfacing; Q2
Tutorial 3 - Introduction to timers, interrupts and MEMS
Lab 2 and Demo
7 (Oct 19) Buses, Networking, Operating System; Q3
Lab 3
8 (Oct 26) Embedded OS Services
Tutorial 4 – Real time Operating Systems
Lab 3 and Demo
9 (Nov 2) Real-time processing; Q4
Tutorial 5 – Wireless and writing drivers
Lab 4 and Demo
10 (Nov 9) Project Intro Project
11 (Nov 16) Project 12 (Nov 23) Project
13 (Nov 30) Project
14 (Dec 7) Project Demo As with any plan, this schedule is subject to some change.
Class Schedule
Mark Coates, Fall 2015 ECSE 426, Lecture 1 17
ECSE 426 – Microprocessor Systems Fall 2015
2 September 2015 3 of 3
Schedule of Lectures and Labs There are (on average) per week 1 lecture hour, 1 tutorial hour, 4 lab hours, and 3 preparation hours associated with this course. Over the course of the semester there will be 10 lectures, 5 tutorials, 4 labs and a project.
Week Lecture Material Tutorials
Labs and Project
1 (Sep 7) Introduction Tutorial A Assembly and C
Form lab groups
2 (Sep 14) Assemblers, Lab Intro Tutorial 1 Introduction to IDE and
assembly
3 (Sep 21) Linker, loader, processor architecture
Lab 1
4 (Sep 28) Processor Microarchitecture; Q1
Tutorial 2 - Introduction to embedded C, IDE and drivers
Lab 1 and Demo
5 (Oct. 5) Embedded Processors Lab 2
6 (Oct 12) IO/Processor Interfacing; Q2
Tutorial 3 - Introduction to timers, interrupts and MEMS
Lab 2 and Demo
7 (Oct 19) Buses, Networking, Operating System; Q3
Lab 3
8 (Oct 26) Embedded OS Services
Tutorial 4 – Real time Operating Systems
Lab 3 and Demo
9 (Nov 2) Real-time processing; Q4
Tutorial 5 – Wireless and writing drivers
Lab 4 and Demo
10 (Nov 9) Project Intro Project
11 (Nov 16) Project 12 (Nov 23) Project
13 (Nov 30) Project
14 (Dec 7) Project Demo As with any plan, this schedule is subject to some change.
ECSE 426, Lecture 1 18
Intro Material
Mark Coates, Fall 2015
ECSE 426, Lecture 1 19
Computer Architecture
• Application of design principles on state-of-art architecture • ARM Cortex M processor family
• The course focuses primarily on experimental work.
• Present microprocessor principles mainly by example
Mark Coates, Fall 2015
ECSE 426, Lecture 1 20
Applications • Deciding Factors: cost, size, power, quantity
Mark Coates, Fall 2015
ECSE 426, Lecture 1 21
Example: Camera
Mark Coates, Fall 2015
• Computer system with: • Image control • Hardware (lenses, motors) • Interfaces
• Added sophistication to consumer electronics • Expandability (of functions) • Connectivity
Userswitches
Computerinterface
Imagestorage
LCDscreen
Flashunit
A/Dconversion
Opticalsensors
Systemcontroller
Motor
Lens
Figure 9.2. A simplified block diagram of a digital camera.
Cable to PC
ECSE 426, Lecture 1 22
Views of Computers: Levels of Abstraction
• Logic Level - Circuits • Logic functions implemented by gates (interfaces, buses, etc)
• Architectural Level - Microarchitecture • Operations performed by resources (ALUs, registers, etc)
• Instruction Set Level - Instructions • Program execution
• Operating System Level - Complete system • System operation
Mark Coates, Fall 2015
ECSE 426, Lecture 1 23
Layered Computer Architecture
Mark Coates, Fall 2015
Microarchitecture Level
Assembly Language Level
Operating System Machine Level
Instruction Set Architecture Level
Problem-oriented Language Level
Digital Logic level Hardware
Interpreter
OS - Partial Interpretation
Translation (Assembler)
Translation (Compiler)
temp := v[k]; v[k] := v[k+1]; v[k+1] := temp;
lw $15, 0($2) lw $16, 4($2) sw $16, 0($2) sw $15, 4($2)
0000 1001 1100 0110 1010 1111 0101 1000 1010 1111 0101 1000 0000 1001 1100 0110 1100 0110 1010 1111 0101 1000 0000 1001 0101 1000 0000 1001 1100 0110 1010 1111
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Computer Organization
• Processor • Microprocessor
• Memory • Peripherals • Common Bus
24
Reproduced from Tanenbaum & Austin
ECSE 426, Lecture 1
Microprocessor Operation Reset
Fetch
Decode
Execute
Start here at power-on or when a reset signal is received
1.Output inst. address on address bus
2. Read inst. pattern from memory onto data bus
3. Increment inst. pointer (program counter)
Determine what type of instruction was fetched
1. If necessary, read data from memory
2. Execute instruction
3. if necessary, write results to memory
Repeat this process until power is turned off or the processor is halted.
25
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Common Processors
• Mainly von Neumann architecture • Arithmetic-logic unit • Registers • Auxiliary registers
26
Reproduced from Tanenbaum & Austin
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Processor Execution - Java code public class Interp {
static int PC; // program counter holds address of next instr static int AC; // the accumulator, a register for doing arithmetic static int instr; // a holding register for the current instruction static int instr_type; // the instruction type (opcode) static int data_loc; // the address of the data, or -1 if none static int data; // holds the current operand static boolean run_bit = true; // a bit that can be turned off to halt the machine
public static void interpret(int memory[ ], int starting_address) {
PC = starting_address; while (runbit) {
instr = memory[PC]; // fetch next instruction into instr PC = PC + 1; // increment program counter instr_type = get_instr_type(instr); // determine instruction type data_loc = find_data(instr, instr_type); // locate data (-1 if none) if (data_loc >= 0) // if data_loc is -1, there is no operand data = memory[data_loc]; // fetch the data execute(instr_type, data); // execute instruction }
} private static int get_instr_type(int addr) { ... } private static int find_data(int instr, int type) { ... } private static void execute(int type, int data){ ... }
}
27
Mark Coates, Fall 2014 ECSE 426, Lecture 1
Processor Execution - Java code public class Interp {
static int PC; // program counter holds address of next instr static int AC; // the accumulator, a register for doing arithmetic static int instr; // a holding register for the current instruction static int instr_type; // the instruction type (opcode) static int data_loc; // the address of the data, or -1 if none static int data; // holds the current operand static boolean run_bit = true; // a bit that can be turned off to halt the
// machine
}
28
Mark Coates, Fall 2014 ECSE 426, Lecture 1
Processor Execution - Java code
public static void interpret(int memory[ ], int starting_address) { PC = starting_address;
while (runbit) { instr = memory[PC]; // fetch next instruction into instr PC = PC + 1; // increment program counter instr_type = get_instr_type(instr); // determine instruction type data_loc = find_data(instr, instr_type); // locate data (-1 if none) if (data_loc >= 0) // if data_loc is -1, there is no operand data = memory[data_loc]; // fetch the data execute(instr_type, data); // execute instruction }
} private static int get_instr_type(int addr) { ... } private static int find_data(int instr, int type) { ... } private static void execute(int type, int data){ ... }
29
Mark Coates, Fall 2015
Registers
Registers
Microprocessor
Instruction Cache
Arithmetic & Logic
Unit
Control Unit Bus
Interface Unit
Data Cache
Instruction Decoder
I/O
RAM
Memory Bus
System Bus
Floating Point Unit
30
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Bus Interface Unit
• Receives instructions & data from main memory
• Instructions are then sent to the instruction cache, data to the data cache
• Also receives the processed data and sends it to the main memory
31
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Instruction Decoder
• Receives the programming instructions
• Decodes them into a form that is understandable by the processing units, i.e., the ALU or FPU
• Passes on the decoded instruction to the ALU or FPU
32
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Arithmetic & Logic Unit (ALU)
• Also known as the “Integer Unit”
• Performs: • whole-number calculations (subtract, multiply, divide, etc.) • comparisons and logical operations (NOT, OR, AND, etc.)
• More recent microprocessors: • multiple ALUs that can do calculations simultaneously
33
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Floating-Point Unit (FPU)
• Also known as the Numeric Unit
• Performs calculations on numbers in scientific notation (floating-point numbers)
• Floating-point calculations are required in graphics,
engineering and science
• The ALU can do these calculations as well, but very slowly
34
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Registers • Small amount of super-fast private memory placed
right next to ALU & FPU for their exclusive use
• The ALU & FPU store intermediate and final results from their calculations in these registers
• Processed data goes back to the data cache and then to main memory from these registers
35
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Control Unit
• The brain of the microprocessor
• Manages the whole uP
• Tasks • fetching instructions & data • storing data • managing input/output devices
36
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Enhancing the capability of a uP?
The computing capability of a uP can be enhanced in many different ways:
• Increasing the clock frequency
• Increasing the word-width
• More effective caching algorithm and the right cache size
• Adding more functional units (e.g. ALU’s, FPU’s, etc.)
• Improving the architecture
37
Basic Arch. Concepts - Pipelining • Makes processor run at high clock rate
• But might take more clock cycles
• Trick: overlap execution • Some overhead with the first few instructions
38 Reproduced from Tanenbaum & Austin
Photo © ALCE \ Fotolia.com
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Pipelining - Reference • Pipeline
• Connect data processing elements in series • Output of one element is input of the next • Execute in parallel (using time-slices)
• Pipelining effects • Does not decrease processing time for a single datum • Increases throughput of the system when processing a stream of data • Using many pipelining stages causes increase in latency
• More resources (processing units, memory) than when executing one branch at the time • Stages cannot reuse resources of previous stage • Pipelining may increase the time required for an instruction to finish
39
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Other Speedups – Multiple Units
• Bottlenecks – execution in single pipeline units • ALU, especially floating point
• Resolution – provide multiple units
40
Reproduced from Tanenbaum & Austin
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Superscalar Architectures
• Common solution for modern processors • Multiple execution units
41
Reproduced from Tanenbaum & Austin
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Multiple-core architectures
42
Intel slide
Mark Coates, Fall 2014 ECSE 426, Lecture 1
Multiple-core architectures
43
Intel slide
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Memory
• Hierarchy of memory units • Speed vs. size • Solutions
• Caching • Virtual memory
44
Reproduced from Tanenbaum & Austin
Mark Coates, Fall 2015 ECSE 426, Lecture 1
The Main Memory Bottleneck
• Modern uPs can process a huge amount of data in a short duration
• They require quick access to data to maximize their performance
• Data unavailable literally stop and wait – this results in reduced performance and wasted power
• Current uPs can process an instruction in about a ns.
• To fetch data from main memory (RAM): order of 10-100 ns
45
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Solution to the Bottleneck Problem • Make the main memory faster
• Problem: 1-ns memory is extremely expensive as compared with currently popular 10-100 ns memory
• Alternative: • Add a small amount of ultra-fast RAM right next to the uP on the same chip • Make sure that frequently used data and instructions resides in that ultra-
fast memory
• Advantage: Much better overall performance due to fast access to frequently-used data and instructions
46
Mark Coates, Fall 2015 ECSE 426, Lecture 1
On-Chip Cache Memory (1)
• On-Chip Cache Memory • Small amount of memory located on the same chip as the uP • May be multiple levels of caches
• The uP stores a copy of frequently used data and instructions in its cache memory
• When the uP wants some data: • checks in the cache first. • only then does the uP ask for the same data from the main memory
47
Mark Coates, Fall 2015 ECSE 426, Lecture 1
On-Chip Cache Memory (2)
• Small size and proximity to the uP • access times short boost in performance
• Predict what data will be required for future calculations • pre-fetch that data and place it in the cache • available immediately when the need arises
• Speed-advantage of cache memory • Depends heavily on caching algorithm
48
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Expanded View of the Memory Systems
Control
Datapath
Hard disk (Virtual Memory)
Processor
Register
Main Memory 2nd Cache
Cache
Slowest Biggest Lowest
• Cache is handled by hardware • Virtual memory is handled by OS • Programmer sees only one memory and registers
Speed: Faster Size: Smaller Cost: Higher
49
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Memory Organization - Standards
• Computer Word • Basic unit of access
• The same memory can be accessed in different ways
50
Reproduced from Tanenbaum & Austin
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Little Endian vs. Big Endian
• Matter of
preference • Significant
implications for compatibility
• Some processors can have both
51
Reproduced from Tanenbaum & Austin
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Standardization –ASCII set
• Standardized way to use bits for encoding • Characters • Display • Communication • File
52
Reproduced from Tanenbaum & Austin
Programmer’s Model of Microprocessor Instruction Set:ldr r0 , [r2, #0]add r2, r3, r4
Memory: 80000004 ldr r0,[r2,#0] 80000008 add r2, r3, r4 8000000B 23456 80000010 AEF0
Memory mapped I/O80000100 input80000108 output
Registers: r0 - r3, pc
Programmer’sModel
Addressing Modes:ldr r12, [r1,#0]mov r1, r3How to access data in registers and memory
Software Build and Load
• Typical flow for desktop computer
Compiler
Assembler
Linker
Loader
Read-Write Memory (RAM)
Boot Process
Object Files Executable Image File
Run-Time Library:
Operating System Image:
Example Program Creation & Run
Mainmemory
I/O bridgeBus interface
ALU
Register fileCPU
System bus Memory bus
Disk controllerGraphics
adapterUSB
controller
Mouse Keyboard Display Disk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
55
Reproduced from Tanenbaum & Austin
Reading “Hello” Command
Mainmemory
I/O bridgeBus interface
ALU
Register fileCPU
System bus Memory bus
Disk controllerGraphics
adapterUSB
controller
Mouse Keyboard Display Disk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
User types"hello"
56
Loading “Hello” Program
Mainmemory
I/O bridgeBus interface
ALU
Register fileCPU
System bus Memory bus
Disk controllerGraphics
adapterUSB
controller
Mouse Keyboard Display Disk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
hello code
"hello,world\n"
57
Finally -Program Running
Mainmemory
I/O bridgeBus interface
ALU
Register fileCPU
System bus Memory bus
Disk controllerGraphics
adapterUSB
controller
Mouse Keyboard Display Disk
I/O bus Expansion slots forother devices suchas network adapters
hello executable stored on disk
PC
hello code
"hello,world\n"
"hello,world\n"
58
Instruction Set Architecture
• Interface between HW and SW • Virtual Machine
• Many possible implementations
• Given by • Resources
• Processor Registers • Execution Units
• Operations • Instruction Types • Data Types • Addressing Modes
Address Bus
Control Bus
Data Bus
CPU
Memory
Operations performed
Operands and results stored
i.e., where and how to address operands HW SW
59
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Problem-oriented Language layer • Compiled to assembly or instruction set level • You will be using embedded C • How does this differ from usual use of C?
• Directly write to registers to control processor operation
• All of the registers have been mapped to macros
• Important bit combinations have macros – use these, please !
• Registers are 32 bits, so int type is 4 bytes
• Register values may change without your specific instructions
• Limited output system
• Floating point operations are inefficient, divide & square-root to be avoided.
60
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Assembly versus C
• Efficiency of compiled code
• Source code portability
• Program maintainability
• Typical bug rates (say, per thousand lines of code)
• The amount of time it will take to develop the solution
• Availability and cost of compilers and other development tools
• Your personal experience (or that of the developers on your team) with specific languages or tools
• Don’t rule out Java or C++ if you have the memory to play with.
61
Mark Coates, Fall 2015 ECSE 426, Lecture 1
Problem
• Company “Ostrich” has recently re-developed their embedded software for flagship products • Developed in assembly, 80 percent working, 2000 lines of code
• Suddenly realized that the product isn’t shippable
• Bugs: system lock-ups indicative of major design flaws or implementation errors & major product performance issues
• Designer has left the company and provided few notes or comments
• You are hired as a consultant. Do you: • Fix existing code?
• Perform complete software redesign and implementation? In this case, which language?
62
ECSE 426, Lecture 1 63
Jobs for This Week
• Form pairs before next Friday (random assignment thereafter)
• Tutorial next week
• Sign up in groups on myCourses
Mark Coates, Fall 2015
Recommended