Interrupts and System Calls

Interrupts and System Calls Don Porter

CSE 506

Housekeeping

ò  Welcome TA Amit Arya – Office Hours posted

ò  Next Thursday’s class has a reading assignment

ò  Lab 1 due Friday

ò  All students should have VMs at this point

ò  Email Don if you don’t have one

ò  Private git repositories should be set-up

Logical Diagram

Memory Management

CPU Scheduler

User

Kernel

Hardware

Binary Formats

Consistency

System Calls

Interrupts Disk Net

RCU File System

Device Drivers

Networking Sync

Memory Allocators Threads

Today’s Lecture

Background: Control Flow

// x = 2, y = true

if (y) {

2 /= x;

printf(x);

} //...

void printf(va_args) {

//...

}

Regular control flow: branches and calls (logically follows source code)

pc

Background: Control Flow

// x = 0, y = true

if (y) {

2 /= x;

printf(x);

} //...

void handle_divzero(){

x = 2;

}

Irregular control flow: exceptions, system calls, etc.

pc Divide by zero!

Program can’t make progress!

Lecture goal

ò  Understand the hardware tools available for irregular control flow.

ò  I.e., things other than a branch in a running program

ò  Building blocks for context switching, device management, etc.

Two types of interrupts

ò  Synchronous: will happen every time an instruction executes (with a given program state)

ò  Divide by zero

ò  System call

ò  Bad pointer dereference

ò  Asynchronous: caused by an external event

ò  Usually device I/O

ò  Timer ticks (well, clocks can be considered a device)

Intel nomenclature

ò  Interrupt – only refers to asynchronous interrupts

ò  Exception – synchronous control transfer

ò  Note: from the programmer’s perspective, these are handled with the same abstractions

Lecture outline

ò  Overview

ò  How interrupts work in hardware

ò  How interrupt handlers work in software

ò  How system calls work

ò  New system call hardware on x86

Interrupt overview

ò  Each interrupt or exception includes a number indicating its type

ò  E.g., 14 is a page fault, 3 is a debug breakpoint

ò  This number is the index into an interrupt table

x86 interrupt table

0 255

…

31

… …

47

Reserved for the CPU

Software Configurable

Device IRQs 48 = JOS System Call

128 = Linux System Call

x86 interrupt overview

ò  Each type of interrupt is assigned an index from 0—255.

ò  0—31 are for processor interrupts; generally fixed by Intel

ò  E.g., 14 is always for page faults

ò  32—255 are software configured

ò  32—47 are for device interrupts (IRQs) in JOS

ò  Most device’s IRQ line can be configured

ò  Look up APICs for more info (Ch 4 of Bovet and Cesati)

ò  0x80 issues system call in Linux (more on this later)

Software interrupts

ò  The int <num> instruction allows software to raise an interrupt

ò  0x80 is just a Linux convention. JOS uses 0x30.

ò  There are a lot of spare indices

ò  You could have multiple system call tables for different purposes or types of processes!

ò  Windows does: one for the kernel and one for win32k

Software interrupts, cont

ò  OS sets ring level required to raise an interrupt

ò  Generally, user programs can’t issue an int 14 (page fault manually)

ò  An unauthorized int instruction causes a general protection fault

ò  Interrupt 13

What happens (generally):

ò  Control jumps to the kernel

ò  At a prescribed address (the interrupt handler)

ò  The register state of the program is dumped on the kernel’s stack

ò  Sometimes, extra info is loaded into CPU registers

ò  E.g., page faults store the address that caused the fault in the cr2 register

ò  Kernel code runs and handles the interrupt

ò  When handler completes, resume program (see iret instr.)

How it works (HW)

ò  How does HW know what to execute?

ò  Where does the HW dump the registers; what does it use as the interrupt handler’s stack?

How is this configured?

ò  Kernel creates an array of Interrupt descriptors in memory, called Interrupt Descriptor Table, or IDT

ò  Can be anywhere in physical memory

ò  Pointed to by special register (idtr)

ò  c.f., segment registers and gdtr and ldtr!

ò  Entry 0 configures interrupt 0, and so on

x86 interrupt table

0 255

…

31

… …

47

idtr

Physical Address of Interrupt Table

(Avoids going through page translation)

x86 interrupt table

0 255

…

31

… …

47

idtr

Code Segment: Kernel Code Segment Offset: &page_fault_handler //linear addr Ring: 0 // kernel Present: 1 Gate Type: Exception

14

Interrupt Descriptor

ò  Code segment selector

ò  Almost always the same (kernel code segment)

ò  Recall, this was designed before paging on x86!

ò  Segment offset of the code to run

ò  Kernel segment is “flat”, so this is just the linear address

ò  Privilege Level (ring)

ò  Interrupts can be sent directly to user code. Why?

ò  Present bit – disable unused interrupts

ò  Gate type (interrupt or trap/exception) – more in a bit

x86 interrupt table

0 255

…

31

… …

47

idtr

Code Segment: Kernel Code Segment Offset: &breakpoint_handler //linear addr Ring: 3 // user Present: 1 Gate Type: Exception

3

Interrupt Descriptors, ctd.

ò  In-memory layout is a bit confusing

ò  Like a lot of the x86 architecture, many interfaces were later deprecated

ò  Worth comparing Ch 9.5 of the i386 manual with inc/mmu.h in the JOS source code

How it works (HW)

ò  How does HW know what to execute?

ò  Interrupt descriptor table specifies what code to run and at what privilege

ò  This can be set up once during boot for the whole system

ò  Where does the HW dump the registers; what does it use as the interrupt handler’s stack?

ò  Specified in the Task State Segment

Task State Segment (TSS)

ò  Another segment, just like the code and data segment

ò  A descriptor created in the GDT (cannot be in LDT)

ò  Selected by special task register (tr)

ò  Unlike others, has a hardware-specified layout

ò  Lots of fields for rarely-used features

ò  Two features we care about in a modern OS:

ò  1) Location of kernel stack (fields ss0/esp0)

ò  2) I/O Port privileges (more in a later lecture)

TSS, cont.

ò  Simple model: specify a TSS for each process

ò  Optimization (JOS):

ò  Our kernel is pretty simple (uniprocessor only)

ò  Why not just share one TSS and kernel stack per-process?

ò  Linux generalization:

ò  One TSS per CPU

ò  Modify TSS fields as part of context switching

Summary

ò  Most interrupt handling hardware state set during boot

ò  Each interrupt has an IDT entry specifying:

ò  What code to execute, privilege level to raise the interrupt

ò  Stack to use specified in the TSS

Comment

ò  Again, segmentation rears its head

ò  You can’t program OS-level code on x86 without getting your hands dirty with it

ò  Helps to know which features are important when reading the manuals

Lecture outline

ò  Overview





High-level goal

ò  Respond to some event, return control to the appropriate process

ò  What to do on:

ò  Network packet arrives

ò  Disk read completion

ò  Divide by zero

ò  System call

Interrupt Handlers

ò  Just plain old kernel code

Example

User Kernel

Stack Stack

if (x) { printf(“Boo”); ...

printf(va_args…){

...

Disk_handler (){ ...

}

RSP

RIP

RSP

RIP

Disk Interrupt!

Complication:

ò  What happens if I’m in an interrupt handler, and another interrupt comes in?

ò  Note: kernel stack only changes on privilege level change

ò  Nested interrupts just push the next frame on the stack

ò  What could go wrong?

ò  Violate code invariants

ò  Deadlock

ò  Exhaust the stack (if too many fire at once)

Example

User Kernel

Stack Stack

if (x) { printf(“Boo”); ...

printf(va_args…){

...

disk_handler (){ lock_kernel(); ... unlock_kernel();

...

RSP

RIP

net_handler (){ lock_kernel(); …

Network Interrupt!

Will Hang Forever! Already Locked!!!

Bottom Line:

ò  Interrupt service routines must be reentrant or synchronize

ò  Period.

Hardware interrupt sync.

ò  While a CPU is servicing an interrupt on a given IRQ line, the same IRQ won’t raise another interrupt until the routine completes

ò  Bottom-line: device interrupt handler doesn’t have to worry about being interrupted by itself

ò  A different device can interrupt the handler

ò  Problematic if they share data structures

ò  Like a list of free physical pages…

ò  What if both try to grab a lock for the free list?

Disabling interrupts

ò  An x86 CPU can disable I/O interrupts

ò  Clear bit 9 of the EFLAGS register (IF Flag)

ò  cli and sti instructions clear and set this flag

ò  Before touching a shared data structure (or grabbing a lock), an interrupt handler should disable I/O interrupts

Gate types

ò  Recall: an IDT entry can be an interrupt or an exception gate

ò  Difference?

ò  An interrupt gate automatically disables all other interrupts (i.e., clears and sets IF on enter/exit)

ò  An exception gate doesn’t

ò  This is just a programmer convenience: you could do the same thing in software

Exceptions

ò  You can’t mask exceptions

ò  Why not?

ò  Can’t make progress after a divide-by-zero

ò  Double and Triple faults detect faults in the kernel

ò  Do exception handlers need to be reentrant?

ò  Not if your kernel has no bugs (or system calls in itself)

ò  In certain cases, Linux allows nested page faults

ò  E.g., to detect errors copying user-provided buffers

Summary

ò  Interrupt handlers need to synchronize, both with locks (multi-processor) and by disabling interrupts (same CPU)

ò  Exception handlers can’t be masked

ò  Nested exceptions generally avoided

Lecture outline

ò  Overview





System call “interrupt”

ò  Originally, system calls issued using int instruction

ò  Dispatch routine was just an interrupt handler

ò  Like interrupts, system calls are arranged in a table

ò  See arch/x86/kernel/syscall_table*.S in Linux source

ò  Program selects the one it wants by placing index in eax register

ò  Arguments go in the other registers by calling convention

ò  Return value goes in eax!

Lecture outline

ò  Overview





Around P4 era…

ò  Processors got very deeply pipelined

ò  Pipeline stalls/flushes became very expensive

ò  Cache misses can cause pipeline stalls

ò  System calls took twice as long from P3 to P4

ò  Why?

ò  IDT entry may not be in the cache

ò  Different permissions constrain instruction reordering

Idea

ò  What if we cache the IDT entry for a system call in a special CPU register?

ò  No more cache misses for the IDT!

ò  Maybe we can also do more optimizations

ò  Assumption: system calls are frequent enough to be worth the transistor budget to implement this

ò  What else could you do with extra transistors that helps performance?

Intel: sysenter/sysexit

ò  These instructions use MSRs (machine specific registers) to store:

ò  Syscall entry point and code segment

ò  Kernel stack

ò  Syscall return address

ò  Implication: system calls must be issued from a few kernel-approved addresses

ò  i.e., in libc

Pros and cons of fixed return point

ò  Pros:

ò  Indeed faster than int instruction

ò  Security arguments:

ò  Easier to sandbox a program (prevent illegal system calls)

ò  Limits ability of a program to issue errant system calls

ò  Cons: Programmer inconvenience

ò  Can’t just drop an ‘int 0x80’ in my program anymore

ò  Tighter contract between program and kernel

ò  Also, not all x86 CPUs have this instruction

More on compatibility

ò  Not all CPUs have sysenter!

ò  We don’t want every program to have to encode knowledge about every x86 CPU model

ò  And we don’t want to break backwards-compatibility

Linus’s “disgusting” solution

ò  Kernel can support both sysenter and int (for legacy programs)

ò  Kernel figures out what CPU supports (since it has to anyway)

ò  Creates a page with the optimal system call instruction (and a standard function call preamble and epilogue)

ò  Always mapped at a fixed address in programs

ò  Replace int 0x80 with a call <addr>!

vdso

ò  This page is called the Virtual Dynamic Shared Object (vdso)

ò  Libc and other programs reserve this address in their link tables

ò  Kernel is responsible for mapping it in during exec!

ò  Solves part of the compatibility problem

AMD: syscall/sysret

ò  Same basic idea as sysenter/sysexit, but without a fixed return point

ò  Programmers suffered with the fixed return point for the performance win, but didn’t like it

ò  More of a drop-in replacement for int 0x80!

ò  Trade a bit of the performance win for a big convenience win

ò  Everyone loved it and adopted it wholesale

ò  Even Intel!

Aftermath (pt 1)

ò  If every recent x86 CPU has syscall, why bother with sysenter?

ò  Good question. Most don’t!

ò  All 64-bit CPUs have syscall!

ò  Only really need vdso for 32-bit programs

Aftermath (pt. 2)

ò  Getpid() on my desktop machine (recent AMD 6-core):

ò  Int 80: 371 cycles

ò  Syscall: 231 cycles

ò  So system calls are definitely faster as a result!

In JOS

ò  You will use the int instruction to implement system calls

ò  There is a challenge problem in lab 3 (i.e., extra credit) to use systenter/sysexit

ò  Note that there are some more details about register saving to deal with

ò  Syscall/sysret is a bit too trivial for extra credit

ò  But still cool if you get it working!

Summary

ò  Interrupt handlers are specified in the IDT

ò  Understand when nested interrupts can happen

ò  And how to prevent them when unsafe

ò  Understand optimized system call instructions

ò  Be able to explain vdso, syscall vs. sysinter vs. int 80

Documents

Interrupts and System Calls