26
PCI project FPGAs make powerful PCI development platforms, thanks to their re-programmability and operating speed. The interface Part 0: How to create a very Simple PCI interface Part 1: How PCI works Part 2: PCI Reads and Writes Part 3: PCI logic analyzer Part 4: PCI plug-and-play The software Part 5: PCI driver for Windows Part 6: PCI driver for Linux The hardware We used a Dragon board for this project. Link An overview on How the PCI Bus Works from Tech-Pro. Simple PCI interface This is an example of PCI code. We control an LED using PCI write commands. Writing a "0" turns the LED off, writing a "1" turns the LED on! // Very simple PCI target // Just 3 flipflops for the PCI logic, plus one to hold the state of an LED module PCI(CLK, RSTn, FRAMEn, AD, CBE, IRDYn, TRDYn, DEVSELn, LED);

PCI Project Wajid

Embed Size (px)

Citation preview

Page 1: PCI Project Wajid

PCI projectFPGAs make powerful PCI development platforms, thanks to their re-programmability and operating speed.

The interface

Part 0: How to create a very Simple PCI interface Part 1: How PCI works Part 2: PCI Reads and Writes Part 3: PCI logic analyzer Part 4: PCI plug-and-play

The software

Part 5: PCI driver for Windows Part 6: PCI driver for Linux

The hardware

We used a Dragon board for this project.

Link An overview on How the PCI Bus Works from Tech-Pro. Simple PCI interface This is an example of PCI code. We control an LED using PCI write commands. Writing a "0"

turns the LED off, writing a "1" turns the LED on!

// Very simple PCI target// Just 3 flipflops for the PCI logic, plus one to hold the state of an LED

module PCI(CLK, RSTn, FRAMEn, AD, CBE, IRDYn, TRDYn, DEVSELn, LED);

input CLK, RSTn, FRAMEn, IRDYn;input [31:0] AD;input [3:0] CBE;inout TRDYn, DEVSELn;output LED;

Page 2: PCI Project Wajid

parameter IO_address = 32'h00000200; // we respond to an "IO write" at this addressparameter CBECD_IOWrite = 4'b0011;

////////////////////////////////////////////////////reg Transaction;wire TransactionStart = ~Transaction & ~FRAMEn;wire TransactionEnd = Transaction & FRAMEn & IRDYn;wire Targeted = TransactionStart & (AD==IO_address) & (CBE==CBECD_IOWrite);wire LastDataTransfer = FRAMEn & ~IRDYn & ~TRDYn;

always @(posedge CLK or negedge RSTn)if(~RSTn) Transaction <= 0;elsecase(Transaction)1'b0: Transaction <= TransactionStart;1'b1: Transaction <= ~TransactionEnd;endcase

reg DevSelOE;always @(posedge CLK or negedge RSTn)if(~RSTn) DevSelOE <= 0;elsecase(Transaction)1'b0: DevSelOE <= Targeted;1'b1: if(TransactionEnd) DevSelOE <= 1'b0;endcase

reg DevSel;always @(posedge CLK or negedge RSTn)if(~RSTn) DevSel <= 0;elsecase(Transaction)1'b0: DevSel <= Targeted;1'b1: DevSel <= DevSel & ~LastDataTransfer;endcase

assign DEVSELn = DevSelOE ? ~DevSel : 1'bZ;assign TRDYn = DevSelOE ? ~DevSel : 1'bZ;

wire DataTransfer = DevSel & ~IRDYn & ~TRDYn;reg LED; always @(posedge CLK) if(DataTransfer) LED <= AD[0];

endmodule

How PCI worksWe concentrate on PCI 2.2 32-bits here, which is what is used in today's PCs.Newer PCI versions include PCI 2.3 and PCI 3.0.

The PCI specification

Page 3: PCI Project Wajid

The PCI is developed and maintained by a group called the PCI Special Interest Group (PCI-SIG in short).Unlike the Ethernet specification, the PCI specification cannot be downloaded for free. You need to be a member of the PCI-SIG to access the specification. As becoming a member is expensive, you might want to check your company's hardware group (assuming you work in the semiconductor industry) to see if you can get access to the specification.

Otherwise here's a short introduction, followed by some links for more info.

PCI characteristicsThe PCI bus has 4 main characteristics:

Synchronous Transaction/Burst oriented Bus mastering Plug-and-play

PCI is synchronousThe PCI bus uses one clock. The clock runs at 33MHz by default but can run lower (all the way down to idle = 0MHz) to save power, or higher (66MHz) if your hardware supports it.

PCI is Transaction/Burst orientedPCI is transaction oriented.

1. You start a transaction 2. You specify the starting address (one clock cycle) 3. You send as many data as you want (many following clock cycles) 4. You end the transaction

PCI is a 32-bits bus, and so has 32 lines to transmit data. At the beginning of a transaction, the bus is used to specify a 32-bits address. Once the address is specified, many data cycles can go through. The address is not re-transmitted but is auto-incremented at each data cycle. To specify a different address, the transaction is stopped, and a new one started. So PCI bandwidth is best utilized in burst mode.

PCI allows bus masteringPCI transactions work in a master-slave relationship. A master is an agent that initiates a transaction (can be a read or a write).While the host CPU is often the bus master, all PCI boards can potentially claim the bus and become a bus master.

PCI is plug-and-playPCI boards are plug-and-play. That means that the host-CPU/host-OS can:

Determine the identity of each PCI board in a PCI bus (manufacturer & function (video, network...))

Determine the abilities/requirements of each board (how much memory space it requires, how many interrupts...)

Relocate each board memory space

Page 4: PCI Project Wajid

The last feature is an important part of plug-and-play. Each board responds to some addresses, but the addresses to which it responds can be programmed (i.e. each board generates its own board/chip-select signals). That allows the OS to "map" the address space of each board where he wants.

PCI "spaces"PCI defines 3 "spaces" where you can read and write.When a transaction starts, the master specifies the starting address of the transaction, if it's a read or a write, AND which space he wants to speak to.

1. Memory space 2. IO space 3. Configuration space

They work as follow:

The memory and IO spaces are the workhorse spaces. They are "relocatable" (i.e. the addresses at which each board responds can be moved).

The configuration space is used for plug-and-play. It's a space where each board has to implement very specific registers at very specific addresses, so that the host-CPU/OS can figure out what is each board's identity/abilities/requirements. From there, the host CPU/OS enables and configures the other two spaces.This space is fixed and always starts at address 0 for all PCI boards; so one line of the PCI connector is used as board-select (for this space only).

To be compliant, a PCI board needs to implement configuration space. Memory and IO spaces are optional, but one or both is always used in practice.

PCI bridgePCI devices don't connect directly to a host CPU, but go through a "bridge" chip.That's because CPUs typically don't "speak" PCI natively, so a bridge has to translate the transactions from the CPU's bus to the PCI's bus. Also CPUs never have 3 memory spaces like PCI devices. Most CPUs have 1 space (memory space), while other CPUs have 2 (memory & IO). The bridge has to play some tricks so that the CPU can still access all 3 PCI spaces.

PCI voltagePCI boards can use 3.3V or 5V signaling. Interestingly, current PCs all use 5V signaling.PCI board connectors have one or two slots that identify if the board is 3.3V or 5V compliant. This is to ensure that, for example, a 3.3V only board cannot be plugged into a PC's 5V-only PCI bus.

Here an example of 5V-only board:

while this board is both 5V and 3.3V compliant:

Page 5: PCI Project Wajid

PCI timingPCI specifies timing related to its clock.With a 33MHz clock, we have:

7ns/0ns Tsu/Th (setup/hold) constraint on inputs 11ns Tco (clock-to-output) on outputs

Links A more detailed technical description in this PCI Local Bus Technical Summary from

TechFest A short PCI Bus Operation page. Many interesting links on Craig's PCI Pages Also An Experiment to Build a PCI Board

PCI Reads and WritesLet's do some real PCI transactions now...

IO transactionsThe easiest PCI space to work with is the IO space.

No virtualization from the CPU/OS (i.e. CPU address = hardware address) No driver necessary (true on Win98/Me, while on Win XP/2K, a driver is required but generic

ones are provided below)

The disadvantage of the IO space is that it's small (limited to 64KBs on PCs, even if PCI supports 4GBs) and pretty crowded.

Finding a free spaceOn Windows 98/Me, open the "Device Manager" (from "Control Panel"/System), then show Computer/Properties and check the "Input/Output (I/O)" panel.

Page 6: PCI Project Wajid

On Windows XP/2000, open the "System Information" program (Programs/Accessories/System Tools/System Information) and click on "I/O".

Lots of peripherals are using the IO space, so free space candidates take a little research.

Page 7: PCI Project Wajid

Device driverThe IO space is left unprotected on Win98/Me, so not driver is necessary there.For WinXP/2K, GiveIO and UserPort are free generic drivers that open up the IO space.

A RAM PCI cardLet's implement a small RAM in our PCI card.

The RAM is 32 bits x 16 locations. That's small enough to fit in IO space using "direct addressing" (the IO space is so crowded that indirect addressing is otherwise necessary).We need to pick a free IO space in the host PC. Each 32bits location takes 4 bytes addresses, so we require 4x16=64 contiguous free addresses. We chose 0x200-0x23F here but you may have to choose something else.

First the module declaration.

module PCI_RAM( PCI_CLK, PCI_RSTn, PCI_FRAMEn, PCI_AD, PCI_CBE, PCI_IRDYn, PCI_TRDYn, PCI_DEVSELn );input PCI_CLK, PCI_RSTn, PCI_FRAMEn, PCI_IRDYn;inout [31:0] PCI_AD;input [3:0] PCI_CBE;output PCI_TRDYn, PCI_DEVSELn;

parameter IO_address = 32'h00000200; // 0x0200 to 0x23Fparameter PCI_CBECD_IORead = 4'b0010;parameter PCI_CBECD_IOWrite = 4'b0011;

Then we keep track of what is happening on the bus through a "PCI_Transaction" register."PCI_Transaction" is asserted when any transaction is going on, either for us, or any other card on the bus.

reg PCI_Transaction;

wire PCI_TransactionStart = ~PCI_Transaction & ~PCI_FRAMEn;wire PCI_TransactionEnd = PCI_Transaction & PCI_FRAMEn & PCI_IRDYn;

always @(posedge PCI_CLK or negedge PCI_RSTn)if(~PCI_RSTn) PCI_Transaction <= 0;elsecase(PCI_Transaction)1'b0: PCI_Transaction <= PCI_TransactionStart;1'b1: PCI_Transaction <= ~PCI_TransactionEnd;endcase

// We respond only to IO reads/writes, 32-bits alignedwire PCI_Targeted = PCI_TransactionStart & (PCI_AD[31:6]==(IO_address>>6)) & (PCI_AD[1:0]==0) & ((PCI_CBE==PCI_CBECD_IORead) | (PCI_CBE==PCI_CBECD_IOWrite));

Page 8: PCI Project Wajid

// When a transaction starts, the address is available for us to register// We just need a 4 bits address herereg [3:0] PCI_TransactionAddr;always @(posedge PCI_CLK) if(PCI_TransactionStart) PCI_TransactionAddr <= PCI_AD[5:2];

Now a few more registers to be able to claim the transaction and remember if it's a read or a write

wire PCI_LastDataTransfer = PCI_FRAMEn & ~PCI_IRDYn & ~PCI_TRDYn;

// Is it a read or a write?reg PCI_Transaction_Read_nWrite;always @(posedge PCI_CLK or negedge PCI_RSTn)if(~PCI_RSTn) PCI_Transaction_Read_nWrite <= 0;elseif(~PCI_Transaction & PCI_Targeted) PCI_Transaction_Read_nWrite <= ~PCI_CBE[0];

// Should we claim the transaction?reg PCI_DevSelOE;always @(posedge PCI_CLK or negedge PCI_RSTn)if(~PCI_RSTn) PCI_DevSelOE <= 0;elsecase(PCI_Transaction)1'b0: PCI_DevSelOE <= PCI_Targeted;1'b1: if(PCI_TransactionEnd) PCI_DevSelOE <= 1'b0;endcase

// PCI_DEVSELn should be asserted up to the last data transferreg PCI_DevSel;always @(posedge PCI_CLK or negedge PCI_RSTn)if(~PCI_RSTn) PCI_DevSel <= 0;elsecase(PCI_Transaction)1'b0: PCI_DevSel <= PCI_Targeted;1'b1: PCI_DevSel <= PCI_DevSel & ~PCI_LastDataTransfer;endcase

Let's claim the transaction.

// PCI_TRDYn is asserted during the whole PCI_Transaction because we don't need wait-states// For read transaction, delay by one clock to allow for the turnaround-cyclereg PCI_TargetReady;always @(posedge PCI_CLK or negedge PCI_RSTn)if(~PCI_RSTn) PCI_TargetReady <= 0;elsecase(PCI_Transaction)1'b0: PCI_TargetReady <= PCI_Targeted & PCI_CBE[0]; // active now on write, next cycle on reads1'b1: PCI_TargetReady <= PCI_DevSel & ~PCI_LastDataTransfer;endcase

Page 9: PCI Project Wajid

// Claim the PCI_Transactionassign PCI_DEVSELn = PCI_DevSelOE ? ~PCI_DevSel : 1'bZ;assign PCI_TRDYn = PCI_DevSelOE ? ~PCI_TargetReady : 1'bZ;

Finally, the RAM itself is written or read, with the PCI_AD bus driven accordingly.

wire PCI_DataTransferWrite = PCI_DevSel & ~PCI_Transaction_Read_nWrite & ~PCI_IRDYn & ~PCI_TRDYn;

// Instantiate the RAM// We use Xilinx's synthesis here (XST), which supports automatic RAM recognition// The following code creates a distributed RAM, but a blockram could also be used (we have an extra clock cycle to get the data out)reg [31:0] RAM [15:0];always @(posedge PCI_CLK) if(PCI_DataTransferWrite) RAM[PCI_TransactionAddr] <= PCI_AD;

// Drive the AD bus on reads only, and allow for the turnaround cyclereg PCI_AD_OE;always @(posedge PCI_CLK or negedge PCI_RSTn)if(~PCI_RSTn) PCI_AD_OE <= 0;elsePCI_AD_OE <= PCI_DevSel & PCI_Transaction_Read_nWrite & ~PCI_LastDataTransfer;

// Now we can drive the PCI_AD busassign PCI_AD = PCI_AD_OE ? RAM[PCI_TransactionAddr] : 32'hZZZZZZZZ;

endmodule

Now we can read and write the PCI card!

Design considerations

1. The PCI_CBE byte enables are not used, so the software is supposed to issue only 32-bits transactions, aligned.

2. You might be surprised to find that the PCI "PAR" signal (bus parity) is not used either.While PAR generation is required for PCI compliance, its checking might not be because the PCs I have access to work fine without it... And since I cannot test it in real hardware, I omitted it.

3. The above code supports burst transfers, but current PC bridges don't seem to issue bursts (at least for the IO space). x86 processors have support for burst IO instructions (REP INS/OUTS) but they end up being broken into individual transactions on the PCI bus.Also I'm not sure if burst IO would require auto-incrementing the IO address, especially since the REP INS/OUTS instructions don't. But as not incrementing has happy consequences on timing (more details below), I kept the code this way.

Issue IO read/write transactions

Page 10: PCI Project Wajid

On PC, you use the x8086 "IN" and "OUT" processor instructions to issue IO transactions.Some compilers don't have native support for these, so you may have to use inline assembler functions. Here are examples for Visual C++:

void WriteIO_DWORD(WORD addr, DWORD data){__asm{mov dx, addrmov eax, dataout dx, eax}}

DWORD ReadIO_DWORD(WORD addr){__asm{mov dx, addrin eax, dx}}

GUI PCI IO exerciser softwareYou can use this simple IOtest application to issue 32-bits IO reads and writes on a PC.That works directly on Win98/Me. Be sure to have GiveIO or UserPort running on WinXP/2K.

One important thing: free spaces return 0xFFFFFFFF on reads.

Timing considerationsRemember that PCI requires:

7ns/0ns Tsu/Th (setup/hold) constraint on inputs 11ns Tco (clock-to-output) on outputs

Most PCI cores are complex enough that the Tsu is impossible to meet without registering the inputs right in the IO blocks. Tco is also hard to meet without doing the same for the outputs.

Page 11: PCI Project Wajid

But these registers add latencies to the design. The above code is simple enough that IO block registers are not required.

The code was tested using the Dragon board and Xilinx's ISE software.It gives something like:

Timing summary:---------------

Timing errors: 0 Score: 0

Design statistics:Minimum period: 9.667ns (Maximum frequency: 103.445MHz)Minimum input required time before clock: 5.556nsMinimum output required time after clock: 10.932ns

Clock frequency was largely met (103MHz against 33MHz).Tsu was met by a large margin (5.556ns against 7ns) while Tco was barely met (10.932ns against 11ns) on the PCI_DEVSELn and PCI_TRDYn signals.Tco would not have been met on the AD bus if the IO address had to be auto-incremented on burst reads. Since the address is static, and since (for read cycles only) the PCI bus requires a turnaround cycle after the address phase, the data has an extra clock cycle to get ready. Without it, the Tco was around 13ns, so above the maximum 11ns. But with the extra clock cycle, we actually meet the timing by a 28ns slack (=margin), which is very comfortable.

The only timing that was not met is the input hold-time (0nS), which was hopefully low enough (0.3nS for the worst violator). But Xilinx doesn't support a way to constraint the hold-time, maybe because using IO block registers guaranties "by design" (of the FPGA) a 0ns hold-time.

PCI logic analyzerNow that we can issue read and write transactions on the bus, wouldn't it be fun to "see" how the transactions actually look like?

Here's a very simple transaction that was captured with Dragon.

During the address phase, CBE is 0x3, which means "IO Write".It's an IO Write, data 0x00000000, at address 0x0200.

Page 12: PCI Project Wajid

The FPGA as a PCI logic analyzerBeing able to see the bus operation can be interesting to:

Get a better understanding of its operation. Check the bus latencies within and in-between transactions. Do post-mortem analysis (if you have functional problems in your PCI core).

Looking at the signals usually requires expensive equipment, like bus extenders and logic analyzers. That can be tricky because the PCI specification doesn't allow more than one IO load on each PCI signal (per PCI card of course). That's because the bus is sensitive to capacitive loads or wire stubs that would distort the high-speed signals.

But couldn't the FPGA act like a logic analyzer?

The FPGA is already connected to the bus, and has internal memories that can be used to capture the bus operation in real time. Dragon has also a USB interface that can be used to dump out the PCI captures without disturbing the PCI interface implementation, even if the PCI bus "dies".

The FPGA can also easily create complex triggers conditions that would outsmart most logic analyzers... what if you want to capture the 17th write after the second read at address 0x1234?

Capturing the PCI signalsWe build a "state" (=synchronous) logic analyzer here.

The signals captured are:

wire [47:0] dsbr = {PCI_AD,PCI_CBE, PCI_IRDYn, PCI_TRDYn, PCI_FRAMEn, PCI_DEVSELn,PCI_IDSEL, PCI_PAR, PCI_GNTn, PCI_LOCKn, PCI_PERRn, PCI_REQn, PCI_SERRn, PCI_STOPn};

Just 48 signals!Nice, fit perfectly in 3 blockrams if we choose a depth of 256 clocks.

Implementation is easy: an 8 bits counter starts feeding the blockrams once a trigger condition is set, and another counter allows the USB to read the blockrams data. Logic was also added to allow some level of pre-trigger acquisition - details in the Dragon board files.

The blockram outputs are muxed out to the USB controller in this order

case(USB_readaddr[2:0])3'h0: USB_Data <= bro[ 7: 0];3'h1: USB_Data <= bro[15: 8];3'h2: USB_Data <= bro[23:16];3'h3: USB_Data <= bro[31:24];3'h4: USB_Data <= bro[39:32];3'h5: USB_Data <= bro[47:40];3'h6: USB_Data <= 8'h01; // padding, added for ease of implementation

Page 13: PCI Project Wajid

3'h7: USB_Data <= 8'h02; // padding, added for ease of implementationendcase

and finally, with a USB bulk read command, the data is acquired and saved into a ".pciacq" file for further analysis.

PCI bus viewerThe software used to view the ".pciacq" file can be downloaded here.

A sample ".pciacq" file is included, which is the result capture of this list of transactions:

ReadIO_DWORD( 0x200 );ReadIO_DWORD( 0x204 );ReadIO_DWORD( 0x208 );ReadIO_DWORD( 0x210 );WriteIO_DWORD( 0x204, 0x12345678 );WriteIO_DWORD( 0x208, 0x87654321 );WriteIO_DWORD( 0x210, 0xDEADBEEF );ReadIO_DWORD( 0x200 );ReadIO_DWORD( 0x204 );ReadIO_DWORD( 0x208 );ReadIO_DWORD( 0x210 );

The software looks like:

Page 14: PCI Project Wajid

One interesting thing: during a read turnaround-cycle, the AD bus shows the data of the previous read... see cycle 151 for example... no idea why.

More PCI bus captures

If we issue an IO write transaction that is not claimed by anybody, the bridge used here retries 12 times!See this WriteNotClaimed.pciacq file (the first IO Write is claimed, the subsequent one is not and gets retried many times).To view it, just un-zip and replace the original ".pciacq" file.

See also this ReadNotClaimed.pciacq file.

PCI plug-and-playNow that reads and writes accesses are going through, what does it take for the PCI plug-and-play to work?

Our PCI card is not yet in the list...

Configuration spaceRemember that PCI cards have three "spaces" where transactions (reads and writes) take place?

Page 15: PCI Project Wajid

1. Memory space 2. IO space 3. Configuration space

The configuration space is the heart of PCI plug-and-play. The OS (Windows, Linux...) reads there first to find if PCI cards are plugged-in, and their characteristics.

For simple boards, the configuration space consists of just 64 bytes. They important fields are:

Offset Name Function Note Length0 Vendor ID Manufacturer number ... allocated by the PCI-SIG 2 bytes2 Device ID Device number ... allocated by the

manufacturers themselves 2 bytes

4 Command Turn on and off accesses to the PCI board

... but configuration space accesses are always on 2 bytes

16 BAR0 (Base address register 0)

Address at which the PCI board should respond

... followed by BAR1 through BAR5

4 bytes each

By implementing the right values and registers at these locations, the OS can "find" the PCI card.

Configuration space transactionsEach PCI slots as a signal called IDSEL. The IDSEL signal is not shared along the bus; each PCI slot has its own.When a PCI card sees a configuration space transaction on the bus, and its own IDSEL is asserted, it knows it should respond.

parameter PCI_CBECD_CSRead = 4'b1010; // configuration space readparameter PCI_CBECD_CSWrite = 4'b1011; // configuration space write

wire PCI_Targeted = PCI_TransactionStart & PCI_IDSEL & ((PCI_CBE==PCI_CBECD_CSRead) | (PCI_CBE==PCI_CBECD_CSWrite)) & (PCI_AD[1:0]==0);

After that, it can be a read or a write but it works the same way than memory or IO spaces do.

A few details:

For the Vendor ID, let's just pick a number; we are just experimenting, right? ok, 0x0100 works fine.

Device ID can be left at 0 Command bit 0 is the "on/off" bit for the IO space, while bit 1 is the "on/off" bit for the Memory

space. BAR0 is a register that is written by the OS, once it decides at which address the PCI card

should be located.

There are a few other details left out, like some bits of BAR0 are read-only...Please refer to a PCI specification/book for the down-to-earth details.

Windows plug-and-playOnce these registers are implemented, the OS can discover the new hardware.

Page 16: PCI Project Wajid

But the OS requires a driver before...

Page 17: PCI Project Wajid

... it agrees to allocate the memory resource.

Page 18: PCI Project Wajid

Links Many interesting things on Craig's PCI & PnP ID's Pages

PCI software driver for WindowsNow that we need a driver for our PCI card, there are two ways to get it.

The easy wayThe easy way consists on having someone else doing the hard work for you!

Check out WinDriver.That's a commercial toolkit that can build a PCI plug-and-play driver solution for you in minutes.

It works like that:

You run a wizard that detects your plug-and-play devices, including the PCI cards. You select your card of interest, give a name to your device and create an ".inf" file. That's enough for Windows to be able to recognize the hardware and convince him that it

should use WinDriver's driver. You quit the wizard, and go through Window's plug-and-play hardware detection to install the driver.

Page 19: PCI Project Wajid

Once the driver is installed, you run the wizard again, this time to build some example source code to access the PCI card.

WinDriver gives you 30 days to try it out.Windriver may be nice, but at $2000, that's expensive if all you want to do is experiment with PCI plug-and-play mechanisms.

The hard wayUse Microsoft Windows DDK and the Online DDK documentation.

Installing Windows DDKThe latest Windows DDKs releases are not free, while earlier incarnations (98/2000) were free to download.The DDKs are easy to install. For Win98 and Win2000 DDKs, first install Visual C++ 5.0 or 6.0, then the DDK itself. Then following the "install.htm" instructions to build a few sample drivers using the "build" command.

A minimum WDM Plug-and-Play driverHere's the very minimum code required for Windows device manager to allocate the memory resource used by our PCI card.Since it's a WDM driver, it works in WinXP/2000/98.

The entry point of a WDM driver is a "DriverEntry" function (like the "main" of a C program).Its main purpose is to publish addresses of callback functions. Our minimum driver just needs 2.

NTSTATUS DriverEntry(PDRIVER_OBJECT DriverObject, PUNICODE_STRING RegistryPath){DriverObject->DriverExtension->AddDevice = DevicePCI_AddDevice;DriverObject->MajorFunction[IRP_MJ_PNP] = DevicePCI_PnP;

return STATUS_SUCCESS;}

A WDM driver creates at least one "device" (if your PC has multiple simular items, the same WDM driver may create multiple devices). Before the driver can create a device, we need a "Device Extension" structure. The structure is used by each device to store information. We can make it as big as we want, and a typical device will store many fields in there. Our minimum device just needs one field.

typedef struct {PDEVICE_OBJECT NextStackDevice;}DevicePCI_DEVICE_EXTENSION, *PDevicePCI_DEVICE_EXTENSION;

What is this "NextStackDevice" for? a WDM implementation detail...WDM devices process IRPs ("I/O Request Packets", create/read/write/close...). WDM devices don't work alone but are assembled in logical "stacks" of devices. IRP requests are sent along the stack and are

Page 20: PCI Project Wajid

processed on the way. Stacks are created from bottom to top (bottom=hardware layers, top=logical layers). When a stack is created, each device attaches itself to the device just below. A device typically stores the info about the device just below himself in the Device Extension, so that later, it can forward along IRP requests. A device doesn't really know where it is in the stack, it just processes or forwards requests as they are coming.

Anyway, now we can implement DevicePCI_AddDevice.It creates a device object and attaches the device to the device stack.

NTSTATUS DevicePCI_AddDevice(PDRIVER_OBJECT DriverObject, PDEVICE_OBJECT pdo){// Create the device and allocate the "Device Extension"PDEVICE_OBJECT fdo;NTSTATUS status = IoCreateDevice(DriverObject, sizeof(DevicePCI_DEVICE_EXTENSION), NULL, FILE_DEVICE_UNKNOWN, 0, FALSE, &fdo);if(!NT_SUCCESS(status)) return status;

// Attach to the driver below usPDevicePCI_DEVICE_EXTENSION dx = (PDevicePCI_DEVICE_EXTENSION)fdo->DeviceExtension;dx->NextStackDevice = IoAttachDeviceToDeviceStack(fdo, pdo);

fdo->Flags &= ~DO_DEVICE_INITIALIZING;return STATUS_SUCCESS;}

Finally we can process the Plug-and-Play IRP requests.Our minimum device processes only START_DEVICE and REMOVE_DEVICE requests.

NTSTATUS DevicePCI_PnP(PDEVICE_OBJECT fdo, PIRP IRP){PDevicePCI_DEVICE_EXTENSION dx = (PDevicePCI_DEVICE_EXTENSION)fdo->DeviceExtension;PIO_STACK_LOCATION IrpStack = IoGetCurrentIrpStackLocation(IRP);ULONG MinorFunction = IrpStack->MinorFunction;

switch(MinorFunction){case IRP_MN_START_DEVICE:// we should check the allocated resource...break;case IRP_MN_REMOVE_DEVICE:status = IRP_NotCompleted(fdo, IRP);if(dx->NextStackDevice) IoDetachDevice(dx->NextStackDevice);IoDeleteDevice(fdo);break;}

// call the device below usIoSkipCurrentIrpStackLocation(IRP);return IoCallDriver(dx->NextStackDevice, IRP);}

Page 21: PCI Project Wajid

The START_DEVICE request is the one where we accept or refuse the memory resources. Here we don't do anything but forward the request down the stack, where it is always accepted.

Now, our device gets some memory resources, but doesn't do anything with them.To be more useful, the driver would need to:

Check the memory resources before accepting them Export a device name Implement some "DeviceIOcontrol" to communicate with a Win32 application Handle more IO requests ("IRP") ...

Get the code here. Your turn to experiment!You can get more sample code by studying the "portio" project in the Windows 2000 DDK for example.

Links Jungo's WinDriver and CompuWare's DriverStudio toolkits Microsoft DDK and the Online DDK documentation The articles Surveying the New Win32Driver Model and Implementing the New Win32

Driver Model from the MSJ. Examples of NT4 style drivers: Kamel from ADP GmbH, DumpPCI from Microsoft Programming the Microsoft Windows driver model book from Walter Oney

Page 22: PCI Project Wajid

PCI software driver for LinuxFedora is an impressive Linux release.Microsoft should be worried...

Writing a Plug-and-Play PCI driver for LinuxIt's actually easier than on Windows.

1. Create the init_module and cleanup_moduleThese functions are called when the driver is loaded or unloaded.

int init_module(void){ return pci_module_init(&pci_driver_DevicePCI);}

void cleanup_module(void){ pci_unregister_driver(&pci_driver_DevicePCI);}

The "pci_driver_DevicePCI" structure is shown next...

2. Create tables describing the PCI board#define VENDOR_ID 0x1000#define DEVICE_ID 0x0000

struct pci_device_id pci_device_id_DevicePCI[] = { {VENDOR_ID, DEVICE_ID, PCI_ANY_ID, PCI_ANY_ID, 0, 0, 0}, {} // end of list};

struct pci_driver pci_driver_DevicePCI = { name: "MyPCIDevice", id_table: pci_device_id_DevicePCI, probe: device_probe, remove: device_remove};

device_probe and device_remove are 2 callback functions, created next...

3. Create the "probe" and "remove" callbacksint device_probe(struct pci_dev *dev, const struct pci_device_id *id){ int ret;

Page 23: PCI Project Wajid

ret = pci_enable_device(dev); if (ret < 0) return ret;

ret = pci_request_regions(dev, "MyPCIDevice"); if (ret < 0) { pci_disable_device(dev); return ret; }

return 0;}

void device_remove(struct pci_dev *dev){ pci_release_regions(dev); pci_disable_device(dev);}

That should be enough to allocate the memory resource...Thanks to Ian Johnston's help, I got the current files (for Fedora Core 2 - kernel 2.6) to compile.Build them using "make" followed by "insmod DevicePCI.ko" to load the driver, and "rmmod DevicePCI.ko" to unload it.

Your turn to experiment!

Link The Linux Device Drivers, 2nd Edition Online Book, and in particular the "Handling Hot-

Pluggable Devices" section of chapter 15. A nice Writing a PCI driver in 5+3 steps presentation