View
223
Download
0
Category
Tags:
Preview:
Citation preview
LUM final presentation
Chanit Giat
Rachel Stahl
Instructor: Artyom Borzin
PROXY CACHE ENGINE
The proxy cache engine gives hardware support to a server’s OS in order to improve its service rate, and adds security features.
The main memory of a network server is the quick storage device, where the recently accessed data is saved. When a new request for data is received, the application must search the memory. If the data are found - send the response; otherwise the data must be read from a slower storage device (disk, tape) and then sent to the user.
PROXY CACHE ENGINE
The system stores the information about all the files’ mapping in main memory and calculates the exact path to the required file if present in main memory. If not present, orders the operating system to bring it from the storage device, and supplies the path to the free memory space is supplied.
The system holds 2 main data bases: A main memory, which holds up to 2Meg paths to
the server’s memory, and their aging parameters. A bit map table, which allows faster memory
management by holding the free space image of the main memory.
Main functions:
Search – returns the path to the main memory, or a path to a free space in the memory.
Set attributes – sets the file’s aging attributes, as supplied by the OS.
Delete – deletes a certain path from the memory.
Count free – returns number of free path slots in the memory.
Init – initialize the machine.
(age – when number of records exceeds a specified number, the system cleans up some of them.)
LengthCID=1 ASIS Site# DataSEARCH:
Previous uArchitecture
Local Bus InterfaceReg.file
Data Streamcontroller
OutputFIFO
InputFIFO
DecoderCRCunit
DatabaseManager(DBM)
UTCAM
SRAM
(Bit
Map)
uArchitecture changes:
Doubling the front-end of the machine, including: Input FIFO Decoder CRC unit
Buffering between the decoders and the DBM with a FIFO.
The search for a free index in the Bit Map is now done in parallel to the rest of the command execution.
Previous uArchitecture
Local Bus InterfaceReg.file
Data Streamcontroller
OutputFIFO
InputFIFO
DecoderCRCunit
DatabaseManager(DBM)
UTCAM
SRAM
(Bit
Map)
FrontEnd
New uarchitecture
InputFIFO Decoder CRC
New uarchitecture
FrontEnd1
FrontEnd0
InputFIFO Decoder CRC
InputFIFO Decoder CRC
Double
FrontEnd1
DBMFifo
FrontEnd0
New uarchitecture
InputFIFO Decoder CRC
InputFIFO Decoder CRC
FIFO
LOCAL BUSINTERFACE
New uarchitecture
Double
FrontEnd1
DBMFifo
FrontEnd0
InputFIFO Decoder CRC
InputFIFO Decoder CRC
FIFO
Reg.file
Data StreamController
OutputFIFO
New uarchitecture
LOCAL BUSINTERFACE
Double
FrontEnd1
DBMFifo
FrontEnd0
InputFIFO Decoder CRC
InputFIFO Decoder CRC
FIFO
Reg.file
Data StreamController
OutputFIFO
DBM
Data Flow
LOCAL BUSINTERFACE
Double
FrontEnd1
DBMFifo
FrontEnd0
InputFIFO Decoder CRC
InputFIFO Decoder CRC
FIFO
Reg.file
Data StreamController
OutputFIFO
DBM
Data Flow
LOCAL BUSINTERFACE
Double
FrontEnd1
DBMFifo
FrontEnd0
InputFIFO Decoder CRC
InputFIFO Decoder CRC
FIFO
Reg.file
Data StreamController
OutputFIFO
DBM
Data Flow
LOCAL BUSINTERFACE
Double
FrontEnd1
DBMFifo
FrontEnd0
InputFIFO Decoder CRC
InputFIFO Decoder CRC
FIFO
Reg.file
Data StreamController
OutputFIFO
DBM
Data stream ctrl
LOCAL BUSINTERFACE
InputFIFO 0
InputFIFO 1
Reg.file
Data StreamController
OutputFIFO
FIFO 0 FIFO 1
Sys_clr
!sot & lwr
!sot & lwr
SOT – start of transaction. lwr – specifies write/read from the system.
Sim: Data Stream ctrl
Reading from
registerfile (crc)
Data enters FIFO 0
Data enters FIFO 1
DBM FIFO
DBMFifo
FIFO
DBM
WAIT ONGO0
WAIT ONGO1
DEC0DEC1
go0 &
!dbm_full
go1 &
!dbm_full fifo_wrdone
fifo_wrdone
Sys_clrgo0/1:
FrontEnd0/1
(decoder0/1)
are ready
dbm_full:
dbm FIFO is
full.
fifo_wrdone:
Write to FIFO is done.
Sim: DBM FIFO
State encoding:
1 – wait on go0
2 – DEC0
4 – wait on go1
8 – DEC1
WAIT ONGO0
WAIT ONGO1
DEC0DEC1
go0 &
!dbm_full
go1 &
!dbm_fullfifo_wrdon
e
fifo_wrdone
Sys_clr DBM FIFO
samples data from decoder 0
DBM FIFO samples data from decoder 1
DBMDOUBLE
DBM interface
DBM fifo ISSUE LOGIC
EXECUTIONUNIT
REQpacket
PACKER
BIT MAP UNIT
Saves the last badDecoder status,
Which goes to theOutput FIFO with the
Next successful Command
Sim: bad decoder status
Register file
Previously, the user could read the system’s current parameters from the register file: command id, CRC value, file’s site etc.
Since we have 2 pipes, the register file had to be changed: Some registers contain data from both pipes. For others, there is a need to specify the pipe of
which to read the parameters.
ADD - old
IDLE
FND_NINDX
ADD_NINDX
ACK_NINDX
(ad_en)&&
(!ad_done)
(bm_s4f_done)
(ad _
erro
r)
(ad_new_done)
(bm_s4f_done)
Finding a newFree index
~40 clk cycles
UpdatingBit map
~10 clk cycles
IDLE
FNEW ACKN
!Sys_clr
Fnew_done
Ackn_done
Bm_s4f_new_ack
!Bm_s4f_new_ack
s4f - old
ADD - new
IDLE
ADD_NINDX ACK_NINDX
!Sys_clr
(ad_en)&&(!ad_done)&&
(bm_index_valid)
(ad_err)
(ad_new_done)
(bm_ack_rcvd)
New index is found while the‘ADD’ module is idle !(which is for more than
50 cycles…)
WT_FOR_ACK
FNEW ACKN
!Sys_clr
Bm_index_valid
Ackn_done
Add_ack
s4f - new
Sim: add, s4f
s4f state encoding:
0 – wait for ack
2 – ack old index
1 – find new index
add state encoding:
1 – idle
2 – add index
4 – ack index
Sim: add, s4f
s4f state encoding:
0 – wait for ack
2 – ack old index
1 – find new index
add state encoding:
1 – idle
2 – add index
4 – ack index
Sim: add, s4f
s4f state encoding:
0 – wait for ack
2 – ack old index
1 – find new index
add state encoding:
1 – idle
2 – add index
4 – ack index
Sim: add, s4f
s4f state encoding:
0 – wait for ack
2 – ack old index
1 – find new index
add state encoding:
1 – idle
2 – add index
4 – ack index
performance
Main function is the ‘search’ command:
Long path (up to 512 bytes) => long CRC calculation => long decoding stage.
Access to main memory => if failed to find the path requested, adding a new record to the memory, which includes finding a new index and acknowledge of the record added (at least 4 memory accesses).
performance
2 input FIFOs – double rate receiving data from OS.
2 decoders – allows decoding of 2 commands in parallel. Significant for several long ‘search’ commands in a row.
DBM FIFO – separates between the decoding and execution of commands, enables them to perform in parallel.
performance
2 search commands each with 102 bytes of path (on which crc is working):
Old Architecture New Architecture
Ads_n falls (first search)
628n 628n
First dword in Input fifo (is_usedw)
719n 718n
End of decoding (crc_done)6128n
(dbm_fifo->fifo_input is ready) 6202n
Pck__en raises 9344n 8560n
Sot falls 9468 2318n
First dword in Input fifo (is_usedw)
2380n 2380n
End of decoding (crc_done)14574n 7869
Pck__en raises 15408n 9486
79428625
6064 926
performance
Search for a free index now executes in parallel to other execution stages of a command. Saves ~50 clock cycles per ‘search’ command, which usually takes ~400-1000 cycles.
The end…
Sim: s4f
State encoding:
0 – wait for ack
2 – ack old index
1 – find new index
WT_FOR_ACK
FNEW ACKN
!Sys_clr
Bm_index_valid
Ackn_done
Add_ack
Sim: s4f
State encoding:
0 – wait for ack
2 – ack old index
1 – find new index
WT_FOR_ACK
FNEW ACKN
!Sys_clr
Bm_index_valid
Ackn_done
Add_ack
Recommended