View
215
Download
0
Category
Preview:
Citation preview
1
UC Berkeley
Time dilation in RAMP
Zhangxi Tan and David PattersonComputer Science Division
UC Berkeley
2
A time machine
• Using RAMP as datacenter simulator– Vary DC configurations: processors, disks,
network and etc.– Evaluate different system implementations: Mapreduce
with 10 Gbps, 2ms delay or 100 Gbps, 80ms delay interconnect
– Explore and predict what happened if update hardware in your cluster: powerful CPU, fast/large disks
– Try things in the future!
RAMP insideRAMP inside
3
The problems
• Emulate fast and many computers in FPGA
• What are the problems?– First comment half year ago in RadLab retreat:
100 MHz is too slow can’t reflect GHz machine– Targets are becoming more and more complex
• Implement them in FPGA and cycle accurate is desired
• How many cores can we put in FPGA? (Original vision 16-24 cores per chip. Now, 1 Leon on V2P30, 2-3 on V2P70)
4
Methodologies
• RDL– Target cycle, host cycle, start, stop, channel model…
• Transfer data between units with extra start/stop control• Replace original transferring logic with RDL• control target clock: If no data, still send something to keep the target
time “running”• Bad control logic implementation may cause deadlock• RDLizing unit (build channels, units) if you want to talk with each other
– Compared to porting APPs for MicroBlaze?– RDLizing is obvious and simple??
• Model: event driven? or clock driven?
• Time dilation – Remove target cycle control
• Stepping every clock cycle is the way to debug 1000 nodes system?– Use standard data transfer interface– Rescale everything to a “virtual wall clock” and “slow down” events
accordingly• Events: Timer interrupt, data sent/received and etc
5
Basic Idea
• “Slow down” time passage to make target faster– 10 ms wall clock time = 2 ms
target time• Network: shorter time to send
packet -> BW increase, latency decrease
• Disk: shorter time to read/write• CPU: shorter time to do
computation
– Virtual wall clock is the coordinate in target, only control event interval in implementation
Wall clock
10 ms perceived event interval
10 ms
Virtual wall clock
2 ms
2 ms perceived event interval
10 ms perceived event interval
No time dilation
Time dilation
6
Real world examples
Real
Time dilation
1 sec
Timer interrupt
before time dilation
10 ms
Network
CPU and OS
100 ms
Sending 100 Mb data between two events
Perceived BW : 100 Mbps
Perceived BW : 1 Gbps
• Sending data at the same rate with the same logic
Timer interrupt after time dilation
50 ms in wall clock time
10 ms perceived in target
• OS updates its timer every 10 ms (jiffies) in each timer interrupt
• Reprogram the timer to slow the interrupt down
– No OS modifications– No HW changes
• Speed up the processor by x5
7
Experiments
• HW Emulator (FPGA): 32-bit Leon3 with, 50MHz, 90 MHz DDR memory, 8K L1 Cache (4K Inst and 4K Data)– Target system: Linux 2.6 kernel, Leon @
50 MHz / 250 MHz / 500 MHz / 1 GHz / 2 GHz– Run Dhrystone benchmark– Tomorrow: HW/SW co-simulation example
• ConceptTime Dilation Factor = wall clock time / emulated
clock time
8
Dhrystone result (w/o memory TD)
How close to a 3 GHz x86 ~8000 Dhrystone MIPS? Memory, Cache, CPI
9
Problems
• Similar to time dilation in VM – To Infinity and Beyond: Time-Warped Network
Emulation, NSDI 06
• Everything scaled linearly, including memory! – VM is lucky: networking code can fit in cache
easily.– RAMP has more knobs to tweak.
• Solution: slow down the memory and redo the experiment
10
Dhrystone w. Memory TD
Keep the memory access latency constant - 90 MHz DDR DRAM w. 200 ns latency in all target (50MHz to 2GHz)- Latency is pessimistic, but reflect the trendRAMP blue result + Time dilation vs. real system?
11
Limitation of Naïve time dilation
• Fixed CPI (memory/CPU) model• Next step
– Variable time dilation factor: distribution and state (statistic model)
– Emulate OOO with time dilationPeek each instruction and dilate it
– Going to deterministic? No, I’ll do statistic
UnitTime dilation counter
Proposed model
• No extra control between units• Reprogram Time Dilation Counter (TDC) in each unit to get different target configuration
Recommended