Upload
truongmien
View
218
Download
2
Embed Size (px)
Citation preview
3/9/20043/9/2004 11
Energy Efficient Scheduling Techniques Energy Efficient Scheduling Techniques For RealFor Real--Time Embedded SystemsTime Embedded Systems
Rabi Mahapatra & Wei ZhaoThis work was done by Rajesh Prathipati as part of
his MS Thesis here.The work has been update by Subrata Acharya &
Nitesh GoyalCopyright: Mahapatra@Texas A&M
3/9/20043/9/2004 22
OutlineOutline
• Introduction• Motivation• Related Work• Single Processor Systems• Distributed Multiprocessor Systems• Experiments & Results• Summary
3/9/20043/9/2004 33
Introduction
PDAaudio/video entertainment
devices robots Handheld computer
Mobile Phone Network Camera Wireless presentation Gateway Cerfcube
Sample Embedded Systems
3/9/20043/9/2004 44
Application Specification for Application Specification for Embedded SystemsEmbedded Systems
•• Periodic Task graphsPeriodic Task graphs•• Each task characterized Each task characterized
by:by:•• PeriodPeriod•• Execution timeExecution time•• DeadlineDeadline
•• Sporadic TasksSporadic Tasks•• Invoked at any timeInvoked at any time•• Hard deadlineHard deadline
•• Soft AperiodicSoft Aperiodic•• Invoked at any timeInvoked at any time•• No deadlineNo deadline
t1
t3t2
t4
t5
Period =90, Deadline =90
Sporadic Task, Deadline =30
Typical Input Specification of Embedded Systems
3/9/20043/9/2004 55
Why Low Power ?Why Low Power ?
High Power dissipation causes Chip failures
Expensive Cooling & Packaging overheadsHigh Manufacturing Costs
Portable Systems, User convenience limited by:Battery SizeRecharging Interval
3/9/20043/9/2004 66
Power ManagementPower ManagementProcessor power dissipation is a function of Processor power dissipation is a function of
α . Cl . V2dd . f
Various LowVarious Low--Power TechniquesPower TechniquesSystemSystem--LevelLevelArchitectureArchitecture--LevelLevelCircuitCircuit--LevelLevel
SystemSystem--Level power reduction techniques:Level power reduction techniques:Dynamic Voltage ScalingDynamic Voltage ScalingDynamic Power ManagementDynamic Power Management
3/9/20043/9/2004 77
System Level Power Management System Level Power Management TaxonomyTaxonomy
SLPM
DPM LPS (DVS)
Fixed Tasks Variable Tasks
Single Processor
Multiprocessors
Single processor
Multiprocessors
Fixed Task set Variable Task set
Single Processor
Multiprocessors
Single Processor
MultiprocessorsD≤P (contd ..)
No Restrictions
Tolerance DL
Hard Realtime
SLPM – System Level Power ManagementDPM – Dynamic Power ManagementLPS – Low power Scheduling
3/9/20043/9/2004 88
System Level Power Management System Level Power Management Taxonomy (Taxonomy (contdcontd …)…)
D≤P
Tolerance DL Hard Real time
No Precedence With Precedence
Periodic
Periodic + Sporadic
Periodic
Periodic + Sporadic
3/9/20043/9/2004 99
Our ObjectiveOur Objective
Given Embedded system and its application task graphs with library functions (i.e. period, execution time, Deadline etc.), our goal is
toReduce the system wide power consumption whileguaranteeing the deadlines
3/9/20043/9/2004 1010
Related WorkRelated WorkMulti-Processor
J.LuoJ.Luo and and N.K.JhaN.K.Jha, 2001 , 2001 “Battery“Battery--Aware static scheduling”Aware static scheduling”
•• Global shifting scheme & local schedule transformationsGlobal shifting scheme & local schedule transformations•• More suitable to small scale systemsMore suitable to small scale systems
R.MishraR.Mishra, , N.RastogiN.Rastogi, and , and D.ZhuD.Zhu, 2003, 2003“Energy aware scheduling for distributed”“Energy aware scheduling for distributed”
•• Greedy and gapGreedy and gap--filling dynamic power management techniquesfilling dynamic power management techniques•• Limited to task graphs with equal deadlineLimited to task graphs with equal deadline
D. Zhu, R. D. Zhu, R. MelhemMelhem, and B. Childers, 2003 , and B. Childers, 2003 “Scheduling with Dynamic Voltage/Speed”“Scheduling with Dynamic Voltage/Speed”
•• Slack sharing among processors, global queueSlack sharing among processors, global queue•• Limited homogenous systems with shared memoryLimited homogenous systems with shared memory
Single Processor:Single Processor:G.QuanG.Quan, and X. HU, 2001 , and X. HU, 2001
Minimum constant voltage for each intervalMinimum constant voltage for each intervalAssumes deadline less than or equal to period.Assumes deadline less than or equal to period.
V.SwaminathanV.Swaminathan, and , and K.ChakrabartyK.Chakrabarty, 2000, 2000LowLow--energy earliest deadline first heuristicenergy earliest deadline first heuristicNo guarantee on required maximum processor speedNo guarantee on required maximum processor speed
3/9/20043/9/2004 1111
ContributionsContributions
Provides a framework for single processor that consider tasks
Whose response time is greater than the period.With Precedence constraints
Introduced chain of task set based execution approach to model low-power in distributed embedded systems.
3/9/20043/9/2004 1212
Energy Efficient Scheduling Techniquesfor Single Processor
3/9/20043/9/2004 1313
Proposed ApproachProposal: A 3-step approach to reduce power in single
processor embedded systems with arbitrary response times and precedence constraints.
Step1: Task priority assignment that guaranteesprecedence constraints.
Step 2: Determination of task speed thatguarantees deadlines.reduces power consumption.
Step 3: Dynamic power managementIdle Intervals.Run-time variations in task execution time.
3/9/20043/9/2004 1414
Task ModelingTask ModelingPeriodic task graphs
Scheduled according to their priorities
Sporadic taskInvoked at any timeHard deadlineExecution slot is neededLet ‘µ’ be the worst-case execution time and ‘d’ be the deadline
Execution Slots are defined with Period : d -µDeadline: d -µ
3/9/20043/9/2004 1515
STEP 1 : Priority Assignment
Remove the node with no Predecessor and least slack
time
END
Arrange the task graphs & EX. Slotsin increasing order Of their period
Remove the task graph with smallest period
Assign the node nexthighest priority
If all nodes in the Graph are Assigned priorities
no
List is emptyyes yes
no
3/9/20043/9/2004 1616
STEP 2 : Task Speed Determination
yes
Find the task with largest speed, ‘s’.
Mark the speed for this task and all other high priority tasks as ‘s’
Arrange tasks in decreasingorder of priority
For each task in the list,determine the speed at whichthe task and all high prioritytasks in the list can be run
Remove all these tasks from the list
List is empty
END
no
3/9/20043/9/2004 1717
Task Task SchedulabilitySchedulabilityLet ℑ = {T1,T2,…,TN} be the task set arranged in decreasing order of priorities.
Characteristics of Ti : {Pi, ei, Di}. A task set is feasible if the deadline of all tasks are always met.
Critical Instant Theorem (Critical Instant Theorem (Liu and Layland, 1973)“Scheduling algorithms for multiprogramming”if a task meets its deadline whenever the task is requested simultaneously with all the high priority tasks, then the deadline will always be met for all task phasing.
3/9/20043/9/2004 1818
In other words, the task set ℑ = {T1,T2,…,TN} is schedulable if and only ti ≤ Di ∀ i =1,..n, where
otherwise ti,j ≤ Di,j ∀ i =1,..n, and ‘j’ instances ofti, where ti,j = R(ti,j + (j-1)Pi) – (j-1)Pi , where
R(ti,j) = + j*ei …………… (2)
k
i
k k
i ePt∑
−
=
1
1+ ei ≤ ti if Pi ≤ Di ………….. (1)
k
i
k k
i ePt∑
−
=
1
1
Task Schedulability (Contd … )
3/9/20043/9/2004 1919
STEP 2 : Task Speed Determination
yes
Find the task with largest speed, ‘s’.
Mark the speed for this task and all other high priority tasks as ‘s’
Arrange tasks in decreasingorder of priority
For each task in the list,determine the speed at whichthe task and all high prioritytasks in the list can be run
Remove all these tasks from the list
List is empty
END
no
3/9/20043/9/2004 2020
Step 3: Dynamic Power ManagementStep 3: Dynamic Power Management
During System operation, idle intervals arise During System operation, idle intervals arise when:when:
Actual task execution time is less than the worstActual task execution time is less than the worst--case case execution time. (that is assumed at the time of fixed execution time. (that is assumed at the time of fixed priority scheduling).priority scheduling).
Since these Idle intervals can not be exploited by Since these Idle intervals can not be exploited by offoff--line methods.line methods.An onAn on--line method that adapts the clock speed line method that adapts the clock speed to take advantage of idle intervals is needed.to take advantage of idle intervals is needed.
3/9/20043/9/2004 2121
DPM (DPM (ContdContd ..)..)
Schedule the tasks according to their preSchedule the tasks according to their pre--determined speeds in a preemptive manner.determined speeds in a preemptive manner.
If the current task has finished and the queue of If the current task has finished and the queue of ready tasks is empty, then:ready tasks is empty, then:
Determine the length of idle intervalDetermine the length of idle intervalIf feasible, put the processor in the power If feasible, put the processor in the power down mode.down mode.
3/9/20043/9/2004 2222
Experimental SetupExperimental Setup
Event driven simulatorEvent driven simulator
Intel Strong Arm SAIntel Strong Arm SA--1100 Embedded 1100 Embedded Processor SpecificationsProcessor Specifications
RealReal--world test cases world test cases (CNC controller, INS, avionics,…)(CNC controller, INS, avionics,…)
3/9/20043/9/2004 2323
BenchmarksBenchmarks
0.692---114Avionics [3]
0.72-----6INS [2]
0.488------8CNC [1]
0.7378510Synthetic III
0.61435Synthetic II
0.52213Synthetic I
Utilization# taskswith D > P
# sporadictasks
# Periodictask graphs
Test cases
Characteristics of various test cases
3/9/20043/9/2004 2424
0
20
40
60
80
100
VLPS [5] proposed technique
CNC
INS
Various low power techniques
% Energysavings
Comparison of % Energy savings with variousLow power techniques
3/9/20043/9/2004 2525
0 . 0 0 %
10 . 0 0 %
2 0 . 0 0 %
3 0 . 0 0 %
4 0 . 0 0 %
5 0 . 0 0 %
6 0 . 0 0 %
7 0 . 0 0 %
8 0 . 0 0 %
9 0 . 0 0 %
10 0 . 0 0 %
Synthet i c I Synthet i c I I Synthet i c I I I CNC INS Avi oni cs
various test cases
% E
nerg
y Sa
ving
s
%Energy Savings
% Energy Savings with the proposed technique on various test cases
3/9/20043/9/2004 2626
Energy Efficient Scheduling Techniquesfor Multi-Processor Embedded Systems
3/9/20043/9/2004 2727
OverviewOverview
PreliminariesPreliminariesSystem modelSystem modelSlack distribution heuristicSlack distribution heuristicPeriodical determination of service ratePeriodical determination of service rateExperiments & ResultsExperiments & Results
3/9/20043/9/2004 2828
PreliminariesPreliminariesCommand and control systems that comprise of hard Command and control systems that comprise of hard realreal--time applications in a distributed environment.time applications in a distributed environment.An application comprises of:An application comprises of:
Chain(sChain(s) of tasks) of tasksHard deadlinesHard deadlinesExchange of messages during executionExchange of messages during execution
Admitting task set (Connection establishment) : Key Admitting task set (Connection establishment) : Key IssuesIssues
Traffic descriptor [6]Traffic descriptor [6]WorstWorst--case delay analysiscase delay analysis
Power Reduction approachesPower Reduction approachesslack distributionslack distributionClock speed adaptation during system runClock speed adaptation during system run--timetime
3/9/20043/9/2004 2929
System ModelA task set is described by a vector triplet ( )where
DCP iii ,,
D i
P i
( )niii CCC ,..1≡
( )nii DD },.../{ 1 ∇
≡ ∇∇ ,........,P 1i
≡
A distributed system with 3 nodes & 2 task sets
PE1 PE2
PE3M1M2
3/9/20043/9/2004 3030
Connection EstablishmentConnection EstablishmentTask set admission: Key PhasesTask set admission: Key Phases
Setting up task setSetting up task setReply task setReply task set
Setting up task set : Key IssuesSetting up task set : Key Issueslocal worstlocal worst--case delay < local deadlinecase delay < local deadlineendend--toto--end worstend worst--case delay < endcase delay < end--toto--end deadlineend deadline
Reply task set : Key IssuesReply task set : Key IssuesSlack distributionSlack distributionService rate < 1 (periodic service rate determination)Service rate < 1 (periodic service rate determination)
3/9/20043/9/2004 3131
ObservationsObservationsProcessing of messages at a node can be extended up to their delay bounds.
This slack can be utilized to increase the worst-case delay tolerable at the computational nodes involved in processing the task set.
The actual processing time demanded by the messages of a task set during the run-time varies and is less than the worst-case specification.
A technique to adapt the clock speed periodically is introduced to take advantage of run-time variations
3/9/20043/9/2004 3232
Slack DistributionSlack DistributionThe slack in a task set is the difference between the endThe slack in a task set is the difference between the end--toto--end deadline and the sum of the worstend deadline and the sum of the worst--case delays case delays suffered at each node.suffered at each node.
This slack can be distributed among the nodes serving This slack can be distributed among the nodes serving task set to reduce the system energy consumption.task set to reduce the system energy consumption.
The slack is distributed among the nodes according to The slack is distributed among the nodes according to the service rate of the nodes.the service rate of the nodes.
3/9/20043/9/2004 3333
Service Rate DeterminationService Rate Determination
Key Issues:Key Issues:Monitoring the traffic patternMonitoring the traffic patternFeedback incorporation while determining Feedback incorporation while determining service rate.service rate.Periodical service rate determinationPeriodical service rate determination•• guarantees processing of messages of guarantees processing of messages of
outstanding intervals by their delay boundsoutstanding intervals by their delay bounds•• guarantees processing of messages of upcoming guarantees processing of messages of upcoming
interval by their delay boundsinterval by their delay boundsScheduling policies considered: FCFS & WRRScheduling policies considered: FCFS & WRR
3/9/20043/9/2004 3434
FCFS Scheduling Policy
∑=
=k
1j
jtQ Q t
and the corresponding queue is determined according to
The service rate should be such that it must process the outstandingmessages that arrived during the interval (t-j∆,t-(j-1)∆) by their remaining delay bound. i.e., (dfcfs - j∆).
The new service rate at the beginning of every interval is deterThe new service rate at the beginning of every interval is determined mined according toaccording to
( )∑=
∆Γ+=k
jFCFStj
ttd
SS1
where k = ( ){ } ∆∆−−− /)1(,max nttt s
S jt
QjdS jt
FCFSjt )( . ≥∆−
3/9/20043/9/2004 3535
WRR Scheduling PolicyThe new service rate at the beginning of every interval is determined according to
( )( )dS
ii
it
j
it
Ξ
∆Γ+= ∑=
S k
1
ji,t
and the corresponding queue is determined according to
∑=
=k
j
jit
it QQ
1
,
The service rate and the corresponding processing time demanded by the outstanding messages that arrived during the interval (t-j∆,t-(j-1)∆) are given by
S jit,
( ) Q - . ji,t
, ≥∆jdS iji
t
3/9/20043/9/2004 3636
Experimental SetupExperimental Setup
Event driven simulatorEvent driven simulator
Socket interface for communicationSocket interface for communication
Intel PXA250 Intel PXA250 XScaleXScale Embedded ProcessorEmbedded Processor
RealReal--life test cases (DSP, Multimedia,..)life test cases (DSP, Multimedia,..)
3/9/20043/9/2004 3737
BenchmarksBenchmarks
1131311616DSP [4]DSP [4]
334444MultimediaMultimedia
3330301010Synthetic IIISynthetic III
22202055Synthetic IISynthetic II
22101033Synthetic I Synthetic I
Number Of Number Of ModesModes
Number Of Number Of ConnectionsConnections
Number Of Number Of NodesNodes
Test CasesTest Cases
Characteristics of various test cases
3/9/20043/9/2004 3838
(4,4)(4,4)(3,3)(3,3)(3,2)(3,2)Multimedia(Multimedia(4,4)4,4)
(10,30)(10,30)(9,25)(9,25)(9,20)(9,20)Synthetic Synthetic (10,30)(10,30)
Mode 3Mode 3((nodes,connectionodes,connectionsns))
Mode 2Mode 2((nodes,connectinodes,connectionsons))
Mode 1 Mode 1 (nodes, (nodes, connections)connections)
Test CasesTest Cases
Mode configurations for Multimedia and Synthetic test cases
Benchmarks (Contd …)
3/9/20043/9/2004 3939
Energy Saving versus Slack Energy Saving versus Slack distributationdistributation
0
10
20
30
40
50
Srate Equal Wcet Greedy
Slack Distribution Schemes
Syst
em E
nerg
y Sa
ving
s % (3,10)
(5,20)
(10,30)
MM(4,4)
DSP(16,31)
FCFS
01020304050607080
Srate Equal Wcet GreedySlack Distribution Schemes
Syst
em E
nerg
y Sa
ving
s %
WRR
3/9/20043/9/2004 4040
Energy Saving at different ModesEnergy Saving at different Modes
01020304050
Synthetic 1 Synthetic 0.8 Multimedia 1 Multimedia 0.8
Normalised Peak Power
Syst
em E
nerg
y Sa
ving
s%
Mode 1
Mode 2
Mode 3
3/9/20043/9/2004 4141
Service rate at intervalsService rate at intervals
00.20.40.60.8
11.2
1 2 3 4 5 6 7 8 9 10 11 12
Intervals
Nor
mal
ised
Ser
vice
Rat
e
0.4
0.6
0.8
0.9
1
(10,30) at one node
3/9/20043/9/2004 4242
Service rate Service rate vsvs MIMI
0.35
0.4
0.45
0.5
0.55
0.6
1 2 3 4 5MI (Monitoring Interval)
Nor
mal
ised
Ser
vice
Rat
e
1
0.9
0.8
(3,10) at one node
3/9/20043/9/2004 4343
Overhead due to number of task Overhead due to number of task sets on servicesets on service
100
150
200
250
300
1 5 10 15 20 25 30Number of connections
Ove
rhea
d(us
ecs)
(10,30) at one node
3/9/20043/9/2004 4444
SummarySummaryEnergy Efficient Scheduling technique for Single Processor that:Energy Efficient Scheduling technique for Single Processor that:
handles Sporadic and periodic task graphs with precedence constrhandles Sporadic and periodic task graphs with precedence constraintsaintstakes into account tasks with arbitrary response timestakes into account tasks with arbitrary response timesdetermines minimum speed for each taskdetermines minimum speed for each taskadapts clock speed to take advantage of idle intervals.adapts clock speed to take advantage of idle intervals.
A connection based task execution approach for distributed A connection based task execution approach for distributed embedded systems that:embedded systems that:
effectively distributes the slack available in the connection toeffectively distributes the slack available in the connection to reduce reduce system wide power consumption.system wide power consumption.periodically adjusts the clock speed to take advantage of runperiodically adjusts the clock speed to take advantage of run--time time variations.variations.
Experimental results indicate that the proposed techniques yieldExperimental results indicate that the proposed techniques yieldsignificant energy savings.significant energy savings.
3/9/20043/9/2004 4545
ReferencesReferences1. N. Kim, M. Ryu, S. Hong, M. Saksena, C. Choi, and H. Shin, “Visual
assessment of a real time system design: A case study on a CNC controller,” in Proc. IEEE Real-Time Systems Symposium, December. 1996.
2. A. Burns, K. Tindell, and A. Wellings, “Effective analysis for engineering real-time fixed priority schedulers,” IEEE Trans. on Software Eng., vol. 21, no. 5, pp. 475–480, May 1995.
3. C. Locke, D. Vogel, and T. Mesler, “Building a predictable avionics platform in Ada: A casestudy,” in Proc. IEEE Real-Time Systems Symposium, December. 1991.
4. C. M. Woodside and G. G. Monforton, “Fast allocation of processes in distributed and parallel systems,” Proc. IEEE Trans. Parallel & Distr. Systems., vol. 4, no. 2, pp. 164-174, Feb. 1993.
3/9/20043/9/2004 4646
References (References (ContdContd ..)..)5. 5. G.QuanG.Quan, and , and X.HuX.Hu, “Energy efficient fixed priority scheduling for real, “Energy efficient fixed priority scheduling for real--time time
systems on variable voltage processors,” In Proc. Design Automatsystems on variable voltage processors,” In Proc. Design Automation ion Conference, June 2001.Conference, June 2001.
6. 6. A.RahaA.Raha, , N.MalcomN.Malcom, and , and W.ZhaoW.Zhao, “Guaranteeing end, “Guaranteeing end--toto--end deadlines in end deadlines in ATM networks,” In Proc. International conference on Distributed ATM networks,” In Proc. International conference on Distributed Computing Systems, May 1995.Computing Systems, May 1995.
3/9/20043/9/2004 4747
THANK YOU