A Testbed for Study of Thermal and Energy
Dynamics in Server Clusters
Shen Li, Fan Yang, Tarek Abdelzaher
University of Illinois at Urbana Champaign
The Goal
• Create an open shared facility for experimentation as a vehicle to promote energy management research and collaboration
• Focus: – Investigate energy consumption on back-end
servers– Investigate server resource allocation policies that
minimize total energy consumption while meeting client performance demands
Testbed Configuration
1. Cluster Scale: We have 67 machines altogether, and 40 of them (most
powerful machines) are made public currently.
2. OS: CentOS5-64bit (will move to Scientific Linux 6 soon).
3. CPU: Intel Xeon x3430, 4 core, 64bit (frequency range: 1.2GHZ ~ 2.4GHZ
with 10 levels).
4. MEM: 4GB
5. Disk: 50GB home directory shared by all users using NFS among all servers.
200G local disk on each machine using etx3.
6. PDU: Avocent PM 3000.
7. CRAC: Liebert Challenger 3000.
8. Network: 40 machines are connected as a complete graph with one switch.
9. Thermal sensor: New PC Notebook USB Thermometer
Current CapabilitiesKnob Sensor
CPU Frequency Utilization, Frequency, Temperature
MEM Utilization,
NIC Received / Sent packets/bytes
PDU Power consumption of each individual machine
CRAC set point* Input and outlet temperature,Set point
* The Computer Room Air Conditioner (CRAC) is of great importance to us. Therefore, the API to manipulate CRAC set point is not public.
Available SoftwareSoftware Binary File Location Software Home
MySQL /home/tarek/dcapi/software
/scratch/shenli3/software on tarekc01~tarekc07
Tomcat /home/tarek/dcapi/software
/scratch/shenli3/software on tarekc19~tarekc28
MemCache /home/tarek/dcapi/software
/scratch/shenli3/software on tarekc08~tarekc18
Apache-httpd /home/tarek/dcapi/software
•If you would like to minimize the interference from other uses, you can copy the binary file and install the software yourself somewhere else under /scratch. The files under /scratch will persist.•To install new software, you can use “wget” on tareka machines to download binary or source file for installation. Since you do not have sudo authority, you cannot utilize “sudo yum install”.
If you have any suggestions about adding new knobs and sensors, please
let us know!
User GuideStep 1: have ccnx (http://www.ccnx.org/) installed and configured according to their tutorial, and make sure that your ccnd can talk to our cluster hub ndn.cs.illinois.edu
Step 2: Download dcapi from here https://bitbucket.org/shenli/dcapi.
Step 3: Setup dcapi according to the readme file.
User GuideStep 4: Use the client GUI to check existing reservations, and find one available slot for your reservation.
We have just released the first prototype of DCAPI. We don’t have any remote users yet. Above data is randomly generated for testing.
User GuideStep 5: After launching the GUI, click add button and fill in the simple form to add your reservation. You can pick any user ID that matches regular expression ([a-z]|[0-9]){6,16}. It will be your CentOS user ID during your reservation.
Err code Reason
3 Invalid reservation time slot
4 Invalid reservation start time
5 Invalid username, the regular expression for user name is ([a-z]|[0-9]){6,16}
7 Invalid email address
8 Conflicts against existing reservation
9 Conflicts against existing username
User GuideStep 6: Check your Email to get your login password.
User GuideStep 2: Use ssh to login to the gateway node tareka01.cs.uiuc.edu, with your username and password. Please note that, the username and password will only be active during your reservation period. For security reasons, remote users do not have sudo access. If there is any operation you need that cannot be accomplished without sudo access, please let us know. We will add one wrapper into dcapi for that if possible.
User GuideStep 2: Use dcapi as other ordinary Linux command lines. (e.g., type “dcapi get_cpu_temp”)
Command Functionality
get_cpu_temp get CPU temperature in Celsius
get_cpu_util get CPU utilization during last second
get_cpu_freq get CPU frequency
set_cpu_freq newFreq set CPU frequency to newfreq
get_mem_util get memory utilization
get_mem_CS get context switch during last second
get_mem_IN get interruption during last second
get_self_power get the power consumption of the current node
get_all_power get power consumption all each node in the cluster
get_crac_in_temp get CRAC intake temperature
get_crac_out_temp get CRAC outlet temperature
API Design
To prevent excessive access to our sensors, dcapi uses master/worker daemons to wrap up real sensor APIs. The data is pulled every fixed time interval and cached at the daemon.
Global resources (PDU, and CRAC) are cached at master daemon. Local master daemon uses RPC to read data from master daemon.
Reservation System Design
For More Information
• Please visit:– http://green-datacenters.web.cs.illinois.edu
• More questions?– Please contact:
Tarek Abdelzaher