15
A Testbed for Study of Thermal and Energy Dynamics in Server Clusters Shen Li, Fan Yang, Tarek Abdelzaher University of Illinois at Urbana Champaign

A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

  • Upload
    gada

  • View
    35

  • Download
    0

Embed Size (px)

DESCRIPTION

A Testbed for Study of Thermal and Energy Dynamics in Server Clusters. Shen Li, Fan Yang, Tarek Abdelzaher University of Illinois at Urbana Champaign. The Goal. Create an open shared facility for experimentation as a vehicle to promote energy management research and collaboration Focus: - PowerPoint PPT Presentation

Citation preview

Page 1: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

A Testbed for Study of Thermal and Energy

Dynamics in Server Clusters

Shen Li, Fan Yang, Tarek Abdelzaher

University of Illinois at Urbana Champaign

Page 2: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

The Goal

• Create an open shared facility for experimentation as a vehicle to promote energy management research and collaboration

• Focus: – Investigate energy consumption on back-end

servers– Investigate server resource allocation policies that

minimize total energy consumption while meeting client performance demands

Page 3: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

Testbed Configuration

1. Cluster Scale: We have 67 machines altogether, and 40 of them (most

powerful machines) are made public currently.

2. OS: CentOS5-64bit (will move to Scientific Linux 6 soon).

3. CPU: Intel Xeon x3430, 4 core, 64bit (frequency range: 1.2GHZ ~ 2.4GHZ

with 10 levels).

4. MEM: 4GB

5. Disk: 50GB home directory shared by all users using NFS among all servers.

200G local disk on each machine using etx3.

6. PDU: Avocent PM 3000.

7. CRAC: Liebert Challenger 3000.

8. Network: 40 machines are connected as a complete graph with one switch.

9. Thermal sensor: New PC Notebook USB Thermometer

Page 4: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

Current CapabilitiesKnob Sensor

CPU Frequency Utilization, Frequency, Temperature

MEM Utilization,

NIC Received / Sent packets/bytes

PDU Power consumption of each individual machine

CRAC set point* Input and outlet temperature,Set point

* The Computer Room Air Conditioner (CRAC) is of great importance to us. Therefore, the API to manipulate CRAC set point is not public.

Page 5: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

Available SoftwareSoftware Binary File Location Software Home

MySQL /home/tarek/dcapi/software

/scratch/shenli3/software on tarekc01~tarekc07

Tomcat /home/tarek/dcapi/software

/scratch/shenli3/software on tarekc19~tarekc28

MemCache /home/tarek/dcapi/software

/scratch/shenli3/software on tarekc08~tarekc18

Apache-httpd /home/tarek/dcapi/software

•If you would like to minimize the interference from other uses, you can copy the binary file and install the software yourself somewhere else under /scratch. The files under /scratch will persist.•To install new software, you can use “wget” on tareka machines to download binary or source file for installation. Since you do not have sudo authority, you cannot utilize “sudo yum install”.

Page 6: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

If you have any suggestions about adding new knobs and sensors, please

let us know!

Page 7: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

User GuideStep 1: have ccnx (http://www.ccnx.org/) installed and configured according to their tutorial, and make sure that your ccnd can talk to our cluster hub ndn.cs.illinois.edu

Step 2: Download dcapi from here https://bitbucket.org/shenli/dcapi.

Step 3: Setup dcapi according to the readme file.

Page 8: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

User GuideStep 4: Use the client GUI to check existing reservations, and find one available slot for your reservation.

We have just released the first prototype of DCAPI. We don’t have any remote users yet. Above data is randomly generated for testing.

Page 9: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

User GuideStep 5: After launching the GUI, click add button and fill in the simple form to add your reservation. You can pick any user ID that matches regular expression ([a-z]|[0-9]){6,16}. It will be your CentOS user ID during your reservation.

Err code Reason

3 Invalid reservation time slot

4 Invalid reservation start time

5 Invalid username, the regular expression for user name is ([a-z]|[0-9]){6,16}

7 Invalid email address

8 Conflicts against existing reservation

9 Conflicts against existing username

Page 10: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

User GuideStep 6: Check your Email to get your login password.

Page 11: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

User GuideStep 2: Use ssh to login to the gateway node tareka01.cs.uiuc.edu, with your username and password. Please note that, the username and password will only be active during your reservation period. For security reasons, remote users do not have sudo access. If there is any operation you need that cannot be accomplished without sudo access, please let us know. We will add one wrapper into dcapi for that if possible.

Page 12: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

User GuideStep 2: Use dcapi as other ordinary Linux command lines. (e.g., type “dcapi get_cpu_temp”)

Command Functionality

get_cpu_temp get CPU temperature in Celsius

get_cpu_util get CPU utilization during last second

get_cpu_freq get CPU frequency

set_cpu_freq newFreq set CPU frequency to newfreq

get_mem_util get memory utilization

get_mem_CS get context switch during last second

get_mem_IN get interruption during last second

get_self_power get the power consumption of the current node

get_all_power get power consumption all each node in the cluster

get_crac_in_temp get CRAC intake temperature

get_crac_out_temp get CRAC outlet temperature

Page 13: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

API Design

To prevent excessive access to our sensors, dcapi uses master/worker daemons to wrap up real sensor APIs. The data is pulled every fixed time interval and cached at the daemon.

Global resources (PDU, and CRAC) are cached at master daemon. Local master daemon uses RPC to read data from master daemon.

Page 14: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

Reservation System Design

Page 15: A Testbed for Study of Thermal and Energy Dynamics in Server Clusters

For More Information

• Please visit:– http://green-datacenters.web.cs.illinois.edu

• More questions?– Please contact:

Tarek Abdelzaher

[email protected]