21
Troubleshooting CloudStack Rajesh Battala, Likitha Shetty & Sailaja Mada Wednesday, December 18, 2013

Troubleshooting Apache Cloudstack

Embed Size (px)

DESCRIPTION

Troubleshooting Apache Cloudstack by Sailaja, Rajesh, and Likitha

Citation preview

Page 1: Troubleshooting Apache Cloudstack

Troubleshooting CloudStack

Rajesh Battala, Likitha Shetty & Sailaja Mada

Wednesday, December 18, 2013

Page 2: Troubleshooting Apache Cloudstack

Agenda ACS developer

– ACS Error codes

– Debugging tips in ACS development

– SSVM troubleshooting

– ACS ports

ACS Cloud Admin

– Install, Configuration & Deployment

– Log analysis

– Important Global Config Parameters

– Best Practices

– Cloud Database

– Reusing Hypervisors

References

Q & A

Troubleshooting CloudStack

Page 3: Troubleshooting Apache Cloudstack

ACS Developer

Troubleshooting CloudStack

Page 4: Troubleshooting Apache Cloudstack

ACS error codes

- Client error codes

- public static final int MALFORMED_PARAMETER_ERROR = 430;

- public static final int PARAM_ERROR = 431;

- public static final int UNSUPPORTED_ACTION_ERROR = 432;

- public static final int PAGE_LIMIT_EXCEED = 433;

- Server error codes

- public static final int INTERNAL_ERROR = 530;

- public static final int ACCOUNT_ERROR = 531;

- public static final int ACCOUNT_RESOURCE_LIMIT_ERROR= 532;

- public static final int INSUFFICIENT_CAPACITY_ERROR = 533;

- public static final int RESOURCE_UNAVAILABLE_ERROR = 534;

- public static final int RESOURCE_ALLOCATION_ERROR = 534;

- public static final int RESOURCE_IN_USE_ERROR = 536;

- public static final int NETWORK_RULE_CONFLICT_ERROR = 537

Insert Presentation Title Here

Page 5: Troubleshooting Apache Cloudstack

Debugging tips in CS development

- Generally use eclipse to attach debugger to the management server

- SystemVM agents

- kill the running process

- add -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8787 to /usr/local/cloud/systemvm/_run.sh

- open port 8787

- start the java process - ./run.sh

- Usage

- To check if events are being logged in check usage_events in cloud DB

- To start usage server in dev setup

mvn -pl usage -Drun -Dpid=$$

Insert Presentation Title Here

Page 6: Troubleshooting Apache Cloudstack

SSVM troubleshooting

- Login

- ssh -i /root/.ssh/id_rsa.cloud -p 3922 root@ip where ip is link local on XenServer and private ip in case of VMware

- Script to check the health of SSVM

- /usr/local/cloud/systemvm/ssvm-check.sh

- Check if port 8250 is open

- In global configuration value of ‘host’ is right set to the management server ip

- Check agent status – service cloud status

- Logs can be found at

- /var/log/cloud/cloud.log

- Template status can be found in template_store_ref DB table

Insert Presentation Title Here

Page 7: Troubleshooting Apache Cloudstack

And a couple more …

- DB Encryption To decrypt the database secret key use the following

java -classpath /usr/share/java/cloud-jasypt-1.8.jar org.jasypt.intf.cli.JasyptPBEStringDecryptionCLI decrypt.sh input=<encryptedValue> password=<secretKey> verbose=false

(where secretKey is the value in /etc/cloudstack/management/key file)

- GUI timeout - Default timeout is 15 minutes

- To increase the timeout edit /usr/share/cloud/management/webapps/client/WEB-INF/web.xml to add

<session-config> <session-timeout>60</session-timeout> </session-config>

- Restart the server

Insert Presentation Title Here

Page 8: Troubleshooting Apache Cloudstack

ACS Ports

- Management Server

- 8080: Primary GUI / Authentication API Port

- 8096: User/Client Management Server (unauthenticated)

- 8787: CloudStack (Tomcat) debug socket

- 9090: Cloudstack Management Cluster Interface

- SystemVM Agent

- 3922: SystemVM to Management (secure)

- 8250: SystemVM to Management (unsecure)

- MySQL Server

- 3306: MySQL Server

- Hypervisor

- 22/443: XenServer

- 22: KVM

- 443: vCenter

- 7080: AWS API server

Insert Presentation Title Here

Page 9: Troubleshooting Apache Cloudstack

ACS Administrator

Troubleshooting CloudStack

Page 10: Troubleshooting Apache Cloudstack

ACS Administrator

Install, Configuration & Deployment

Log analysis

Important Global Config Parameters

Best Practices

Reuse of Hypervisors

Cloud database

Troubleshooting CloudStack

Page 11: Troubleshooting Apache Cloudstack

Install ,Configuration & Deployment Issues

Troubleshooting CloudStack

? Failed to login to ACS Management server

4.2 requires Min 2 GB RAM

Redeploy DB and start cloudstack-setup-management

? Issue with Instances in isolated network

VLAN Trunking in Switch port configuration

? Failed to deploy instances

Insufficient resources : Management server log analysis

Page 12: Troubleshooting Apache Cloudstack

Install ,Configuration & Deployment Issues

? Failed to add host

XCP host – Copy Echo plugin

Host License

Compatible host while creating the cluster of hosts

? Host/Storage pool in avoid set

Reachability issues

Timeout

Capacity of the storage pool / Host

Alert state

? Move XS hosts from Alert state

Unmanage the cluster with the affected host.

Clear the host tags of the affected host.

xe host-param-clear param-name=tags uuid=<UUID of affected host>

Manage the cluster with the affected host.

Troubleshooting CloudStack

Page 13: Troubleshooting Apache Cloudstack

Install ,Configuration & Deployment Issues

? Host in Alert State

Monitor Host Root Disk usage

? Host/Storage pool in avoid set

Reachability issues

Timeout

Capacity of the storage pool / Host

Alert state

? Move XS hosts from Alert state

Unmanage the cluster with the affected host.

Clear the host tags of the affected host.

xe host-param-clear param-name=tags uuid=<UUID of affected host>

Manage the cluster with the affected host.

Troubleshooting CloudStack

Page 14: Troubleshooting Apache Cloudstack

Logs

Management Server logs

- /var/log/cloudstack/managementserver.log

- /var/log/cloudstack/api.log

SSVM - /var/log/cloud/cloud.out

KVM cloudstak Agent - /var/log/cloudstack/agent/agent.log

vSphere logs

- /var/log/hostd.log (host log)

- /var/log/vmkernel.log (kernel log)

- /var/log/vpxa.log (agent log)

Xenserver logs

- /var/log/Smlog

-/var/log/xensource.log

/etc/cloudstack/management/log4j-cloud.xml - Set the priority to TRACE

Levels - FATAL, ERROR, WARNING, INFO, DEBUG, TRACE

Troubleshooting CloudStack

Page 15: Troubleshooting Apache Cloudstack

Global Config Parameters

Troubleshooting CloudStack

expunge.delay Determines how long (in seconds) to wait before actually expunging destroyed vm. The default value = the default value of expunge.interval

60

expunge.interval The interval (in seconds) to wait before running the expunge thread.

60

expunge.workers Number of workers performing expunge 1

network.gc.interval Seconds to wait before checking for networks to shutdown 600

network.gc.wait Time (in seconds) to wait before shutting down a network that's not in used

600

pool.storage.allocated.capacity.disablethreshold Percentage (as a value between 0 and 1) of allocated storage utilization above which allocators will disable using the pool for low allocated storage available.

1

secstorage.allowed.internal.sites Comma separated list of cidrs internal to the datacenter that can host template download servers, please note 0.0.0.0 is not a valid site

wait Time in seconds to wait for control commands to return 1800

vmware.vcenter.session.timeout VMware client timeout in seconds 12000

integration.api.port Defaul API port 8096

storage.cleanup.interval The interval (in seconds) to wait before running the storage cleanup thread.

86400

Page 16: Troubleshooting Apache Cloudstack

Best Practises Switch port configurations ( VLANs must be trunked).

Restrict the IP addresses which can access storage to avoid data loss .

Monitor host disk space .

All hosts must be 64-bit and must support HVM (Intel-VT or AMD-V enabled). All Hosts within a Cluster must be homogeneous.

The volumes used for Primary and Secondary storage should be accessible from Management Server and the hypervisors. These volumes should allow root users to read/write data. These volumes must be for the exclusive use of CloudStack and should not contain any data

With Advanced Networking, separate subnets must be used for private and public networks

The Management Servers communicate with the XenServers on ports 22 (ssh) and 80 (HTTP).

The Management Servers communicate with VMware vCenter servers on port 443 (HTTPs).

The Management Servers communicate with the KVM servers on port 22 (ssh).

Troubleshooting CloudStack

Page 17: Troubleshooting Apache Cloudstack

Reusing Hypervisors

Xenserver • xe vm-uninstall --multiple –force

• Unmount Storage

• xe vif-unplug uuid=<uuid>

• xe vif-destroy uuid=<uuid>

• xe network-destroy uuid=<cloud link Local uuid>

• sh /opt/xensource/bin/cloud-clean-vlan.sh

• Disable cloud tags created on host

Vmware • Delete all instances

• Delete Templates

• Unmount Datastores

• Remove all cloud networks

Troubleshooting CloudStack

Page 18: Troubleshooting Apache Cloudstack

Cloud Database

op_dc_vnet_alloc

op_dc_ip_address_alloc

user_ip_address

image_store

vm_template

Template_store_ref

volume

storage_pool

host

vm_instance

nics

network_offering

physical_network_traffic_types

Troubleshooting CloudStack

Page 19: Troubleshooting Apache Cloudstack

Troubleshooting CloudStack

Page 20: Troubleshooting Apache Cloudstack

References o https://cwiki.apache.org/confluence/display/CLOUDSTACK/SSVM%2C+templates%2C+Secondary+storage+t

roubleshooting

o https://cwiki.apache.org/confluence/display/CLOUDSTACK/Ports+used+by+CloudStack

o http://dlafferty.blogspot.in/2013/08/using-cloudstacks-log-files-xenserver.html

Troubleshooting CloudStack

Page 21: Troubleshooting Apache Cloudstack

Get Involved

Web: http://cloudstack.apache.org/

Mailing Lists: cloudstack.apache.org/mailing-lists.html

IRC: irc.freenode.net: 6667 #cloudstack

Twitter: @cloudstack

LinkedIn: www.linkedin.com/groups/CloudStack-Users-Group-3144859

If it didn’t happen on the mailing list, it didn’t happen.

Troubleshooting CloudStack