18
Troubleshooting (EC2, ELB) Version 1.0 Shiva N ([email protected])

AWS EC2 and ELB troubleshooting

Embed Size (px)

Citation preview

Page 1: AWS EC2 and ELB troubleshooting

Troubleshooting (EC2, ELB)

Version 1.0

Shiva N ([email protected])

Page 2: AWS EC2 and ELB troubleshooting

v

Your first stop

Subscribe to the RSS Feed, and possibly integrate into your operations dashboard. If there are no issues, then it is not likely to be an issue with the AWS Service.

http://status.aws.amazon.com/

Page 3: AWS EC2 and ELB troubleshooting

v

Your second stopAWS Cloudwatch

Provides metrics across all AWS services, that are not available to external monitoring systems.

Detailed monitoring should be enabled. Proactively configure notifications for

threshold alarming. Use customs metrics.

Page 4: AWS EC2 and ELB troubleshooting

v

Troubleshooting – EC2• Instance Launch• EC2 Instance Health • EC2 Instance Network connectivity• EBS issues

Page 5: AWS EC2 and ELB troubleshooting

v

Troubleshooting Instance launch• Potential causes

• Account limits issue• IAM user issue• AutoScaling event terminating instance• Bad/Corrupted AMI configuration• Storage attachment issues• AWS Infrastructure issues

Page 6: AWS EC2 and ELB troubleshooting

v

Troubleshooting EC2 Instance Health • Potential causes

• EBS volume snapshot in progress• Cloud-init user-data script failures• Meta data access issues• OS filesystem issues• Kernel issues• Underlying AWS infrastructure issues

Page 7: AWS EC2 and ELB troubleshooting

v

Troubleshooting Network Connectivity• Potential causes

• CPU, Memory or I/O utilization of instance• Number of active connections exceeding capacity / limits / memory.• AWS EC2 Network Security Groups, ACL's, Routing misconfiguration.• Instance OS Firewall blocking connectivity.• SSH keys lost or misconfiguration.• Network carrier issues.

Page 8: AWS EC2 and ELB troubleshooting

v

EBS issues• Potential causes

• Account limits• EBS snapshotting in progress• File system/OS configuration issues• Underlying infrastructure issues

Page 9: AWS EC2 and ELB troubleshooting

v

ELB Troubleshooting – API Response• OutofService: A Transient Error OccurredInternal ELB error. Retry API call, and raise support case on failure

• CertificateNotFound: undefined• ELBs based on load, span across AZs, and on create might take some time to

sync certificates. Retry API call after some time• The certificate you are trying to use is not found, or not well formed.

Page 10: AWS EC2 and ELB troubleshooting

v

ELB Troubleshooting – Error Messages• HTTP 400: BAD_REQUEST - Client sent a bad request.• HTTP 405: METHOD_NOT_ALLOWED - Length of the method in the request header exceeds

127 characters.• HTTP 408: Request Timeout - Indicates that the client cancelled the request or failed to

send a full request.• HTTP 502: Bad Gateway - Indicates that the load balancer was unable to parse the response

sent from a registered instance.• HTTP 503: Service Unavailable or HTTP 504 Gateway Timeout

• Insufficient capacity in the load balancer to handle the request.• Registered instances closing the connection (KeepAlive issues)• No registered instances, or no healthy instances• Connection to the client is closed• The instance's security group does not allow communication with load balancer.

Page 11: AWS EC2 and ELB troubleshooting

v

ELB Troubleshooting – Response Metrics • HTTPCode_ELB_4XX - Indicates a malformed or a cancelled request from the client.• HTTPCode_ELB_5XX - Either the load balancer or the registered instance is causing

the error or the load balancer is unable to parse the response.• HTTPCode_Backend_2XX - Indicates a normal, successful response from the

registered instance(s).• HTTPCode_Backend_3XX - Indicates some type of redirect response sent from the

registered instance(s).• HTTPCode_Backend_4XX - Indicates some type of client error response sent from

the registered instance(s).• HTTPCode_Backend_5XX - Indicates some type of server error response sent from

the registered instance(s).

Page 12: AWS EC2 and ELB troubleshooting

v

ELB Troubleshooting – Other issues• Load balancer health check failure

• Instance(s) closing the connection to the load balancer. (ELB has 60s timeout)• Responses timing out.• Non-200 response received• Failing public key authentication.• Registering instances taking longer than expected to be In Service.

Page 13: AWS EC2 and ELB troubleshooting

v

ELB Troubleshooting – Potential problems• Possible causes

• The client(s) are caching IP of DNS lookups• The back end instances within an AZ can have an imbalance• Sticky sessions• Request processing time - requests that can take a long time to process• Unhealthy hosts• Timeout settings (keep alives)• NACLs and SGs• ELB fails to scale on spiky traffic• SSL/Certificate issues

Page 14: AWS EC2 and ELB troubleshooting

v

ELB troubleshooting• Resources

• ELB Logs – Proactive Monitor• Alarm ELB metrics• Network diagnostics

Page 15: AWS EC2 and ELB troubleshooting

v

Information required in support case• All resource ID's of all resources involved in problem description or

diagnosis steps.• Instance types, AZ locations, AMI, storage configuration, etc. for any

patterns or trends.• Exact times problems began to, or stopped occurring, frequency of

occurrence if repeating.• Instance console or ELB logs/error logs• Troubleshooting steps performed to date, protocols used, etc.

Page 16: AWS EC2 and ELB troubleshooting

v

What next?• Run books / Play books• Automation• Monthly internal reviews and documented mitigation strategies• Continual improvement plan• Collaboration and Communication with AWS TAM and SA• Well Architected review

Page 17: AWS EC2 and ELB troubleshooting

v

Resources and References• Documentation

• Amazon Elastic Cloud – User Guide• Elastic Load Balancing – Developer Guide

• Training• AWS Certified Sysops Associate• AWS Certified Architect Associate

Page 18: AWS EC2 and ELB troubleshooting

v

Discussion…