Troubleshooting AWS SSM Agent Connection Failures Triggering EC2 Monitoring Alerts
Quick Fix Summary
TL;DRRestart the SSM Agent service on the affected EC2 instance.
The SSM Agent is not communicating with the AWS SSM service, causing health checks to fail and triggering CloudWatch alarms.
Diagnosis & Causes
Recovery Steps
Step 1: Verify SSM Agent Status and Connectivity
Check if the SSM Agent process is running and can reach the SSM service endpoints.
# Check SSM Agent service status
sudo systemctl status amazon-ssm-agent
# Check for agent process
ps aux | grep -i amazon-ssm-agent
# Test connectivity to SSM endpoints (replace region)
nc -zv ssm.us-east-1.amazonaws.com 443 Step 2: Restart and Re-register the SSM Agent
Restart the agent service. If the issue persists, force a re-registration with the SSM service.
# Restart the SSM Agent service
sudo systemctl restart amazon-ssm-agent
# Force agent re-registration (if restart fails)
sudo /opt/aws/amazon-ssm-agent/bin/amazon-ssm-agent -register -y -region "us-east-1" -i "i-1234567890abcdef0" Step 3: Validate IAM Instance Profile Permissions
Ensure the EC2 instance's IAM role has the necessary SSM managed policy attached.
# Describe the IAM instance profile attached to the instance (from AWS CLI)
aws ec2 describe-instances --instance-ids i-1234567890abcdef0 --query 'Reservations[0].Instances[0].IamInstanceProfile'
# Check attached policies for the IAM role (replace RoleName)
aws iam list-attached-role-policies --role-name MyEC2SSMRole Step 4: Check Network and Security Configuration
Verify that the instance's security group allows outbound HTTPS (443) traffic and that VPC endpoints (if used) are correctly configured.
# Describe security groups for the instance
aws ec2 describe-instances --instance-ids i-1234567890abcdef0 --query 'Reservations[0].Instances[0].SecurityGroups'
# Check VPC Endpoint status (if using Interface endpoints)
aws ec2 describe-vpc-endpoints --filters "Name=vpc-endpoint-type,Values=Interface" "Name=service-name,Values=com.amazonaws.us-east-1.ssm" Step 5: Reinstall the SSM Agent
As a last resort, reinstall the latest version of the SSM Agent.
# For Amazon Linux 2 / RHEL / CentOS
sudo yum remove -y amazon-ssm-agent
sudo yum install -y https://s3.amazonaws.com/ec2-downloads-windows/SSMAgent/latest/linux_amd64/amazon-ssm-agent.rpm
sudo systemctl start amazon-ssm-agent Architect's Pro Tip
"This often happens after an instance is stopped/started or its IAM role is modified. The agent's internal registration can become stale. A restart (Step 2) usually clears the state."
Frequently Asked Questions
How can I prevent this alert in the future?
Implement a CloudWatch alarm based on the SSM Agent heartbeat metric (`AWS/SSM/AgentHeartbeat`) instead of generic instance status checks. Also, ensure your EC2 launch templates/AMIs have the latest SSM Agent pre-installed and use an IAM role with the `AmazonSSMManagedInstanceCore` policy.
The instance is in a private subnet. What should I check?
Verify that VPC endpoints for SSM (`com.amazonaws.region.ssm`), EC2 Messages (`ec2messages`), and SSM Messages (`ssmmessages`) are created in the VPC and that the route tables for the private subnet direct traffic to these endpoints. The security group attached to the endpoints must allow inbound TCP 443 from the instance's security group.