Solved: Azure VM Extension Provisioning Failed (Custom Script & Dependency Agent)
Quick Fix Summary
TL;DRCheck VM network connectivity, verify extension settings, and redeploy with forced update.
Azure VM Extension Provisioning Failed occurs when the Azure platform cannot successfully install or configure an extension on a virtual machine. This prevents critical monitoring, security, or automation tasks from functioning.
Diagnosis & Causes
Recovery Steps
Step 1: Diagnose VM Agent and Network Connectivity
First, verify the VM Agent is running and can reach required Azure endpoints. Use the Serial Console or connect via SSH/RDP.
# Check VM Agent status on Linux
sudo systemctl status waagent
# Check VM Agent status on Windows
Get-Service WindowsAzureGuestAgent Step 2: Inspect Extension Logs for Root Cause
Examine the detailed extension execution logs. The path varies by OS and extension type.
# Linux Custom Script logs
sudo cat /var/lib/waagent/custom-script/download/*/stdout
sudo cat /var/lib/waagent/custom-script/download/*/stderr
# Windows Dependency Agent logs
Get-Content "C:\WindowsAzure\Logs\Plugins\Microsoft.Azure.Monitoring.DependencyAgent\*.log" -Tail 50 Step 3: Force Reinstall the Failed Extension
Remove the extension in a failed state and redeploy it. Use the `--force-update` flag to bypass version checks.
# Remove the failed extension
az vm extension delete --resource-group MyRG --vm-name MyVM --name CustomScriptExtension
# Reinstall with force update
az vm extension set --resource-group MyRG --vm-name MyVM --name CustomScript --publisher Microsoft.Azure.Extensions --version 2.1 --settings '{"commandToExecute":"echo test"}' --force-update Step 4: Resolve Dependency Agent Conflicts
The Dependency Agent often fails if the Log Analytics (OMS) Agent is missing or misconfigured. Install it first.
# Install Log Analytics Agent prerequisite for Dependency Agent
az vm extension set --resource-group MyRG --vm-name MyVM --name OmsAgentForLinux --publisher Microsoft.EnterpriseCloud.Monitoring --version 1.13 --settings '{"workspaceId":"your-id"}' --protected-settings '{"workspaceKey":"your-key"}' Step 5: Deploy with ARM Template (Idempotent Fix)
For production, use an ARM template to ensure a consistent, declarative state. This template snippet shows a dependency chain.
{"type": "Microsoft.Compute/virtualMachines/extensions","name": "[concat(parameters('vmName'), '/OmsAgentForLinux')]","apiVersion": "2023-03-01","properties": {"publisher": "Microsoft.EnterpriseCloud.Monitoring","type": "OmsAgentForLinux","autoUpgradeMinorVersion": true}},"dependsOn": ["[resourceId('Microsoft.Compute/virtualMachines/', parameters('vmName'))]"]]}
{"type": "Microsoft.Compute/virtualMachines/extensions","name": "[concat(parameters('vmName'), '/DependencyAgentLinux')]","apiVersion": "2023-03-01","properties": {"publisher": "Microsoft.Azure.Monitoring.DependencyAgent","type": "DependencyAgentLinux"},"dependsOn": ["[concat(parameters('vmName'), '/OmsAgentForLinux')]"]} Architect's Pro Tip
"For Custom Script failures, always URL-encode your command and wrap it in a base64 script. This avoids parsing errors in the ARM/CLI JSON handler."
Frequently Asked Questions
How long should I wait before considering an extension failed?
The default timeout is 90 minutes. If status remains 'Creating' or 'Transitioning' beyond 20 minutes, begin troubleshooting immediately.
Can I run multiple Custom Script Extensions on one VM?
Yes, but each must have a unique 'name' property. They run in parallel by default; use 'dependsOn' in ARM templates to sequence them.