Bug #14844
[dispatch-cloud] Azure driver bugs discovered in trial run
Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
02/28/2019
Due date:
% Done:
100%
Estimated time:
(Total: 0.00 h)
Story points:
1.0
Release:
Release relationship:
Auto
Description
- If creating a VM fails, an attempt should be made to delete the VM's dependent resources (nic/blob) before returning the error to Create()'s caller. As it stands, an unbounded number of new unused nics and blobs pile up during times when VMs can't be created and the dispatcher keeps retrying.
- nil pointer panic in (*AzureInstance)Address() -- perhaps a newly created instance that has no IP address assigned yet (see note)
Subtasks
Related issues
Associated revisions
History
#1
Updated by Tom Clegg about 2 years ago
- Related to Story #13908: [Epic] Replace SLURM for cloud job scheduling/dispatching added
#2
Updated by Tom Clegg about 2 years ago
panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x83aab5] goroutine 102 [running]: git.curoverse.com/arvados.git/lib/cloud.(*AzureInstance).Address(0xc420478500, 0x7f16da9a9628, 0xc420478500) /GOPATH/src/git.curoverse.com/arvados.git/lib/cloud/azure.go:633 +0x15 git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor.(*Executor).setupSSHClient(0xc420368ea0, 0xc42061a6e7, 0xc420368e01, 0xc4204b88a0) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor/executor.go:178 +0x61 git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor.(*Executor).sshClient(0xc420368ea0, 0x1, 0x0, 0x0, 0x0) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor/executor.go:153 +0x10f git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor.(*Executor).newSession.func1(0x8f7c01, 0x0, 0x9ebaa0, 0xc4204b88b0) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor/executor.go:128 +0x37 git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor.(*Executor).newSession(0xc420368ea0, 0x0, 0x8e5c40, 0xc420253710) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor/executor.go:136 +0xa0 git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor.(*Executor).Execute(0xc420368ea0, 0x0, 0xc4201df740, 0x19, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/ssh_executor/executor.go:92 +0x73 git.curoverse.com/arvados.git/lib/dispatchcloud/worker.(*worker).probeBooted(0xc4203b0b00, 0x989064, 0xa, 0x97c340, 0xc4204ed6e0) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/worker/worker.go:349 +0x91 git.curoverse.com/arvados.git/lib/dispatchcloud/worker.(*worker).probeAndUpdate(0xc4203b0b00) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/worker/worker.go:192 +0x1394 git.curoverse.com/arvados.git/lib/dispatchcloud/worker.(*worker).ProbeAndUpdate(0xc4203b0b00) /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/worker/worker.go:141 +0x57 created by git.curoverse.com/arvados.git/lib/dispatchcloud/worker.(*Pool).runProbes /GOPATH/src/git.curoverse.com/arvados.git/lib/dispatchcloud/worker/pool.go:636 +0x378
Evidently either IPConfigurations or PrivateIPAddress can be nil here:
func (ai *AzureInstance) Address() string {
return *(*ai.nic.IPConfigurations)[0].PrivateIPAddress
}
#3
Updated by Tom Morris about 2 years ago
- Target version changed from To Be Groomed to Arvados Future Sprints
- Story points set to 1.0
#4
Updated by Tom Clegg about 2 years ago
- Related to Story #14807: [arvados-dispatch-cloud] Features/fixes needed before first production deploy added
#5
Updated by Tom Morris almost 2 years ago
- Target version changed from Arvados Future Sprints to 2019-03-13 Sprint
#6
Updated by Peter Amstutz almost 2 years ago
- Assigned To set to Peter Amstutz
#7
Updated by Peter Amstutz almost 2 years ago
14844-cdc-azure-fixes @ 8c4fb97b1d34b5f8fc50d239698a08c35a63dac3
- If PrivateIPAddress somehow isn't defined, return empty string (don't panic)
- If VM create fails, attempt to immediately clean the VHD and NIC corresponding to that VM (if it doesn't work, cleanup processes should still get around to it.)
#8
Updated by Lucas Di Pentima almost 2 years ago
This LGTM, thanks.
#9
Updated by Peter Amstutz almost 2 years ago
- Status changed from New to Resolved
- % Done changed from 0 to 100
Applied in changeset arvados|a310d114bdc06b20cd007e6aff14b409e1c11e32.
#10
Updated by Tom Morris almost 2 years ago
- Release set to 15
Merge branch '14844-cdc-azure-fixes' closes #14844
Arvados-DCO-1.1-Signed-off-by: Peter Amstutz <pamstutz@veritasgenetics.com>