Bug #12933

[crunch2] add equivalent of cloud_node line

Added by Ward Vandewege over 2 years ago. Updated over 2 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Crunch
Target version:
Start date:
01/11/2018
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-

Description

In crunchv1, a line is logged in every job that looks like this:

{"cloud_node":{"size":"Standard_D3_v2","price":0.229},"total_cpu_cores":4,"total_scratch_mb":204695,"total_ram_mb":14023}

We need the equivalent in crunchv2. The node-info output is useful, but the string above has several other things going for it:

a) it has cloud node information (node type, price, actual ram)
b) it is machine parsable

The format doesn't need to be exactly like the above ("size" seems a misnomer for "instance_type", e.g.), but it needs to have all relevant cloud node information and it should be machine parsable.


Subtasks

Task #12946: Review 12933-log-node-propertiesResolvedTom Clegg


Related issues

Related to Arvados - Feature #12746: [crunch2] Add I/O (and other?) stats to crunch-runResolved01/26/2018

Related to Arvados - Bug #12465: [crunchv2] Improve crunch-run environment reportingNew

Associated revisions

Revision 453f922b
Added by Tom Clegg over 2 years ago

Merge branch '12933-log-node-properties'

refs #12933

Arvados-DCO-1.1-Signed-off-by: Tom Clegg <>

History

#1 Updated by Ward Vandewege over 2 years ago

  • Related to Feature #12746: [crunch2] Add I/O (and other?) stats to crunch-run added

#2 Updated by Tom Clegg over 2 years ago

This is how we do it in crunch1, in source:sdk/cli/bin/crunch-job (where @node is a list of node names obtained from sinfo [...] --nodes=$SLURM_NODELIST)

my $resp = api_call(
  'nodes/list',
  'filters' => [['hostname', 'in', \@node]],
  'order' => 'hostname',
  'limit' => scalar(@node),
    );
for my $n (@{$resp->{items}}) {
  Log(undef, "$n->{hostname} $n->{uuid} ".JSON::encode_json($n->{properties}));
}

#3 Updated by Tom Clegg over 2 years ago

In crunch2 we can add this to the node-info logger: If $SLURMD_NODENAME is not empty, call /arvados/v1/nodes?filters=[[hostname,=,$nodename]] and print the uuid and properties hash of the returned item (if any).

#4 Updated by Ward Vandewege over 2 years ago

Since a container is like a unix process, i.e. it runs exactly once, it would sure be nice if this information was captured in the container object. Then we don't even need to add it to the logs.

#5 Updated by Tom Clegg over 2 years ago

  • Status changed from New to In Progress
  • Assigned To set to Tom Clegg

12933-log-node-properties @ e128fc5885c553c9e9b55f2529d0ea6937e5a6b7

#6 Updated by Peter Amstutz over 2 years ago

  • Related to Bug #12465: [crunchv2] Improve crunch-run environment reporting added

#7 Updated by Tom Clegg over 2 years ago

12933-log-node-properties @ 5469772c43759b8bde77c3d78450658e266b9cf0

This version saves a node.json file in the log (analogous to container.json), with the admin-only "info" field removed. This should be easy to json.Unmarshal into the new Node type in the Go SDK.

#8 Updated by Tom Clegg over 2 years ago

  • Target version changed from To Be Groomed to 2018-01-17 Sprint

#9 Updated by Lucas Di Pentima over 2 years ago

  • File services/crunch-run/crunchrun.go
    • Line 749: The comment seems to need an update
    • Lines 741 & 750: I think the params argument should be passed to the CallRaw call, right?

#10 Updated by Tom Clegg over 2 years ago

Fixed both issues.

12933-log-node-properties @ 813f5f4aad5da71c4fcfe6639c9010e1056acf1f

#11 Updated by Lucas Di Pentima over 2 years ago

LGTM, thanks!

#12 Updated by Ward Vandewege over 2 years ago

This works but only when the workflow is run without --local.

I ran a test workflow with

cwl-runner --local download.cwl --bashScript download.sh --urlFile 2.txt

which resulted in

$    cwl-runner --local download.cwl --bashScript download.sh --urlFile 2.txt 
2018-01-13 16:18:58 cwltool INFO: /usr/bin/cwl-runner 1.0.20171211211613 1.0.20171211211613, arvados-python-client 0.1.20171211211613, cwltool 1.0.20170928192020
2018-01-13 16:18:58 cwltool INFO: Resolved 'download.cwl' to 'file:///data-nvme1n1/home/wvandewege/downloader/download.cwl'
2018-01-13 16:18:59 arvados.arv-run INFO: Upload local files: "download.sh" "2.txt" 
2018-01-13 16:18:59 arvados.arv-run INFO: Uploaded to 5e1ebe288e1daccf5744d1849610d292+71 (dhhck-4zz18-0aogl30ddkrk3yk)
2018-01-13 16:18:59 cwltool INFO: [workflow download.cwl] start
2018-01-13 16:18:59 cwltool INFO: [step readUrlList] start
2018-01-13 16:18:59 cwltool INFO: [step readUrlList] completed success
2018-01-13 16:18:59 cwltool INFO: [step downloadUrl] start
2018-01-13 16:18:59 arvados.cwl-runner INFO: [container downloadUrl] dhhck-xvhdp-9299d8y8q16fbfu state is Committed
2018-01-13 16:18:59 cwltool INFO: [step downloadUrl] start
2018-01-13 16:19:00 arvados.cwl-runner INFO: [container downloadUrl_2] dhhck-xvhdp-a2zmgofa2d818dh state is Committed
2018-01-13 16:19:29 arvados.cwl-runner INFO: [container downloadUrl] dhhck-xvhdp-9299d8y8q16fbfu is Final
2018-01-13 16:19:44 arvados.cwl-runner INFO: [container downloadUrl_2] dhhck-xvhdp-a2zmgofa2d818dh is Final
2018-01-13 16:19:44 cwltool INFO: [step downloadUrl] completed success
2018-01-13 16:19:44 cwltool INFO: [workflow download.cwl] completed success
2018-01-13 16:19:44 arvados.cwl-runner INFO: Overall process status is success
2018-01-13 16:19:44 arvados.cwl-runner INFO: Final output collection 660188f814755c40f7a719f2b94d6f19+59 "Output of download.cwl (2018-01-13T16:19:44.827Z)" (dhhck-4zz18-j72zsymb8qxte9u)
{
    "out1": null
}
2018-01-13 16:19:44 cwltool INFO: Final process status is success

The log collections for the containers do not have the node.json file.

When I ran it without --local, like so:

cwl-runner download.cwl --bashScript download.sh --urlFile 2.txt
2018-01-13 16:26:33 cwltool INFO: /usr/bin/cwl-runner 1.0.20171211211613 1.0.20171211211613, arvados-python-client 0.1.20171211211613, cwltool 1.0.20170928192020
2018-01-13 16:26:33 cwltool INFO: Resolved 'download.cwl' to 'file:///data-nvme1n1/home/wvandewege/downloader/download.cwl'
2018-01-13 16:26:34 arvados.arv-run INFO: Upload local files: "download.sh" "2.txt" 
2018-01-13 16:26:34 arvados.arv-run INFO: Uploaded to 5e1ebe288e1daccf5744d1849610d292+71 (dhhck-4zz18-hhc7pht771wfxc0)
2018-01-13 16:26:34 arvados.cwl-runner INFO: [container download.cwl] submitted container dhhck-xvhdp-9gzxm1rcb9s4kne
2018-01-13 16:27:49 arvados.cwl-runner INFO: [container download.cwl] dhhck-xvhdp-9gzxm1rcb9s4kne is Final
2018-01-13 16:27:49 arvados.cwl-runner INFO: Overall process status is success
2018-01-13 16:27:49 arvados.cwl-runner INFO: Final output collection 660188f814755c40f7a719f2b94d6f19+59
{
    "out1": null
}
2018-01-13 16:27:49 cwltool INFO: Final process status is success

The resulting log collections do have the node.json file.

Why is this?

#13 Updated by Tom Clegg over 2 years ago

Probably because dhhck-xvhdp-9299d8y8q16fbfu ran on compute3, which had

2018-01-13T16:19:10.117547746Z crunch-run 0.1.20171212165144.296aa66 started

while dhhck-xvhdp-9gzxm1rcb9s4kne ran on compute1, which had

2018-01-13T16:26:36.644719254Z crunch-run 0.1.20180111190404.453f922 started

#14 Updated by Tom Clegg over 2 years ago

  • Status changed from In Progress to Resolved

Also available in: Atom PDF