Feature #16950

[cli] add costanalyzer to arvados-client

Added by Ward Vandewege about 1 year ago. Updated 5 months ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
-
Target version:
Start date:
11/03/2020
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
-
Release relationship:
Auto

Description

The workflow costanalyzer script has lived out of tree for a long time. Add it as an arvados-client command.


Subtasks

Task #16954: Review 16950-add-costanalyzerResolvedWard Vandewege


Related issues

Related to Arvados - Feature #17148: add logging middleware to lib/cmdNew

Associated revisions

Revision 6bf9e1a4
Added by Ward Vandewege 11 months ago

Merge branch 'master' into 16950-add-costanalyzer

refs #16950

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 708f0095
Added by Ward Vandewege 11 months ago

Merge branch '16950-add-costanalyzer' into master

closes #16950

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

Revision 791c2132 (diff)
Added by Ward Vandewege 7 months ago

Costanalyzer has been moved to the arvados codebase, accessible via
arvados-client.

refs #16950

Arvados-DCO-1.1-Signed-off-by: Ward Vandewege <>

History

#1 Updated by Ward Vandewege about 1 year ago

  • Status changed from New to In Progress

#2 Updated by Ward Vandewege about 1 year ago

  • Description updated (diff)

#3 Updated by Peter Amstutz about 1 year ago

  • Target version changed from 2020-10-07 Sprint to 2020-10-21 Sprint

#4 Updated by Ward Vandewege 12 months ago

  • Target version changed from 2020-10-21 Sprint to 2020-11-04 Sprint

#5 Updated by Ward Vandewege 12 months ago

#6 Updated by Peter Amstutz 12 months ago

  • Target version changed from 2020-11-04 Sprint to 2020-11-18

#7 Updated by Tom Clegg 11 months ago

NoPrefixFormatter looks pretty similar to &logrus.TextFormatter{DisableTimestamp: true}, except that it loses the structured-logging fields and just prints the message. That could be a feature, but I'm inclined to minimize the logging customization unless/until we're ready to put it somewhere like cmd/arvados-client to standardize across all the subcommands.

In tests: prefer to avoid the old Go SDK interface with arvadosclient.Dict... e.g., to get a CR, you can use

ac.RequestAndDecodeContext(ctx, &cr, "GET", "arvados/v1/container_requests/"+crUUID, nil, nil)

"Dict is a helper type so we don't have to write out 'map[string]interface{}' every time" ... is an antipattern that did not work out that well for us in the arvadosclient package.

You could do stuff with interfaces and embedded/wrapped types to handle the different kinds of nodeinfo without resorting to interface{} func params, but since you define both types right here, I think the simplest approach would be to make a single type that has both fields in it.

type nodeInfo struct {
  // Legacy
  Properties struct {
    CloudNode struct {
      Price float64
      Size  string
    } `json:"cloud_node"`
  }
  // Modern
  ProviderType string
  Price        float64
}

(You don't really need the json struct tags here for fields like Properties where the only difference is case: "properties" will decode into "Properties"... and you don't really need to define the fields like VCPUs that aren't being used.)

You can skip the bytes.Buffer and decode straight from the file: err = json.NewDecoder(f).Decode(&nodeInfo)

Should un-pyramid this:

if _, ok := itemMap["log_uuid"]; ok {
        if itemMap["log_uuid"] == nil {
                ...
        }
        // ...rest of func
}

...along these lines...

logUUID, ok := itemMap["log_uuid"]
if !ok {
        return
}
// ...rest of func

It seems like you could be using a cr arvados.ContainerRequest instead of an itemMap map[string]interface{} for the container request record, which would clean up some type casts/checks.

time.Parse("2006-01-02T15:04:05.000000000Z", ...) -- isn't this time.RFC3339Nano?

costanalyzer() and generateCrCsv()should return fmt.Errorf(...) instead of os.Exit(1) (except that generateCrCsv()

Could use backticks in flag descriptions -- e.g., description "output `directory` for the CSV reports" will render as "-output directory" instead of "-output string"

#8 Updated by Tom Clegg 11 months ago

One more thing: I found the behavior of creating a bunch of files and dirs in ./results/ a little surprising. The help text mentions that output csv files go there, but doesn't say anything about the cache behavior. I wonder if it would be a bit more friendly if the default output dir was somewhere under $HOME?

#9 Updated by Ward Vandewege 11 months ago

Tom Clegg wrote:

NoPrefixFormatter looks pretty similar to &logrus.TextFormatter{DisableTimestamp: true}, except that it loses the structured-logging fields and just prints the message. That could be a feature, but I'm inclined to minimize the logging customization unless/until we're ready to put it somewhere like cmd/arvados-client to standardize across all the subcommands.

Yeah, the idea is that we're logging to the console, and there's a human there, and structured logging in that context doesn't make a whole lot of sense. It's interesting, &logrus.TextFormatter{DisableTimestamp: true} is close but still starts each line with the (truncated) level, and I don't see a 'DisableLevel' option in logrus:

INFO Collecting child containers for container request ce8i5-xvhdp-9k61uz8l8eap5tr 
INFO  done                                        
INFO                                              
INFO Uuid report in results/ce8i5-dz642-f6zze094riq01ku.csv 
INFO 
Aggregate cost accounting for all supplied uuids in results/2020-11-11-11-37-13-aggregate-costaccounting.csv 

This is the second time we do this, NoPrefixFormatter is also in deduplicationreport. So, how about we standardize it? I made #17148 for that purpose.

In tests: prefer to avoid the old Go SDK interface with arvadosclient.Dict... e.g., to get a CR, you can use

Thanks, fixed. I also replaced a few of those uses in the code.

"Dict is a helper type so we don't have to write out 'map[string]interface{}' every time" ... is an antipattern that did not work out that well for us in the arvadosclient package.

OK, I had been wondering about that. I've just replaced it with map[string]interface{}.

You could do stuff with interfaces and embedded/wrapped types to handle the different kinds of nodeinfo without resorting to interface{} func params, but since you define both types right here, I think the simplest approach would be to make a single type that has both fields in it.

Thanks, done.

(You don't really need the json struct tags here for fields like Properties where the only difference is case: "properties" will decode into "Properties"... and you don't really need to define the fields like VCPUs that aren't being used.)

Simpler, thanks.

You can skip the bytes.Buffer and decode straight from the file: err = json.NewDecoder(f).Decode(&nodeInfo)

That one takes the cake for shortening the code. Thanks!

Should un-pyramid this:

Done.

It seems like you could be using a cr arvados.ContainerRequest instead of an itemMap map[string]interface{} for the container request record, which would clean up some type casts/checks.

Done, that was also a very nice cleanup, thanks!

time.Parse("2006-01-02T15:04:05.000000000Z", ...) -- isn't this time.RFC3339Nano?

Not really, time.RFC3339Nano has a trailing time: "2006-01-02T15:04:05.999999999Z07:00". In any case, it's moot now; the time.Parse calls are gone after the switch to using the struct types from the SDK.

costanalyzer() and generateCrCsv()should return fmt.Errorf(...) instead of os.Exit(1) (except that generateCrCsv()

Done.

Could use backticks in flag descriptions -- e.g., description "output `directory` for the CSV reports" will render as "-output directory" instead of "-output string"

Oh! I didn't know that. Neat, thanks.

One more thing: I found the behavior of creating a bunch of files and dirs in ./results/ a little surprising. The help text mentions that output csv files go there, but doesn't say anything about the cache behavior. I wonder if it would be a bit more friendly if the default output dir was somewhere under $HOME?

I changed it so that the cache goes into ~/.cache/arvados/costanalyzer. I also added a flag to disable the cache (and use it in the tests). For the results directory, I'm not sure what the best default is. It's ./results now, sounds like you would prefer it to be $HOME/something ?

Ready for another look in e1937e57fe2c0e99b6b636049142cc7598f80231

#10 Updated by Ward Vandewege 11 months ago

I changed the output directory behavior, removing the default. It has to be set explicitly now. See 6790e4990fa54d81decdd555beee337144d1fab1 on branch 16950-add-costanalyzer.

#11 Updated by Peter Amstutz 11 months ago

  • Target version changed from 2020-11-18 to 2020-12-02 Sprint

#12 Updated by Ward Vandewege 11 months ago

#13 Updated by Tom Clegg 11 months ago

Probably better to use os.UserHomeDir() instead of user.Current().HomeDir (this reminds me of how annoying it was in arvados-server boot when I wanted to install passenger in a specific dir but it insisted on putting stuff in the current user's home dir, even if I set $HOME to something else).

logger.SetOutput(stdout) seems surprising -- why not stderr?

Rest LGTM, thanks

#14 Updated by Ward Vandewege 11 months ago

Tom Clegg wrote:

Probably better to use os.UserHomeDir() instead of user.Current().HomeDir (this reminds me of how annoying it was in arvados-server boot when I wanted to install passenger in a specific dir but it insisted on putting stuff in the current user's home dir, even if I set $HOME to something else).

logger.SetOutput(stdout) seems surprising -- why not stderr?

Rest LGTM, thanks

OK, I've fixed those 2 things in b39f7a6141ecd5c53531b7705c0496623b4df9e9

#15 Updated by Ward Vandewege 11 months ago

  • % Done changed from 0 to 100
  • Status changed from In Progress to Resolved

#16 Updated by Peter Amstutz 5 months ago

  • Release set to 38

Also available in: Atom PDF