Bug #8497

[Data Manager] Small batch size makes it slow to process collections

Added by Joshua Randall over 5 years ago. Updated over 5 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Start date:
02/23/2016
Due date:
% Done:

100%

Estimated time:
(Total: 0.00 h)
Story points:
0.5

Description

datamanager queries the API server for collections in batches of 50. This is very slow.

Performance data on our system (running with the fix from 8485 as otherwise we can't fetch all the collections):

$ time ./datamanager -dry-run &> /tmp/datamanager-dry-run-50.log

real    72m30.514s
user    6m30.243s
sys     0m43.171s

Changing one line in datamanager.go from 'BatchSize: 50' to 'BatchSize: 1000' results in:

$ time ./datamanager -dry-run &> /tmp/datamanager-dry-run-1000.log
real    12m57.729s
user    5m16.569s
sys     0m28.488s

I'd suggest raising the BatchSize as much as possible (or making it a configuration parameter).


Subtasks

Task #8520: Review PR #41ResolvedRadhika Chippada

Associated revisions

Revision 6303d577
Added by Radhika Chippada over 5 years ago

closes #8497
Merge branch 'wtsi-hgi-8497-datamanager-batchsize-1000'

History

#1 Updated by Joshua Randall over 5 years ago

  • Status changed from New to Feedback
  • Assigned To set to Joshua Randall
  • % Done changed from 0 to 100

#2 Updated by Joshua Randall over 5 years ago

  • Assigned To deleted (Joshua Randall)

#3 Updated by Brett Smith over 5 years ago

  • Subject changed from datamanager is slow to process collections to [Data Manager] Small batch size makes it slow to process collections

#4 Updated by Brett Smith over 5 years ago

  • Target version set to 2016-03-16 sprint

#5 Updated by Joshua Randall over 5 years ago

  • Assigned To set to Joshua Randall

#6 Updated by Brett Smith over 5 years ago

  • Story points set to 0.5

#7 Updated by Radhika Chippada over 5 years ago

PR #41

Rather than hardcoding the batch size of 1000, please add an argument "collection-batch-size" with default value of 1000.

#8 Updated by Joshua Randall over 5 years ago

Radhika, I've now made the batch size a command line option ( -collection-batch-size) as requested.

#9 Updated by Radhika Chippada over 5 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:6303d577e0513eae1254a9c73648c24b9451ed10.

Also available in: Atom PDF