Project

General

Profile

Actions

Bug #8497

closed

[Data Manager] Small batch size makes it slow to process collections

Added by Joshua Randall about 8 years ago. Updated about 8 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
Story points:
0.5

Description

datamanager queries the API server for collections in batches of 50. This is very slow.

Performance data on our system (running with the fix from 8485 as otherwise we can't fetch all the collections):

$ time ./datamanager -dry-run &> /tmp/datamanager-dry-run-50.log

real    72m30.514s
user    6m30.243s
sys     0m43.171s

Changing one line in datamanager.go from 'BatchSize: 50' to 'BatchSize: 1000' results in:

$ time ./datamanager -dry-run &> /tmp/datamanager-dry-run-1000.log
real    12m57.729s
user    5m16.569s
sys     0m28.488s

I'd suggest raising the BatchSize as much as possible (or making it a configuration parameter).


Subtasks 1 (0 open1 closed)

Task #8520: Review PR #41ResolvedRadhika Chippada02/23/2016Actions
Actions #1

Updated by Joshua Randall about 8 years ago

  • Status changed from New to Feedback
  • Assigned To set to Joshua Randall
  • % Done changed from 0 to 100
Actions #2

Updated by Joshua Randall about 8 years ago

  • Assigned To deleted (Joshua Randall)
Actions #3

Updated by Brett Smith about 8 years ago

  • Subject changed from datamanager is slow to process collections to [Data Manager] Small batch size makes it slow to process collections
Actions #4

Updated by Brett Smith about 8 years ago

  • Target version set to 2016-03-16 sprint
Actions #5

Updated by Joshua Randall about 8 years ago

  • Assigned To set to Joshua Randall
Actions #6

Updated by Brett Smith about 8 years ago

  • Story points set to 0.5
Actions #7

Updated by Radhika Chippada about 8 years ago

PR #41

Rather than hardcoding the batch size of 1000, please add an argument "collection-batch-size" with default value of 1000.

Actions #8

Updated by Joshua Randall about 8 years ago

Radhika, I've now made the batch size a command line option ( -collection-batch-size) as requested.

Actions #9

Updated by Radhika Chippada about 8 years ago

  • Status changed from Feedback to Resolved
  • % Done changed from 0 to 100

Applied in changeset arvados|commit:6303d577e0513eae1254a9c73648c24b9451ed10.

Actions

Also available in: Atom PDF