Bug #11238

job_task creation fails with ApiError - HttpError 422 - ActiveRecord::StatementInvalid: PG::InternalError: ERROR: invalid memory alloc request size 1718630765

Added by Joshua Randall 13 days ago. Updated 12 days ago.

Status:NewStart date:03/10/2017
Priority:NormalDue date:
Assignee:-% Done:

0%

Category:-
Target version:-
Story points-
Velocity based estimate-

Description

For some reason some of our jobs are failing with an API server error indicating a postgres statement was invalid because it tried to malloc 1.7GB of RAM!?

From a job log:

2017-03-10_12:43:38 z8ta6-8i9sb-1u981ftt6rzlgbb 52742 151 stderr arvados.errors.ApiError: <HttpError 422 when requesting https://api.arvados.sanger.ac.uk/arvados/v1/job_tasks?alt=json returned "#<ActiveRecord::StatementInvalid: PG::InternalError: ERROR:  invalid memory alloc request size 1718630765

History

#1 Updated by Joshua Randall 13 days ago

These appear to be the corresponding log from the api server production.log (there does not appear to be any mention of the issue in our postgres logs):

[api.arvados.sanger] [42516ad406ffb11b68c6b7c720679436] WARNING: Can't verify CSRF token authenticity
[api.arvados.sanger] [42516ad406ffb11b68c6b7c720679436] #<ActiveRecord::StatementInvalid: PG::InternalError: ERROR:  invalid memory alloc request size 1718630765
[api.arvados.sanger] [42516ad406ffb11b68c6b7c720679436] /var/www/arvados-api/shared/vendor_bundle/ruby/2.1.0/gems/activerecord-3.2.22.5/lib/active_record/connection_adapters/postgresql_adapter.rb:1176:in `get_last_result'
[api.arvados.sanger] [42516ad406ffb11b68c6b7c720679436] Error 1489149789+85807e44: 422
[api.arvados.sanger] [42516ad406ffb11b68c6b7c720679436] {"method":"POST","path":"/arvados/v1/job_tasks","format":"json","controller":"arvados/v1/job_tasks","action":"create","status":422,"duration":32.56,"view":0.45,"db":18.46,"params":{"parameters":{"inputs":"5c41dcf66d94aba8b7c1553c39a4520c+23030","name":"55_of_200","interval":"chr5:91121972-109108305","interval_list":"a65feae6f5f8dc407f422586aed7dc26+96","ref":"9a15a1d495a6efa0f3e05f1e851b694e+2227","reuse_job_task":"z8ta6-ot0gb-vi32h3s1a61u01e"},"success":true,"sequence":2,"finished_at":"2017-03-09T21:32:59.000000000Z","created_by_job_task_uuid":"z8ta6-ot0gb-bovzdgts2cmm8i2","progress":1.0,"output":"d2f2c39e81e617b139da9f6ea5cf581a+3511+A8e3cc4d710260d397eea250fdb048f58f0a53c8b@58d43f08","started_at":"2017-03-09T19:07:16.000000000Z","job_uuid":"z8ta6-8i9sb-1u981ftt6rzlgbb","alt":"json","job_task":{"job_uuid":"z8ta6-8i9sb-1u981ftt6rzlgbb","sequence":2,"parameters":{"inputs":"5c41dcf66d94aba8b7c1553c39a4520c+23030","name":"55_of_200","interval":"chr5:91121972-109108305","interval_list":"a65feae6f5f8dc407f422586aed7dc26+96","ref":"9a15a1d495a6efa0f3e05f1e851b694e+2227","reuse_job_task":"z8ta6-ot0gb-vi32h3s1a61u01e"},"output":"d2f2c39e81e617b139da9f6ea5cf581a+3511+A8e3cc4d710260d397eea250fdb048f58f0a53c8b@58d43f08","progress":1.0,"success":true,"created_by_job_task_uuid":"z8ta6-ot0gb-bovzdgts2cmm8i2","started_at":"2017-03-09T19:07:16.000000000Z","finished_at":"2017-03-09T21:32:59.000000000Z"}},"@timestamp":"2017-03-10T12:43:09Z","@version":"1","message":"[422] POST /arvados/v1/job_tasks (arvados/v1/job_tasks#create)"}
 

#2 Updated by Joshua Randall 13 days ago

  • Subject changed from ApiError - HttpError 422 - ActiveRecord::StatementInvalid: PG::InternalError: ERROR: invalid memory alloc request size 1718630765 to job_task creation fails with ApiError - HttpError 422 - ActiveRecord::StatementInvalid: PG::InternalError: ERROR: invalid memory alloc request size 1718630765

#3 Updated by Joshua Randall 13 days ago

arvados_production=> select count(*) from jobs;
 count
-------
 49870
(1 row)

arvados_production=> select count(*) from job_tasks;
  count
---------
 4931305
(1 row)

#4 Updated by Joshua Randall 12 days ago

I've followed the directions at https://blog.dob.sk/2012/05/19/fixing-pg_dump-invalid-memory-alloc-request-size/ to check the job_tasks table for bad rows, but it didn't find any. Guess there may be other tables joined in on whatever the queries are that are being done during job task creation?

Also available in: Atom PDF