Bug #5713

[FUSE] File access sometimes takes too long on su92l

Added by Abram Connelly about 2 years ago. Updated 3 months ago.

Status:NewStart date:04/13/2015
Priority:NormalDue date:
Assignee:Tom Morris% Done:

0%

Category:FUSE
Target version:Arvados Future Sprints
Story points-
Velocity based estimate-

Description

Every once in a while, accessing files or data from the keep mount becomes unusably slow from the shell nodes.

Sometimes file access can take anywhere from 5-15 minutes. I've mostly seen it on abe.shell.su92l and shell.su92l. The problem is intermittent and once the file successfully opens subsequence access is fast.


Related issues

Related to Arvados - Bug #5662: [FUSE] cd/ls sometimes takes too long on su92l New 04/03/2015
Blocked by Arvados - Bug #5748: [Keep] sometimes, Keep sucks up 100% cpu and becomes real... Resolved 05/06/2015

History

#1 Updated by Brett Smith about 2 years ago

  • Subject changed from Accessing file from my shell node keep mount becomes unusuably slow to [FUSE] File access sometimes takes too long on su92l
  • Category set to FUSE

#2 Updated by Ward Vandewege about 2 years ago

Caused by #5748

#3 Updated by Abram Connelly about 2 years ago

abram@abe.su92l:~/keep/home/dbGaP Release v2/dbGap Release v2 23andMe data/b37.list$ time file  b37.list
b37.list: ASCII text

real    22m17.655s
user    0m0.008s
sys     0m0.004s
abram@abe.su92l:~/keep/home/dbGaP Release v2/dbGap Release v2 23andMe data/b37.list$ ls -lah b37.list 
-r-xr-xr-x 1 abram abram 5.9K Apr 22 16:33 b37.list

The keep mount was setup as follows:

arv-mount --debug --logfile .bcsdebug/log.20150423 keep

Here's what I think are the relevant log lines from the above log file:

arv-mount handle.entry [('.', <arvados_fuse.CollectionDirectory object at 0x26ad8d0>), ('..', <arvados_fuse.ProjectDirectory object at 0x22e0310>
), ('b37.list', <arvados_fuse.StreamReaderFile object at 0x22de550>)]
Request fail: GET http://[keep7.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb
3e2 => <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='keep7.su92l.arvadosapi.com', port=25107): Read timed out. (read timeou
t=300)
Request: GET http://[keep5.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
404 response: 10 bytes in 35.9370708466 msec (0.000 MiB/sec)
Request: GET http://[keep6.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
404 response: 10 bytes in 23.3190059662 msec (0.000 MiB/sec)
Request: GET http://[keep8.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
Request fail: GET http://[keep8.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb
3e2 => <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='keep8.su92l.arvadosapi.com', port=25107): Read timed out. (read timeou
t=300)
Request: GET http://[keep3.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
Request fail: GET http://[keep3.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb
3e2 => <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='keep3.su92l.arvadosapi.com', port=25107): Read timed out. (read timeou
t=300)
Request: GET http://[keep2.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
404 response: 10 bytes in 276.965856552 msec (0.000 MiB/sec)
Request: GET http://[keep1.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
Request fail: GET http://[keep1.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb
3e2 => <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='keep1.su92l.arvadosapi.com', port=25107): Read timed out. (read timeou
t=300)
Request: GET http://[keep4.su92l.arvadosapi.com]:25107/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
404 response: 10 bytes in 175.436019897 msec (0.000 MiB/sec)
Request: GET http://[keep6.su92l.arvadosapi.com]:25108/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb3e2
Request fail: GET http://[keep6.su92l.arvadosapi.com]:25108/89434b3a149d83793ea56dd53951e97e+6006+Ab79cb458b34f47292416a3d83e73cd1e875b1505@554bb
3e2 => <class 'requests.exceptions.ReadTimeout'>: HTTPConnectionPool(host='keep6.su92l.arvadosapi.com', port=25108): Read timed out. (read timeou
t=300)

#4 Updated by Abram Connelly about 2 years ago

abram@abe.su92l:~/keepid/0581ad21ee2382f71f06f7249f4d676c+80/out-data$ time file get-evidence.json
get-evidence.json: ASCII text, with very long lines

real    49m4.585s
user    0m0.024s
sys     0m0.004s
abram@abe.su92l:~/keepid/0581ad21ee2382f71f06f7249f4d676c+80/out-data$ ls -lah
total 3.9M
dr-xr-xr-x 1 abram abram    0 May  5 18:01 .
dr-xr-xr-x 1 abram abram    0 Jan  1  1970 ..
-r-xr-xr-x 1 abram abram 3.9M Jan  1  1970 get-evidence.json
abram@abe.su92l:~/keepid/0581ad21ee2382f71f06f7249f4d676c+80/out-data$ date
Tue May  5 19:01:20 UTC 2015

#5 Updated by Brett Smith about 2 years ago

  • Target version changed from Bug Triage to Arvados Future Sprints

Our current thinking is that #5748 is the main culprit here. We're very interested to hear how things look after the next deploy.

#6 Updated by Tom Morris 3 months ago

  • Assignee set to Tom Morris

Has this problem been seen recently? It was reported over 2 years ago and a lot has changed since then, include a bunch of improvements to the performance of various components.

Also available in: Atom PDF