Bug #8787

[API] repositories/get_all_permissions method is too slow

Added by Nico César over 5 years ago. Updated over 4 years ago.

Status:
Resolved
Priority:
Normal
Assigned To:
-
Category:
API
Target version:
-
Start date:
03/24/2016
Due date:
% Done:

0%

Estimated time:
Story points:
-

Description

In su92l we're getting a Error: Net::ReadTimeout :

Error: Net::ReadTimeout
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:134:in `readuntil'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:144:in `readline'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http/response.rb:39:in `read_status_line'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http/response.rb:28:in `read_new'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1408:in `block in transport_request'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1405:in `catch'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1405:in `transport_request'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1378:in `request'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1371:in `block in request'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:853:in `start'
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1369:in `request'
/usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/adapter/net_http.rb:76:in `perform_request'
/usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/adapter/net_http.rb:39:in `call'
/usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/request/url_encoded.rb:14:in `call'
/usr/local/rvm/gems/ruby-2.1.4/gems/google-api-client-0.6.4/lib/google/api_client/request.rb:159:in `send'
/usr/local/rvm/gems/ruby-2.1.4/gems/google-api-client-0.6.4/lib/google/api_client.rb:563:in `execute'
/usr/local/rvm/gems/ruby-2.1.4/gems/arvados-0.1.20160315200920/lib/arvados.rb:207:in `api_exec'
/usr/local/rvm/gems/ruby-2.1.4/gems/arvados-0.1.20160315200920/lib/arvados.rb:83:in `block (4 levels) in initialize'
/usr/local/arvados/update-gitolite.rb:233:in `<main>'

this is

 arv = Arvados.new
 permissions = arv.repository.get_all_permissions

which makes a

Started GET "/arvados/v1/repositories/get_all_permissions" for 127.0.0.1 at 2016-03-24 13:14:56 +0000
Processing by Arvados::V1::RepositoriesController#get_all_permissions as */*

and I think is taking ~150 seconds sometimes (sometimes is almost instant, caching maybe? )

we need to optimize this so

/bin/bash -c 'source /etc/profile.d/rvm.sh && /usr/local/arvados/update-gitolite.rb production'

can run predictably


Related issues

Related to Arvados - Story #8886: [API] Do not block individual API queries on rebuilding the permissions graphResolved04/13/2016

Related to Arvados - Feature #10267: Permission-monitoring service keeps a flattened permissions table up to dateRejected10/12/2016

Related to Arvados - Story #10816: [AP] [spike] Implement permission lookups as a recursive graph query in Postgres instead of computing in RubyResolved01/04/2017

History

#1 Updated by Nico César over 5 years ago

this started happening Tue, Mar 22, 2016 at 9:12 AM (Boston time)

#2 Updated by Nico César over 5 years ago

Comparing indexes looking for a missing one:

# select
    t.relname as table_name,
    i.relname as index_name,
    array_to_string(array_agg(a.attname), ', ') as column_names
from
    pg_class t,
    pg_class i,
    pg_index ix,
    pg_attribute a
where
    t.oid = ix.indrelid
    and i.oid = ix.indexrelid
    and a.attrelid = t.oid
    and a.attnum = ANY(ix.indkey)
    and t.relkind = 'r' 
group by
    t.relname,
    i.relname
order by
    t.relname,
    i.relname;

on qr1hi there is a difference with table logs, column uuid,

but c97qk is identical

#3 Updated by Nico César over 5 years ago

this is how to reproduce it:

#!/usr/bin/env ruby

require 'arvados'

beginning_time = Time.now
arv = Arvados.new
permissions = arv.repository.get_all_permissions
end_time = Time.now
puts "Time elapsed #{(end_time - beginning_time)*1000} milliseconds" 
nico@shell:~$ ./test.rb 
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill': Net::ReadTimeout (Faraday::Error::TimeoutError)
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:134:in `readuntil'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:144:in `readline'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http/response.rb:39:in `read_status_line'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http/response.rb:28:in `read_new'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1408:in `block in transport_request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1405:in `catch'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1405:in `transport_request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1378:in `request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1371:in `block in request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:853:in `start'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1369:in `request'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/adapter/net_http.rb:76:in `perform_request'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/adapter/net_http.rb:39:in `call'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/request/url_encoded.rb:14:in `call'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/google-api-client-0.6.4/lib/google/api_client/request.rb:159:in `send'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/google-api-client-0.6.4/lib/google/api_client.rb:563:in `execute'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/arvados-0.1.20160315200920/lib/arvados.rb:207:in `api_exec'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/arvados-0.1.20160315200920/lib/arvados.rb:83:in `block (4 levels) in initialize'
        from ./test.rb:7:in `<main>'
nico@shell:~$ ./test.rb 
Time elapsed 15450.9472 milliseconds
nico@shell:~$ time ./test.rb                                                                                                                                                
Time elapsed 111464.593272 milliseconds

real    1m51.765s
user    0m0.524s
sys     0m0.176s
nico@shell:~$ time ./test.rb 
Time elapsed 15467.253309 milliseconds

real    0m15.989s
user    0m0.568s
sys     0m0.348s
nico@shell:~$ time ./test.rb 
Time elapsed 13511.453208 milliseconds

real    0m13.801s
user    0m0.516s
sys     0m0.036s

#4 Updated by Nico César over 5 years ago

timeout is 2 minutes, here is the proof

nico@shell:~$ cat ./test.rb 
#!/usr/bin/env ruby

require 'arvados'

beginning_time = Time.now
arv = Arvados.new
permissions = arv.repository.get_all_permissions
end_time = Time.now
puts "Time elapsed #{(end_time - beginning_time)*1000} milliseconds" 

nico@shell:~$ time ./test.rb 
/usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:158:in `rescue in rbuf_fill': Net::ReadTimeout (Faraday::Error::TimeoutError)
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:152:in `rbuf_fill'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:134:in `readuntil'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/protocol.rb:144:in `readline'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http/response.rb:39:in `read_status_line'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http/response.rb:28:in `read_new'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1408:in `block in transport_request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1405:in `catch'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1405:in `transport_request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1378:in `request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1371:in `block in request'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:853:in `start'
        from /usr/local/rvm/rubies/ruby-2.1.4/lib/ruby/2.1.0/net/http.rb:1369:in `request'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/adapter/net_http.rb:76:in `perform_request'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/adapter/net_http.rb:39:in `call'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/faraday-0.8.11/lib/faraday/request/url_encoded.rb:14:in `call'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/google-api-client-0.6.4/lib/google/api_client/request.rb:159:in `send'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/google-api-client-0.6.4/lib/google/api_client.rb:563:in `execute'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/arvados-0.1.20160315200920/lib/arvados.rb:207:in `api_exec'
        from /usr/local/rvm/gems/ruby-2.1.4/gems/arvados-0.1.20160315200920/lib/arvados.rb:83:in `block (4 levels) in initialize'
        from ./test.rb:7:in `<main>'

real    2m0.688s
user    0m0.456s
sys     0m0.084s

#5 Updated by Brett Smith over 5 years ago

  • Project changed from OPS to Arvados
  • Subject changed from [API] Error: Net::ReadTimeout on arv.repository.get_all_permissions to [API] repositories/get_all_permissions method is too slow
  • Category set to API

#6 Updated by Brett Smith over 5 years ago

  • Target version set to Arvados Future Sprints

#7 Updated by Brett Smith over 5 years ago

There are a lot of contributing factors here. Given the wide timing variation shown in the notes, one idea is that the response time is greatly affected by whether or not particular database results are cached.

The #1 culprit is the permissions graph. We completely flush that whenever it's affected, and then the next API query has to rebuild the graph completely.

Addressing this performance issue would help a lot of things, not just this individual call. We'd like to start there and see whether it's sufficient to solve this issue.

There was a philosophical discussion about whether our permissions model (and by extension, API generally) should move to a sort of eventual consistency model, where the permissions graph gets rebuilt in the background without that work blocking the handling of any individual API query.

#8 Updated by Nico César almost 5 years ago

see #10917 looks like we've go an order of magnitude of improvement

#9 Updated by Nico César almost 5 years ago

  • Status changed from New to Resolved

This got resolved with the last deploy no more Timeout alerts in my inbox

#10 Updated by Tom Clegg over 4 years ago

  • Target version deleted (Arvados Future Sprints)

Also available in: Atom PDF