Support #6357


Added by Brett Smith almost 6 years ago. Updated almost 6 years ago.

Assigned To:
Target version:
Start date:
Due date:
% Done:


Estimated time:
(Total: 0.00 h)
Story points:


[Tom, Radhika and I thought it might make sense for you to take this up as part of the user support story. It's at least tangentially related. If that's an issue, let us know and we'll figure out a plan B.]

Abram has written up, which is basically a smarter version of In particular, it can query different clusters to see find one that has a collection with a specific content address, and redirect the user there. It caches these results.

Right now is just an Apache RewriteRule. Assume that we want to replace that with, to get support for content addresses (and in the future, other kinds of redirects). Please review it for suitability to that end, especially around scalability and performance. Of course, anything in our usual review criteria is appropriate—it's possible or even likely that engineering will also work on this software in the future.


Task #6404: ReviewResolvedTom Clegg


#1 Updated by Brett Smith almost 6 years ago

  • Description updated (diff)
  • Assigned To set to Tom Clegg

#2 Updated by Brett Smith almost 6 years ago

One comment I already made just based on a functional review is that the size of the cache needs to be limited by configuration, so it doesn't grow to consume all RAM on the host.

#3 Updated by Tom Clegg almost 6 years ago

adopt? replace

This service implicitly offers to forever do Y when presented with link X, but the X→Y mapping isn't defined except by source code and (more likely) experimentation, and hasn't been discussed/agreed. We have some choices:
  • Accept that some links will break
  • Offer long term support for multiple short-url services with different (overlapping) behavior

IMO we should not add behavior to that we aren't prepared to commit to, and given that we haven't discussed behavior, we're not prepared to commit to it.

I'm not taking a position that anything here is incorrect -- just saying I don't think we should adopt the long term maintenance responsibility without giving ourselves a chance to think it through and make a good decision.


I'd like to know what the intended behavior is. I can read the source code and assume it's doing what was intended, but it would be better to have it written explicitly. That way we can see which behaviors are promised, which are undefined, and which are bugs.

PDH redirects should be 302, not 301.


When a browser shows up at this service, it would be better to implement everything (except the list of public servers1) entirely on the client side. There might be some todo's but I think this architecture has a better future than a server-side cache: for example, it could also find objects on local installs which are inaccessible to

(That doesn't mean this service is useless -- e.g., "wget" won't be able to use our fancy javascript -- but it'll never be as good as a client-side implementation, in the browser case.)

1 ...and that too, when we decide on a supported way to get that list

code review / suggestions

Should have test cases.

Should have access logs.

Bunch of recommendations at

Prefer command line flags over JSON config file.

Accept "address" (e.g., ":8000") instead of accepting just a port and prepending ":". (It's free to let the caller specify a listening IP, so why not...)

Use var x = regexp.MustCompile(...), instead of doing all that stuff in init().

Use RWMutex instead of channel, and guard readers too (using RLock()), not just writers.

Seems like write_cache_pdh() and write_cache_uuid() should just be one function that takes a map as an argument.

-if x { return true }
-if y { return true }
-return false
+return x || y

Use router to parse paths. Seems like "*filepath" actually represents "{id}/filepath" -- tell the router that, and use ByName("id") and ByName("filepath") instead of re-parsing the url yourself in the handler.

Make a debugLog function that does if g_verbose { log.Printf ... }. Add a -verbose or -debug flag to enable it.

Query API directly, instead of Workbench (and tell it you don't want the manifest_text). "GET {workbench}/collection" from workbench involves a lot of excess work. (We should streamline the way you get an anonymous token -- currently this would have to be in's config.)

#4 Updated by Tom Clegg almost 6 years ago

  • Status changed from New to Feedback

#5 Updated by Tom Clegg almost 6 years ago

  • Status changed from Feedback to Resolved

Also available in: Atom PDF