Feature #13642

keepstore backend for ceph librados

Added by Joshua Randall over 3 years ago. Updated about 1 year ago.

Status:
Feedback
Priority:
Normal
Assigned To:
Category:
Keep
Target version:
-
Start date:
Due date:
% Done:

100%

Estimated time:
Story points:
-

Description

librados provides direct access to a ceph object store, with the potential for improved performance over accessing Ceph storage via the ceph object gateway / radosgw (which we are currently doing by using the keepstore S3 backend).

We would like a keepstore backend to access a ceph object storage pool directly via librados.

However, it is not enough to simply have a librados backend, as librados does not provide any indexing. We therefore would need another service to provide the index capabilities (block listing, block listing by hash prefix, most recent write timestamp, and size) that keepstore requires. This could be done by employing a lightweight nosql database deployed alongside keepstore (such as Riak KV).

History

#1 Updated by Joshua Randall over 3 years ago

However, it is not enough to simply have a librados backend, as librados does not provide any indexing. We therefore would need another service to provide the index capabilities (block listing, block listing by hash prefix, most recent write timestamp, and size) that keepstore requires. This could be done by employing a lightweight nosql database deployed alongside keepstore (such as Riak KV).

Update: it might be enough to use the librados backend, as in fact librados does provide object listings as well as size and modification times (although these appear to be only second precision, not nanosecond). I am not sure of how a "touch" can be realised, but I'll look into that now.

#2 Updated by Joshua Randall over 3 years ago

  • Assigned To set to Joshua Randall

#3 Updated by Joshua Randall over 3 years ago

  • Status changed from New to Feedback
  • % Done changed from 0 to 100

#4 Updated by Joshua Randall over 3 years ago

Packaging-wise this could be problematic as the go-ceph package links in everything required for librados and librbd, which is substantial:

$ ldd keepstore
        linux-vdso.so.1 =>  (0x00007fff445bc000)
        librados.so.2 => /usr/lib/x86_64-linux-gnu/librados.so.2 (0x00007f176734b000)
        libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f176712e000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f1766d64000)
        libdl.so.2 => /lib/x86_64-linux-gnu/libdl.so.2 (0x00007f1766b5f000)
        libboost_thread.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_thread.so.1.58.0 (0x00007f1766939000)
        libboost_random.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_random.so.1.58.0 (0x00007f1766732000)
        libblkid.so.1 => /lib/x86_64-linux-gnu/libblkid.so.1 (0x00007f17664f0000)
        libnss3.so => /usr/lib/x86_64-linux-gnu/libnss3.so (0x00007f17661a9000)
        libsmime3.so => /usr/lib/x86_64-linux-gnu/libsmime3.so (0x00007f1765f7d000)
        libnspr4.so => /usr/lib/x86_64-linux-gnu/libnspr4.so (0x00007f1765d3d000)
        librt.so.1 => /lib/x86_64-linux-gnu/librt.so.1 (0x00007f1765b35000)
        libboost_iostreams.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_iostreams.so.1.58.0 (0x00007f176591c000)
        libboost_system.so.1.58.0 => /usr/lib/x86_64-linux-gnu/libboost_system.so.1.58.0 (0x00007f1765717000)
        libstdc++.so.6 => /usr/lib/x86_64-linux-gnu/libstdc++.so.6 (0x00007f1765395000)
        libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f176508c000)
        /lib64/ld-linux-x86-64.so.2 (0x000055ad468a3000)
        libgcc_s.so.1 => /lib/x86_64-linux-gnu/libgcc_s.so.1 (0x00007f1764e75000)
        libuuid.so.1 => /lib/x86_64-linux-gnu/libuuid.so.1 (0x00007f1764c70000)
        libnssutil3.so => /usr/lib/x86_64-linux-gnu/libnssutil3.so (0x00007f1764a42000)
        libplc4.so => /usr/lib/x86_64-linux-gnu/libplc4.so (0x00007f176483d000)
        libplds4.so => /usr/lib/x86_64-linux-gnu/libplds4.so (0x00007f1764639000)
        libbz2.so.1.0 => /lib/x86_64-linux-gnu/libbz2.so.1.0 (0x00007f1764428000)
        libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f176420e000)

One possibility would be to use the new Go 1.8+ plugin functionality and migrate keepstore volume driver backends to plugins that are loaded when a volume is configured: https://golang.org/pkg/plugin/

This way, you could have a core keepstore package and then additional packages to add support for azure, s3, rados, etc without having to worry about whether specific libraries are available.

#5 Updated by Joshua Randall over 3 years ago

I could also use the plugin system only within the rados volume driver if you'd rather it not be a generic system.

#6 Updated by Tom Clegg about 3 years ago

Yes, I think it would be worthwhile to load the rados driver as a plugin. Moving the existing drivers out to plugins too would make them more useful as starting points for new drivers.

We'll have to reorganize the code a bit so the plugins and keepstore itself can share interfaces, tests, and the global config object -- something like
  • lib/keep/ - Volume interface, config struct
  • lib/keepplugin/rados/ - rados driver
  • lib/keeptest/ - Volume interface (current TestableVolume) and generic test suite
  • services/keepstore - main program and (at least for now) built-in volume drivers
Some initial comments on the branch:
  • Can we drop the command line flags? The other drivers only have them for backward compatibility.
  • Compare() allocates its own buffer, bypassing keepstore's buffer-limiting mechanism. Looks like Compare and Get could be refactored to use a common get() method that calls a provided func when it has some bytes available.
  • Looks like the goroutine in Get might write to buf after Get returns in the ctx.Done() case. This can corrupt the buffer after it has been given to a different request/goroutine.
  • The logic in Get might be simplified (incl avoiding the labelled loop) by using a defer func(){ ... }() to check the err being returned and call the appropriate radosTracef().
  • You can say time.Second instead of 1 * time.Second... do you know whether "read 0 bytes with no error" really happens? If so, should it have a wait less than 1s? And if not, I wonder about handling this as an internal error instead.
  • I'd like to understand why the Get-vs.-Put locks are needed even though Put seems to rely on write_full's atomicity to avoid exposing partially written blocks.

#7 Updated by Joshua Randall about 1 year ago

Tom - any progress over the past couple years getting the keepstore volume drivers refactored into a run-time loadable plugin system? I'd still be keen to get this merged at some point.

#8 Updated by Tom Clegg about 1 year ago

Still like this idea, but we haven't done the loadable module work yet, unfortunately.

Also available in: Atom PDF