https://dev.arvados.org/https://dev.arvados.org/favicon.ico?15576888422014-08-06T09:03:35ZArvadosArvados - Idea #3491: [Keep] Support transparent compression of blocks in Keephttps://dev.arvados.org/issues/3491?journal_id=131592014-08-06T09:03:35ZPeter Amstutzpeter.amstutz@curii.com
<ul><li><strong>Subject</strong> changed from <i>Support transparent compression of blocks in Keep</i> to <i>[Keep] Support transparent compression of blocks in Keep</i></li><li><strong>Description</strong> updated (<a title="View differences" href="/journals/13159/diff?detail_id=11684">diff</a>)</li><li><strong>Category</strong> set to <i>Keep</i></li></ul> Arvados - Idea #3491: [Keep] Support transparent compression of blocks in Keephttps://dev.arvados.org/issues/3491?journal_id=134022014-08-08T14:40:06ZTom Cleggtom@curii.com
<ul><li><strong>Target version</strong> set to <i>Deferred</i></li></ul> Arvados - Idea #3491: [Keep] Support transparent compression of blocks in Keephttps://dev.arvados.org/issues/3491?journal_id=831092020-03-16T21:57:27ZStanislaw Adaszewski
<ul></ul><p>I would let the user decide whether blocks should be compressed or raw but this is definitely a great feature with potential for a lot of space savings. As a private person I would like this feature. I would implement it slightly differently though - basically use the checksum of the compressed data as the block address (like this there would be no need to decompress to verify the checksum and re-compress to send). Then the only thing should be the fuse driver should decompress blocks marked as gzip-compressed on-the-fly. If algos other than gzip were an option, there are compression schemes that are designed to be way faster to decompress, e.g. WKdm used for memory compression on Mac OS. This would perhaps be less convenient insofar that HTTP doesn't support it as encoding but it is much much faster.</p> Arvados - Idea #3491: [Keep] Support transparent compression of blocks in Keephttps://dev.arvados.org/issues/3491?journal_id=831112020-03-17T14:05:23ZPeter Amstutzpeter.amstutz@curii.com
<ul></ul><p>The main reason this hasn't been a priority is that many file formats already have domain specific compression such as BAM and various compressed image formats. Trying to compress already-compressed files is counterproductive, since at best it is a waste of time and at worst the result is larger than if you had left it alone. It also turns out that at gigabit+ transfer speeds, involving the CPU to do compression/decompression can be a huge bottleneck compared to just sending the data uncompressed (for typical compression ratios).</p> Arvados - Idea #3491: [Keep] Support transparent compression of blocks in Keephttps://dev.arvados.org/issues/3491?journal_id=831132020-03-17T15:45:48ZStanislaw Adaszewski
<ul></ul><p>Thank you for your reply. This makes sense. However, recently I unpacked UniRef30 for example and it jumped from 42GB compressed to 162GB uncompressed. Would be neat to have the compression as a user-controlled option. Some brainstorming on this could be worthwhile, as I am encountering this kind of ratio pretty often.</p>