Tape compression

Hello friends,

We are looking at working out our tape compression ratios in our deployment. I attempted to do some calculations using cta-admin, which make sense to me, but the compression ratio I get is very very low. Could someone please have a look and verify that this is indeed the correct way to work out compression? We have 6T tapes

cta-admin --json ta ls -a | jq  -rc '.[] | select(.occupancy|tonumber > 6000000000000) | .occupancy' | awk '{sum+=$1}END{print sum/NR}'

6,009,652,993,055

6,009,652,993,055 / 6,000,000,000,000 = 1.001608832175956

This gives up an average compression ratio of 1:1.001608832175956, which is very low. The specs on the drives suggest a 1:2.5 at best, but 1.1.0016 seems quite low.

We data is restic pack files, which are deduped but not compressed.

Thank you friends :slight_smile:

Hi Denis,

my comment would be that deduplication (which removes duplicate blocks) is in fact a particular form of compression / data reduction. So compressing the data further might be difficult.

What ratio do you get if you take the deduplicated data and run gzip on it?

Vladimir

Denis,

see this page:

where they say:

Size Reduction Rate

Compression claims to reduce data size to the ratio of 2:1 up to 2.5:1, as claimed by some programs based on the available data file types.

With deduplication, though, the data is altered substantially. Reduction rates can range from 4:1 up to 20:1 and with specific data types can even be reduced to 200:1.

Vladimir

Hi Vladimir,

Thanks, that makes sense!

Warm Regards,

Denis