Compression and tape capacity management

jcasals · 13 February 2023 16:14

Dear team,

Some days ago we started a testing period, since we could use an extra drive in our IBM library.

So today we wanted to bring a tape full, but since we didn’t want to create 12TB of data to write a full tape we have defined a mediatype with just 20GB, we assigned it to a tape and tried to fill it up. We couldn’t, it just kept writing over its capacity.
We then looked in the forum and we saw this thread we Mwai seemed to have the same problem: Tape occupancy exceeds capacity defined

We then have seen that some tapes, and depending on the compression rate of the data type, may be filled up to three times more than the original size they have.

So, we have two (three) questions:

1a. Is it possible to define a size that fakes the size of the tapes so we don’t have to really fill them up?
1b. In case it doesn’t, then what is the size defined in the mediatype for, because even if we define to 12TB (for our LTO8, for example) it will write a little bit more if it has room in the tape?
2. Is there a way to disable compression of data, and be exactly sure that we know how much space we write, we have in total, and we have left?

We know that having compression lets us write almost double per tape, but then, if it depends on the type of data we are writing and it’s not always the same amount per tape, it makes a little bit difficult the tape and space management.

Sorry if these are questions already asked, I looked a bit on the existing threads, and nothing really cleared our doubts.

Cheers,

Jordi

jleduc · 13 February 2023 17:21

Hi Jordi,
cta-taped writes on tape until it reaches the end of tape (as reported by the tape drive) so that we can maximize the amount of data we write on tape. I know that you can configure the drive to stop at the end of logical end of tape so that all tapes of a specific mediatype will stop at the same place. But compression is orthogonal to this end of tape concept.
Let me try to answer your questions in order:
1a: no, you will need to fill the tape. Alternatively you can set this tape in a specific tapepool (one per tape you want to partially fill), set the archive route for 1 specific folder to this storage class → tapepool with your 1 tape and just write the amount of data you want to write on this tape in this directory. Then you can set the tape to full (cta-admin tape ch) when you are done so that no more data goes to this tape.
1b. For the service planning we just sum the tapes per mediatype and know that we have X tapes with at least capacity bytes on each of them. This gives us a small free margin for the data we write. In addition if data is highly compressible then the full archive process is highly inefficient and the ratio between tape capacity and amount of data written on tape helps us track these inefficiencies. Compression partially compensate these inefficiencies but usually the FC link becomes the bottleneck at this point…
2. Compression is performed on the tape drive during archival: you should have a look at your tape drive configuration (on the library side or at the SCSI level) to see if you can disable compression. But again this is a freebie that gives you some free capacity…

Alternatively when I want to benchmark the infrastructure with uncompressible data, I just create a dummy uncompressible file bigger than the tape drive internal buffer: something like 4GB should be enough. You can create it by dd’ing from /dev/urandom. Then you just xrdcp the same file over and over to CTA for this tape: the drive will have no way to compress it.