It is mentioned that “When an operator repacks a tape, the tape files located on the source tape will be deleted from the TAPE_FILE table and will be put into the recycle-bin.”
I am not sure I totally understand this point which does sound somewhat scary. I would have thought that the TAPE_FILE entry would be preserved and updated with the VID of the destination tape. Can you please clarify? Pressumably the file will remain in the ARCHIVE_FILE table, right?
What is the exact meaning of the --bufferurl option. Is this the top level EOS dir of the files that will be repacked or is it specifying the repack buffer? Following CERN, we have set up seperate default and retrieve EOS spaces. Can we specify one of them as the repack buffer and how we can do this with --bufferurl?
Entries in TAPE_FILE are not updated, the repacked copy is considered a new tape copy which has its own VID and fseq.
When a tape is repacked, a new row is added to TAPE_FILE for the new copy. Then the old copy is deleted and its row is removed from TAPE_FILE. The metadata from the old copy is preserved in the recycle bin. The old copy can be “undeleted” until the tape is reclaimed. When the repacked tape is reclaimed, all data it contained is permanently deleted.
No change is made to the ARCHIVE_FILE table during repack operations.
The --bufferURL option is if you want repack operations to use a different disk buffer than the default one. We added this to optimise the use of our hardware resources, by using the spare disk space on tape servers for repack operations. If you omit the option, repack will use the same disk buffer that is used for normal archive/retrieve operations.
it looked like it points to a part of EOS namespace but according to your description above, the option resembles the (cta-admin) disksystem’s fileregexp flag (which in our case is ^root://antares.* eos:antares-eos01:retrieve)
bufferURL is a normal root URL just like the example you cited from the configuration file.
It is not a regular expression. It has to be an URL specifying a specific directory on a specific disk instance where you want repack operations to take place.
I want to move files within the dteam tape pool which contains three tapes, CL0090, CL0091 and CT4899, from CL0091 to CL0090. For this reason I have set CL0091 and CT4899 as FULL.
If it is the daemon user who needs to perform the archive and retrieve operations, do I need to create a group mount rule and a requester mount rule for daemon?
I was getting xrootd auth errors and “No mount rules: storageClass=dteam requester=eosantaresdev:daemon:daemon” in cta-taped.log when I had set acls for the repack dir. When I followed your set up it worked! Thanks so much!!
The 17 files from one tape were moved to the only tape in the pool with a full status set to false. So, is this how we specify the destination tapes?
The repacked tape appears empty
[root@cta-front02 ~]# cta-admin tf ls -v CL0091
archive id copy no vid fseq block id instance disk fxid size checksum type checksum value storage class owner group creation time path
[root@cta-front02 ~]#
but when listing the tapes of the pool, it appears to still have files
vid media type vendor library tapepool vo encryption key name capacity occupancy last fseq full from castor state state reason label drive label time last w drive last w time w mounts last r drive last r time r mounts c.user c.host c.time m.user m.host m.time comment
CL0090 LTO9 IBM asterix_lto dteam dteam - 18.0T 2.2T 570 false false ACTIVE - - - asterix_lto9_02 2022-06-09 16:18 4 asterix_lto9_02 2022-05-23 13:54 8 cta-admin cta-front02.scd.rl.ac.uk 2022-03-11 17:19 cta-admin cta-front02 2022-06-09 14:03 LTO9 media for Asterix CL0091 LTO9 IBM asterix_lto dteam dteam - 18.0T 3.4G 17 true false ACTIVE - - - asterix_lto9_03 2022-04-26 10:31 26 asterix_lto9_03 2022-06-09 16:14 11 cta-admin cta-front02.scd.rl.ac.uk 2022-03-11 17:20 cta-admin cta-front02 2022-06-09 14:03 LTO9 media for Asterix
CT4899 TS1160 IBM asterix_ts dteam dteam - 20.0T 92.6G 73 true false ACTIVE - - - asterix_ts1160_19 2022-05-23 13:29 2 asterix_ts1160_19 2022-05-23 13:54 1 cta-admin cta-front02 2022-05-18 13:09 cta-admin cta-front02 2022-06-09 14:15 GTF
Do I need to do this? cta-admin tape reclaim -v CL0091
Hi George, yes, you will (like castor) need to do the reclaim before the tape side of things clears its counters and makes the tape available for use again.
Yes you need to reclaim the tape to logically delete the files. The files are still recoverable from the recycle bin until reclaim is done. Reclaim will delete the files from the recycle bin.
Physically, the files are still on the tape until you start writing again from the begining (e.g. label the tape).
Sorry, I mean that we set the FULL flag to true for all tapes that we want to repack (i.e. the source tapes) and set the FULL flag to false for the tapes that will receive the repacked files (i.e. the destination tapes).
Is this assumption correct?
Another thing I forgot to ask: how do you repack from one tape pool to another? Do you change the storage class of the files to be repacked and then CTA will do the rest?
We are currently in the process of adding a new REPACKING tape state which will be used to explicitly manage repacking tapes. It is not finished yet, but this part of CTA will change in an upcoming release.
Repacking from one tape pool to another could be accomplished by changing the storage class of the files. However, we do not yet have a tool to do this. Currently the storage class in EOS is only used when the file is first created. The SC is saved in the catalogue database and we did not create a tool to change it.
I changed the storage class in EOS (setting the sys.archive.storage_class attibute) of a subset of files on a tape but that was not enough to repack these files to a different pool (i.e. the files were repacked in the same tape pool). From what you said, I only now realised that this happened because the storage class of these files had not been changed in the CTA DB. Is my understanding correct?
Thanks for this. I did manage to repack a subset of files on tape to different tape pool by changing the storage class of this subset of files in the DB (I did this on our dev instance).
For some reason, I see that “nbMasterFiles”:“0” (in the output of cta-admin --json tape ls --vid <VID> ) but I did manage to get hold of file counts per pool via a DB query.
If you see “nbMasterFiles”:“0” for a tape that is not empty, this suggests that you should run cta-statistics-update.
In previous versions of CTA, cta-statistics-update was the only way to update this value. Now it gets updated on-the-fly as files are written, but any tape that was written with the older version of CTA will not have valid statistics until you run cta-statistics-update.
In any case, you need to run it periodically to take account of deleted files.