CTA feature for moving tape files

Hello,

Our local Facilities (non Tier-1) enviromental data are mainly held in two extremely large legacy tape pools (10.6P and 16.1P). The data co-location is extremely poor with files from the same datasets being spread over several tapes.

We would like to re-organise these data into smaller tape pools and improve thereby their co-location. I don’t think this is possible to be done with the current Repack workflow as this involves the moving of all the files from a given tape.

Is there any way to move some files from one tape to another in CTA?

As far as I know, this is not possible (but please do confirm!) so it would be extremelly usefull in this kind of repack workflow could be added in one of the future versions of CTA.

Thanks,

George

Hello George,

We briefly discussed yesterday about how to best reply to your request.

What you are asking for is currently not easy to achieve. Today, there is only this hack to try:

  1. Select few files you want to move from tape A to tape B and recall them into the repack instance. Here is example where they need to end up:
/usr/bin/xrdcp $recalled_file $EOS_MGM_URL/$EOS_REPACK_DIR/$VID/$fSeq_with_zeros

where at CERN:

EOS_MGM_URL="root://eosctarepack.cern.ch"
EOS_REPACK_DIR="/eos/ctarepack/production"

VID is the tape volume identified and for the fSeq_with_zeros try something like this:

for i in `/usr/bin/cta-admin --json tapefile ls -v $VID | /usr/bin/jq --raw-output ' .[] | .af.archiveId+" "+.tf.fSeq+" "+.af.size+" "+.af.checksum[0].type+" "+.af.checksum[0].value' | /usr/bin/awk '{if ($1 ~ /^
[0-9]+$/) print $2, $3, $4, $5}'`
do

 # Extract the necessary information from the CTA tape file record
 fSeq=$(echo $i | awk '{print $1}')
 fSeq_with_zeros=$(echo $i | awk '{printf("%09d", $1)}')
  1. Once the files are in place on the REPACK instance - example:
pcvlado ~ > EOS_MGM_URL=root://eosctarepack eos ls -l /eos/ctarepack/production/I52620 | head -5
-rw-------   1 daemon   daemon     6392670912 Sep 13 09:56 000000147
-rw-------   1 daemon   daemon     4378111115 Sep 13 09:56 000000148
-rw-------   1 daemon   daemon        8140800 Sep 13 09:54 000000187
-rw-------   1 daemon   daemon       39931967 Sep 13 09:57 000000239
-rw-------   1 daemon   daemon     5727365621 Sep 13 09:58 000000316

then you can try to run cta-admin repack add --vid VID --mountpolicy your_repack_mount_policy --no-recall.

The --no-recall option should ensure that there is no recall of any files from the source tape A and only the files which are in the buffer are migrated to the new destination tape B. You should see the number of the files you provided in the providedFiles column of the cta-admin repack ls command output.

Please note that what I just wrote above is a completely untested recipe. Please give it a try and report here how it went.

In the long run, we plan to provide a solution based on modifying the storage classes. The idea is that you would change storage class of some files (no production quality tool to do this exist today). Then according to the defined archive routes, those files would end up in a different tape pool.

Hope this helps. Best regards,

Vladimir

Hi Vlado,

Thanks so much for your repl - it definitely provides some food for thought! I will think about your recipe and discuss it with the rest of the team,

Best,

George