CTA feature for moving tape files

Hello,

Our local Facilities (non Tier-1) enviromental data are mainly held in two extremely large legacy tape pools (10.6P and 16.1P). The data co-location is extremely poor with files from the same datasets being spread over several tapes.

We would like to re-organise these data into smaller tape pools and improve thereby their co-location. I don’t think this is possible to be done with the current Repack workflow as this involves the moving of all the files from a given tape.

Is there any way to move some files from one tape to another in CTA?

As far as I know, this is not possible (but please do confirm!) so it would be extremelly usefull in this kind of repack workflow could be added in one of the future versions of CTA.

Thanks,

George

Hello George,

We briefly discussed yesterday about how to best reply to your request.

What you are asking for is currently not easy to achieve. Today, there is only this hack to try:

  1. Select few files you want to move from tape A to tape B and recall them into the repack instance. Here is example where they need to end up:
/usr/bin/xrdcp $recalled_file $EOS_MGM_URL/$EOS_REPACK_DIR/$VID/$fSeq_with_zeros

where at CERN:

EOS_MGM_URL="root://eosctarepack.cern.ch"
EOS_REPACK_DIR="/eos/ctarepack/production"

VID is the tape volume identified and for the fSeq_with_zeros try something like this:

for i in `/usr/bin/cta-admin --json tapefile ls -v $VID | /usr/bin/jq --raw-output ' .[] | .af.archiveId+" "+.tf.fSeq+" "+.af.size+" "+.af.checksum[0].type+" "+.af.checksum[0].value' | /usr/bin/awk '{if ($1 ~ /^
[0-9]+$/) print $2, $3, $4, $5}'`
do

 # Extract the necessary information from the CTA tape file record
 fSeq=$(echo $i | awk '{print $1}')
 fSeq_with_zeros=$(echo $i | awk '{printf("%09d", $1)}')
  1. Once the files are in place on the REPACK instance - example:
pcvlado ~ > EOS_MGM_URL=root://eosctarepack eos ls -l /eos/ctarepack/production/I52620 | head -5
-rw-------   1 daemon   daemon     6392670912 Sep 13 09:56 000000147
-rw-------   1 daemon   daemon     4378111115 Sep 13 09:56 000000148
-rw-------   1 daemon   daemon        8140800 Sep 13 09:54 000000187
-rw-------   1 daemon   daemon       39931967 Sep 13 09:57 000000239
-rw-------   1 daemon   daemon     5727365621 Sep 13 09:58 000000316

then you can try to run cta-admin repack add --vid VID --mountpolicy your_repack_mount_policy --no-recall.

The --no-recall option should ensure that there is no recall of any files from the source tape A and only the files which are in the buffer are migrated to the new destination tape B. You should see the number of the files you provided in the providedFiles column of the cta-admin repack ls command output.

Please note that what I just wrote above is a completely untested recipe. Please give it a try and report here how it went.

In the long run, we plan to provide a solution based on modifying the storage classes. The idea is that you would change storage class of some files (no production quality tool to do this exist today). Then according to the defined archive routes, those files would end up in a different tape pool.

Hope this helps. Best regards,

Vladimir

Hi Vlado,

Thanks so much for your repl - it definitely provides some food for thought! I will think about your recipe and discuss it with the rest of the team,

Best,

George

Hi Vlado

I tried your recipe and worked as expected.

  1. Recalled 5 files from tape (landed on the retrieve buffer)

  2. Copy the 5 recalled files over to the recall buffer - in my case, in the same EOS instance - with the appropriate target names (padded fseq numbers from the source tape). The repack dir was owned by a non daemon user

  3. Chown the repack dir to daemon:daemon

  4. Did cta-admin repack add … --no-recall
    which showed, as you said, providedFiles=5

  5. Only ArchiveForRepack mount was seen

  6. After the completion of the repack, the files were deleted from the source tape and the repack buffer by CTA

  7. Recalled the 5 files succesffully from the new target tape

I dont think I would try this recipe with production files.

Your suggestion with storage class change sounds more straightforward and much less error-prone to me (this how we used to do repacks from one pool to another in CASTOR). I know there is no production quality tool available but changing the class directly in the DB with SQL Developer or similar shouldnt be too difficult.

Best,

George