Repacking a tape with corrupt file(s)

We have encountered a tape which has at least one unreadable file. CTA, cta-readtp, and dd all fail to read this file. We can, it seems, skip over it with the positioning commands.

How would we go about repacking this tape to a new tape leaving out any attempts to read the corrupt file? Things I can think of:

  • cta-admin tf rm the corrupt file(s) and do a full repack
  • use cta-readtp to read off all the files and put them into the buffer with the naming scheme repack would use. Then do repack with --norecall

Does CERN have a documented procedure you could share?

Hi Eric,

normally, we would just submit the tape for repack.

The repack process should try, but skip over the unreadable file(s) and continue reading the others.

Example (see the failed column):

[tape-local@ctaproductionfrontend11 ~]$ cta-admin repack ls
          c.time  repackTime     c.user    vid           tapepool providedFiles totalFiles totalBytes selectedFiles filesToRetrieve filesToArchive failed   status      instance 
2025-04-28 04:53  2d9h34m20s tape-local I70276    backup_ZVAULT_1             0      10063      11.0T         10063             561            465    465  Running ctaproduction 
2025-04-29 07:52  1d6h35m27s tape-local I76243    backup_ZVAULT_1             0      10502      11.5T         10502            4382           4382    498  Running ctaproduction 

Like this you move the good files onto another tape as soon as possible.

Have you tried that? Please re-try multiple times (repack rm / repack add) to make sure that the tape is mounted in different drives.

Once you are left with only unreadable files on that tape, come back to this ticket and we will take it from there.

Vladimir

One of the issues we had with this tape was that it also triggered the issue we talked about where the wrong files were being read. Like in the process of trying to read the broken file, CTA or the tape drive loses track of where it is on the tape. So in a repack nothing past that broken file is going to be readable.

So let’s say we just try to repack the tape. The first 1700 files will be read OK and the rest will fail. What happens to those 1700? Are they written to a new tape or do they stay in the buffer? If they get written to tape are subsequent repack attempts smart enough to not read them a second time from the bad tape?

If we use tapefile rm to get rid of the one bad file from CTA, I presume it will not be read during the repack attempt and perhaps it will succeed. Or perhaps we will have to get rid of multiple bad files.

Cheers,

Eric

Eric,

Repack should skip bad files and continue until it reaches end of data on tape.

So whatever the drive manages to read successfully will be ready to be migrated to another tape.

You just need to wait until that process finishes = until the repack status changes from Running to Failed and you are left with less files to read.

Then you re-try again - hopefully on a different tape drive.

So just read those 1700 files with normal repack and wait until they are migrated. Subsequent repacks will not read them again because logically, they will simply already be on another tape.

Example of a tape where repack is in the Failed state with 1 file remaining (all other files were moved elsewhere onto another tape(s)):

[tape-local@ctaproductionfrontend11 ~]$ cta-admin repack ls
          c.time repackTime     c.user    vid           tapepool providedFiles totalFiles totalBytes selectedFiles filesToRetrieve filesToArchive failed   status      instance 
2025-05-04 09:19  15h28m27s tape-local I55295         vo_CMS_fam             0      40718       7.2T         40718               1              1      1   Failed ctaproduction 

Do not get rid of the broken file. Do multiple repack passes first.

Vladimir

In our case that doesn’t work. Maybe you weren’t at the last dev meeting where I described a related issue? We have an Enstore tape (which is why it behaves differently) where once it tries to read a particular file (#1721) and fails, it then tries to skip ahead, say to file 1740. Instead it lands on file 1741 reads it and discards it because the checksums don’t match. And on and on through the rest of the tape.

This is repeatable so we now have a tape where once we try to read file #1721 in can never read anything else on the tape. That’s why I’m suggesting we need to forget about that one file.

Eric,

in this case - do you have access to this procedure how to restore a file from the recycle bin:

Try it on the pre-production setup.

If that works, try to temporarily delete that problematic file (or more files) and re-launch repack to recover remaining files from the tape.

Then - BEFORE you run cta-admin tape reclaim command, restore that deleted file(s) back so it is not (yet) lost.

What do you think about this tip?

Vladimir

I think something like this could work. I wonder if it will since we don’t have EOS. I don’t think we’re interested in trying to recover, at least this one file. It’s CMS Monte Carlo and unfortunately we had the only copy. But it’s MC.

Thanks for the tips.