Corrupted tape blocks?

Hello,

We are seeing some issues with certain tapes that we are trying to repack. After the tape has been mounted for a repack recall and some files have been “successfully transfered to disk” (according to the cta-taped log), cta-taped crashes. the tape gets unmounted from the drive and “cta-admin re ls” shows something like 1853 failed (to recall?) files.

Looking in some detail the cta-taped log, I saw the following

  • The last file that was successfully read from tape and succcessfully transfered to disk was the file with fSeq = 29. I can indeed see the file in the repack buffer (/eos/antaresfac/repack/JT0575/000000029).However this file looks like it is missing(!) from the source tape although the immediatelly previous ones (fSeq 21 to 28) and following ones (fSeq > 30) are present on tape.

  • Immediately after the “File successfully transfered to disk” log line for this file - /eos/antaresfac/repack//JT0575/000000029 - I see the following errors (one after the other) indicating that fSeq=3- cannot be read

{“epoch_time”:1741341577.508140925,“local_time”:“2025-03-07T09:59:37+0000”,“hostname”:“getafix-ts08”,“program”:“cta-taped”,“log_level”:“ERROR”,“pid”:“85267”,“tid”:“88019”,“message”:“Error reading a file in TapeReadFileTask”,“drive_name”:“obelix_ts1160_12”,“instance”:“antares”,“sched_backend”:“cephUser”,“thread”:“TapeRead”,“tapeDrive”:“obelix_ts1160_12”,“tapeVid”:“JT0575”,“mountId”:“2785875”,“vo”:“repack”,“tapePool”:“ceda2”,“mediaType”:“3592JE20T”,“logicalLibrary”:“obelix_ts”,“mountType”:“Retrieve”,“labelFormat”:“0000”,“vendor”:“IBM”,“capacityInBytes”:20000000000000,“fileId”:8331198,“BlockId”:1172873,“fSeq”:30,“dstURL”:"root://antares-eos15.scd.rl.ac.uk//eos/antaresfac/repack//JT0575/000000030?oss.asize=10630323586",“isRepack”:true,“isVerifyOnly”:false,“fileBlock”:1777,“ErrorMessage”:“In DriveGeneric::readBlock: Failed ST read Errno=5: Input/output error”}
{“epoch_time”:1741341577.511443774,“local_time”:“2025-03-07T09:59:37+0000”,“hostname”:“getafix-ts08”,“program”:“cta-taped”,“log_level”:“ERROR”,“pid”:“85267”,“tid”:“88029”,“message”:“In DriveGeneric::readBlock: Failed ST read Errno=5: Input/output error”,“drive_name”:“obelix_ts1160_12”,“instance”:“antares”,“sched_backend”:“cephUser”,“thread”:“DiskWrite”,“tapeDrive”:“obelix_ts1160_12”,“tapeVid”:“JT0575”,“mountId”:“2785875”,“vo”:“repack”,“tapePool”:“ceda2”,“threadCount”:10,“threadID”:9,“fileId”:8331198,“dstURL”:“root://antares-eos15.scd.rl.ac.uk//eos/antaresfac/repack//JT0575/000000030?oss.asize=10630323586”,“fSeq”:30,“actualURL”:“root://antares-eos21.scd.rl.ac.uk:1095//eos/antaresfac/repack//JT0575/000000030?cap.msg=t7jvDdNbHeup4zeyTJZxUWqADL0BZq7FZznkmIEBPWAODI9cxhC9h/ldAPhgzvSzVKH/1FMPIb1Byn3SYloNqQrg3Ud5XoKCw13MUvzHMG4YV9Qk0zATNYdIb0nHEUzROsDYNKR6zZfzHD/LGlBe6BCELC1Wuj01bT0ZXOHgwM00taR9xTfERCtE32KXphh9WnuSMXasONe02053AvRvDH/WjjYqYiz1VmMLXGe2RI2TwSVbuaDk6MpfALCzkx6yQigNjwI1VQp0fGg1AXPvzdhz3zEY8XA6ehhIUdr0vY55FEyKdksSgY5zzLk51fTl293r47VJXwJQz5wXgGdaKlaLvvKShDsi87eXUDPevxXyqlrrKyn07dCXu/tVH1GPkEqLyga8QSNPR/Ij6SaNznHj/dObqiRGT0IOK9bZrxj7idjc79OJZn7m3B80AzU9TDi5UXBZjMFM58QMuiPrJ40UYNGxQCtHV8EQ0REtSKi+DhP90gJ/nBWxPDh4AYYw2YqouFIM5/wZFOcJVBPhzct9KVa00+5ukIcvVusC/yAreDZ2742CKrWWkeRXSlaYvRSTxk5IpkM=&cap.sym=lKNh9ndkBi9ZOIMZecMyKeIQYww=&eos.clientinfo=zbase64:MDAwMDAwNmV4nBXIUQrDIBBF0a10Axm00JIIsxijTyoRlXGkdPeNf+fe1lGdKBva34/c3PnlXNPSCB9ENov66+BzpgRBXKNLbhyR/Cy6+t5ZEJR9VS8YG9p4WhohkhTygeblrDlef2i7Jow=&mgm.id=0c59159c&mgm.logid=78d0fad0-fb3a-11ef-a6ad-0c42a1f42af0&mgm.replicahead=0&mgm.replicaindex=0&oss.asize=10630323586&xrdcl.requuid=f6a35a5d-506d-4e83-a68f-7acd26991064”,“received_archiveFileID”:8331198,“expected_NSBLOCKId”:1776,“received_NSBLOCKId”:18446744073709551615,“failed_Status”:true}
{“epoch_time”:1741341577.513305754,“local_time”:“2025-03-07T09:59:37+0000”,“hostname”:“getafix-ts08”,“program”:“cta-taped”,“log_level”:“ERROR”,“pid”:“85267”,“tid”:“88019”,“message”:“Failed to open tape file for reading”,“drive_name”:“obelix_ts1160_12”,“instance”:“antares”,“sched_backend”:“cephUser”,“thread”:“TapeRead”,“tapeDrive”:“obelix_ts1160_12”,“tapeVid”:“JT0575”,“mountId”:“2785875”,“vo”:“repack”,“tapePool”:“ceda2”,“mediaType”:“3592JE20T”,“logicalLibrary”:“obelix_ts”,“mountType”:“Retrieve”,“labelFormat”:“0000”,“vendor”:“IBM”,“capacityInBytes”:20000000000000,“fileId”:8331214,“BlockId”:1213434,“fSeq”:31,“dstURL”:“root://antares-eos15.scd.rl.ac.uk//eos/antaresfac/repack//JT0575/000000031?oss.asize=11169130808”,“isRepack”:true,“isVerifyOnly”:false,“ErrorMessage”:“SCSI error in positionToLogicalObject: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present”}
{“epoch_time”:1741341577.513466651,“local_time”:“2025-03-07T09:59:37+0000”,“hostname”:“getafix-ts08”,“program”:“cta-taped”,“log_level”:“ERROR”,“pid”:“85267”,“tid”:“88019”,“message”:“Error reading a file in TapeReadFileTask”,“drive_name”:“obelix_ts1160_12”,“instance”:“antares”,“sched_backend”:“cephUser”,“thread”:“TapeRead”,“tapeDrive”:“obelix_ts1160_12”,“tapeVid”:“JT0575”,“mountId”:“2785875”,“vo”:“repack”,“tapePool”:“ceda2”,“mediaType”:“3592JE20T”,“logicalLibrary”:“obelix_ts”,“mountType”:“Retrieve”,“labelFormat”:“0000”,“vendor”:“IBM”,“capacityInBytes”:20000000000000,“fileId”:8331214,“BlockId”:1213434,“fSeq”:31,“dstURL”:“root://antares-eos15.scd.rl.ac.uk//eos/antaresfac/repack//JT0575/000000031?oss.asize=11169130808”,“isRepack”:true,“isVerifyOnly”:false,“fileBlock”:0,“ErrorMessage”:“SCSI error in positionToLogicalObject: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present”}
{“epoch_time”:1741341577.514474098,“local_time”:“2025-03-07T09:59:37+0000”,“hostname”:“getafix-ts08”,“program”:“cta-taped”,“log_level”:“ERROR”,“pid”:“85267”,“tid”:“88020”,“message”:“SCSI error in positionToLogicalObject: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present”,“drive_name”:“obelix_ts1160_12”,“instance”:“antares”,“sched_backend”:“cephUser”,“thread”:“DiskWrite”,“tapeDrive”:“obelix_ts1160_12”,“tapeVid”:“JT0575”,“mountId”:“2785875”,“vo”:“repack”,“tapePool”:“ceda2”,“threadCount”:10,“threadID”:0,“fileId”:8331214,“dstURL”:“root://antares-eos15.scd.rl.ac.uk//eos/antaresfac/repack//JT0575/000000031?oss.asize=11169130808”,“fSeq”:31,“received_archiveFileID”:8331214,“expected_NSBLOCKId”:0,“received_NSBLOCKId”:18446744073709551615,“failed_Status”:true}

Indeed there is no /eos/antaresfac/repack/JT0575/000000030 in the buffer although it is present on tape.

I quite troubled by the block mismatches I highlighted above…

Do you know what is the problem and what should I do?

Wishfully thinking I could re-submit the repack with the --no-recall flag to triger the archival of files with 21 <= fseq <= 29 and submit again the repack for the whole tape but I worried that the tape maybe damged or correpted around the block with fSeq=29.

Thanks,

George

Just to add that I have seen this in a couple of other tapes as well

Hi George,

did you have a look into /var/log/messages around this time 2025-03-07T09:59:37+0000?

Is there any other message related to Input/output error?

Vladimir

Hi Vlado,

The only messages with the same time stamp (Mar 7 09:59:37 ) are the following

Mar 7 09:59:37 getafix-ts08 kernel: st 17:0:1:0: [st2] Sense Key : Not Ready [current]
Mar 7 09:59:37 getafix-ts08 kernel: st 17:0:1:0: [st2] Add. Sense: Medium not present

Seen these in another occurence of the same issue with another tape.

Best,

George

Hi George,

we discussed your issue at today’s CTA operations meeting.

While we can not immediately pinpoint what the issue is we have the following suggestions:

  1. You could use cta-readtp utility to read the fSeqs which you think might be problematic. Something like cta-readtp JT0575 20-30 -u <DRIVE> should mount the tape and perform a quick check of the files.
  2. The Medium not present clearly indicates that the tape cartridge was not mounted in the drive. Could this be mount/dismount problem?

Best regards,

Vladimir

Hi Vlado,

Many thanks for the reply. Yes, Medium not present was probably related to a mount/dismount problem which is defintely not unlikely with this library.

Thanks also for the info on cta-readtp. Probably there were no problematic blocks after all because
the repacks did complete after a couple of re-submissions.

Best,

George