Failed archive requests because of of duplicate archive file ids

Hello,

We have several thousands of failed archive requests, extracted via cta-admin --json fr ls.

I tried repeatedly to re-queue tham by sending the CLOSEW signal but again they fail (sys.archive.error=“Dec 15 14:33:27.539393 getafix-ts20 In ArchiveMount::reportJobsBatchTransferred(): got an exception”)

I had a look at the cta-taped logs and noticed the following error


{"epoch_time":1765809092.405254844,"local_time":"2025-12-15T14:31:32+0000","hostname":"getafix-ts15","program":"cta-taped","log_level":"ERROR","pid":658611,"tid":661131,"message":"In ArchiveMount::reportJobsBatchTransferred(): got an exception","drive_name":"obelix_ts1170_28","instance":"antares","sched_backend":"cephUser","thread":"MainThread","tapeDrive":"obelix_ts1170_28","mountId":"3327099","vo":"storaged-ceda","tapePool":"ceda_sentinel1a","successfulBatchSize":1000,"exceptionMessageValue":"selectArchiveFileSizesAndChecksums failed: Found duplicate archive file identifier in batch of files written to tape: archiveFileId=4383038324"}

It looks like there are duplicate archive file ids in all the batches of 1000 files that CTA is trying to archive but I dont understand how this happened and how it can be resolved.

Thanks,

George

This how the failed archive plot looks like…

Dont quite undestand why it keeps going up.

Hi George,

My guess would be that the files you are trying to re-queue did in fact get archived.

What happens if you type:

cta-admin tf ls --id 4383038324

For the failed requests, you can see the reason for the failures with:

cta-admin failedrequest ls --log

Possibly the files were archived and it was the reporting step that failed?

Michael

Hi Michael,

When I try to list the tape file with archive id 4383038324, I get

Archive file with ID 4383038324 does not exist

I cant see the reason for the failures using –log option.

I also think that a substantial part of these files, associated with the failed requests, were archived and it was the reporting step that failed. In such cases, as far as I know, re-sending the CLOSEW signal would work.

The thing is that in this batch of files, there are indeed duplicate archive ids (I extracted them via eos –json fileinfo … ). I am not sure what to do in this case: would reseting sys.archive.file_id=”” for all of them be of any help?

Best,

George

Hi George,

The archive ID is a unique monotonic number assigned at the point of file creation (before writing the file contents). It should not be possible for two files to be assigned the same archive ID. If that has happened, something is seriously wrong. Can you provide more details?

The file must have a valid archive ID to be archived. If you reset `sys.archive.file.id`, or if a file with that archive ID already exists in CTA, archiving will fail.

Can you provide some examples of the log messages from `cta-admin failedrequest ls –log`?