I meant to say that I havent seen errors associated with disk_file_id = 00000ABDA46B324547FAADE3CD608295DCCC
# grep '00000ABDA46B324547FAADE3CD608295DCCC' *.log | grep -c "got an exception"
0
[root@gmv18014 cta]#
but if I grep for ERROR in general (or “got an exception”) I do see some other disk_file_id showing:
{"epoch_time":1734362759.148497944,"local_time":"2024-12-16T09:25:59-0600","hostname":"gmv18014","program":"cta-taped","log_level":"ERROR","pid":2193083,"tid":2251909,"message":"In ArchiveMount::reportJobsBatchTransferred(): got an exception","drive_name":"G1_F3C3R3","instance":"preProdCMS","sched_backend":"cephUser","thread":"MainThread","tapeDrive":"G1_F3C3R3","mountId":"4131","vo":"cms","tapePool":"cms.cmsData2023","exceptionMessageValue":"commit problem committing the DB transaction: Database library reported: ERROR: duplicate key value violates unique constraint \"archive_file_din_dfi_un\"DETAIL: Key (disk_instance_name, disk_file_id)=(eoscta, 0000AEC65EE5F77D43B89B1E759F9D0B4400) already exists. (DB Result Status:7 SQLState:23505)"}
And I see a bunch of entries like:
{"epoch_time":1734362759.190981927,"local_time":"2024-12-16T09:25:59-0600","hostname":"gmv18014","program":"cta-taped","log_level":"INFO","pid":2193083,"tid":2251909,"message":"In ArchiveJob::failTransfer(): requeued job for (potentially in-mount) retry.","drive_name":"G1_F3C3R3","instance":"preProdCMS","sched_backend":"cephUser","thread":"MainThread","tapeDrive":"G1_F3C3R3","mountId":"4131","vo":"cms","tapePool":"cms.cmsData2023","fileId":4295386504,"copyNb":1,"failureReason":"In ArchiveMount::reportJobsBatchTransferred(): got an exception","requestObject":"ArchiveRequest-Frontend-cmscta01.fnal.gov-3003176-20241127-11:32:04-0-121348","retriesWithinMount":1,"maxRetriesWithinMount":2,"totalRetries":1,"maxTotalRetries":2}
It has been understood that errors associated with 00000ABDA46B324547FAADE3CD608295DCCC
seen in cta-objectstore-dump-object
are caused by:
"commit problem committing the DB transaction: Database library reported: ERROR: duplicate key value violates unique constraint \"archive_file_din_dfi_un\"DETAIL: Key (disk_instance_name, disk_file_id)=(eoscta, 0000AEC65EE5F77D43B89B1E759F9D0B4400) already exists. (DB Result Status:7 SQLState:23505)"
An attempt to archive a file already archived. It was just not clear to me why that other file was affected (00000ABDA46B324547FAADE3CD608295DCCC
). But now I understand that the whole batch fails if one failed. Right?
As for file 00000ABDA46B324547FAADE3CD608295DCCC
, it has been succesfully written:
{"epoch_time":1734175603.047356067,"local_time":"2024-12-14T05:26:43-0600","hostname":"gmv18014","program":"cta-taped","log_level":"INFO","pid":1266386,"tid":1321287,"message":"File successfully transmitted to drive","drive_name":"G1_F3C3R3","instance":"preProdCMS","sched_backend":"cephUser","thread":"TapeWrite","tapeDrive":"G1_F3C3R3","tapeVid":"FL2780","mountId":"3979","vo":"cms","tapePool":"cms.cmsData2023","mediaType":"LTO8","logicalLibrary":"TS4500G1_CTACMS","mountType":"ArchiveForUser","vendor":"Unknown","capacityInBytes":12000000000000,"fileId":4295372622,"fileSize":9113991,"fSeq":58645,"diskURL":"root://cmsdata322.fnal.gov:1097/00000ABDA46B324547FAADE3CD608295DCCC","readWriteTime":0.027945,"checksumingTime":0.004059,"waitDataTime":8e06,"waitReportingTime":0.000128,"transferTime":0.03214,"totalTime":0.032127,"dataVolume":9113991,"headerVolume":480,"driveTransferSpeedMBps":283.701279297787,"payloadTransferSpeedMBps":283.686338593706,"reconciliationTime":0,"LBPMode":"LBP_On"}
But archive ID did not register in DB:
# cta-admin tapefile ls --id 4295372622
Archive file with ID 4295372622 does not exist
And dCache received a failure to archive.