Summary
Not seeing files getting archive id for a long time after transfer complteted
Details
{
"clientVersion": {
"ctaVersion": "5.11.9.0-1",
"xrootdSsiProtobufInterfaceVersion": "v1.26"
},
"serverVersion": {
"ctaVersion": "5.11.9.0-1",
"xrootdSsiProtobufInterfaceVersion": "v1.26"
},
"catalogueConnectionString": "postgresql:postgresql://cta:******@ifdb11.fnal.gov:5468/cta_prd?gssencmode=disable",
"catalogueVersion": "15.0",
"isUpgrading": false,
"schedulerBackendName": "cephUser",
"instanceName": "prd"
}
]
Details
This is Fermilab production CTA installation working with dCache as cache layer. We migrated from Enstore to CTA in the beginning of April and it has been running fine. Last week we started to notice frequent dismounts on archive and started to look at the system more vigorously.
I have noticed in the log:
{
"epoch_time": 1752509471.182319686,
"local_time": "2025-07-14T11:11:11-0500",
"hostname": "tpsrvf2201",
"program": "cta-taped",
"log_level": "INFO",
"pid": 338027,
"tid": 338150,
"message": "File successfully read from disk",
"drive_name": "F1_F4B2D2",
"instance": "prd",
"sched_backend": "cephUser",
"thread": "DiskRead",
"tapeDrive": "F1_F4B2D2",
"tapeVid": "FA9585",
"mountId": "139837",
"vo": "cms",
"tapePool": "cms.OORun2025RAW",
"threadID": 0,
"path": "root://cmsstor807:1095/0000367B58EE527445749619F70D655A7BA8",
"actualURL": "root://cmsstor807:1095/0000367B58EE527445749619F70D655A7BA8?xrdcl.requuid=9146f910-8883-4d20-a4e9-dedad5ea56df",
"fileId": 4526908306,
"readWriteTime": 26.573199,
"checksumingTime": 0.0,
"waitFreeMemoryTime": 0.003905,
"waitDataTime": 0.0,
"waitReportingTime": 0.0,
"checkingErrorTime": 0.00115900000000001,
"openingTime": 0.022237,
"transferTime": 26.602023,
"totalTime": 26.602023,
"dataVolume": 5361627550,
"globalPayloadTransferSpeedMBps": 201.549617109947,
"diskPerformanceMBps": 201.549617109947,
"openRWCloseToTransferTimeRatio": 0.999752387252654
}
which I believe corresponds to a successful archive. However a few hours later this file still has no
archive id:
# cta-admin tf ls -i cms_prd -f 0000367B58EE527445749619F70D655A7BA8
archive id copy no vid fseq block id instance disk fxid size checksum type checksum value storage class owner group creation time
#
I assume this is because files are inserted into CTA DB in batches. So I assume there is file(s) holding the whole batch. Is that correct?
Question, how can I find out what files are in the batch? For instance I see in the log file (grepping for 0000367B58EE527445749619F70D655A7BA8 :
{
"epoch_time": 1752509444.682011138,
"local_time": "2025-07-14T11:10:44-0500",
"hostname": "tpsrvf2201",
"program": "cta-taped",
"log_level": "INFO",
"pid": 450022,
"tid": 450022,
"message": "Handler received SIGCHILD. Propagating to all handlers.",
"drive_name": "F1_F4B2D2",
"instance": "prd",
"sched_backend": "cephUser",
"capacityInBytes": "18000000000000",
"logicalLibrary": "F1_LTO9",
"mediaType": "LTO9",
"mountAttempted": "1",
"mountId": "139837",
"mountType": "ArchiveForUser",
"stillOpenFileForThread0": "root://cmsstor807:1095/0000367B58EE527445749619F70D655A7BA8?xrdcl.requuid=9146f910-8883-4d20-a4e9-dedad5ea56df",
"stillOpenFileForThread1": "root://cmsstor817:1095/0000ADB98390C459498284CEB9F8FD8F5534?xrdcl.requuid=e1de0dd1-184e-47b5-8786-167f6380f63b",
"stillOpenFileForThread2": "root://cmsstor808:1096/00003A8898E9FD084DCA9199274F170D9AF2?xrdcl.requuid=d926b82b-5bb2-40eb-90d6-664def504f10",
"stillOpenFileForThread4": "root://cmsdata321:1096/0000E008335098BA4E46B83E8ACACE18445B?xrdcl.requuid=24ab240f-3f33-4e33-b0a5-ff7f71166ba3",
"stillOpenFileForThread5": "root://cmsstor801:1095/0000EDEDE752D5E243F8816549D2DA436C72?xrdcl.requuid=4725afa3-7892-439a-b23f-2b33b6ccd38f",
"stillOpenFileForThread6": "root://cmsstor807:1094/0000B2104B4B604E40A9912A76AE7F0B8CCB?xrdcl.requuid=7e39bd20-6899-49b6-868c-b93e42f6b468",
"stillOpenFileForThread7": "root://cmsstor807:1094/000032C2664722C640059DC3B51FE8FEB4B0?xrdcl.requuid=2ab359a5-7f7f-4ef0-995c-78a02a3457b5",
"stillOpenFileForThread8": "root://cmsstor824:1094/0000773A856E883844E19513F42BA2BCEF85?xrdcl.requuid=ee8bdbb5-5018-49ad-bcdc-96a5f6d487c9",
"stillOpenFileForThread9": "root://cmsstor820:1096/00001AC7CAD99FAE49CFA824F5FF8D9BC7AC?xrdcl.requuid=1c434280-fd1a-4813-945c-8a2d89a51d66",
"tapePool": "cms.OORun2025RAW",
"tapeVid": "FA9585",
"vendor": "Fujifilm",
"vo": "cms",
"volReqId": "139837",
"SubprocessName": "signalHandler"
}
What does above mean? Does this mean that these 10 files are part of one batch? None of these other files have associated "File successfully read from disk" yet. Am I correct to expect the archive file ids assigned once all 10 files have been transferred to tape? IOW could you help me understand what is “holding” 0000367B58EE527445749619F70D655A7BA8 from being declared on tape.
Interestingly I do not see 'File successfully transmitted to drive' message associated w/ 0000367B58EE527445749619F70D655A7BA8
Thank you,
Dmitry