Hello,
I am observing In ArchiveMount::reportJobsBatchTransferred(): got an exception
when archiving files to CTA. This is not a 100% failure - most of the archive calls are successul.
This is dCache/CTA setup.
CTA version 5.10.11.0-1.el9.el9.x86_64
dCache version: 9.2
dCache CTA driver : 0.15.0
# cta-admin --json v| jq
[
{
"clientVersion": {
"ctaVersion": "5.10.11.0-1.el9",
"xrootdSsiProtobufInterfaceVersion": "v1.15"
},
"serverVersion": {
"ctaVersion": "5.10.11.0-1.el9",
"xrootdSsiProtobufInterfaceVersion": "v1.15"
},
"catalogueConnectionString": "postgresql:postgresql://cta:******@example.org:5432/cta",
"catalogueVersion": "14.0",
"isUpgrading": false
}
]
(I put bogus names to db connection string)
I am just asking for some guideance to get to to bottom of the issue. What I observe is the following:
- based on cta-taped log a file seem to get written to CTA.
- then something fails, looks like in some reporting, and the archive fails. Then, after a while dCache retries ad nauseam with the same result
example request dump:
# cta-objectstore-dump-object ArchiveRequest-Frontend-cmscta01.fnal.gov-3003176-20241127-11:32:04-0-96848
Object store path: rados://CTAQUEUE@rados.cta.CTAQUEUE:CTAQUEUE
Object name: ArchiveRequest-Frontend-cmscta01.fnal.gov-3003176-20241127-11:32:04-0-96848
Header dump:
{
"type": "ArchiveRequest_t",
"version": "0",
"owner": "",
"backupowner": "",
"payload": "8LEE/aCYgBCasgR9orEEC2N0YV9pbml0aWFsqLEEAbCxBLAJuLEEAcCxBLAJ0rEEHrKvBAZlb3NkZXaCsAQIY21zY3RhMDHQsASBqNesBtqxBB6yrwQGZW9zZGV2grAECGNtc2N0YTAx0LAEyPDGrwbisQQUSW5pdGlhbCBtb3VudCBwb2xpY3maswQKCggIARIETdVqIbizBICAgICAgICAgAHIswQAgrUEMcCtBAHorQQB4q4EJS8wMDAwMEFCREE0NkIzMjQ1NDdGQUFERTNDRDYwODI5NURDQ0PStQQkMDAwMDBBQkRBNDZCMzI0NTQ3RkFBREUzQ0Q2MDgyOTVEQ0ND+rUEBmVvc2N0YYq2BGVlb3NRdWVyeTovL2Ntc2RhdGEzMjIuZm5hbC5nb3Y6MTA5Ny9zdWNjZXNzLzAwMDAwQUJEQTQ2QjMyNDU0N0ZBQURFM0NENjA4Mjk1RENDQz9hcmNoaXZlaWQ9NDI5NTM2NDczM5K2BFVlb3NRdWVyeTovL2Ntc2RhdGEzMjIuZm5hbC5nb3Y6MTA5Ny9lcnJvci8wMDAwMEFCREE0NkIzMjQ1NDdGQUFERTNDRDYwODI5NURDQ0M/ZXJyb3I9oLYEh6OsBPK2BBBSBHJvb3RaCGVvc3VzZXJzwrcERHJvb3Q6Ly9jbXNkYXRhMzIyLmZuYWwuZ292OjEwOTcvMDAwMDBBQkRBNDZCMzI0NTQ3RkFBREUzQ0Q2MDgyOTVEQ0NDkrgEE2Ntcy5jbXNEYXRhMjAyM0BjdGGauAQwsq8EBmVvc2N0YYKwBBppcHY0OjEzMS4yMjUuMTkwLjE3Mjo0MjM1MtCwBO3b8boGorgE5QKAkwIBipMCD2Ntcy5jbXNEYXRhMjAyM5KTAgCakwJeQXJjaGl2ZVF1ZXVlRmFpbGVkLWNtcy5jbXNEYXRhMjAyMy1NYWludGVuYW5jZS1mbXYyMjAyOS5mbmFsLmdvdi0yODAzLTIwMjQxMTAxLTEwOjI4OjMzLTAtMjcxM6CTAuYHqJMCArCTAgG4kwLbHsCTAgLIkwIC0pMCX0RlYyAxMyAxNDozMzowNi44NTg1NDAgZ212MTgwMTQgSW4gQXJjaGl2ZU1vdW50OjpyZXBvcnRKb2JzQmF0Y2hUcmFuc2ZlcnJlZCgpOiBnb3QgYW4gZXhjZXB0aW9u0pMCX0RlYyAxMyAxNTowMzoxNC4zNDg1OTQgZ212MTgwMTQgSW4gQXJjaGl2ZU1vdW50OjpyZXBvcnRKb2JzQmF0Y2hUcmFuc2ZlcnJlZCgpOiBnb3QgYW4gZXhjZXB0aW9u2JMCAuCTAgCouAQBuLgEAOK4BAtjdGFfaW5pdGlhbA=="
}
Body dump:
{
"archivefileid": "4295364733",
"mountpolicy": {
"name": "cta_initial",
"archivepriority": "1",
"archiveminrequestage": "1200",
"retrievepriority": "1",
"retieveminrequestage": "1200",
"creationlog": {
"username": "eosdev",
"host": "cmscta01",
"time": "1704317953"
},
"lastmodificationlog": {
"username": "eosdev",
"host": "cmscta01",
"time": "1710340168"
},
"comment": "Initial mount policy"
},
"mountpolicyname": "cta_initial",
"checksumblob": "CggIARIETdVqIQ==",
"creationtime": "9223372036854775808",
"reconcilationtime": "0",
"diskfileinfo": {
"ownerUid": 1,
"gid": 1,
"path": "/00000ABDA46B324547FAADE3CD608295DCCC"
},
"diskfileid": "00000ABDA46B324547FAADE3CD608295DCCC",
"diskinstance": "eoscta",
"archivereporturl": "eosQuery://cmsdata322.fnal.gov:1097/success/00000ABDA46B324547FAADE3CD608295DCCC?archiveid=4295364733",
"archiveerrorreporturl": "eosQuery://cmsdata322.fnal.gov:1097/error/00000ABDA46B324547FAADE3CD608295DCCC?error=",
"filesize": "9113991",
"requester": {
"name": "root",
"group": "eosusers"
},
"srcurl": "root://cmsdata322.fnal.gov:1097/00000ABDA46B324547FAADE3CD608295DCCC",
"storageclass": "cms.cmsData2023@cta",
"creationlog": {
"username": "eoscta",
"host": "ipv4:131.225.190.172:42352",
"time": "1734110701"
},
"jobs": [
{
"copynb": 1,
"tapepool": "cms.cmsData2023",
"archivequeueaddress": "",
"owner": "ArchiveQueueFailed-cms.cmsData2023-Maintenance-fmv22029.fnal.gov-2803-20241101-10:28:33-0-2713",
"status": "AJS_Failed",
"totalretries": 2,
"retrieswithinmount": 1,
"lastmountwithfailure": "3931",
"maxtotalretries": 2,
"maxretrieswithinmount": 2,
"failurelogs": [
"Dec 13 14:33:06.858540 gmv18014 In ArchiveMount::reportJobsBatchTransferred(): got an exception",
"Dec 13 15:03:14.348594 gmv18014 In ArchiveMount::reportJobsBatchTransferred(): got an exception"
],
"maxreportretries": 2,
"totalreportretries": 0,
"reportfailurelogs": []
}
],
"reportdecided": true,
"isrepack": false,
"isfailed": false
}
I am mostly curious if this looks familiar and where I need to dig to figure out what is wrong. taped log does not show any failures.