Hi Richard,
Thanks for this. The above merge request link seems to be gone. Please let me know when the change will be merged to the main/master branch of the cta-ops tools.
Best,
George
Hi Richard,
Thanks for this. The above merge request link seems to be gone. Please let me know when the change will be merged to the main/master branch of the cta-ops tools.
Best,
George
Hi,
Really sorry for nagging! Is there update on the change of ops tools’s source code?
Thanks,
George
Dear George,
Apologies for the delay.
I merged the changes from Richard. The following 3 files have changed:
cta-ops-config.yaml
tools/pip/tapeadmin/src/tapeadmin/__init__.py
tools/pip/tapeverify/src/tapeverify/cta_verify_tape.py
Can you please try to pull the latest version and let me know how it goes?
If this does not resolve the problem, the next step I can propose is to have a remote debug session with you. Either I connect to your servers or you share your screen in Zoom and I will tell you what commands to type.
Regards,
Vladimir
Hi George,
Please excuse me for not getting back to you - I thought the issue was solved by Richard’s MR and did not look any further into it.
But I had a look today and in fact I think the SSS keytab is exactly why you are getting the error, as you had suggested earlier in the thread.
I managed to reproduce the issue:
[tape-local@ctaproductionfrontend11 ~]$ XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.sss.keytab /usr/bin/cta-verify-file --instance eosctaboguskonstattina --vid I76744 --id 4816781169
Error in Google Protocol Buffers: Instance name "eosctaboguskonstattina" does not match key identifier "tape-local"
/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x6b) [0x7f7a920a823d]
/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x91) [0x7f7a920a94fd]
/lib64/libXrdSsiCta.so(cta::exception::PbException::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x4c) [0x7f7a93ec0d5a]
/lib64/libXrdSsiCta.so(cta::frontend::WorkflowEvent::WorkflowEvent(cta::frontend::FrontendService const&, cta::common::dataStructures::SecurityIdentity const&, cta::eos::Notification const&)+0x6ac) [0x7f7a93f93320]
/lib64/libXrdSsiCta.so(cta::xrd::RequestMessage::process(cta::xrd::Request const&, cta::xrd::Response&, XrdSsiStream*&)+0x30a) [0x7f7a93ea2b22]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ExecuteAction()+0x169) [0x7f7a93e9e8e7]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::Execute()+0xd0) [0x7f7a93e9be6c]
/lib64/libXrdSsiCta.so(XrdSsiPb::Service<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ProcessRequest(XrdSsiRequest&, XrdSsiResource&)+0x8f) [0x7f7a93e9ad55]
/lib64/libXrdUtils.so.3(XrdScheduler::Run()+0x14a) [0x7f7ab73427ca]
/lib64/libXrdUtils.so.3(XrdStartWorking(void*)+0xd) [0x7f7ab73428cd]
/lib64/libXrdUtils.so.3(XrdSysThread_Xeq+0x3c) [0x7f7ab7380b2c]
This happens because the disk instance name I specified is not present in the keytab, instead the value of the user field is u:tape-local.
So indeed I think indeed the problem is the keytab and the fix would be what you suggested.
Edit: the above is valid in case you are using SSS to authenticate, which I understand you are.
Thanks for the confirmation Vlado.
@kskovola many thanks for getting back to me. Yes, I am using SSS to authenticate to the Frontend. I will create an SSS key for the eosctabogus disk instance and add it in the /etc/cta/eos.sss.keytab. I am assuming that I need to revert the definition in /etc/cta/cta-cli.conf
eos.instance cta-admin
to
eos.instance eosctabogus
is this correct?
Hi George,
Yes, the instance name needs to match the user name in the SSS key, so it would have to be set to eosctabogus again.
Really, sorry - I must be doing something wrong….
I added to /etc/cta/eos.sss.keytab
a line that looks like
0 u:verification g:it n:eosctabogus N:…….
and also added a verification user to the DB
but still get the same error. What am I missing….?
George,
see this config on our side: keytab file, error and success with 1 file queued:
[tape-local@ctaproductionfrontend11 ~]$ cat /etc/cta/tape-local.sss.keytab
0 u:tape-local g:nogroup n:nowhere N:<number> c:<number> e:0 f:0 k:<key>
[tape-local@ctaproductionfrontend11 ~]$ XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.sss.keytab /usr/bin/cta-verify-file --instance eosctabogus --vid I76744 --id 4816781169
Error in Google Protocol Buffers: Instance name "eosctabogus" does not match key identifier "tape-local"
/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x6b) [0x7f7a920a823d]
/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x91) [0x7f7a920a94fd]
/lib64/libXrdSsiCta.so(cta::exception::PbException::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x4c) [0x7f7a93ec0d5a]
/lib64/libXrdSsiCta.so(cta::frontend::WorkflowEvent::WorkflowEvent(cta::frontend::FrontendService const&, cta::common::dataStructures::SecurityIdentity const&, cta::eos::Notification const&)+0x6ac) [0x7f7a93f93320]
/lib64/libXrdSsiCta.so(cta::xrd::RequestMessage::process(cta::xrd::Request const&, cta::xrd::Response&, XrdSsiStream*&)+0x30a) [0x7f7a93ea2b22]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ExecuteAction()+0x169) [0x7f7a93e9e8e7]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::Execute()+0xd0) [0x7f7a93e9be6c]
/lib64/libXrdSsiCta.so(XrdSsiPb::Service<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ProcessRequest(XrdSsiRequest&, XrdSsiResource&)+0x8f) [0x7f7a93e9ad55]
/lib64/libXrdUtils.so.3(XrdScheduler::Run()+0x14a) [0x7f7ab73427ca]
/lib64/libXrdUtils.so.3(XrdStartWorking(void*)+0xd) [0x7f7ab73428cd]
/lib64/libXrdUtils.so.3(XrdSysThread_Xeq+0x3c) [0x7f7ab7380b2c]
/lib64/libc.so.6(+0x8a19a) [0x7f7ab6c8a19a]
/lib64/libc.so.6(+0x10f240) [0x7f7ab6d0f240]
[tape-local@ctaproductionfrontend11 ~]$ XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.sss.keytab /usr/bin/cta-verify-file --instance tape-local --vid I76744 --id 4816781169
RetrieveRequest-Frontend-ctaproductionfrontend11.cern.ch-1126-20251013-10:24:01-0-3336
[tape-local@ctaproductionfrontend11 ~]$ cta-admin sq | grep I76744
Retrieve ctaproduction cephUser vo_ATLAS_raw ATLAS IBMLIB4-TS1160 I76744 1 5.2G 166 166 50 600 15 50 0 0 0 20.0T 5864 20.7T 1 0
Does this help you to resolve the issue?
If not, please provide similar information including the config file content and full commands you are trying to execute.
Vladimir
Hi Vlado,
If I create a seperate /etc/cta/tape-local.keytab as you suggest
[root@cta-front03 ~]# cat /etc/cta/tape-local.keytab
0 u:tape-local g:tape n:nowhere N.....
and put in /etc/cta/ on both the frontend and the admin node where I run the commands, I get an AUTH error
[root@cta-adm-preprodfac georgep]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.keytab /usr/bin/cta-admin v
Error from XRootD SSI Framework: [FATAL] Auth failed: No protocols left to try
(venv) [root@cta-adm-preprodfac georgep]#
(venv) [root@cta-adm-preprodfac georgep]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.keytab /usr/bin/cta-verify-file --vid JL0504 --id 4294991264
Error from XRootD SSI Framework: [FATAL] Auth failed: No protocols left to try
(venv) [root@cta-adm-preprodfac georgep]#
which is expected because there is no tape-local defined as CTA admin used in the DB
If I try the /etc/cta/cta-cli.keytab which has been used for all cta-admin commands, the cta-verify-command does work only if I specify –instance cta-admin because
[root@cta-front03 ~]# cat /etc/cta/cta-cli.keytab
0 u:cta-admin g:tape n:cta-admin N:…
The error re-appears when now I try the command that verifies a whole tape
cta-ops-verify-tape -v JL0504 -C cta-operations-utilities/cta-ops-config.yaml
I think this is because in cta-operations-utilities/cta-ops-config.yaml, the cta-verify-file command looks like it is hard coded as
cta_verify_file: “/usr/bin/cta-verify-file --instance eosctabogus --id”
maybe I need to replace eosctabogus with cta-admin in the above config line…?
I dont understand what is the the exact use of the “/etc/cta/tape-local.keytab” which is defined in cta-operations-utilities/cta-ops-config.yaml
default_user:
name: "tape-local"
group: "tape"
sss_keytab_file: "/etc/cta/tape-local.keytab"
I can see that this is The system user for executing automated task but unless this user is defined as an admin user (i.e., cta-admin admin add –username tape-local) the tools are not going to work.
Hi George,
Thanks for trying further. I would suggest you forget about the tape-local concept, it is our internal CERN solution to run commands with local authentication.
I think indeed, your issue comes from the fact of the hardcoded --instance eosctabogus value in the cta_verify_file file. That is where Richard’s modifications should help as it should allow you to specify different - non-hardcoded config file.
If you do not want to try the Richard’s change, please replace the hardcoded value of eosctabogus to cta-admin as you suggested and give it a try again to verify whole tape.
If that works you are good to go and we will discuss internally how to make this more clearer as I also find this confusing.
Please report.
Vladimir
Hi Vlado,
Thanks for this.
I think I am using the latest version of the ops tools code that includes Richard’s changes.
Anyway, having defined eos.instance cta-admin in /etc/cta/cta-cli.conf and
cta_verify_file: “/usr/bin/cta-verify-file --instance cta-admin --id”
in cta-operations-utilities/cta-ops-config.yaml
Both the following commands workeed
XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.keytab /usr/bin/cta-verify-file --vid JL0504 --id 4294991264
XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.keytab cta-ops-verify-tape -v JL0504 -C cta-operations-utilities/cta-ops-config.yaml
the second verified 30 files in total (10 in the beginning, 10 in the middle and 10 from the end) from the tape as per default values
We will try now the tool on production tapes.
Best,
George
Sorry to come back….
When trying to run the tool on the production system, I get the following error that I havent seen before
(venv) [root@cta-adm-fac1 georgep]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.keytab cta-ops-verify-tape -v TD4240 -C cta-operations-utilities/cta-ops-config.yaml
2025-10-23 16:47:35 [INFO] [verify_tape] Running verify-tape for tape with vid TD4240, read speed of 300 MB/s, and data size: 0B
2025-10-23 16:47:36 [INFO] [partial_tape_scan] Performing partial tape scan.
2025-10-23 16:47:36 [INFO] [verify_tape] Verifying 30 files and 163.4G from tape TD4240
2025-10-23 16:47:36 [INFO] [verify_files] ArchiveId of files to verify: 5317279,5243485,5243506,4377425995,4377425999,4377426011,4377426016,4377426020,4377426031,4377426043,4377886140,5429490,5902228,5902719,5808576,5618276,5865608,5865745,5595133,5986503,5820176,5866548,5866662,5970908,5953109,5820377,5970929,5866627,5866603,5866656
2025-10-23 16:47:36 [CRITICAL] [log_and_exit] Could not submit verification request for archiveId 5317279 of tape TD4240 (STDERR: Cannot retrieve file because the disk instance of the request does not match that of the archived file: archiveFileId=5317279 requestDiskInstance=cta-admin archiveFileDiskInstance=eosantaresfac)
which I dont quite understand as on preprod the diskInstance of the tapefiles was also different (than cta-admin) but there was no error….
Dear George,
We are very sorry that you still have difficulties with enabling tape verification in CTA.
It shouldn’t be this complicated.
As this ticket is already too long, I would suggest to get in touch privately and have a debug session over Zoom.
I will contact you by e-mail.
Vladimir
Hi George,
Is there a verification mount policy (cta.verification.mount_policy) set on the frontend against which you are running the verification?
If not, that would explain the error you are getting.
Hi George,
as Konstantina wrote above, I was able to reproduce your 2nd error message by not having / deleting the mount policy:
[root@ctapreproductionfrontend11 ~]# cta-admin mp ls|grep verif
verification 50 14400 50 300 vlado cta-frontend01 2023-05-26 14:50 vlado cta-frontend01 2023-05-26 14:50 preproduction Tape Media Verification framework mount policy
[root@ctapreproductionfrontend11 ~]# cta-admin mp rm -n verification
[root@ctapreproductionfrontend11 ~]# cta-admin mp ls|grep verif
[root@ctapreproductionfrontend11 ~]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.sss.keytab /usr/bin/cta-verify-file --instance tape-local --vid I72930 --id 4298404554
Cannot retrieve file because the disk instance of the request does not match that of the archived file: archiveFileId=4298404554 requestDiskInstance=tape-local archiveFileDiskInstance=eosctapublicpps
Please try to create verification mount policy on your production setup and confirm that it all works.
We will work on our side to make the error messages clearer.
Vladimir
Hi Konstantina and Vlado,
Many thanks again for the zoom session we had yesterday!
I had created a verification mount policy on the producton system before trying to run the tool (and also a VERIFICATION virtual organisation).
[root@cta-adm-fac1 ~]# cta-admin mp ls | grep verification
verification 5 300 5 300 cta-admin cta-front04 2025-10-22 16:48 cta-admin cta-front04 2025-10-22 16:48 antares verification mount policy
[root@cta-adm-fac1 ~]#
[root@cta-adm-fac1 ~]# cta-admin vo ls | grep -i verification
VERIFICATION 3 3 1.0T eosantaresfac false cta-admin cta-front04 2025-10-22 16:43 cta-admin cta-front04 2025-10-22 16:43 antares verification vo
[root@cta-adm-fac1 ~]#
There must be something else that is different between our preprod and prod instances and is causing the error. I will try to find out what this difference is.
George
And, of course, that was the case: I had forgotten to define the verfication policy in
/etc/cta/cta-frontend-xrootd.conf!
cta.verification.mount_policy verification
cta-verify-file now submitted a verification retrieve!
Your comment on the verification policy triggered me to check - thank you so much
Really sorry for missing this…I cannot apologise too much!
George
Finally, I also managed to use a “disk instance” that is not called cta-admin but tape-verification
/etc/cta/tape-verification.keytab that looks like this0 u:tape-verification g:nogroup n:nowhere N:...
/etc/cta/cta-cli.conf on both admin node and frontendeos.instance tape-verification
Error from XRootD SSI Framework: [FATAL] Auth failed: No protocols left to try
This is because tape-verification is not only an admin user (see next change) but an a kind of EOS instance that has to be able to send work-flow events to the (admin) frontend like the one below
ct 29 12:10:02.575427153 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In WorkflowEvent::WorkflowEvent(): received event." instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" eventType="PREPARE" eosInstance="tape-verification" diskFilePath="dummy" diskFileId=""
Oct 29 12:10:02.612295613 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In OStoreDB::queueRetrieve(): recorded request for queueing (enqueueing posted to thread pool)." instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" tapeVid="TD4240" jobObject="RetrieveRequest-Frontend-cta-front01.scd.rl.ac.uk-1554916-20251029-12:00:22-0-1" fileId="4377425999" diskInstance="eosantaresfac" diskFilePath="dummy" diskFileId="209587550" vidSelectionTime="0.019826" agentReferencingTime="0.007868" insertionTime="0.002213" taskPostingTime="0.000492" taskQueueSize="1" totalTime="0.030399"
Oct 29 12:10:02.613241081 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In Scheduler::queueRetrieve(): Queued retrieve request" instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" fileId="4377425999" instanceName="tape-verification" diskSystemName="" diskFilePath="dummy" diskFileOwnerUid="0" diskFileGid="0" dstURL="file://dummy" errorReportURL="" creationHost="cta-front01" creationTime="1758774104" creationUser="tape-verification" requesterName="verification" requesterGroup="it" criteriaArchiveFileId="4377425999" criteriaCreationTime="1758774104" criteriaDiskFileId="209587550" criteriaDiskFileOwnerUid="0" criteriaDiskInstance="eosantaresfac" criteriaFileSize="9324529165" reconciliationTime="1758774104" storageClass="diamond_i13-1" checksumType="ADLER32" checksumValue="a7ed647d" fSeq="5" vid="TD4240" blockId="33661" fileSize="9324529165" copyNb="1" selectedVid="TD4240" verifyOnly="1" catalogueTime="0.006192" schedulerDbTime="0.030778" policyName="verification" policyMinAge="300" policyPriority="5" retrieveRequestId="RetrieveRequest-Frontend-cta-front01.scd.rl.ac.uk-1554916-20251029-12:00:22-0-1"
Oct 29 12:10:02.613703518 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In WorkflowEvent::processPREPARE(): queued file for retrieve." instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" fileId="4377425999" schedulerTime="0.038128" isVerifyOnly="1" retrieveReqId="RetrieveRequest-Frontend-cta-front01.scd.rl.ac.uk-1554916-20251029-12:00:22-0-1"
tape-verification needs to be defined as an admin user in the CTA DB, otherwiseUser: tape-verification on host: cta-front01 is not authorized to execute CTA admin commands
With all the above changes, the following command works as expected
XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-verification.keytab cta-verify-file --instance tape-verification --id 4377425999 --vid TD4240
George,
now that you can verify one file, can you also verify several files on a tape with cta-ops-verify-tape?
If that works, then you can configure the rest of the framework to automatically select tapes for verifications as I explain in my earlier post above.
Vladimir
Hi Vlado,
Yes, I am happy to report that the cta-ops-verify-tape works as well (given the instance modification of the cta_verify_file line in cta-operations-utilities/cta-ops-config.yaml).
Just a last question: if the verification of one or more files fails where exactly we would see the error(s)?
So far, I have been following the verification progress in cta-taped log where we can see the outcome in
{“epoch_time”:1761742964.882947779,“local_time”:“2025-10-29T13:02:44+0000”,“hostname”:“getafix-ts14”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:2771,“tid”:2771,“message”:“Tape session finished”,“drive_name”:“obelix_ts1170_27”,“instance”:“antares”,“sched_backend”:“cephUser”,“capacityInBytes”:“50000000000000”,“logicalLibrary”:“obelix_ts1170”,“mediaType”:“TS1170”,“mountAttempted”:“1”,“mountId”:“3228093”,“mountType”:“Retrieve”,“tapePool”:“diamond_i13-1”,“tapeVid”:“TD4240”,“vendor”:“IBM”,“vo”:“storaged_dls”,“volReqId”:“3228093”,“wasTapeMounted”:“1”,“mountTime”:“20.136459”,“positionTime”:“358.090478”,“waitInstructionsTime”:“0.211757”,“waitFreeMemoryTime”:“0.01022”,“waitDataTime”:“0.0”,“waitReportingTime”:“0.10875”,“checksumingTime”:“0.0”,“readWriteTime”:“592.523687”,“flushTime”:“0.0”,“unloadTime”:“143.196796”,“unmountTime”:“191.776099”,“encryptionControlTime”:“0.009”,“transferTime”:“592.854414”,“totalTime”:“1305.89575”,“deliveryTime”:“971.117454”,“drainingTime”:“0.0”,“dataVolume”:“97614829801”,“filesCount”:“30”,“headerVolume”:“14400”,“payloadTransferSpeedMBps”:“74.7493280386279”,“driveTransferSpeedMBps”:“74.7493390655418”,“repackFilesCount”:“0”,“userFilesCount”:“0”,“verifiedFilesCount”:“30”,“repackBytesCount”:“0”,“userBytesCount”:“0”,“verifiedBytesCount”:“97614829801”,“status”:“success”,“tapeDrive”:“obelix_ts1170_27”,“subprocessPid”:1487300,“exitCode”:0,“killSignal”:0}