Use of cta-ops-verify-tape

Hi Richard,

Thanks for this. The above merge request link seems to be gone. Please let me know when the change will be merged to the main/master branch of the cta-ops tools.

Best,

George

Hi,

Really sorry for nagging! Is there update on the change of ops tools’s source code?

Thanks,

George

Dear George,

Apologies for the delay.

I merged the changes from Richard. The following 3 files have changed:

cta-ops-config.yaml
tools/pip/tapeadmin/src/tapeadmin/__init__.py
tools/pip/tapeverify/src/tapeverify/cta_verify_tape.py

Can you please try to pull the latest version and let me know how it goes?

If this does not resolve the problem, the next step I can propose is to have a remote debug session with you. Either I connect to your servers or you share your screen in Zoom and I will tell you what commands to type.

Regards,

Vladimir

Hi George,

Please excuse me for not getting back to you - I thought the issue was solved by Richard’s MR and did not look any further into it.

But I had a look today and in fact I think the SSS keytab is exactly why you are getting the error, as you had suggested earlier in the thread.
I managed to reproduce the issue:

[tape-local@ctaproductionfrontend11 ~]$ XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.sss.keytab /usr/bin/cta-verify-file --instance eosctaboguskonstattina --vid I76744 --id 4816781169
Error in Google Protocol Buffers: Instance name "eosctaboguskonstattina" does not match key identifier "tape-local"
/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x6b) [0x7f7a920a823d]
/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x91) [0x7f7a920a94fd]
/lib64/libXrdSsiCta.so(cta::exception::PbException::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x4c) [0x7f7a93ec0d5a]
/lib64/libXrdSsiCta.so(cta::frontend::WorkflowEvent::WorkflowEvent(cta::frontend::FrontendService const&, cta::common::dataStructures::SecurityIdentity const&, cta::eos::Notification const&)+0x6ac) [0x7f7a93f93320]
/lib64/libXrdSsiCta.so(cta::xrd::RequestMessage::process(cta::xrd::Request const&, cta::xrd::Response&, XrdSsiStream*&)+0x30a) [0x7f7a93ea2b22]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ExecuteAction()+0x169) [0x7f7a93e9e8e7]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::Execute()+0xd0) [0x7f7a93e9be6c]
/lib64/libXrdSsiCta.so(XrdSsiPb::Service<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ProcessRequest(XrdSsiRequest&, XrdSsiResource&)+0x8f) [0x7f7a93e9ad55]
/lib64/libXrdUtils.so.3(XrdScheduler::Run()+0x14a) [0x7f7ab73427ca]
/lib64/libXrdUtils.so.3(XrdStartWorking(void*)+0xd) [0x7f7ab73428cd]
/lib64/libXrdUtils.so.3(XrdSysThread_Xeq+0x3c) [0x7f7ab7380b2c]

This happens because the disk instance name I specified is not present in the keytab, instead the value of the user field is u:tape-local.
So indeed I think indeed the problem is the keytab and the fix would be what you suggested.

Edit: the above is valid in case you are using SSS to authenticate, which I understand you are.

Thanks for the confirmation Vlado.

@kskovola many thanks for getting back to me. Yes, I am using SSS to authenticate to the Frontend. I will create an SSS key for the eosctabogus disk instance and add it in the /etc/cta/eos.sss.keytab. I am assuming that I need to revert the definition in /etc/cta/cta-cli.conf

eos.instance cta-admin

to

eos.instance eosctabogus

is this correct?

Hi George,

Yes, the instance name needs to match the user name in the SSS key, so it would have to be set to eosctabogus again.

Really, sorry - I must be doing something wrong….

I added to /etc/cta/eos.sss.keytab

a line that looks like

0 u:verification g:it n:eosctabogus N:…….

and also added a verification user to the DB

but still get the same error. What am I missing….?

George,

see this config on our side: keytab file, error and success with 1 file queued:

[tape-local@ctaproductionfrontend11 ~]$ cat /etc/cta/tape-local.sss.keytab
0 u:tape-local g:nogroup n:nowhere N:<number> c:<number> e:0 f:0 k:<key>

[tape-local@ctaproductionfrontend11 ~]$ XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.sss.keytab /usr/bin/cta-verify-file --instance eosctabogus --vid I76744 --id 4816781169
Error in Google Protocol Buffers: Instance name "eosctabogus" does not match key identifier "tape-local"
/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x6b) [0x7f7a920a823d]
/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x91) [0x7f7a920a94fd]
/lib64/libXrdSsiCta.so(cta::exception::PbException::Exception(std::basic_string_view<char, std::char_traits<char> >, bool)+0x4c) [0x7f7a93ec0d5a]
/lib64/libXrdSsiCta.so(cta::frontend::WorkflowEvent::WorkflowEvent(cta::frontend::FrontendService const&, cta::common::dataStructures::SecurityIdentity const&, cta::eos::Notification const&)+0x6ac) [0x7f7a93f93320]
/lib64/libXrdSsiCta.so(cta::xrd::RequestMessage::process(cta::xrd::Request const&, cta::xrd::Response&, XrdSsiStream*&)+0x30a) [0x7f7a93ea2b22]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ExecuteAction()+0x169) [0x7f7a93e9e8e7]
/lib64/libXrdSsiCta.so(XrdSsiPb::RequestProc<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::Execute()+0xd0) [0x7f7a93e9be6c]
/lib64/libXrdSsiCta.so(XrdSsiPb::Service<cta::xrd::Request, cta::xrd::Response, cta::xrd::Alert>::ProcessRequest(XrdSsiRequest&, XrdSsiResource&)+0x8f) [0x7f7a93e9ad55]
/lib64/libXrdUtils.so.3(XrdScheduler::Run()+0x14a) [0x7f7ab73427ca]
/lib64/libXrdUtils.so.3(XrdStartWorking(void*)+0xd) [0x7f7ab73428cd]
/lib64/libXrdUtils.so.3(XrdSysThread_Xeq+0x3c) [0x7f7ab7380b2c]
/lib64/libc.so.6(+0x8a19a) [0x7f7ab6c8a19a]
/lib64/libc.so.6(+0x10f240) [0x7f7ab6d0f240]

[tape-local@ctaproductionfrontend11 ~]$ XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.sss.keytab /usr/bin/cta-verify-file --instance tape-local --vid I76744 --id 4816781169
RetrieveRequest-Frontend-ctaproductionfrontend11.cern.ch-1126-20251013-10:24:01-0-3336

[tape-local@ctaproductionfrontend11 ~]$ cta-admin sq | grep I76744
      Retrieve ctaproduction  cephUser     vo_ATLAS_raw       ATLAS   IBMLIB4-TS1160 I76744            1        5.2G    166      166       50     600              15               50           0          0         0          20.0T           5864         20.7T          1              0

Does this help you to resolve the issue?

If not, please provide similar information including the config file content and full commands you are trying to execute.

Vladimir

Hi Vlado,

If I create a seperate /etc/cta/tape-local.keytab as you suggest

[root@cta-front03 ~]# cat /etc/cta/tape-local.keytab
0 u:tape-local g:tape n:nowhere N.....

and put in /etc/cta/ on both the frontend and the admin node where I run the commands, I get an AUTH error

[root@cta-adm-preprodfac georgep]#  XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.keytab /usr/bin/cta-admin v
Error from XRootD SSI Framework: [FATAL] Auth failed: No protocols left to try
(venv) [root@cta-adm-preprodfac georgep]#
(venv) [root@cta-adm-preprodfac georgep]#  XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-local.keytab /usr/bin/cta-verify-file --vid JL0504 --id 4294991264
Error from XRootD SSI Framework: [FATAL] Auth failed: No protocols left to try
(venv) [root@cta-adm-preprodfac georgep]#

which is expected because there is no tape-local defined as CTA admin used in the DB

If I try the /etc/cta/cta-cli.keytab which has been used for all cta-admin commands, the cta-verify-command does work only if I specify –instance cta-admin because

[root@cta-front03 ~]# cat /etc/cta/cta-cli.keytab
0 u:cta-admin g:tape n:cta-admin N:…

The error re-appears when now I try the command that verifies a whole tape

cta-ops-verify-tape -v JL0504 -C cta-operations-utilities/cta-ops-config.yaml

I think this is because in cta-operations-utilities/cta-ops-config.yaml, the cta-verify-file command looks like it is hard coded as

cta_verify_file: “/usr/bin/cta-verify-file --instance eosctabogus --id”

maybe I need to replace eosctabogus with cta-admin in the above config line…?

I dont understand what is the the exact use of the “/etc/cta/tape-local.keytab” which is defined in cta-operations-utilities/cta-ops-config.yaml

default_user:
    name: "tape-local"
    group: "tape"
    sss_keytab_file: "/etc/cta/tape-local.keytab"

I can see that this is The system user for executing automated task but unless this user is defined as an admin user (i.e., cta-admin admin add –username tape-local) the tools are not going to work.

Hi George,

Thanks for trying further. I would suggest you forget about the tape-local concept, it is our internal CERN solution to run commands with local authentication.

I think indeed, your issue comes from the fact of the hardcoded --instance eosctabogus value in the cta_verify_file file. That is where Richard’s modifications should help as it should allow you to specify different - non-hardcoded config file.

If you do not want to try the Richard’s change, please replace the hardcoded value of eosctabogus to cta-admin as you suggested and give it a try again to verify whole tape.

If that works you are good to go and we will discuss internally how to make this more clearer as I also find this confusing.

Please report.

Vladimir

Hi Vlado,

Thanks for this.

I think I am using the latest version of the ops tools code that includes Richard’s changes.

Anyway, having defined eos.instance cta-admin in /etc/cta/cta-cli.conf and

cta_verify_file: “/usr/bin/cta-verify-file --instance cta-admin --id”

in cta-operations-utilities/cta-ops-config.yaml

Both the following commands workeed

XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.keytab /usr/bin/cta-verify-file --vid JL0504 --id 4294991264
XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.keytab cta-ops-verify-tape -v JL0504 -C cta-operations-utilities/cta-ops-config.yaml

the second verified 30 files in total (10 in the beginning, 10 in the middle and 10 from the end) from the tape as per default values

We will try now the tool on production tapes.

Best,

George

1 Like

Sorry to come back….

When trying to run the tool on the production system, I get the following error that I havent seen before

(venv) [root@cta-adm-fac1 georgep]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.keytab cta-ops-verify-tape -v TD4240 -C cta-operations-utilities/cta-ops-config.yaml
2025-10-23 16:47:35 [INFO] [verify_tape] Running verify-tape for tape with vid TD4240, read speed of 300 MB/s, and data size: 0B
2025-10-23 16:47:36 [INFO] [partial_tape_scan] Performing partial tape scan.
2025-10-23 16:47:36 [INFO] [verify_tape] Verifying 30 files and 163.4G from tape TD4240
2025-10-23 16:47:36 [INFO] [verify_files] ArchiveId of files to verify: 5317279,5243485,5243506,4377425995,4377425999,4377426011,4377426016,4377426020,4377426031,4377426043,4377886140,5429490,5902228,5902719,5808576,5618276,5865608,5865745,5595133,5986503,5820176,5866548,5866662,5970908,5953109,5820377,5970929,5866627,5866603,5866656
2025-10-23 16:47:36 [CRITICAL] [log_and_exit] Could not submit verification request for archiveId 5317279 of tape TD4240 (STDERR: Cannot retrieve file because the disk instance of the request does not match that of the archived file: archiveFileId=5317279 requestDiskInstance=cta-admin archiveFileDiskInstance=eosantaresfac)

which I dont quite understand as on preprod the diskInstance of the tapefiles was also different (than cta-admin) but there was no error….

Dear George,

We are very sorry that you still have difficulties with enabling tape verification in CTA.

It shouldn’t be this complicated.

As this ticket is already too long, I would suggest to get in touch privately and have a debug session over Zoom.

I will contact you by e-mail.

Vladimir

Hi George,

Is there a verification mount policy (cta.verification.mount_policy) set on the frontend against which you are running the verification?

If not, that would explain the error you are getting.

Hi George,

as Konstantina wrote above, I was able to reproduce your 2nd error message by not having / deleting the mount policy:

[root@ctapreproductionfrontend11 ~]# cta-admin mp ls|grep verif
       verification         50    14400         50      300    vlado             cta-frontend01 2023-05-26 14:50    vlado             cta-frontend01 2023-05-26 14:50 preproduction Tape Media Verification framework mount policy

[root@ctapreproductionfrontend11 ~]# cta-admin mp rm -n verification
[root@ctapreproductionfrontend11 ~]# cta-admin mp ls|grep verif

[root@ctapreproductionfrontend11 ~]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-cli.sss.keytab /usr/bin/cta-verify-file --instance tape-local --vid I72930 --id 4298404554
Cannot retrieve file because the disk instance of the request does not match that of the archived file: archiveFileId=4298404554 requestDiskInstance=tape-local archiveFileDiskInstance=eosctapublicpps

Please try to create verification mount policy on your production setup and confirm that it all works.

We will work on our side to make the error messages clearer.

Vladimir

Hi Konstantina and Vlado,

Many thanks again for the zoom session we had yesterday!

I had created a verification mount policy on the producton system before trying to run the tool (and also a VERIFICATION virtual organisation).

[root@cta-adm-fac1 ~]# cta-admin mp ls | grep verification
             verification          5      300          5      300 cta-admin              cta-front04 2025-10-22 16:48 cta-admin              cta-front04 2025-10-22 16:48  antares verification mount policy
[root@cta-adm-fac1 ~]#
[root@cta-adm-fac1 ~]# cta-admin vo ls | grep -i verification
 VERIFICATION               3                3          1.0T eosantaresfac        false cta-admin              cta-front04 2025-10-22 16:43 cta-admin cta-front04 2025-10-22 16:43  antares verification vo
[root@cta-adm-fac1 ~]#

There must be something else that is different between our preprod and prod instances and is causing the error. I will try to find out what this difference is.

George

And, of course, that was the case: I had forgotten to define the verfication policy in

/etc/cta/cta-frontend-xrootd.conf!

cta.verification.mount_policy  verification

cta-verify-file now submitted a verification retrieve!

Your comment on the verification policy triggered me to check - thank you so much

Really sorry for missing this…I cannot apologise too much!

George

Finally, I also managed to use a “disk instance” that is not called cta-admin but tape-verification

  • Defined a /etc/cta/tape-verification.keytab that looks like this
0 u:tape-verification g:nogroup n:nowhere N:...
  • Defined this instance in /etc/cta/cta-cli.conf on both admin node and frontend
eos.instance tape-verification
  • Ensured that the new tape-verification SSS key is present in the /etc/cta/eos.sss.keytab on the frontend, otherwise we the usual auth error by the frontend
Error from XRootD SSI Framework: [FATAL] Auth failed: No protocols left to try

This is because tape-verification is not only an admin user (see next change) but an a kind of EOS instance that has to be able to send work-flow events to the (admin) frontend like the one below

ct 29 12:10:02.575427153 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In WorkflowEvent::WorkflowEvent(): received event." instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" eventType="PREPARE" eosInstance="tape-verification" diskFilePath="dummy" diskFileId=""
Oct 29 12:10:02.612295613 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In OStoreDB::queueRetrieve(): recorded request for queueing (enqueueing posted to thread pool)." instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" tapeVid="TD4240" jobObject="RetrieveRequest-Frontend-cta-front01.scd.rl.ac.uk-1554916-20251029-12:00:22-0-1" fileId="4377425999" diskInstance="eosantaresfac" diskFilePath="dummy" diskFileId="209587550" vidSelectionTime="0.019826" agentReferencingTime="0.007868" insertionTime="0.002213" taskPostingTime="0.000492" taskQueueSize="1" totalTime="0.030399"
Oct 29 12:10:02.613241081 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In Scheduler::queueRetrieve(): Queued retrieve request" instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" fileId="4377425999" instanceName="tape-verification" diskSystemName="" diskFilePath="dummy" diskFileOwnerUid="0" diskFileGid="0" dstURL="file://dummy" errorReportURL="" creationHost="cta-front01" creationTime="1758774104" creationUser="tape-verification" requesterName="verification" requesterGroup="it" criteriaArchiveFileId="4377425999" criteriaCreationTime="1758774104" criteriaDiskFileId="209587550" criteriaDiskFileOwnerUid="0" criteriaDiskInstance="eosantaresfac" criteriaFileSize="9324529165" reconciliationTime="1758774104" storageClass="diamond_i13-1" checksumType="ADLER32" checksumValue="a7ed647d" fSeq="5" vid="TD4240" blockId="33661" fileSize="9324529165" copyNb="1" selectedVid="TD4240" verifyOnly="1" catalogueTime="0.006192" schedulerDbTime="0.030778" policyName="verification" policyMinAge="300" policyPriority="5" retrieveRequestId="RetrieveRequest-Frontend-cta-front01.scd.rl.ac.uk-1554916-20251029-12:00:22-0-1"
Oct 29 12:10:02.613703518 cta-front01 cta-frontend: LVL="INFO" PID="1554916" TID="1555454" MSG="In WorkflowEvent::processPREPARE(): queued file for retrieve." instance="antares" sched_backend="cephUser" user="tape-verification@cta-front01" fileId="4377425999" schedulerTime="0.038128" isVerifyOnly="1" retrieveReqId="RetrieveRequest-Frontend-cta-front01.scd.rl.ac.uk-1554916-20251029-12:00:22-0-1"
  • Finally, tape-verification needs to be defined as an admin user in the CTA DB, otherwise
User: tape-verification on host: cta-front01 is not authorized to execute CTA admin commands

With all the above changes, the following command works as expected

XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/tape-verification.keytab cta-verify-file --instance tape-verification --id 4377425999 --vid TD4240
1 Like

George,

now that you can verify one file, can you also verify several files on a tape with cta-ops-verify-tape?

If that works, then you can configure the rest of the framework to automatically select tapes for verifications as I explain in my earlier post above.

Vladimir

Hi Vlado,

Yes, I am happy to report that the cta-ops-verify-tape works as well (given the instance modification of the cta_verify_file line in cta-operations-utilities/cta-ops-config.yaml).

Just a last question: if the verification of one or more files fails where exactly we would see the error(s)?

So far, I have been following the verification progress in cta-taped log where we can see the outcome in

{“epoch_time”:1761742964.882947779,“local_time”:“2025-10-29T13:02:44+0000”,“hostname”:“getafix-ts14”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:2771,“tid”:2771,“message”:“Tape session finished”,“drive_name”:“obelix_ts1170_27”,“instance”:“antares”,“sched_backend”:“cephUser”,“capacityInBytes”:“50000000000000”,“logicalLibrary”:“obelix_ts1170”,“mediaType”:“TS1170”,“mountAttempted”:“1”,“mountId”:“3228093”,“mountType”:“Retrieve”,“tapePool”:“diamond_i13-1”,“tapeVid”:“TD4240”,“vendor”:“IBM”,“vo”:“storaged_dls”,“volReqId”:“3228093”,“wasTapeMounted”:“1”,“mountTime”:“20.136459”,“positionTime”:“358.090478”,“waitInstructionsTime”:“0.211757”,“waitFreeMemoryTime”:“0.01022”,“waitDataTime”:“0.0”,“waitReportingTime”:“0.10875”,“checksumingTime”:“0.0”,“readWriteTime”:“592.523687”,“flushTime”:“0.0”,“unloadTime”:“143.196796”,“unmountTime”:“191.776099”,“encryptionControlTime”:“0.009”,“transferTime”:“592.854414”,“totalTime”:“1305.89575”,“deliveryTime”:“971.117454”,“drainingTime”:“0.0”,“dataVolume”:“97614829801”,“filesCount”:“30”,“headerVolume”:“14400”,“payloadTransferSpeedMBps”:“74.7493280386279”,“driveTransferSpeedMBps”:“74.7493390655418”,“repackFilesCount”:“0”,“userFilesCount”:“0”,“verifiedFilesCount”:“30”,“repackBytesCount”:“0”,“userBytesCount”:“0”,“verifiedBytesCount”:“97614829801”,“status”:“success”,“tapeDrive”:“obelix_ts1170_27”,“subprocessPid”:1487300,“exitCode”:0,“killSignal”:0}