Cta-taped: "File writing to disk failed"

Hello,

I am trying to recall a file from tape but cta-taped fails because of the error below. My EOS dir is
drwxr-xr-+ 1 dteam001 dteam 23693 Aug 17 13:33 tape

with the following ACL: sys.acl=“u:1000:rwx+dp”

Can you please give an idea what I have got wrong? Many thanks.

2021-08-17T15:38:07.107054+01:00 cta-ts03 cta-taped: LVL=“ERROR” PID=“60920” TID=“60997” MSG=“File writing to disk failed” thread=“DiskWrite” tapeDrive=“asterix_ts1160_13” tapeVid=“CT4852” mountId=“14” threadCount=“10” threadID=“0” fileId=“4294967297” dstURL=“root://cta-eos14.scd.rl.ac.uk//eos/antaresdev/dteam/tape/ECHO_stresstest.tar?eos.lfn=fxid:39&eos.ruid=0&eos.rgid=0&eos.injection=1&eos.workflow=retrieve_written&eos.space=retrieve&oss.asize=20480” fSeq=“2” errorMessage=“In XrootWriteFile::XrootWriteFile failed XrdCl::File::Open() on root://cta-eos14.scd.rl.ac.uk//eos/antaresdev/dteam/tape/ECHO_stresstest.tar?eos.lfn=fxid:39&eos.ruid=0&eos.rgid=0&eos.injection=1&eos.workflow=retrieve_written&eos.space=retrieve&oss.asize=20480 [ERROR] Server responded with an error: [3010] Unable to open file /eos/antaresdev/dteam/tape/ECHO_stresstest.tar; Operation not permitted code:400 errNo:3010 status:1” readWriteTime=“0.000000” checksumingTime=“0.000000” waitDataTime=“1.500443” waitReportingTime=“0.010457” checkingErrorTime=“0.000012” openingTime=“0.000000” closingTime=“0.000000” transferTime=“0.000000” totalTime=“0.000000” dataVolume=“0” globalPayloadTransferSpeedMBps=“0.000000” diskPerformanceMBps=“0.000000” openRWCloseToTransferTimeRatio=“0.000000”

Hi George,

I am not 100% sure about the cause of your problem but my guess is that either the Simple Shared Secret (SSS) key used by your cta-taped daemon to authenticate with the MGM is failing or the account represented by the SSS key does not have EOS “sudo” privileges.

My tape server uses the following SSS keytab file to authenticate with the MGM. The file is owned by user cta which is the same user that the cta-taped daemon runs as:

[itctabuild02] ~ > ls -l /etc/cta/cta-taped.sss.keytab
-rw-------. 1 cta cta 135 Aug 17 19:07 /etc/cta/cta-taped.sss.keytab
[itctabuild02] ~ > 
[itctabuild02] ~ > ps -ef | grep cta-taped | grep -v grep
cta      17566     1  0 19:08 ?        00:00:00 cta-taped --log-to-file /var/log/cta/cta-taped.log
cta      17568 17566  0 19:08 ?        00:00:05 cta-taped --log-to-file /var/log/cta/cta-taped.log
cta      19251 17566  0 19:09 ?        00:00:01 cta-taped --log-to-file /var/log/cta/cta-taped.log
[itctabuild02] ~ > 

Please run the following command on your tape server machine to verify that your SSS key, like mine, is successfully authenticating your cta-taped daemon as user cta:

 [itctabuild02] ~ > sudo XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/cta-taped.sss.keytab runuser -u cta eos whoami
Virtual Identity: uid=19215 (99,19215) gid=30 (30,99) [authz:sss] sudo* host=localhost domain=localdomain
[itctabuild02] ~ > 
[itctabuild02] ~ > id 19215
uid=19215(cta) gid=30(tape) groups=1475(cta),30(tape)
[itctabuild02] ~ > 

In my case EOS authenticates my SSS key as being user cta belonging to group tape. The main point here is that it authenticates to user cta.

If this is the same for you, then you should check that EOS lists the user cta as an EOS “sudoer” by executing the following command on your MGM machine:

[itctabuild02] ~ > sudo eos vid ls | egrep ^sudoer
sudoer                 => uids(daemon,cta,eosadmin1,eosadmin2)
[itctabuild02] ~ > 

Please could you tell me if the above two checks pass or fail for you?

Regards,

Steve

Hi George,

Once we have finished solving your current problem of failing to retrieve a file from tape and write it to disk we need to look at your ACLs. You are missing the “no update” ACL required by tape backed files. Again let’s get your current situation unblocked and then fix your ACLs after.

Regards,

Steve

Hi Steve,

Thanks so much for the reply. I did suspect that this has something to do with the SSS keys and I spent some changing the user/group in /etc/cta/cta-taped.sss.keytab but with no much sucess, i.e. I kept getting the same error. So, this is what I have:

The SSS keys on EOS MGM and on the tape server are the same, I mean the part after the key name is the same; they look like

cat /etc/eos.keytab
0 u:daemon g:daemon n:eosantaresdev ****************************

cat cta-taped.sss.keytab
0 u:cta g:tape n:cta-taped *************************************

Basically, I copied /etc/eos.keytab to /etc/cta/cta-taped.sss.keytab replaced the string
“0 u:daemon g:daemon n:eosantaresdev” with “0 u:cta g:tape n:cta-taped”
and changed the ownership to cta/tape and permissions to 400.

I did manage to write a file to tape using the above SSS keys but the recall failed. I ran the following on the tape server

export XrdSecPROTOCOL=sss
export XrdSecSSSKT=/etc/cta/cta-taped.sss.keytab

[root@cta-ts03 cta]# runuser -u cta eos whoami
Virtual Identity: uid=2 (2,99) gid=2 (2,99) [authz:sss] host=cta-ts03.scd.rl.ac.uk domain=scd.rl.ac.uk

So cta-taped presents itself to EOS as…the daemon user (uid=2(daemon) gid=2(daemon) groups=2(daemon)) which I can’t explain…

The EOS sudoer list is empty at the moment:

[root@cta-eos14 ~]# eos vid ls | egrep ^sudoer
sudoer => uids()

Best,

George

Hi George,

You only have one key. You have copied that key to the key table file of the MGM and to the key table file of the tape server. In this scenario the tape server is the client trying to authenticate with the MGM which is the server. You have mapped the key in the MGM to the user daemon. It does not matter which mapping you put in the tape server, the MGM will stick with its interpretation of the facts.

If the tape server user is not in the EOS list of “sudoers” then the tape server can never become root
which is required for reading private user files and for writing files belonging to different users.

Cheers,

Steve

Hi Steve,

Thanks for the reply. We created a seperate SSS key for the tape server but then we ran into auth problems with the Frontend for which then we also created a dedicated key. Both of these keys were added to the /etc/eos.keytab of all EOS nodes (as shown at the bottom).

Also updated the ACL of the EOS dir to sys.acl=“u:1000:rwx+dp,z:!u,u:0:+u”

The file could be written to the EOS dir, the workflow could be created with the Frontend but the tape server still wasn’r permitted to open the file. After changing the EOS dir permisions (eos chmod) from 700 to 755, both migration and recall worked.

Hopefully, this set up is the correct one.

I noticed that the recalled file remains in the retrieve space even long after I xrdcp it out of EOS. Why is it not evicted (or GC’ed) from the buffer. Only FTS can do this?

Best,

George


xrdsssadmin list eos.keytab
Number Len Date/Time Created Expires Keyname User & Group
------ — --------- ------- -------- -------
2 32 08/24/21 12:47:27 -------- cta-taped cta tape
1 32 08/24/21 15:30:21 -------- cta_eosantaresdev eosantaresdev tape
1 32 06/14/21 15:50:16 -------- eosantaresdev daemon daemon

xrdsssadmin list ctafrontend_server_sss.keytab
Number Len Date/Time Created Expires Keyname User & Group
------ — --------- ------- -------- -------
1 32 08/24/21 15:30:21 -------- cta_eosantaresdev eosantaresdev tape

xrdsssadmin list cta-taped.keytab
Number Len Date/Time Created Expires Keyname User & Group
------ — --------- ------- -------- -------
2 32 08/24/21 12:47:27 -------- cta-taped cta tape

Hi George,

Eviction from the buffer has to be done explicitly by the client. FTS will do this for you automatically (if it detects a CTA endpoint), xrdcp will not. You would need to do xrdfs prepare -e to evict the file.

Cheers,

Oliver.

Hi Oliver,

Great, many thanks for this.

Best,

George