Eos cp could not send SSI buffer

Hi,

I get this error when copying a file with eos:

error: target file open failed - errno=5 : Input/output error

checking the mgm logs:

210920 13:38:58 time=1632137938.280114 func=open                     level=INFO  logid=57dc5ee0-1a07-11ec-943a-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=XrdMgmOfsFile:1056             tident=root.15713:117@localhost sec=sss   uid=0 gid=0 name=daemon geo="" acl=1 r=1 w=1 wo=0 egroup=0 shared=0 mutable=1 facl=0
210920 13:38:58 time=1632137938.280358 func=BroadcastRefreshFromExternal level=INFO  logid=static.............................. unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=Caps:261                       tident= sec=(null) uid=99 gid=99 name=- geo="" id=10 pid=f
210920 13:38:58 time=1632137938.280518 func=open                     level=INFO  logid=57dc5ee0-1a07-11ec-943a-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=XrdMgmOfsFile:1582             tident=root.15713:117@localhost sec=sss   uid=0 gid=0 name=daemon geo="" blocksize=4096 lid=100002
210920 13:38:58 time=1632137938.280591 func=BroadcastRefreshFromExternal level=INFO  logid=static.............................. unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=Caps:261                       tident= sec=(null) uid=99 gid=99 name=- geo="" id=10 pid=f
210920 13:38:58 time=1632137938.280783 func=HandleProtoMethodEvents  level=INFO  logid=static.............................. unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=WFE:1615                       tident= sec=(null) uid=99 gid=99 name=- geo="" default SYNC::CREATE /eos/users/t/postinstall.log zitosm-devel02.d.de:10955 fxid=0000005b mgm.reqid=""
210920 13:38:58 time=1632137938.281708 func=SendProtoWFRequest       level=ERROR logid=static.............................. unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=WFE:2466                       tident= sec=(null) uid=99 gid=99 name=- geo="" protoWFEndPoint="zitosm-devel02.d.de:10955" protoWFResource="/ctafrontend" fullPath="/eos/users/t/postinstall.log" event="sync::create" msg="Could not send SSI protocol buffer request to outside service." reason="[FATAL] Auth failed"
210920 13:38:58 time=1632137938.281752 func=open                     level=INFO  logid=57dc5ee0-1a07-11ec-943a-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=XrdMgmOfsFile:2916             tident=root.15713:117@localhost sec=sss   uid=0 gid=0 name=daemon geo="" msg="workflow trigger returned" retc=107 errno=115
210920 13:38:58 time=1632137938.281791 func=Emsg                     level=ERROR logid=57dc5ee0-1a07-11ec-943a-021169072178 unit=mgm@zitosm-devel02.d.de:1094 tid=00007fa71b6dd700 source=XrdMgmOfsFile:3294             tident=root.15713:117@localhost sec=sss   uid=0 gid=0 name=daemon geo="" Unable to [FATAL] Auth failed /eos/users/t/postinstall.log; Transport endpoint is not connected

any idea what could be causing this/how to circumvent it?

many thanks,
mwai

Hi,

Your EOS MGM doesn’t have the right credentials for contacting the CTA frontend. Your cta-frontend-xrootd.conf (or equivalent) should have a line like this;

sec.protocol sss -s /etc/cta/ctafrontend_server_sss.keytab

and your /etc/xrd.cf.mgm should have a line like this

sec.protocol sss -c /etc/eos.keytab -s /etc/eos.keytab

The SSS key referenced in ctafrontend_server_sss.keytab should be present (it may have to be the last entry actually) in the -c argument in the EOS file (/etc/eos.keytab in the example).

Oliver.

1 Like

Hi Oliver,

Many thanks for this! I was able to solve it. However, doing an eos cp again gave me an op. not permitted error.

dtape" event="sync::create" msg="Received an error response" response="RSP_ERR_USER" reason="Failed to check and get next archive file ID: No mount rules: storageClass=single requester=eosdev:root:root"
210922 16:14:41 time=1632320081.210776 func=open                     level=INFO  logid=6d806ecc-1baf-11ec-9863-000af7e08db8 unit=mgm@tpm03.d.de:1094 tid=00007fc6bd1f2700 source=XrdMgmOfsFile:2916             tident=root.203900:97@localhost sec=sss   uid=0 gid=0 name=eosdev geo="::test" msg="workflow trigger returned" retc=1 errno=115
210922 16:14:41 time=1632320081.210803 func=Emsg                     level=ERROR logid=6d806ecc-1baf-11ec-9863-000af7e08db8 unit=mgm@tpm03.d.de:1094 tid=00007fc6bd1f2700 source=XrdMgmOfsFile:3294             tident=root.203900:97@localhost sec=sss   uid=0 gid=0 name=eosdev geo="::test" Unable to Failed to check and get next archive file ID: No mount rules: storageClass=single requester=eosdev:root:root /eos/users/test/readtape; Operation not permitted

auth issue or?
my sss keys:

[root@tpm03 ~]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta-taped.keytab runuser -u cta eos whoami
Virtual Identity: uid=1001 (99,1001) gid=30 (30,99) [authz:sss] sudo* host=localhost domain=localdomain geo-location=::test
[root@tpm03 ~]# id 1001
uid=1001(cta) gid=30(tape) groups=30(tape)
[root@tpm03 ~]# eos vid ls | egrep ^sudoer
sudoer                 => uids(daemon,cta)

Also, should this line on the /etc/xrd/cf/mgm be uncommented?

# Set the root destination for all archives belonging to this instance
# EOS_ARCHIVE_URL=root://castorpps.cern.ch//user/cern.ch/c3/archive/

thanks,
mwai

Hi,

I see No mount rules in your error message. For a user to be authorised to use the system they must be mapped by a requestermountrule or a groupmountrule to a mountpolicy.

The scripts we use to set up our CI are a good source for understanding how this fits together.

Please check you have this configured correctly.

Oliver.

Thanks Oliver for the pointer.

I however came across this:

[root@tpm03 ~]# xrdcp /root/stagetape root://localhost//eos/users/test/
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Unable to update file - fobidden by ACL /eos/users/test/stagetape; Operation not permitted (destination)

not sure but it looks to me the tapeserver/frontend doesn’t fetch the updated mount rules?

thanks

HI,

So you believe the mount rules are fine? Then you may have encountered a known issue we’re working on. Does the problem persist after a restart of the CTA frontend?

Oliver.

Yeah, no change after a restart.

Thanks

Hi,

Then we need to check your config. Can you send the output of

cta-admin mp ls
cta-admin rmr ls
cta-admin gmr ls
eos attr ls /eos/users/test/stagetape
eos whoami # as the user who cannot write

Thanks,

Oliver.

Hi Oliver,

cta-admin mp ls

mount policy a.priority a.minAge r.priority r.minAge   c.user        c.host           c.time   m.user        m.host           m.time comment
     ctatest          1        1          1        1 karimimw tpm03.desy.de 2021-09-27 11:55 karimimw tpm03.desy.de 2021-09-27 11:55 ctatest

cta-admin rmr ls

instance username  policy   c.user        c.host           c.time   m.user        m.host           m.time comment
  eosdev     root ctatest karimimw tpm03.desy.de 2021-09-27 11:56 karimimw tpm03.desy.de 2021-09-27 11:56 ctatest

cta-admin gmr ls

instance group  policy   c.user        c.host           c.time   m.user        m.host           m.time comment
  eosdev  root ctatest karimimw tpm03.desy.de 2021-09-27 11:59 karimimw tpm03.desy.de 2021-09-27 11:59 ctatest

eos attr ls

sys.acl="u:0:rwx+dp,u:99:rwx+dp,z:!u,u:0:+u"
sys.archive.storage_class="single"
sys.attr.link="/eos/dev/proc/cta/workflow"
sys.eos.btime="1632324282.203476387"
sys.forced.checksum="adler"
sys.forced.layout="replica"
sys.forced.nstripes="1"
sys.link.workflow.sync::abort_prepare.default="proto"
sys.link.workflow.sync::archive_failed.default="proto"
sys.link.workflow.sync::archived.default="proto"
sys.link.workflow.sync::closew.default="proto"
sys.link.workflow.sync::closew.retrieve_written="proto"
sys.link.workflow.sync::create.default="proto"
sys.link.workflow.sync::delete.default="proto"
sys.link.workflow.sync::evict_prepare.default="proto"
sys.link.workflow.sync::prepare.default="proto"
sys.link.workflow.sync::retrieve_failed.default="proto"

eos whoami

Virtual Identity: uid=0 (0,3,99) gid=0 (0,4,99) [authz:sss] sudo* host=localhost domain=localdomain geo-location=1234test

thanks,
Mwai

Hi,

Please try configuring and using a non-root user for writing to the EOS instance.

Thanks,

Oliver.

Hi Oliver,

Just a quick one, what should be the correct ACLs for the eos folder with the cta attributes? Still getting the ACL error even after configuring a non-root user.

Thanks,
mwai

Hi Mwai,

The acls should be as you have them, assuming there’s an entry now for your non-root user (u:<your_user>:rwx+dp) and they have a mount rule and mount policy associated.

Are you still getting Unable to update file?

Oliver.

Hi Oliver,

Unfortunately not yet. Perhaps it has sth to do with EOS.

[root@tpm03 ~]# xrdcp -f -v -d1 /root/stagetape root://localhost//eos/users/test/
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Unable to open file /eos/users/test/stagetape; Operation not permitted (destination)

however:

[root@tpm03 ~]# eos cp /root/stagetape /eos/users/test/stagetape
[eoscp] stagetape                Total 0.00 MB	|====================| 100.00 % [0.0 MB/s]
error: [SUCCESS]
error: failed copying path=root://localhost//eos/users/test/stagetape
#WARNING [eos-cp] copied 0/1 files and 0 B in 0.05 seconds with 0 B/s

and:

[root@tpm03 ~]# eos ls -y /eos/users/test
d0::t0   -rw-r-----   0 root     root                0 Oct  8 17:11 stage
d0::t0   -rw-r-----   0 root     root                0 Oct  8 17:49 stagetape

cheers,
mwai

Hi,

First a general comment - please be careful with anything that might be modifying a file already on CTA. In general we arrange the ACLs so this is not possible. Deletions, updates and writes have different permissions, and the use case we want to support is that a user can write but not subsequently modify a file. Sometimes for testing we relax this though.

So, please send the output of the following, performed as your end user.

eos whoami
eos attr ls /eos/users/test
eos ls -dl /eos/users/test
eos rm /eos/users/test/stage
eos rm /eos/users/test/stagetape
eos cp /root/stagetape /eos/users/test

These permissions errors are coming from EOS, so check /var/log/eos/mgm/xrdlog.mgm for clues. You could also check /var/log/cta/cta-frontend.log, I suspect the archive request is not getting that far but this should be confirmed.

Oliver.

hi Oliver,

eos whoami

[smeyer@tpm03]~% eos whoami
Secsss (getKeyTab): Unable to open /etc/eos.keytab; Permission denied
Unable to open keytab file.
Secsss (getKeyTab): Unable to open /etc/eos.keytab; Permission denied
Unable to open keytab file.
[smeyer@tpm03]~% sudo eos whoami
Virtual Identity: uid=0 (0,3,99) gid=0 (0,4,99) [authz:sss] sudo* host=localhost domain=localdomain geo-location=1234test

eos attr ls /eos/users/test

sys.acl="u:34570:rwx+dp"
sys.archive.storage_class="ctaStorageClass"
sys.attr.link="/eos/dev/proc/cta/workflow"
sys.eos.btime="1633703066.540722204"
sys.forced.checksum="adler"
sys.forced.layout="replica"
sys.forced.nstripes="1"
sys.link.workflow.sync::abort_prepare.default="proto"
sys.link.workflow.sync::archive_failed.default="proto"
sys.link.workflow.sync::archived.default="proto"
sys.link.workflow.sync::closew.default="proto"
sys.link.workflow.sync::closew.retrieve_written="proto"
sys.link.workflow.sync::create.default="proto"
sys.link.workflow.sync::delete.default="proto"
sys.link.workflow.sync::evict_prepare.default="proto"
sys.link.workflow.sync::prepare.default="proto"
sys.link.workflow.sync::retrieve_failed.default="proto"

both eos rm commands delete without problems/output(although when logged in as root) otherwise

Secsss (getKeyTab): Unable to open /etc/eos.keytab; Permission denied

eos cp /root/stagetape /eos/users/test (have to login as root otherwise same issue as eos whoami)

[root@tpm03 ~]# eos cp /root/stagetape /eos/users/test
[eoscp] stagetape                Total 0.00 MB	|====================| 100.00 % [0.0 MB/s]
error: [SUCCESS]
error: failed copying path=root://localhost//eos/users/test/stagetape
#WARNING [eos-cp] copied 0/1 files and 0 B in 0.05 seconds with 0 B/s

The Frontend doesn’t complain…I’m guessing the ACLs mapping on the EOS is clearly wrong?

cheers

Hi,

Your client can’t authenticate to EOS, so we fall at the first hurdle. Do you want to use kerberos or sss? unix is not enough.

Please send the output of eos vid ls

Oliver.

Hi,

[root@tpm03 ~]# eos vid ls
geotag:"default" => "1234test"
krb5:"<pwd>":gid => root
krb5:"<pwd>":uid => root
publicaccesslevel: => 1024
sss:"<pwd>":gid => root
sss:"<pwd>":uid => root
sudoer                 => uids(daemon,cta)

enabled kerberos by eos vid enable krb5…perhaps it doesn’t suffice?

mwai

Hi Mwai,

What you’ve done should be enough, as long as Kerberos itself is configured correctly. We need to get to a point where you can kinit and then run eos whoami and see that you’re mapped to 34570 (assuming that’s your uid).

On the MGM config (typically /etc/xrd.cf.mgm) you need

sec.protocol krb5 <keytab> <service principal>
sec.protbind * only krb5 sss unix

and the keytab referenced must be up to date.

What does klist show?

Oliver.

HI Oliver,

[root@tpm03 ~]# klist
Ticket cache: FILE:/tmp/krb5cc_0_TugcLfUn0q
Default principal: karimimw@DESY.DE

Valid starting       Expires              Service principal
10/13/2021 15:04:30  10/14/2021 15:04:26  krbtgt/DESY.DE@DESY.DE
	renew until 10/15/2021 15:04:26
10/13/2021 15:04:39  10/14/2021 15:04:26  host/tpm03.desy.de@DESY.DE
	renew until 10/14/2021 15:04:39

after correctly configuring the mgm krb5, eos whoami mappes my user id:

[root@tpm03 ~]# eos whoami
Virtual Identity: uid=0 (0,3,99,34570) gid=0 (0,4,99) [authz:krb5] sudo* host=localhost domain=localdomain geo-location=1234test

Almost there!

Good! Now you can retry the steps enumerated in my earlier post.