Containerised EOS/CTA instance

george_patargias · 27 November 2020 17:24

Hello,

I am trying to set up an containerised EOS/CTA instance following the instructions
found here https://eoscta.docs.cern.ch/gitlab/runner/

sudo ./create_instance.sh -n cta -b ~ -B CTA-build -D -O -d database-postgres-test.yaml

I get the an error while loading the libmariadb.so.3 share library

Configuring database:
Configuring postgres database
Wiping database
cta-catalogue-schema-drop: error while loading shared libraries:
libmariadb.so.3: cannot open shared object file: No such file or directory
ERROR: Could not wipe database. cta-catalogue-schema-drop
/etc/cta/cta-catalogue.conf FAILED
ERROR: init pod in ERROR: init pod in Error state. Initialization failed.

When I ran prepapreImage.sh, I saw that mariadb-libs are installed

Installing : 1:mariadb-libs-5.5.65-1.el7.x86_64 66/137
Installing : 1:mariadb-devel-5.5.65-1.el7.x86_64 110/137

and inside the CTA build tree, doing ldd on the cta-catalogue-schema-drop binary shows that is linked with libmariadb.so.3
…
libmariadb.so.3 => /lib64/libmariadb.so.3 (0x00007f1bf1a65000)
…

Do you know what I need to do?

Many thanks,

George

mdavis · 1 December 2020 13:01

Hi George,

Can you confirm what version of libmariahdb you compiled against? Is it also 5.5.65-1?

Michael

george_patargias · 2 December 2020 16:26

Hi Michael,

No, I don’t think it is 5.5.65-1. The /lib64/libmariadb.so.3 that is linked to the
cta-catalogue-schema-drop binary is actually provided by

mariadb-libs-10.3.20-3.el7.0.0.rdo1.x86_64

installed on my VM from one ouf our OpenStack repos.

George

george_patargias · 7 December 2020 09:44

Hello,

I will repeat the installation procedure making sure to instrall mariadb-devel-5.5.65-1.el7.x86_64 before compiling CTA.

Hopefully this will resolve the issue.

George

george_patargias · 9 December 2020 17:14

Hello,

Just to let you know that compiling/linking CTA against mariadb-devel-5.5.65-1.el7.x86_64
generated a cta-catalogue-schema-drop binary that ran successfully without the above
error.

What was left then to do was to ensure the conncetivity to the extenal PostgreSQL DB which we did by suitably modifying the security groups on the OpenStack VM hosting
the DB.

After this, the create_instance.sh script ran successfully and we now have a Kubernetes EOS-CTA instance up and running!

Thanks again for your help.

George

george_patargias · 10 December 2020 14:46

Hello again,

I am trying to run the simple archive/retrieval test described (https://eoscta.docs.cern.ch/gitlab/verify/) but the command

kubectl --namespace ${NAMESPACE} exec ctacli – cta-admin --json version | jq

fails with: Error from XRootD SSI Framework: [FATAL] Auth failed

The ctacli pod has the keytab of ctaadmin1 who is allowed to type cta admin commands.
and the logs from the kdc pod actually confirm this

Generating /root/user1.keytab for user1OK
Generating /root/user2.keytab for user2OK
Generating /root/poweruser1.keytab for poweruser1OK
Generating /root/poweruser2.keytab for poweruser2OK
Generating /root/ctaadmin1.keytab for ctaadmin1OK
Generating /root/ctaadmin2.keytab for ctaadmin2OK
Generating /root/eosadmin1.keytab for eosadmin1OK
Generating /root/eosadmin2.keytab for eosadmin2OK
Generating /root/cta-frontend.keytab for cta/cta-frontendOK
Generating /root/eos-server.keytab for eos/eos-serverOK

Do you know what I need to do?

Thanks,

George

okeeble · 11 December 2020 13:23

Hi George,

Try running the following first;

kubectl --namespace ${NAMESPACE} exec ctacli – kinit -kt /root/ctaadmin1.keytab ctaadmin1@TEST.CTA

Oliver.

george_patargias · 14 December 2020 17:39

Hi Oliver,

Thanks for this; indeed the kerberos tokens expire within a day or so.

Apparently the mhvtl tape lib is not configured properly. When the prepare_tests.sh attempts to run the “cta-tape-label” command, there is an error

Dec 11 18:39:02.915407 tpsrv01 cta-tape-label: LVL=“WARN” PID=“14977” TID=“14977” MSG=“Drive does not support LBP” userName=“UNKNOWN” tapeVid=“V01001” tapeOldLabel="" force=“false”

Aborting: Failed to mount tape for read/write access: vid=V01001 slot=smc0: Failed to mount tape in SCSI tape-library for read/write access: vid=V01001 librarySlot=smc0: Received error from rmcd: rmcRc=2203 rmcErrorStream=smc_mount: SR018 - mount of V01001 on drive 0 failed : /dev/smc : scsi error : Hardware error ASC=4 ASCQ=3

RMC03 - illegal function 4

and also

cta-smc -m -D 1 -V V01007
smc_mount: SR018 - mount of V01007 on drive 1 failed : /dev/smc : scsi error : Hardware error ASC=4 ASCQ=3

George

george_patargias · 15 December 2020 15:06

Hi,

For some reason all the pods were gone so I had to re-create them.

But now I cannot seem to run cta-admin commands on ctacli

cta@host-172-16-114-217 orchestration (master)]$ kubectl --namespace cta exec ctacli – cta-admin version
User: ctaadmin1 on host: 10.254.96.4 is not authorized to execute CTA admin commands

despite the fact that the krb tokens are valid

kubectl --namespace cta exec ctacli – klist Ticket cache: FILE:/tmp/krb5cc_0
Default principal: ctaadmin1@TEST.CTA

Valid starting Expires Service principal
12/15/20 15:46:44 12/16/20 15:46:44 krbtgt/TEST.CTA@TEST.CTA
12/15/20 15:48:32 12/16/20 15:46:44 cta/cta-frontend@TEST.CTA

What do I need to do?

George

mdavis · 5 January 2021 08:55

cta-catalogue-admin-user-create /etc/cta/cta-catalogue.conf -u ctaadmin1 -m "CTA Admin User"

george_patargias · 6 January 2021 13:44

Thanks Michael.

I am trying to do a simple xrdcp on the client pod (I have sourced /root/client_helper.sh and run eospower_kdestroy and eospower_kinit) but I get an auth error

xrdcp /etc/yum.conf root://ctaeos//eos/ctaeos/cta/c95f42f3-9933-49a5-818c-9fcd933875e3
[0B/0B][100%][==================================================][0B/s]
Run: [ERROR] Server responded with an error: [3010] Unable to open file /eos/ctaeos/cta/c95f42f3-9933-49a5-818c-9fcd933875e3; Operation not permitted (destination)

Not sure why this happens; which log I need to check?

George

mdavis · 14 January 2021 14:49

My guess is you need to create an ACL on the EOS destination directory to give write permission to the user you are authenticating with. You can check the MGM log /var/log/eos/mgm/xrdlog.mgm and also check the permissions on the target directory with eos ls -l and eos attr ls.

george_patargias · 14 January 2021 18:04

Thanks Michael; I recreated the instance and the error was gone.

I was trying to bring in another node ito the k8s cluster and schedule the tpsrv pods without much sucecss. I removed the node restarted all kubernetes services on the VM’s control plane and try now once again to create the instance on the same VM.

I get an Error status for the tpsrv pods which I cannot explain. If I describe the tpsrv01 pod, I get

FirstSeen LastSeen Count From SubObjectPath Type Reason Message

34m 34m 1 {default-scheduler } Normal Scheduled Successfully assigned tpsrv01 to 127.0.0.1
34m 20m 15 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-myobjectstore” (spec.Name: “myobjectstore”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “objectstore-config” not found
34m 20m 15 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-mydatabase” (spec.Name: “mydatabase”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “database-config” not found
34m 20m 15 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-mylibrary” (spec.Name: “mylibrary”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “library-config” not found
32m 18m 7 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod “tpsrv01_default(d544821f-568d-11eb-af6f-facaad881567)”: timeout expired waiting for volumes to attach/mount for pod “default”/“tpsrv01”. list of unattached/unmounted volumes=[myobjectstore mydatabase mylibrary logstorage]
32m 18m 7 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod “default”/“tpsrv01”. list of unattached/unmounted volumes=[myobjectstore mydatabase mylibrary logstorage]
18m 14m 10 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-myobjectstore” (spec.Name: “myobjectstore”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “objectstore-config” not found
18m 14m 10 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-mydatabase” (spec.Name: “mydatabase”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “database-config” not found
18m 14m 10 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-mylibrary” (spec.Name: “mylibrary”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “library-config” not found
16m 14m 2 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod “tpsrv01_default(d544821f-568d-11eb-af6f-facaad881567)”: timeout expired waiting for volumes to attach/mount for pod “default”/“tpsrv01”. list of unattached/unmounted volumes=[myobjectstore mydatabase mylibrary logstorage]
16m 14m 2 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod “default”/“tpsrv01”. list of unattached/unmounted volumes=[myobjectstore mydatabase mylibrary logstorage]
13m 1m 14 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-mydatabase” (spec.Name: “mydatabase”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “database-config” not found
13m 1m 14 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-mylibrary” (spec.Name: “mylibrary”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “library-config” not found
13m 1m 14 {kubelet 127.0.0.1} Warning FailedMount MountVolume.SetUp failed for volume “kubernetes.io/configmap/d544821f-568d-11eb-af6f-facaad881567-myobjectstore” (spec.Name: “myobjectstore”) pod “d544821f-568d-11eb-af6f-facaad881567” (UID: “d544821f-568d-11eb-af6f-facaad881567”) with: configmaps “objectstore-config” not found
11m 3s 6 {kubelet 127.0.0.1} Warning FailedMount Unable to mount volumes for pod “tpsrv01_default(d544821f-568d-11eb-af6f-facaad881567)”: timeout expired waiting for volumes to attach/mount for pod “default”/“tpsrv01”. list of unattached/unmounted volumes=[myobjectstore mydatabase mylibrary logstorage]
11m 3s 6 {kubelet 127.0.0.1} Warning FailedSync Error syncing pod, skipping: timeout expired waiting for volumes to attach/mount for pod “default”/“tpsrv01”. list of unattached/unmounted volumes=[myobjectstore mydatabase mylibrary logstorage]

george_patargias · 14 January 2021 18:13

Just to add, that I restarted again all k8s services, but the kubelet reports the following

Jan 14 18:12:12 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:12.405658 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/07b28482-5691-11eb-82dc-facaad881567-mydatabase” (spec.Name: “mydatabase”) pod “07b28482-5691-11eb-82dc-facaad881567” (UID: “07b28482-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:12 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:12.408084 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/07b28482-5691-11eb-82dc-facaad881567-myobjectstore” (spec.Name: “myobjectstore”) pod “07b28482-5691-11eb-82dc-facaad881567” (UID: “07b28482-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:12 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:12.408292 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/07b28482-5691-11eb-82dc-facaad881567-mylibrary” (spec.Name: “mylibrary”) pod “07b28482-5691-11eb-82dc-facaad881567” (UID: “07b28482-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:12 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: W0114 18:12:12.630184 15867 kubelet_pods.go:636] Unable to retrieve pull secret cta/ctaregsecret for cta/tpsrv02 due to secrets “ctaregsecret” not found. The image pull may not succeed.
Jan 14 18:12:17 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:17.336407 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/08027a99-5691-11eb-82dc-facaad881567-mydatabase” (spec.Name: “mydatabase”) pod “08027a99-5691-11eb-82dc-facaad881567” (UID: “08027a99-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:17 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:17.336749 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/08027a99-5691-11eb-82dc-facaad881567-mylibrary” (spec.Name: “mylibrary”) pod “08027a99-5691-11eb-82dc-facaad881567” (UID: “08027a99-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:17 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:17.338338 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/08027a99-5691-11eb-82dc-facaad881567-myobjectstore” (spec.Name: “myobjectstore”) pod “08027a99-5691-11eb-82dc-facaad881567” (UID: “08027a99-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:17 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: I0114 18:12:17.338455 15867 operation_executor.go:1073] MountVolume.SetUp succeeded for volume “kubernetes.io/configmap/08027a99-5691-11eb-82dc-facaad881567-eosctaconfig” (spec.Name: “eosctaconfig”) pod “08027a99-5691-11eb-82dc-facaad881567” (UID: “08027a99-5691-11eb-82dc-facaad881567”).
Jan 14 18:12:17 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: W0114 18:12:17.630223 15867 kubelet_pods.go:636] Unable to retrieve pull secret cta/ctaregsecret for cta/ctafrontend due to secrets “ctaregsecret” not found. The image pull may not succeed.
Jan 14 18:12:18 host-172-16-114-217.nubes.stfc.ac.uk kubelet[15867]: W0114 18:12:18.629636 15867 kubelet_pods.go:636] Unable to retrieve pull secret cta/ctaregsecret for cta/ctacli due to secrets “ctaregsecret” not found. The image pull may not succeed.

george_patargias · 14 January 2021 19:24

Hi (yet) again

Please ingnore penultimate post. The tpsrv pods started ok after recreating the pv’s. The kubelet messages persisist though.

Would be very greatfull if you could give me a clue how to pass the “ctaregsecret” to worker node added to the cluster

Thanks,

George