CTA-TAPED terminates with SIGSEGV

hello,

We slowly started migrating to Alma9 but for whatever reason we can’t get cta-taped up and running. Well technically it runs but fails with signal 11 and continues on an infinite loop dumping drive and maintenance objects in the object store.

* cta-taped@VDSTK11.service - CERN Tape Archive (CTA) tape server daemon for drive VLSTK11
     Loaded: loaded (/etc/systemd/system/cta-taped@VDSTK11.service; enabled; preset: disabled)
     Active: active (running) since Fri 2024-07-05 12:59:16 CEST; 1min 30s ago
    Process: 1756173 ExecStart=/usr/bin/cta-taped --log-format=default --log-to-file=/var/log/cta/cta-taped.log --config=/etc/cta/cta-taped-VLSTK11.conf ${CTA_TAPED_OPTIONS} (co>
   Main PID: 1756174 (cta-tpd-master)
      Tasks: 15 (limit: 1643936)
     Memory: 42.6M
        CPU: 42.633s
     CGroup: /system.slice/system-cta\x2dtaped.slice/cta-taped@VDSTK11.service
             |-1756174 /usr/bin/cta-taped --log-format=default --log-to-file=/var/log/cta/cta-taped.log --config=/etc/cta/cta-taped-VLSTK11.conf
             |-1762504 /usr/bin/cta-taped --log-format=default --log-to-file=/var/log/cta/cta-taped.log --config=/etc/cta/cta-taped-VLSTK11.conf
             `-1762515 /usr/bin/cta-taped --log-format=default --log-to-file=/var/log/cta/cta-taped.log --config=/etc/cta/cta-taped-VLSTK11.conf

Jul 05 12:59:16 zitcta-ts01.desy.de systemd[1]: Starting CERN Tape Archive (CTA) tape server daemon for drive VLSTK11...
Jul 05 12:59:16 zitcta-ts01.desy.de systemd[1]: Started CERN Tape Archive (CTA) tape server daemon for drive VLSTK11.

but on the log:

Jul  5 12:59:52.472220592 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="In signal handler, received SIGCHLD and propagations to other handlers" drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" signal="Child exited" senderPID="1758645" senderUID="0"
Jul  5 12:59:52.472414458 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Handler received SIGCHILD. Propagating to all handlers." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="signalHandler"
Jul  5 12:59:52.472468797 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Propagated SIGCHILD." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="signalHandler"
Jul  5 12:59:52.472572027 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Propagated SIGCHILD." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="drive:VLSTK11"
Jul  5 12:59:52.472773561 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Maintenance subprocess crashed. Will spawn a new one." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" pid="1758645" IfSignaled="1" TermSignal="11" CoreDump="128"
Jul  5 12:59:52.472882192 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Propagated SIGCHILD." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="maintenanceHandler"
Jul  5 12:59:52.472957665 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Subprocess handler requested forking" drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="maintenanceHandler"
Jul  5 12:59:52.473062969 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Subprocess handler will fork" drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="maintenanceHandler"
Jul  5 12:59:52.474717427 zitcta-ts01 cta-taped: LVL="INFO" PID="1758669" TID="1758669" MSG="In child process. Running child." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="maintenanceHandler"
Jul  5 12:59:52.869922569 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="In signal handler, received SIGCHLD and propagations to other handlers" drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" signal="Child exited" senderPID="1758665" senderUID="0"
Jul  5 12:59:52.870027412 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Handler received SIGCHILD. Propagating to all handlers." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="signalHandler"
Jul  5 12:59:52.870069046 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Propagated SIGCHILD." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="signalHandler"
Jul  5 12:59:52.870180625 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Drive subprocess crashed. Will spawn a new one." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" tapeDrive="VLSTK11" pid="1758665" IfSignaled="1" TermSignal="11" CoreDump="128"
Jul  5 12:59:52.870224716 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Tape session finished" drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" tapeDrive="VLSTK11" pid="1758665" IfSignaled="1" TermSignal="11" CoreDump="128" Error_sessionKilled="1" killSignal="11" status="failure"
Jul  5 12:59:52.870321802 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Tape session finished" drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" tapeDrive="VLSTK11" pid="1758665" IfSignaled="1" TermSignal="11" CoreDump="128" killSignal="11"
Jul  5 12:59:52.870385507 zitcta-ts01 cta-taped: LVL="INFO" PID="1756174" TID="1756174" MSG="Propagated SIGCHILD." drive_name="VLSTK11" instance="ctb" sched_backend="vfsUser" SubprocessName="drive:VLSTK11"

object store:

[root@zitcta-ts01 cta-store]# ll *Drive*
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756175-20240705-12:59:16-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756205-20240705-12:59:17-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756236-20240705-12:59:17-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756269-20240705-12:59:18-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756302-20240705-12:59:18-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756328-20240705-12:59:19-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756359-20240705-12:59:19-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756389-20240705-12:59:20-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756419-20240705-12:59:20-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756451-20240705-12:59:20-0
-rw-------. 1 cta tape 208 Jul  5 12:59 DriveProcess-VLSTK11-zitcta-ts01.desy.de-1756481-20240705-12:59:21-0

Not sure if we are missing something or a misconfig somewhere. Needless to say the cta-frontend runs without any issues. Here you can find the stack trace just in case.

thank you,
mwai

Ok this seems to be a permission issue:

the stat call is successful

stat("/tmp/krb5cc_0", {st_mode=S_IFREG|0600, st_size=1658, ...}) = 0

but the openat call fails:

openat(AT_FDCWD, "/tmp/krb5cc_0", O_RDONLY|O_CLOEXEC) = -1 EACCES (Permission denied)

Changing the /tmp/krb5cc_0 to 644 was able to get cta-taped running.

is this expected?

cheers

Hello Mwai,

what are the owner and group of the kerberos credential cache? For us it is cta and tape and the permissions are 400, only read permission for owner. Is it possible that the owner of the file is not cta (or the user you specified in the taped config file), that would explain why setting the 644, more specifically the read for others, does not make cta-taped drive process to not fail.

In the taped config file you can specify the user and group for the daemon, if not specified the defaults are cta and taped. However, we have not changed this behaviour, so perhaps the file was recreated by another user causing this problem?

Nevertheless, there is a bug in this behaviour, and taped should exit with the a proper reason, there is no point on recreating the drive process if it cannot read the keytab, thanks for spotting this :slight_smile:

Cheers.

Hi @poliverc! Indeed the config file used in this case was created by another user. Runs as before! Thanks a lot

cheers