Cannot start CTA services after an upgrade to 5.11.10

Hi,

We cannot seem to start cta-frontend and cta-taped after upgrading directly to 5.11.10 from 5.10.10. The service logs suggest error in loading libraries,

Can this be because we should have upgraded first to the pivot version 5.11.0.1-1 as per the release announcement? I thought the the whole point of the pivot version is to minimize service downtime.

We have updated the CTA Catalogue to version 15.

Thanks,

George

Hi George,

I don’t think this problem is directly related to the pivot version, but probably some libraries that were not updated correctly.

Could you please write here what are these service log errors that you are seeing?
And the cta RPMs that you have currently installed? (rpm -qa | grep cta-* should show all with of them with version 5.11.10.0-1).


Regarding the pivot release, its objective is to allow the CTA Catalogue to be updated to a newer version without the need to stop all the tape servers and frontends. For example, in this case while they are in version 5.11.0.1-1 they can work with both CTA Catalogue versions 14 and 15, so the upgrade can go with less downtime.

Version 5.11.10.0-1 is only compatible with Catalogue version 15, so if you have CTA version 5.11.10.0-1 and Catalogue version 14, it will never work until the Catalogue is finally updated to version 15.

Cheers,
Joao

Hi Joao,

Thanks for the reply.

We have checked the version of the installed CTA rpms on both upgraded hosts and they are all 5.11.10.0-1

cta-lib-catalogue-occi-5.11.10.0-1.el9.x86_64
cta-lib-common-5.11.10.0-1.el9.x86_64
cta-lib-catalogue-postgres-5.11.10.0-1.el9.x86_64
cta-lib-catalogue-inmemory-5.11.10.0-1.el9.x86_64
cta-lib-catalogue-5.11.10.0-1.el9.x86_64
cta-lib-5.11.10.0-1.el9.x86_64
cta-common-5.11.10.0-1.el9.x86_64
cta-frontend-5.11.10.0-1.el9.x86_64
cta-taped-5.11.10.0-1.el9.x86_64
cta-objectstore-tools-5.11.10.0-1.el9.x86_64
cta-catalogueutils-5.11.10.0-1.el9.x86_64
cta-cli-5.11.10.0-1.el9.x86_64
cta-immutable-file-test-5.11.10.0-1.el9.x86_64
[root@cta-front02 ~]#

The errors from /var/log/cta/cta-frontend-xrootd.log

=====> all.export /ctafrontend nolock r/w
Config exporting /ctafrontend
Plugin loaded secprot v5.8.2 from seclib libXrdSec-5.so
++++++ Authentication system initialization started.
Plugin loaded secsss v5.8.2 from sec.protocol libXrdSecsss-5.so
=====> sec.protocol sss -s /etc/cta/eos.sss.keytab
=====> sec.protbind * only sss
Config 2 authentication directives processed in /etc/cta/cta-frontend-xrootd.conf
------ Authentication system initialization completed.
++++++ Protection system initialization started.
Config warning: Security level is set to none; request protection disabled!
Config Local protection level: none
Config Remote protection level: none
------ Protection system initialization completed.
Config Routing for cta-front02.scd.rl.ac.uk: local pub4 prv4
Config Route all4: cta-front02.scd.rl.ac.uk Dest=[::130.246.182.39]:10955
++++++ ssi initialization started.
=====> ssi.svclib libXrdSsiCta.so
Config Configuring standalone server.
250702 10:03:48 1947844 ssi_XrdSsiCtaServiceProvider: pid:1947844 tid:139850548526144 XrdSsiCtaServiceProvider() constructor
250702 10:03:48 1947844 ssi_XrdSsiCtaServiceProvider: pid:1947844 tid:139850548526144 Called Init(/etc/cta/cta-frontend-xrootd.conf,)
250702 10:03:48 1947844 ssi_XrdSsiCtaServiceProvider: pid:1947844 tid:139850548526144 XrdSsiCtaServiceProvider::Init(): cta::exception::Exception cta.instance_name is not set in configuration file /etc/cta/cta-frontend-xrootd.conf
/usr/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x6b) [0x7f3178da9fcd]
/usr/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::basic_string_view<char, std::char_traits >, bool)+0x91) [0x7f3178dab095]
/usr/lib64/libctacommon.so.0(cta::exception::UserError::UserError(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&, bool)+0x53) [0x7f3178dac589]
/lib64/libXrdSsiCta.so(cta::frontend::FrontendService::FrontendService(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0x8eb) [0x7f317b8fb593]
/lib64/libXrdSsiCta.so(std::_MakeUniqcta::frontend::FrontendService::__single_object std::make_unique<cta::frontend::FrontendService, std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&>(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&)+0x48) [0x7f317b840bbb]
/lib64/libXrdSsiCta.so(XrdSsiCtaServiceProvider::Init(XrdSsiLogger*, XrdSsiCluster*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, int, char**)+0x256) [0x7f317b839380]
/lib64/libXrdSsi-5.so(XrdSsiSfsConfig::ConfigSvc(char**, int)+0x205) [0x7f317c14cca5]
/lib64/libXrdSsi-5.so(XrdSsiSfsConfig::Configure(XrdOucEnv*)+0x136) [0x7f317c14cfb6]
/lib64/libXrdSsi-5.so(XrdSsiSfsConfig::Configure(char const*, XrdOucEnv*)+0x410) [0x7f317c14e220]
/lib64/libXrdSsi-5.so(XrdSfsGetFileSystem2+0xb6) [0x7f317c14e3b6]
/lib64/libXrdServer.so.3(XrdXrootdloadFileSystem(XrdSysError*, XrdSfsFileSystem*, char const*, char const*, XrdOucEnv*)+0x7d) [0x7f317e6a8b1d]
/lib64/libXrdServer.so.3(XrdXrootdProtocol::ConfigFS(XrdOucEnv&, char const*)+0x4a) [0x7f317e6a3b5a]
/lib64/libXrdServer.so.3(XrdXrootdProtocol::Configure(char*, XrdProtocol_Config*)+0x6fc) [0x7f317e6a59fc]
/lib64/libXrdServer.so.3(XrdgetProtocol+0x65) [0x7f317e6b7c35]
/usr/bin/xrootd(+0x10375) [0x5575098e1375]
/usr/bin/xrootd(+0x7cd8) [0x5575098d8cd8]
/lib64/libc.so.6(+0x295d0) [0x7f317de295d0]
/lib64/libc.so.6(__libc_start_main+0x80) [0x7f317de29680]
/usr/bin/xrootd(+0x8315) [0x5575098d9315]

250702 10:03:48 1947844 ssi_Config: Provider initialization failed.
------ ssi initialization failed.
250702 10:03:48 1947844 XrootdConfig: Unable to load file system via libXrdSsi.so
250702 10:03:48 1947844 XrootdConfig: Unable to load base file system using libXrdSsi.so
------ xroot protocol initialization failed.
250702 10:03:48 1947844 XrdProtocol: Protocol xroot could not be loaded
------ xrootd cta@cta-front02.scd.rl.ac.uk:10955 initialization failed.
(END)

and the cta-taped log

{“epoch_time”:1751383447.035173161,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“ERROR”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Aborting cta-taped on uncaught exception. Stack trace follows.”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“Message”:“getTapeDrive: create failed: instance failed: Error while trying to retrieve text for error ORA-01804
“}
{“epoch_time”:1751383447.035393226,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:0,“traceFrame”:”/usr/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x6b) [0x7f899f3ce097]”}
{“epoch_time”:1751383447.035459590,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:1,“traceFrame”:“/usr/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::basic_string_view<char, std::char_traits >, bool)+0x91) [0x7f899f3cf0af]”}
{“epoch_time”:1751383447.035510516,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:2,“traceFrame”:“/usr/lib64/libctardbmswrapper.so.0(cta::rdbms::wrapper::OcciConnFactory::create()+0x141) [0x7f899f5b46af]”}
{“epoch_time”:1751383447.035558374,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:3,“traceFrame”:“/usr/lib64/libctardbms.so.0(cta::rdbms::ConnPool::getConn()+0x1bd) [0x7f89a0ddf595]”}
{“epoch_time”:1751383447.035619484,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:4,“traceFrame”:“/usr/lib64/libctacatalogue.so.0(cta::catalogue::RdbmsDriveStateCatalogue::getTapeDrive(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const+0x68) [0x7f899f9494ae]”}
{“epoch_time”:1751383447.035667874,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:5,“traceFrame”:“/usr/lib64/libctacatalogue.so.0(+0x44626b) [0x7f899fa4626b]”}
{“epoch_time”:1751383447.035714899,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:6,“traceFrame”:“/usr/lib64/libctacatalogue.so.0(+0x4477bc) [0x7f899fa477bc]”}
{“epoch_time”:1751383447.035767377,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:7,“traceFrame”:“/usr/lib64/libctacatalogue.so.0(cta::catalogue::DriveStateCatalogueRetryWrapper::getTapeDrive(std::__cxx11::basic_string<char, std::char_traits, std::allocator > const&) const+0x52) [0x7f899fa462d8]”}
{“epoch_time”:1751383447.035815595,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:8,“traceFrame”:“/usr/bin/cta-taped() [0x49ee06]”}
{“epoch_time”:1751383447.035860974,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:9,“traceFrame”:“/usr/bin/cta-taped() [0x49e547]”}
{“epoch_time”:1751383447.035924588,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:10,“traceFrame”:“/usr/bin/cta-taped() [0x4c91df]”}
{“epoch_time”:1751383447.035970757,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:11,“traceFrame”:“/usr/bin/cta-taped() [0x4c8d76]”}
{“epoch_time”:1751383447.036016130,“local_time”:“2025-07-01T16:24:07+0100”,“hostname”:“getafix-ts09”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:“2564933”,“tid”:“2564933”,“message”:“Stack trace”,“drive_name”:“obelix_lto8_16”,“instance”:“antares-dev”,“sched_backend”:“cephUser”,“traceFrameNumber”:12,“traceFrame”:“/usr/bin/cta-taped() [0x48f7c4]”}

It looks like we needed the following directives to be added to cta-frontend-xrootd.conf

cta.instance_name antares-dev

cta.schedulerdb.scheduler_backend_name cephUser

After adding these two, the service could start..!

Lets see know why cta-taped cannot start…maybe a missing directive there as well?

Changing the ownership of the cta-taped log to cta:tape (we have been having as root:root until no) did the the trick and cta-taped is now running.

I cant see now the drive in the object store

root@cta-front02 ~]# cta-admin dr ls
library drive host desired request status since vid tapepool vo files data MB/s session priority activity scheduler instance age reason
[root@cta-front02 ~]#

I deleted the object
DriveProcess-obelix_lto8_16-getafix-ts09.scd.rl.ac.uk-13848-20250702-12:04:05-0
and restarted cta-taped but I still cannot see the the drive…

Do you jave any ideas?

Drive finally appeared in the obkject store after a while…

Hi George,

Sorry for the late reply, but I’m glad that you managed to figure it out.

The addition of the configurations cta.instance_name and cta.schedulerdb.scheduler_backend_name was indeed a non-backward compatible change, introduced by the previous stable release 5.11.2.0-1.
It’s referenced in the announcement through a link, but I realise that it might have gone unnoticed…

For reference, in case anyone else looks into this thread, the details about these new configurations required for 5.11.2.0-1 (and later) can be found here.

Also, thank you for also sharing the solution that worked for you regarding the other problems.
I’m not sure why cta-taped took such a long time to be visible, but the logs should provide some useful details if the initialisation failed.

Best,
Joao