To be clear, I don’t necessarily consider this a CTA bug, but I’m providing information and looking for suggestions.
Summary
cta-rmcd returns 0 even when a mount or dismount times out
Details
CTA version: 5.11.11.0-1
Operating System and version: AlmaLinux 9.7
Xrootd version: 5.9.1-1
Objectstore backend: PostgreSQL
Steps to reproduce
Mount or dismount tape when library is busy. This has been difficult to reproduce outside of a busy production environment.
What is the current bug behaviour?
rmcd returns 0 even when tape has not dismounted. Tape remains mounted, then CTA has a cleaner error and taped goes offline.
What is the expected correct behaviour?
I’m not exactly sure. Return code should be something other than 0, perhaps.
Relevant logs and/or screenshots
02/05 12:39:15 2440104 rmc_srv_unmount: RMC92 - unmount request by 1000,33 from localhost.localdomain
02/05 12:39:15 2440104 rmc_srv_unmount: RMC98 - unmount FAA269 8 0
02/05 12:44:52 2440104 rmc_srv_unmount: returns 0
02/05 12:51:28 2440104 rmc_srv_mount: RMC92 - mount request by 1000,33 from localhost.localdomain
02/05 12:51:28 2440104 rmc_srv_mount: RMC98 - mount FAA199/0 on drive 8
02/05 12:56:36 2440104 rmc_srv_mount: returns 0
02/05 13:03:19 2440104 rmc_srv_unmount: RMC92 - unmount request by 1000,33 from localhost.localdomain
02/05 13:03:19 2440104 rmc_srv_unmount: RMC98 - unmount FAA199 8 0
02/05 13:07:35 2440104 rmc_srv_unmount: returns 0
{“epoch_time”:1770318441.586744138,“local_time”:“2026-02-05T13:07:21-0600”,“hostname”:“tpsrvf2101”,“program”:“cta-taped”,“log_level”:“INFO”,“pid”:2950234,“tid”:2950234,“message”:“In Scheduler::setDesiredDriveState(): success.”,“drive_name”:“F1_F9B4D1”,“instance”:“prd”,“sched_backend”:“cephUser”,“drive”:“F1_F9B4D1”,“up”:“down”,“force”:“no”,“reason”:“[cta-taped] ERROR Cleaner failed. Cleaner failed to dismount tape: Failed to dismount tape: vid=FAA199 slot=smc8: Failed to dismount tape in SCSI tape-library: vid=FAA199 librarySlot=smc8: Failed to read message header: In io::readBytes: timeout”,“comment”:“”,“schedulerDbTime”:0.001207}
Possible causes
This occurs when the Spectra library (running BlueScale) is busy and may be slow to respond. The timeout values we are using on the taped may not be ideal. Spectra has suggested 12-20 minutes per their spec.
taped TapeLoadTimeout 300
taped WatchdogMountMaxSecs 600
taped WatchdogUnmountMaxSecs 600