Tape drive stuck in "CleanUp Retrieve" state

gtf87 · 30 May 2022 15:06

Over the weekend we have had 4 drives go into this cleanup state and never come out of it. It takes a restart of the tape daemon to clear the issue. This has been since we moved to cta version 4.6.1-1 last week.
It appears that the mount fails, and the tape daemon for whatever reason refuses to clear the state. I’ve found two ways the mounts fail. One, the nbAttempts is reached -

exceptionMessage=“Failed to mount tape for read-only access: vid=CT1988 slot=smc3: Failed to mount tape in SCSI tape-library for read/write access: vid=CT1988 librarySlot=smc3: Received error from rmcd after several fast retries: nbAttempts=10 rmcErrorStream=smc_mount: SR017 - find_cartridge CT1988 failed : Not Ready to Ready Transition”

and the other the tape is apparently mounted, but rmcd says it fails as the tape is already mounted -

cta-rmcd.log.4.gz:05/26 11:18:00 1805 rmc_srv_mount: RMC98 - mount CT2405/0 on drive 0
cta-rmcd.log.4.gz:05/26 11:18:02 1805 rmc_sendrep: smc_mount: Asked for CT2405, got reply for CT2405
cta-rmcd.log.4.gz:05/26 11:18:02 1805 rmc_sendrep: smc_mount: SR018 - mount of CT2405 on drive 0 failed : volume in use

Our servers have two drives attached, and the other drive carries on as normal. In all cases the drive is empty. In all cases the tape device is “busy” so does not respond to mt. lsof shows the tape daemon has the device open

lsof /dev/ts1160_0
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
cta-tpd-a 21138 cta 36u CHR 9,128 0t0 47127 /dev/nst0

Anyone any ideas where to look for the cause?

Logs available on request

Thanks

Tim

mdavis · 31 May 2022 16:09

Sorry, Tim, this is a regression. It’s fixed in 4.7.3-1.