Tape Drive stuck in Cleanup

snorberg1 · 7 March 2024 14:26

We were running some small file tests and transferring over less than a TB but close to half a TB.
The system currently for the drive is:

TS4500G1 LTO8D0      gmv18018      Up ArchiveForUser CleanUp     1120 VR5871 twaltontest vo     -    -    -   21559        0        -             1120 -

Which has been like this for a while. Restarting the tape drive does not fix the issue.
Doing some googling and talking to people at fermilab we think that the ceph objectstore system is being cleaned up or wants to be cleaned up but is stuck maybe because of memory?
Have you run into this issue? I know there is another Tape drive stuck in “cleanup Retrieve” state but looking at that they tried restarting the tape drive and that worked but in this case that is not working and been tried multiple times.
Taped log has after latest restart:

Mar  7 08:15:03 gmv18018 cta-taped: LVL="DEBUG" PID="3474146" TID="3479471" MSG="RdbmsCatalogue::updateTapeDriveStatistics(): It didn't update statistics"
Mar  7 08:15:07 gmv18018 cta-taped: LVL="DEBUG" PID="3474148" TID="3474148" MSG="In MaintenanceHandler::exceptionThrowingRunChild(): About to do a maintenance pass." SubprocessName="maintenanceHandler"
Mar  7 08:15:07 gmv18018 cta-taped: LVL="DEBUG" PID="3474148" TID="3474148" MSG="DEBUG: In QueueCleanupRunner::runOnePass(): no queues requested a cleanup." SubprocessName="maintenanceHandler"

The tape is in the drive:

[root@gmv18018 ~]# mtx -f /dev/sg2 status | head -n 40
  Storage Changer /dev/sg2:2 Drives, 10293 Slots ( 255 Import/Export )
Data Transfer Element 0:Empty
Data Transfer Element 1:Full (Storage Element 17 Loaded):VolumeTag = VR5871M8

snorberg1 · 7 March 2024 19:00

‘’’
[root@storagedev201 ~]# cta-admin --json v| jq
[
{
“clientVersion”: {
“ctaVersion”: “4-4958535git71bda8fb”,
“xrootdSsiProtobufInterfaceVersion”: “v1.4”
},
“serverVersion”: {
“ctaVersion”: “4-4958535git71bda8fb”,
“xrootdSsiProtobufInterfaceVersion”: “v1.4”
},
“catalogueConnectionString”: “postgresql:postgresql://cta:******@ifdb07.fnal.gov:5438/cta_dev”,
“catalogueVersion”: “12.0”,
“isUpgrading”: false
}
]
‘’’

rbachman · 12 March 2024 14:12

Hi @snorberg1,
the git hash corresponds to CTA version ~4.8.2-1 from December 2022. Since then we’ve made changes related to the cleanUp process, such as Do not set the 'queueTrimRequired' flag as true when 'doCleanup' is required (#572) · Issues · cta / CTA · GitLab and notably Avoid looping in Cleaning Up state (#509) · Issues · cta / CTA · GitLab .

So we recommend upgrading to a more recent version of CTA.

rbachman · 12 March 2024 14:29

As an aside, operationally: Have you tried stopping cta-taped, unmounting the tape, and then re-starting cta-taped?

snorberg1 · 13 March 2024 13:24

When this happened before and was thought to be a different issue I had stumbled on this solution and thought I tried that first thing this time. Figured I would give it a try again to see. It does seem to be back up not sure if I did it in a different order or if something else was weird but thanks for both suggestions. We are looking to upgrade to a newer setup as well.