We are using CTA 4.7.3, and recently we found a strange problem with the scheduling system.
We now have ~100k files about 100TB waiting for migration frequently. Often, a tape gets full and there are still files in the waiting queue. When the new full tape is unloaded and removed from the drive, the file number of the corresponding tape pool is changed to zero, and then increased back to its original value. Meanwhile, the oldest age from cta-admin sq changed to 0 or a much smaller value. Showed as follows,
Hello biyujiang,
we have effectively seen this issue in production as well beginning of June 2022.
1 full Archiving tape was consuming the full archive queue and requeuing everything.
The observable result was that all the tapes were dismounted and the full archive queue reinjected later → oldest ages back to 0 and then everything was remounted again.
This bug has been fixed in cta v4.7.5-1.
Here is its entry in the CTA Release notes:
- cta/CTA#1225 - Fix bug causing tapeserver to sometimes pop the entire archive queue at the end of the mount
You should upgrade to at least CTA 4.7.5-1 to get this fixed.