File number in scheduling becomes zero when a tape is full and then unloaed from the drive

Hi, there,

We are using CTA 4.7.3, and recently we found a strange problem with the scheduling system.

We now have ~100k files about 100TB waiting for migration frequently. Often, a tape gets full and there are still files in the waiting queue. When the new full tape is unloaded and removed from the drive, the file number of the corresponding tape pool is changed to zero, and then increased back to its original value. Meanwhile, the oldest age from cta-admin sq changed to 0 or a much smaller value. Showed as follows,

Has anyone had such problems?

Here is another case.

Hello biyujiang,
we have effectively seen this issue in production as well beginning of June 2022.
1 full Archiving tape was consuming the full archive queue and requeuing everything.
The observable result was that all the tapes were dismounted and the full archive queue reinjected later → oldest ages back to 0 and then everything was remounted again.
This bug has been fixed in cta v4.7.5-1.

Here is its entry in the CTA Release notes:

- cta/CTA#1225 - Fix bug causing tapeserver to sometimes pop the entire archive queue at the end of the mount

You should upgrade to at least CTA 4.7.5-1 to get this fixed.

Good and thanks for the explanation. We didn’t notice the message before. So we will upgrade to a higher version when we are ready.

We identified this bug in our standard stress test since 4.7.1-1, but identifying the exact reason was much more challenging…

We need to find a way to clearly communicate this kind of bugs to the community.

This archive queue bug is present in CTA versions between 4.7.0-1 and 4.7.4-x: fixed in 4.7.5-1.

Great. After upgrading to 4.7.7, this bug disappeared. Perhaps some more obvious upgrading suggestion notices in the release notes would be better.