Repack issue with many small files

Hi,
We are repacking some tapes that contain 60–70k small files, but they are geting stuck in the repack queue in a pending state. Has anyone experienced this issue before or can offer advice on how to resolve it?

thanks

Hello Atefeh Sharif,

can you please change your CTA community portal username from gvyg5ynncmx1z804as_m to something more human readable? Thank you.

It is can happen that some files somehow “disapear” from the repack queue. See this sample output:

[tape-local@ctaproductionfrontend11 ~]$ cta-admin repack ls
          c.time  repackTime     c.user    vid     tapepool providedFiles totalFiles totalBytes selectedFiles filesToRetrieve filesToArchive failed  status      instance 
2025-11-22 10:35  2d2h41m23s tape-local L88936 vo_ALICE_raw             0       1879      18.6T          1879              28           1243     28 Running ctaproduction 
2025-11-22 10:35  2d2h41m18s tape-local L88937 vo_ALICE_raw             0       1855      18.6T          1855              26           1247     26 Running ctaproduction 
2025-11-22 10:35  2d2h41m12s tape-local L88945 vo_ALICE_raw             0       1851      18.6T          1851              24            902     24 Running ctaproduction 
2025-11-22 10:36  2d2h40m29s tape-local L88988 vo_ALICE_raw             0       1856      18.6T          1856              23           1086     23 Running ctaproduction 

You need to wait until the filesToArchive column gets to 0 and the request status to Failed.

Then you need to delete the request and resubmit it again. The remaining files will most likely be read without any issues.

For the time being, we have not identified what causes this.

Best regards,

Vladimir Bahyl
CERN

Thanks for the update. I just want to clarify that my concern is not about the repacks that are currently stuck in running or failed state.
The main issue we’re facing is that when we submit a repack (with 60–70k small files), it takes a very long time just to move from Pending → Running. we even tried removing the tape and submitting the repack again, but there was no improvement.
when we submit a tape with a normal number of files and standard file sizes, the transition from pending to running happens quickly.

Hi Atefeh Sharif,

Thanks for changing your username to something a human can pronounce.

See this output from CERN:

[tape-local@ctaproductionfrontend11 ~]$ cta-admin repack ls
          c.time repackTime     c.user    vid     tapepool providedFiles totalFiles totalBytes selectedFiles filesToRetrieve filesToArchive failed   status      instance 
2025-11-30 10:52 1d2h23m55s tape-local L89162  vo_CMS_2025             0       4426      18.8T          4426               0           2180      0  Running ctaproduction 
2025-11-30 10:52 1d2h23m48s tape-local L89171  vo_CMS_2025             0       4519      19.0T          4519               0           3223      0  Running ctaproduction 
2025-11-30 16:52  20h23m55s tape-local L89188      vo_LHCb             0       2724      18.6T          2724               0           1341      0  Running ctaproduction 
2025-12-01 00:52  12h23m54s tape-local L89190  vo_CMS_2025             0       4390      18.7T          4390            4390           4390      0 Starting ctaproduction 
2025-12-01 00:52  12h23m48s tape-local L89199 vo_ALICE_raw             0       1890      18.6T          1890            1890           1890      0 Starting ctaproduction 
2025-12-01 06:52   6h23m54s tape-local L89200      vo_LHCb             0          0          0             0               0              0      0  Pending ctaproduction 
2025-12-01 10:22   2h53m29s   mducruet I03511  vo_CMS_2024             0          0          0             0               0              0      0  Pending ctaproduction 

Between Pending and Running, there is Starting state when the repack requests are expanded. It means that for each tape individual file requests are created in the object store.

Obviously with many files on tape, many objects need to be created which, if your underlying object storage filesystem is slow might cause the delay.

Can you inspect how fast the objects are created?

For example right now in our object store, we have over 160 000 objects (POOL, ID, NAMESPACE depend on your configuration):

[tape-local@ctaproductionfrontend11 ~]$ rados -p POOL --id ID --namespace NAMESPACE ls | wc -l
162052

Another point to explain is that with many files, there is a limit how big the objects can be.

CTA can repack tapes with over 1 000 000 files, but the underlying storage can not keep up. That is why we implemented a limit on how many files are expanded at once.

Please see the [--maxfilestoselect/--mfts <max_files_to_select>] option of the cta-admin repack command.

Try to submit repack with only 10 000 files. Then obviously you need to submit the repack multiple times to get through all files.

Let me know how it goes. Best regards,

Vladimir Bahyl

Thank you for the detailed explanation. I will check all the points you mentioned I’ll run some tests and let you know the results.

May I also ask for some clarification on the following scheduler parameters

cta.schedulerdb.numberofthreads 500
cta.schedulerdb.threadstacksize_mb 1
#cta.schedulerdb.tape_cache_max_age_secs 600
#cta.schedulerdb.retrieve_queue_cache_max_age_secs 10

In our configuration, the first two parameters are set, while the last two are currently commented out.
Do you recommend enabling or adjusting the last two parameters for better performance?
And regarding the first two parameters, is the current configuration appropriate, or should we tune them depending on our workload?

thank you

Hi Atefeh Sharif,

apologies for the delay, but I would like to conclude this thread and reply to you.

We discussed internally and the only value we have different is this one:

cta.schedulerdb.numberofthreads 4000

However, this one (or the other values) shouldn’t really have effect on your repack performance issues. Our guess is that you do not have objectstore on SSD based Ceph backend as we do, but that you are using another solution.

Best regards,

Vladimir Bahyl
CERN

Hi Vlado,

Happy New Year, and thank you for your response. Repacking with the --maxfilestoselect option solved the problem, and we were able to repack all tapes containing a large number of small files.
Thank you so much
Atefeh