Identifying repack candidate tapes

How do you identify which tapes are ripe for a repack? I recently went through and deleted a large number of files. Going tape by tape and listing the files I can see that I have tapes where the majority of files are no longer in tapefile. I naively expected that tape ls would tell me how many active files were still on a tape.

Thanks!

Eric

Hi Eric,
more often than not when we repack we pick tapes based on the media generation, repacking from an older generation on the way out to a new generation with higher capacity.

However, the Automated Tape REpacking SYStem comes with a tool called cta-ops-repack-0-scan, which is intended for precisely this use case. It allows you to automatically select tape VIDs based on the percentage of their occupancy, filter by tape pool etc., and optionally feed those selected VIDs into the automated repacking system.

Additional info and install instructions can be found on the dedicated ops tool wiki: Home · Wiki · cta / CTA Operations utilities · GitLab

Cheers,
Richard

Perfect! Thank you! This looks great.

On further examination, that script appears to be looking at master data in bytes divided by capacity. Which makes sense and was what I naively expected. However, we have a tape where every file has been put into the recycle log (visible in recycletf ls) and yet looks like this:

[
  {
    "vid": "FL0590",
    "mediaType": "LTO8",
    "vendor": "Unknown",
    "logicalLibrary": "TS4500G1_CTACMS",
    "tapepool": "cms.Run3Winter20DRPremixMiniAODMCGenSimRaw",
    "vo": "cms",
    "encryptionKeyName": "-",
    "capacity": "12000000000000",
    "occupancy": "11636563312201",
    "lastFseq": "1665",
    "full": true,
    "fromCastor": false,
    "readMountCount": "45",
    "writeMountCount": "28",
    "nbMasterFiles": "1665",
    "masterDataInBytes": "11636563312201",
    "state": "ACTIVE",
    "stateReason": "",
    "stateUpdateTime": "1707864003",
    "stateModifiedBy": "eosdev@cmscta01",
    "dirty": true,
    "verificationStatus": ""
  }
]

So any deletions are not reflected in the usage. Is there, perhaps, something we are missing which should be updating these columns as deletions happen?

[root@cmscta01 atresys]# XrdSecPROTOCOL=sss XrdSecSSSKT=/etc/cta/ctafrontend_forwardable_sss.keytab cta-admin --json rtf ls --vid FL0590|jq | grep fseq | wc
   1665    3330   32193

So all the files have been deleted in EOS

Hi Eric,
so the masterDataInBytes of a tape is supposed to be computed by CTA as something along the lines of

masterDataInBytes = occupancy - size_of(deleted_files)

However, certain operations in CTA, such as a file being marked as deleted, don’t trigger immediate statistics updates.

You can force a statistics update when needed using the cta-statistics-update (from the CTA RPMs) script, which should result in an up-to-date masterDataInBytes value.

We run this script about once a day as part of our monitoring.

Thanks. This works great, of course. Are there any other such things which need to be done for a functional system? (As opposed to monitoring-only tasks?)

Hi Eric,
yes we know having to call this somewhat obscure script is sub-optimal. It is scheduled for review at some point.

The other needed-for-production scripts/executions I can think of are more conditional, and not part of the core CTA software: