WLCG tape usage accounting by VO

Hello,

We (disk infrastructure group at Fermilab) have been requested to provide WLCG tape usage accounting by VO which seems to amount to providing a information in json somewhere.

This link has been given to me as an example:

https://cta.web.cern.ch/wlcg_tape_statistics/CERN-WLCG-tape-statistics.json

My questions

  • is….. Is this part of CTA offering or there some standlone script that genetates it?
  • If this json is generated by a standalone script - where I can get it from?
  • Is there a document describing the json content (I’ve created a script to extract the numbers by VO, I would appreciate some guidance as to what/how to write in json file)
  • Is there some significance to where this json is served from? (e.g. CTA ). Does it have to be provided by a known RSE in Rucio or any URL will do. And where this URL needs to be registered.

Also, some of the fields, I am not sure the data on tape mounts is available. In that json the following is reported:

   "storageservice": {
        "implementation": "CTA",
        "implementationversion": "",
        "latestupdate": 1758129902,
        "name": "CERN-PROD-Tape",
        "storageshares": [
            {
                "avgtaperemounts": 0,
                "name": "ALICE",
                "occupiedsize": 164142000000000000,
                "readbytes24h": 0,
                "timestamp": 1758067200,
                "totalmounts24h": 0,
                "uniquemounts24h": 0,
                "usedsize": 178833244000000000,
                "vos": [
                    "alice"
                ],
                "writebytes24h": 0
            },
            {
                "avgtaperemounts": 0,
                "name": "ATLAS",
                "occupiedsize": 228111000000000000,
                "readbytes24h": 0,
                "timestamp": 1758067200,
                "totalmounts24h": 0,
                "uniquemounts24h": 0,
                "usedsize": 234447721000000000,
                "vos": [
                    "atlas"
                ],
                "writebytes24h": 0
            },

avgtaperemounts, readbytes24h, totalmounts24h, uniquemounts24h, writebytes24h

I am not sure where to get these data. On the link I have provided these numbers are all 0.

Thank you,

Dmitry

Hi Dmitry,

the official documentation about this framework is here:

Internally, we also have this documentation page:

Otherwise, our script is located here:

https://gitlab.cern.ch/cta/operations/-/tree/master/tape/ctaops-generate-CERN-tape-usage-json

Please let me know if this is enough for you or if you need additional information.

Best regards,

Vladimir
CERN

Hi Vladimir,

Thank you for the links. The link to the script gives me 404.

I have developed script to generate the entries from CTA DB …. The only numbers I can’t obtain are avgtaperemounts, readbytes24h, totalmounts24h, uniquemounts24h, writebytes24h.

But I think it is already good enough for Dune (who requested it)

Thanks,

Dmitry

Hi Dmitry,

Thanks for confirmation that you managed to find a way how to move forward. I realized that the script is only accessible for CERN contributors of the CTA project.

The reason is that our script is very CERN specific, which is why we did not publish it in the cta-operations-utilities repository available for external institutes. The idea is that every site will have their own setup which will determine how the statistics are generated.

For CERN, the main code of the script is:

    for vo in experiments:
        vo_summary = summary_points[vo]
        tape_sessions_vo = get_tape_sessions_24h(vo, config)
        total_mounts_24h = len(tape_sessions_vo)
        unique_mounts_24h = get_unique_mounts(tape_sessions_vo)
        avg_tape_remounts = round(total_mounts_24h / unique_mounts_24h, 2) if unique_mounts_24h > 0 else 0
        vo_statistics = {
            "name": vo,
            "usedsize": round(int(vo_summary['usedSize']), -9),         # Round to the GB
            "occupiedsize": round(int(vo_summary['capacitySum']), -9),  # Round to the GB
            "readbytes24h": get_bytes(tape_sessions_vo, 'Retrieve'),
            "writebytes24h": get_bytes(tape_sessions_vo, 'Archive'),
            "totalmounts24h": total_mounts_24h,
            "uniquemounts24h": unique_mounts_24h,
            "avgtaperemounts": avg_tape_remounts,
            "timestamp": vo_summary['timestamp'],
            "vos": [f"{vo.lower()}"]
        }
        cta_vo_statistics_list.append(vo_statistics)
    cta_version = cmd_utils.run_cmd('cta-admin --json version | jq .[].serverVersion.ctaVersion').stdout.replace('"', '').rstrip()
    cta_tape_mount_statistics = {
        "storageservice": {
            "name": "CERN-PROD-Tape",
            "implementation": "CTA",
            "implementationversion": cta_version,
            "latestupdate": int(time.time()),
            "storageshares": cta_vo_statistics_list
        }
    }

The data occupancy summary statistics are calculated from summary of all tapes.
The 24h mount statistics are taken from the “Tape session finished” messages from our monitoring using this query:

    query = f"select payload.mountType.str, payload.dataVolume.num, payload.tapeVid.str " \
            f"FROM ctataped_tapesessionfinished WHERE instance = 'ctaproduction' " \
            f"AND payload.vo.str =~ /(?i)^{vo}$/ AND time >= now() - 24h"

Hope this helps. Best regards,

Vladimir Bahyl
CERN