CTA Garbage collectors, and the FST collector process

davidjericho · 29 June 2020 03:16

Hello,

I’m reading through https://eoscta.docs.cern.ch/install/eos/#configuring-the-mgm-tape-aware-garbage-collector and looking at the continuousintegration sub-directory. I see there’s no documented steps to configure the cta-fst-gcd process.

The phrasing in the description isn’t clear to me, I assume however that it is required, as it’s defined in the systemd files and the RPMs installs do enable the service.

As it runs as the daemon user, am I correct in assuming it requires the daemon eos keytab file, with the associated permissions?

jleduc · 29 June 2020 07:14

Hello David,
indeed we do not configure the cta-fst-gcd process in the CI environment.

The rpm ships with an example configuration file: /etc/cta/cta-fst-gcd.conf.example where comments explain how to configure it.

The eos-keytab is usually owned by the daemon user and you can set which file it is in the cta-fst-gcd configuration file: xrdsecssskt is the key as you can see in the example configuration.

davidjericho · 30 June 2020 22:19

Thankyou @jleduc. I’d seen the example config and had adapted it accordingly. I’m presently running it as a container within the frontend pod in k8s, as that seems the most logical place.

I see it doesn’t log to stdout/stderr in line with normal container operation, and has no way to easily make it do so. It’s the python logging module, so I’ll put something together and send a patch back when I get to it.

jleduc · 1 July 2020 19:42

@davidjericho the cta-fst-gcd process must run on the eos FSTs of the eoscta instance.
If you can let us know what makes you think it should run on the frontend, we will fix the documentation accordingly as this is not really the place to run it.

It is a companion process that runs on an eoscta disk server (FST), it looks for old files on the locally mounted files systems that are candidate for garbage collection and issues xrootd evict on the eoscta headnode.

If these files have a tape copy, they are deleted from the local storage and this frees up some space for newer files -> garbage collection.

In Kubernetes world it really depends how you are mounting your filesystems and how you are running the FST processes (I tend to run 1 FST process per device, which is not the simplest approach…).

In theory you should be able to run a companion pod on your storage nodes that mount the same PVs as the FST in readonly mode and in the same mountpoint as the FST: the cta-fst-gcd does not rm files itself but relies on the MGM to queue deletions on the FST processes, so read only mounts should be enough.

Good luck with this: you can leave the cta-fst-gcd for the very end as you can have a functional cta instance without it and read again this post when you see that your eoscta FST filesystems are filling in the eos monitoring.

davidjericho · 2 July 2020 05:05

@jleduc, oh wow! I massively misunderstood the documentation then. My reading was that it acted through the MGM to purge, issuing delete commands via the MGM, and never was the filesystem underneath the FST accessed.

Your explanation has it running with direct access to the underlaying filesystem of FST. I was reading the source to determine just how it worked, but I admit my eyes glazed over after a long day

This now makes a lot more sense. It’s quite easy to put this on a companion container within the pod as I’ve already done the leg work to get the rest of CTA specific settings done on our deployments using helm. My only issue is that we encrypt every disk individually, and disk unlocks and mounting of filesystems is done within the pod namespace. I’m sure I can make this work neatly.