EOS | cta | is tgc still supported?

kotlyar · 15 September 2022 05:58

Hi all

I do not sure that this topic for cta or eos so put it here.

In our eoscta instance looks like MGM tape garbage collector (TGC) does not work. I do not see any activity by logs and there are no changes in space. I think I correctly enabled it by commands provided by eoscta docs [1]. Maybe there are some defaults which are missed on fresh setup or some missed features.

Is such configuration still supported our we have to switch to cta-fst-gcd python script? Are there any hints what could be checked and where.

Bellow my eoscta setup for tgc [1] and full space config [2] with eos versions [3]. We use NS in QuarkDB.

Many thx in advance.
Cheers
Victor

[1] tgc setup

[root@tape-1-3-3 /]#  eos space ls default -m | tr ' ' '\n' | grep statfs
sum.stat.statfs.usedbytes=82912054607872
sum.stat.statfs.freebytes=9975213588480
sum.stat.statfs.freebytes?configstatus@rw=9975213588480
sum.stat.statfs.capacity=92887268196352
sum.stat.statfs.ffiles=0
sum.stat.statfs.files=19482918665
sum.stat.statfs.capacity?configstatus@rw=92887268196352

[root@tape-1-3-3 /]# eos space status default | grep tgc
tgc.availbytes                   := 20000000000000
tgc.qryperiodsecs                := 300
tgc.totalbytes                   := 40000000000000

[root@tape-1-3-3 /]# eos ns stat -m |grep tgc
uid=all gid=all tgc.is_active=false

[root@tape-1-3-3 /]# grep tgc /etc/xrd.cf.mgm 
mgmofs.tgc.enablespace default

[2] space default

[root@tape-1-3-3 /]# eos space status default           
# ------------------------------------------------------------------------------------
# Space Variables
# ....................................................................................
balancer                         := off
balancer.node.ntx                := 2
balancer.node.rate               := 25
balancer.threshold               := 20
converter                        := off
converter.ntx                    := 2
drainer.node.nfs                 := 5
drainer.node.ntx                 := 2
drainer.node.rate                := 25
drainperiod                      := 86400
filearchivedgc                   := off
fsck_refresh_interval            := 7200
geobalancer                      := off
geobalancer.ntx                  := 10
geobalancer.threshold            := 5
graceperiod                      := 86400
groupbalancer                    := off
groupbalancer.engine             := std
groupbalancer.file_attempts      := 50
groupbalancer.max_file_size      := 16G
groupbalancer.max_threshold      := 0
groupbalancer.min_file_size      := 1G
groupbalancer.min_threshold      := 0
groupbalancer.ntx                := 10
groupbalancer.threshold          := 5
groupmod                         := 24
groupsize                        := 2
lru                              := off
quota                            := off
scan_disk_interval               := 14400
scan_ns_interval                 := 259200
scan_ns_rate                     := 50
scaninterval                     := 604800
scanrate                         := 100
taperestapi                      := off
tgc.availbytes                   := 20000000000000
tgc.qryperiodsecs                := 300
tgc.totalbytes                   := 40000000000000
tracker                          := off
wfe                              := on
wfe.interval                     := 10
wfe.ntx                          := 500

[3] eos version

[root@tape-1-3-3 /]# rpm -qa|grep eos
eos-folly-2019.11.11.00-1.el7.cern.x86_64
libmicrohttpd-0.9.38-eos.yves.el7.cern.x86_64
eos-client-4.8.79-1.el7.cern.x86_64
eos-fuse-sysv-4.8.79-1.el7.cern.x86_64
eos-server-4.8.79-1.el7.cern.x86_64
eos-protobuf3-3.5.1-5.el7.cern.eos.x86_64
eos-folly-deps-2019.11.11.00-1.el7.cern.x86_64
eos-fuse-core-4.8.79-1.el7.cern.x86_64
eos-fuse-4.8.79-1.el7.cern.x86_64
eos-testkeytab-4.8.79-1.el7.cern.x86_64
eos-nginx-1.9.9-5.x86_64
eos-xrootd-4.12.8-1.el7.cern.x86_64

rbachman · 15 September 2022 09:14

Hi @kotlyar ,
the MGM TGC is indeed still supported. It is intended to run in combination with the FST TGC (and the Archive File / AF TGC).

Your config in /etc/xrd.cf.mgm seems good, as far as I can tell. Note that mgmofs.tapeenabled true needs to be set as well, but I’m willing to bet that this is the case.

How is your FST TGC configured? Could it be that it is set up in a manner such that it always picks off files before the MGM TGC has a chance to collect them? The config file for this should be found at /etc/cta/cta-fst-gcd.conf.

A question about your setup: Is it really your intention to run the MGM TGC on the default space? The default space is usually for archivals, but the MGM TGC is intended to clean up after file retrievals. We would run it on what we call ‘spinners’ space. More details on this can be found in @jleduc’s presentation at the EOS workshop this spring EOS workshop (7-10 March 2022): How to enable EOS for tape · Indico .

I see that our docs are a bit lacking here. I’ll try to expand the wiki content on this in the future.

kotlyar · 15 September 2022 13:20

Dear Richard,

many thx for your replay!

We have very simple setup with one (default) space for archive and retrieve and I would like to avoid cta-fst-gcd if possible. So only one internal garbage collector(thread? combinations of threads?) on mgm node is far enough for our setup.

Our intention to store as much as possible and as long as possible files on EOSCTA disks even if they are migrated to tape

Cheers
Victor

rbachman · 27 September 2022 08:59

Hi @kotlyar, we’ve put up a page where we document our recommended garbage collection setup: Garbage Collection - EOSCTA Docs

Please have a look to see if this solves your issue. In short, the three garbage collectors are really meant to run in unison.
Cheers!

kotlyar · 27 September 2022 11:16

Hi Richard (@rbachman )
many thx for your help!
I have installed FST TGC for our all in docker setup in isolated container and it works fine.
As note maybe you have to add auth setup needed for fst-gcd to work as soon as it is not described in the docs. I mean what permissions on the EOS side it must have to work. As example I used eos vid set map -sss gcd vuid:0 vgid:0 for gcd sss key.

Cheers
Victor

jleduc · 27 September 2022 19:31

Hi Victor,
thank you for this contribution, indeed fully isolating FST GCD in a dedicated container with its own authentication could be a nice refinement.
Something like illustrating how the various CTA-EOS SSS keys/authentication methods are used for what would be interesting as well.
I tried to give a tip about how to configure the 2 types of GCD to play well together: do not hesitate to update this thread with what you use.
Indeed these values greatly depends on the amount of FS imbalance you can afford in your system.

Cheers,
Julien