Set up mlsensor on EOSCTA for ALICE storage accounting

george_patargias · 29 June 2022 10:33

Hello,

We are trying to configure mlsensor on the MGM of our EOSCTA instance so that the instance can report Storage-provided information on http://alimonitor.cern.ch.

The mlsensor is running but we get some errors/warnings in the /var/log/mlsensor/MLSensor0.log (pasted at the bottom). Can you please share your /etc/mlsensor/mlsensor.properties with us?
I copy/paste ours for reference; maybe we got some setting wrong.

Thanks,

George

# Address of local MonaLisa service; this is usually the VOBOX (unles there is a site local ML proxy service)
# make sure UDP/8884 is open on VOBOX machine
mlsensor.apmon.destinations = lcgvo-alice-1.gridpp.rl.ac.uk:8884
# mlsensor.apmon.destinations = t1rtr1-ip180.gridpp.rl.ac.uk:8884

############### Format of the data to send #############################
# cluster name to send the data to; This should be the name of the ALICE storage name
cluster.name=ALICE::RAL::CTA

# enable an extra suffix for cluster names
# AliEnFilter creates aggregated values for all clusters ending in _Nodes
cluster.name.suffix.enabled = true
cluster.name.suffix = _xrootd_Nodes

# append a module-dependent cluster name suffix
cluster.name.dynamic = false

# if the FQDN cannot be determined or is the wrong one override it with this option
#node.name=<defaults to fqdn of localhost>

################ Logging configuration ##################################
##
## How much logging info
## MIN is .level = OFF
## MAX is .level = ALL
##
## Other values for this parameter can be: SEVERE, WARNING, CONFIG, INFO, FINE, FINER, FINEST
## Please notice that the last two options are used only for debugging and generates large output!
##
## this option is better to be left as it is. Please notice the dot before level .
.level = OFF
lia.level = INFO
mlsensor.level = INFO

###
#monDiskIOStat config section
###
monDiskIOStat.configFile=/etc/mlsensor/mlsensor.properties

#allowedDevices=
#deniedDevices=

################ Advanced logging ( 'logrotate' style )###########################
##
## If you wold like to enable MonALISA to "logrotate" it's logs
## please comment the upper 3 lines and uncomment the following ones
##
## This will create 4 files that will be logrotated, after reaching the size limit
##
handlers= java.util.logging.FileHandler
java.util.logging.FileHandler.formatter = java.util.logging.SimpleFormatter

# File size in bytes!
java.util.logging.FileHandler.limit = 1000000

#Number of files used in cycle through
java.util.logging.FileHandler.count = 4

#Whether should append at the end of a file log or start with a new one
java.util.logging.FileHandler.append = true
java.util.logging.FileHandler.pattern = /var/log/mlsensor/MLSensor%g.log

## logging to stdout and stderr options
## MonaLisa uses standard logging included since java 1.4
# handlers= java.util.logging.ConsoleHandler
# java.util.logging.ConsoleHandler.level = FINEST
# java.util.logging.ConsoleHandler.formatter = java.util.logging.SimpleFormatter

# Monitor xrootd disk space usage
mlsensor.modules=monXrdSpace

# Be compatible with legacy Perl xrootd monitoring companion
rewrite.parameter.names=true
cluster.name.suffix.monXrdSpace=_manager_xrootd_Services

# Run the disk space checking every 5 minutes
monXrdSpace.execDelay=300

# list of localhost ports of xrootd processes (can be more than one)
lia.Monitor.modules.monXrdSpace.args=1094

*******************************************************************************************************


*******************************************************************************************************

Jun 29, 2022 11:21:02 AM mlsensor.monitor.MonitorTask run
WARNING: [ MonitorTask ] Exception executing/sending result from lia.Monitor.modules.monXrdSpace . Cause:
java.io.IOException: No servers from this configuration answered, bailing out for now: {antares-eos01.scd.rl.ac.uk=[1094]}
        at lia.Monitor.modules.monXrdSpace.doProcess(monXrdSpace.java:286)
        at mlsensor.monitor.MonitorTask.run(MonitorTask.java:56)
        at lia.util.threads.MLExecutorsFactory$SafeRunnable.run(MLExecutorsFactory.java:234)
        at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
        at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
        at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
        at java.lang.Thread.run(Thread.java:750)

mdavis · 30 June 2022 13:19

Hi George,

The configuration of ALICE monitoring is not something we have experience of on the CTA team. At Tier-0, we provide an EOSCTA endpoint and the ALICE data management team configure their probe against our endpoint.

Please contact the ALICE experiment for any questions about their tools and monitoring infrastructure.

Best regards,

Michael

george_patargias · 5 July 2022 15:17

Hi Michael,

Thanks for the reply. I am already in contact with the ALICE people; just thought you guys might know this.

I think the ALICE requirement in the EOS MGM config is

all.sitename ALICE::RAL::CTA

which of course will break the FTS-mediated SSD eviction for every every body else if implemented. I heard that the next FTS version will dispense with the need to set this var in the MGM config.

George