Hi everyone,
at CERN we’re thinking about making a push to release some of our CTA operator tools, which have thus far lived inside of a separate private repo.
The operator tools are what we use to allow operators and automation systems to perform CTA-related tasks, as well as what powers the data-gathering parts of our monitoring.
Here is the work-in-progress plan for releasing these tools:
The code will be GLPv3 licensed.
We will create a new public repo at gitlab.cern.ch, then we’ll gradually migrate tools there.
Packaging wise we’re thinking of providing pip packages, since most of the code is Python, and rpms, which wrap the installation process of said pip packages and other needed dependencies.
We’ll start with three components to test the waters and work out the details. These would be:
‘ctautils’ - a set of utility modules used by most of the tools
‘tapeadmin’ - a module for tape-specific things
'ATRESYS - A freshly made set of tools for automating and tracking repacks and the surrounding tape lifecycle
We have some internal documentation for our tools and metrics, these can be published on the Gitlab wiki together with the code as it moves.
The released software will be as generic as possible: CERNs-specific details should either go/stay in our Puppet profiles (which will not be shared), or be replaced by config files such that everyone can adapt the tools to their own setups.
I would be interested in hearing your thoughts on the plan above and how well this works for various sites once we get going.
There are also some things I’m not sure on how to do properly.
For instance, it would be nice to share the Grafana dashboards that go with the metrics, but as far as I know there is no systematic way of exposing them from Grafana’s built-in version control.
Perhaps committing the exported json to the repo from time to time would be sufficient?
The situation is similar for the Rendeck job definitions as well.
@snorenberg@mwai@kotlyar (it looked like you might be interested, based on previous discussions)
super! actually I recently thought about supply pool realization at CERN and about possible monitoring solutions so as far as I understand that is a good direction to go. Good luck!
Some notes from my point of view:
we do not use Grafana, Rendeck so I can not help you here
using wiki looks not good, maybe .md based documentation for self-documenting git repo is better (with or without using *.md site generators)
using pip for python: maybe it would be good to do everything about python in python-env (maybe based on miniconda) and avoid completely system default packages. They usually provide requirements file for dependency
actually we are interested in using containers as soon as we have Ubuntu everywhere
and we are using ansible playbooks for setups if possible
in theory, if for testing CI/CD you will create something based on containers and ansible(as replace for bash scripts) it also worth for sharing
Thanks Viktor, this is all good to know and I’ll take it into consideration
The Gitlab Wiki feature is entirely markdown based, so each article would be a .md file in the project. One can even clone the wiki part as a repo on its own and work with it locally, or render it in some other way.
Yes, this is something we could work on. We faced a similar issue with colliding package versions from the CERN monitoring setup. For now we rely on creating a special path /opt/ctaops-lib/... where we put our dependencies, and then giving our scripts preference for these when importing.
External dependencies are already tracked and version locked using a requirements.txt, so this we’ll include as well.
Please note the versioning and release scheme:
Releases and pip packages are tagged with versions x.y, where x is incremented when CTA (in particular tape-admin) changes in a backwards-incompatible way, and y is incremented when there is an update to the operator tools for the same CTA release. The package index may also contain packages versioned with x.y-devz. These are release candidates which we are testing internally and should not be used unless you are feeling particularly adventurous.
The present 1.y tags of the operator tools are tested with CTA v4.8.*.
The 1.2 tag is the candidate for the switchover on our production setup, where we will start using the contents of the public repo instead of the corresponding contents in our private repo.
What is available now
CTA Operations script libraries:
ctautils - collection of helpers and wrappers which are re-used across the various operator tools
tapeadmin - library for interacting with tape media and libraries
ATRESYS - Automated Tape REpacking SYStem, a tool for managing the tape life-cycle surrounding repack operations. Vlado will present this tool at the EOS workshop.
Also includes some config and examples for monitoring.
A requirements.txt file with recommended dependency versions. The idea is to install the packages specified here into a dedicated venv.
A config file template. You will have to install this manually and adjust to make it work with your setup. Configuration for future tools will be added to this template as well.
Makefiles and .gitlab-ci.yml for easy building.
Things coming next
An RPM for managing non-python dependencies and versionlocking to specific CTA versions.
More tools, we’re considering the EOSCTA namespace reconciliation scripts and our ACL management tools.
More monitoring examples
Feedback
When you find the time to play around with this we’d appreciate any feedback you may have on this so far, in particular:
If there were any dependencies we missed.
For ATRESYS in particular: Whether or not the installation of psycopg2 (pip pkg for postgresql interactions) works for you. On CC7 we had issues with both the binary and non-binary pip package on Pypi, so we have built it from source and installed that instead. Instructions for this maneuver are in the wiki.
Hi everyone, just wanted to let you know that thanks to the efforts of @lwardena and our summer student Thomas, we now have a new release of the operations utilities (1.4) which includes many new tools:
The tape verification tool, which is used to periodically read back tape data in order to verify its integrity
The tape supply tool, which we use to automatically re-fill the tapepools belonging to VOs from pools of fresh tapes
The cta-ops-admin command, which is a customisable wrapper around cta-admin and a set of tape scripts used for testing drives, labelling tapes, etc.
Two EOS-specific tools, which allow for the changing for storage classes and for fetching the path in EOS corresponding to a file in CTA.
There are also a few tweaks to the repack automation system.
We’ll move our setup to use the public tools instead of the internal edition in the next days, so there will likely be a new tag with bugfixes soon. In the mean time feel free to check these out and play with them, and report bugs!
Dear CTA Operators,
it took a little longer than I liked, but at long last the 2.0 (CC7) and 2.1 (Alma9) tags of the Operator utilities are now ready. We have tested this version for the past weeks with CTA 5.10.10.1-1. These new tags include:
CTA Operations Utilities 2.0 (CC7)
The drive name generation script cta-ops-drive-config-generate, which collects information from the library-provided drive information and produces names following our convention (as described at the CTA workshop)
The ability for cta-ops-admin to execute drive commands on the set of locally installed drives
Adjustments to ATRESYS, the repack automation script, to account for the repack sub-request workflow introduced to CTA.
Code quality and logging consistency improvements
CTA Operations Utilities 2.1 (AlmaLinux 9)
Same as the above, but with additional adjustments we found were needed for our migration to Alma9
The tapealerting package, which contains scripts for monitoring and automatically performing simple action on tape drives and media in the event of something going wrong (see also the CTA workshop talk).
The ctaopsdrvenv package, which contains a tool for reading the internal humidity and temperature sensor data for modern drive models. We use this to monitor the conditions in our data centre and alert us if they are outside of the drive recommendations.
rsyslog config examples for json-style logging in CTA