Dealing with failed requests

denis.lujanski · 8 October 2021 00:10

Hi there,

I am wondering what the preferred way of dealing with failed archive requests is. We have a few requests which have failed for a legitimate reason, but I am wondering how to go about the files, which are now permanently on disk. I would like to get them onto tape as easily and autonomously as possible.

I thought about triggering the sync::closew workflow using this command:

eos file workflow <filepath> default sync::closew

This seems to do nothing, other than generate an appropriate log entry in WFE.log:

211008 00:01:37 INFO  WFE:1615                       default SYNC::CLOSEW <filepath> cta-frontend-ctafrontend.cta.svc.cluster.archive:10955 fxid=00010705 mgm.reqid=""

Normally, there would also be a corresponding entry in cta-frontend.log, but not when I trigger the workflow manually. I tried clearing the failed request, to see if maybe it’s ignoring the trigger because it is a failed request already, but had no luck.

The only way I can get it to re-issue the archive request is if I copy to a local directory (via eosd), then remove it from little eos, then re-upload it to little eos:

cp <filepath> .
rm -f <filepath>
cp $(basename <filepath>) $(dirname <filepath>)

Is there a cleaner way to do this?

Thank as always for your help!

Denis

mdavis · 12 November 2021 10:53

There is a command-line tool for this, cta-send-closew.sh. This tool triggers the CLOSEW event on the specified file(s).

denis.lujanski · 15 November 2021 01:41

Thanks very much, Michael. I have tried this but ran into:

usr/bin/cta-send-closew.sh /eos/cta/dev/manual_closew_event/ontape/thisisthefile
Caught exception: instance must be specified in /etc/cta/cta-cli.conf

Contents of the conf file:

cta.endpoint ctafrontend:10955

Looking at what I think might be the right bit of code:

  XrdSsiPb::Config config(config_file, "eos");

  for(auto &conf_option : std::vector<std::string>({ "instance", "requester.user", "requester.group" })) {
    if(!config.getOptionValueStr(conf_option).first) {
      throw std::runtime_error(conf_option + " must be specified in " + config_file);
    }
  }
  const std::string &eos_instance = config.getOptionValueStr("instance").second;
  const std::string &eos_endpoint = config.getOptionValueStr("endpoint").first ? config.getOptionValueStr("endpoint").second : "localhost:1095";

It is expecting something like:

cta.instance cta
cta.endpoint ctafrontend:10955

But unfortunately the outcome is still the same? Any ideas?

Thanks again,

Denis

biyujiang · 15 November 2021 09:56

We also have such problems. When executing cta-send-closew.sh, it says that -i/–instance and other information are needed. And if we execute the following command with some tricks, it works:

eos --json fileinfo /path/to/file | cta-send-event CLOSEW -i eosname -u user -g group

And the confusing thing is that for some files, only one copy is failed, and the other one is good( not listed in cta fr ls), but two copies are waiting for writing after executing the above command.

Should cta write only the failed one? Or write two copies for the failed files?

denis.lujanski · 15 November 2021 22:42

That’s a neat workaround! Thank you for sharing it
Unfortunately I can’t seem to be able to use it (perhaps I am on a different version of cta):

eos --json fileinfo /eos/cta/dev/manual_closew_event/ontape/thisisthefile  | cta-send-event CLOSEW -u apache -g apache -i cta
Caught exception: Usage: eos --json fileinfo /eos/path | cta-send-event CLOSEW|PREPARE

eos --json fileinfo /eos/cta/dev/manual_closew_event/ontape/thisisthefile  | cta-send-event CLOSEW
Caught exception: instance must be specified in /etc/cta/cta-cli.conf

We only write one copy to tape, so we haven’t encountered this particular issue. It sounds like it should only retry the failed write. Perhaps it will queue both, but only actually re-write one (and ignore the other)?

Cheers,

Denis