How to remove a file from the catalogue when file is missing from the EOS namespace

Hello,

I have 4 files in the CTA catalogue that are on “full tapes”, but aren’t present in the EOS namespace at all. I’m not sure how this situation was created, I might have to look a bit closer to see if I can replicate it. I have a hypothesis, but I’ll attempt it first before I create rabbit holes for anyone to dive into :slight_smile:

I have deleted all data off the EOS namespace, and was going through a process of reclaiming or repacking tapes. I’m not sure how to progress from here.

The tape ls output, edited for width.

[root@ctafrontend-0 /]# cta-admin ta ls --all | grep true
   vid media type vendor   library tapepool        vo encryption key name capacity occupancy last fseq  full disabled rdonly 
A01007    VTLtape  MHVTL spooltest      vtl CloudStor                   -    10.0G      8.6G      1582  true    false  false 
A01008    VTLtape  MHVTL spooltest      vtl CloudStor                   -    10.0G      6.4G      1182  true    false  false 
A01016    VTLtape  MHVTL spooltest     vtl2 CloudStor                   -    10.0G     10.3G      1952  true    false  false 
A01019    VTLtape  MHVTL spooltest     vtl2 CloudStor                   -    10.0G      9.8G      1805  true    false  false 

Selecting the first tape in the list:

[root@ctafrontend-0 /]# cta-admin tf ls -v A01007
archive id copy no    vid fseq block id instance disk fxid  size checksum type checksum value   storage class owner group    creation time ss vid ss fseq path
      1240       1 A01007    1        0      cta       726 20.9M       ADLER32       407a8d98 ctaStorageClass    48    48 2020-07-31 04:40      -       0    -
      1255       1 A01007    2       90      cta       728 20.9M       ADLER32       407a8d98 ctaStorageClass    48    48 2020-08-02 23:03      -       0    -

The fxids across the 4 tapes are: 723, 726, 728, and 72b. Checking the EOS namespace, fileinfo reports no file present.

EOS Console [root://localhost] |/eos/cta/> fileinfo fxid:723
error: cannot retrieve file meta data - Error while fetching FileMD #1827 protobuf from QDB: Empty response (errc=2) (No such file or directory)
EOS Console [root://localhost] |/eos/cta/> fileinfo fxid:726
error: cannot retrieve file meta data - Error while fetching FileMD #1830 protobuf from QDB: Empty response (errc=2) (No such file or directory)
EOS Console [root://localhost] |/eos/cta/> fileinfo fxid:728
error: cannot retrieve file meta data - Error while fetching FileMD #1832 protobuf from QDB: Empty response (errc=2) (No such file or directory)
EOS Console [root://localhost] |/eos/cta/> fileinfo fxid:72b
error: cannot retrieve file meta data - Error while fetching FileMD #1835 protobuf from QDB: Empty response (errc=2) (No such file or directory)

Attempts to reclaim the tape, repack the tapes, or even force the issue with an tape rm aren’t working, as CTA (rightly) rejects the command saying the tapes hold files.

I’ve seen https://eoscta.docs.cern.ch/lifecycle/Delete/ and the discussion of asynchronous reconciliation, but I haven’t been able to find any steps for this, barring me diving into the CTA catalogue with a SQL interface.

I can verify the files are gone from the EOS namespace, I don’t wish to keep them, and wish to reuse the tapes. What is the method for clearing up this particular issue?

Hi David,

This situation can happen if the CTA Frontend is down when the file is deleted from EOS. In this case we decided that the non-availability of CTA should not block the deletion of the file from the EOS namespace.

Ultimately we will deal with these cases through a reconciliation procedure between the EOS namespace and the CTA file catalogue. This procedure and its supporting tools are not available yet, so for the meantime the situation has to be resolved manually.

There are two ways in which this could be accomplished:

1. Send a DELETE event to the CTA Frontend.

There is a tool cta-send-event (see CTA/cmdline/CtaSendEvent.cpp and CTA/cmdline/cta-send-event.sh). This tool sends CLOSEW and PREPARE events to the CTA Frontend to intervene in the case of failed archivals/failed recalls.

I deliberately did not add DELETE events to this tool as we don’t have a procedure for reconciliation yet. However if you wanted to modify this tool to allow DELETE events it is quite straightforward. In fillNotification(), set notification.mutable_wf()->set_event(cta::eos::Workflow::DELETE);

2. Manual database intervention

Use your favourite SQL client to connect to the DB. Delete all the rows from the ARCHIVE_FILE and TAPE_FILE tables for all files on the tapes you wish to reclaim.

Sorry it’s a bit low-level for now. In future we will have a better process for dealing with this case.

1 Like

Thankyou @mdavis,

Both of those steps are sufficient for the time being. As it’s only 4 files, I’ll dive into the DB, it’d be easy enough to script up as well should we need to.

As we use k8s, it’s very easy to tie a health check for CTA to EOS, rejecting certain operations to users. We’ll have to discuss internally if that’s what we want to do.