Files archived but eoscta disagrees

Hi everyone,
I have encountered a rather strange issue. All files that we write to eoscta end up on tape, and both the cta catalog and the eos namespace agree on this. However I have discovered a small percentage of files are archived without any issues by cta, but eoscta does not believe the files are on tape.

Any idea how / why this might happen and how to fix it (get the metadata stores back in sync)?

I’ve so far tried this to get the state of the file back in syncL
evict file (cannot evict because file not on tape - at least according to eoscta)
stage file (cannot stage file because file not on tape - at least according to eoscta)
eos file verify (no change)
eos fs boot $fsid --syncmgm (no change)
eos file workflow $path default sync::closew (no change)

Example:

# From EOS Console
EOS Console [root://localhost] |/> file info /eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81
  File: '/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81'  Flags: 0644
  Size: 1049357764
Modify: Mon Aug 23 03:13:11 2021 Timestamp: 1629688391.661305000
Change: Mon Aug 23 03:13:07 2021 Timestamp: 1629688387.576249383
 Birth: Mon Aug 23 03:13:07 2021 Timestamp: 1629688387.572872566
  CUid: 48 CGid: 48 Fxid: 0008239d Fid: 533405 Pid: 75170 Pxid: 000125a2
XStype: adler    XS: 15 d8 50 60    ETAGs: "143184814407680:15d85060"
Layout: replica Stripes: 1 Blocksize: 4k LayoutId: 00100012 Redundancy: d1::t0 
  #Rep: 1
┌───┬──────┬──────────────────────────────────────┬────────────────┬────────────────┬──────────┬──────────────┬────────────┬────────┬────────────────────────┐
│no.│ fs-id│                                  host│      schedgroup│            path│      boot│  configstatus│       drain│  active│                  geotag│
└───┴──────┴──────────────────────────────────────┴────────────────┴────────────────┴──────────┴──────────────┴────────────┴────────┴────────────────────────┘
 0       37 za16hr4s-0.fst.cta.svc.cluster.archive       default.36            /data     booted             rw      nodrain   online            crlt::crlt-v2 

*******

EOS Console [root://localhost] |/> attr ls /eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81
sys.archive.error=""
sys.archive.file_id="13665"
sys.archive.storage_class="single-copy-backup"
sys.cta.archive.objectstore.id=""
sys.eos.btime="1629688387.572872566"
sys.utrace="09dcf626-03c0-11ec-9841-32d4fb053b09"
sys.vtrace="[Mon Aug 23 03:13:07 2021] uid:48[apache] gid:48[apache] tident:apache.1:1@10-54-4-66.nginx-proxy.cta.svc.cluster.archive name:apache dn: prot:https host:consumer.consumer.cta.svc.cluster.archive=>https domain:consumer.cta.svc.cluster.archive=>https geo: sudo:0"

From CTA Console:

[root@cta-tapeserver-ctatapeserver-0 /]# cta-admin --json tf ls -i cta -f 0008239d -l | jq
[
  {
    "af": {
      "archiveId": "13665",
      "storageClass": "single-copy-backup",
      "creationTime": "1629693596",
      "checksum": [
        {
          "type": "ADLER32",
          "value": "15d85060"
        }
      ],
      "size": "1049357764"
    },
    "df": {
      "diskId": "533405",
      "diskInstance": "cta",
      "ownerId": {
        "uid": 48,
        "gid": 48
      },
      "path": "/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81"
    },
    "tf": {
      "vid": "A00764",
      "copyNb": 1,
      "blockId": "10667110",
      "fSeq": "2669"
    }
  }
]

As you can see, all appears healthy, except eoscta does not recognise the tape copy and is holding on to the disk copy. The log entries appear normal:

From MGM log:

# File being written to eoscta:
210823 03:13:07 time=1629688387.571867 func=Put                      level=INFO  logid=static.............................. unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcbfdcce700 source=HttpHandler:709                tident= sec=(null) uid=99 gid=99 name=- geo="" method=PUT path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81
210823 03:13:07 time=1629688387.572350 func=open                     level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcbfdcce700 source=XrdMgmOfsFile:497              tident=apache.1:1@10-54-4-66.nginx-proxy.cta.svc.cluster.archive sec=https uid=48 gid=48 name=apache geo="" op=write trunc=512 path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 info=&eos.app=http&eos.bookingsize=1049357764&eos.targetsize=1049357764
210823 03:13:07 time=1629688387.572437 func=open                     level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcbfdcce700 source=XrdMgmOfsFile:619              tident=apache.1:1@10-54-4-66.nginx-proxy.cta.svc.cluster.archive sec=https uid=48 gid=48 name=apache geo="" msg="rewrote symlinks" sym-path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 realpath=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81
210823 03:13:07 time=1629688387.573597 func=HandleProtoMethodEvents  level=INFO  logid=static.............................. unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcbfdcce700 source=WFE:1615                       tident= sec=(null) uid=99 gid=99 name=- geo="" default SYNC::CREATE /eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 cta-frontend-ctafrontend.cta.svc.cluster.archive:10955 fxid=0008239d mgm.reqid=""
210823 03:13:07 time=1629688387.575881 func=SendProtoWFRequest       level=INFO  logid=static.............................. unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcbfdcce700 source=WFE:2460                       tident= sec=(null) uid=99 gid=99 name=- geo="" protoWFEndPoint="cta-frontend-ctafrontend.cta.svc.cluster.archive:10955" protoWFResource="/ctafrontend" fullPath="/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81" event="sync::create" timeSpentMs=2 msg="Sent SSI protocol buffer request"
210823 03:13:07 time=1629688387.576599 func=open                     level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcbfdcce700 source=XrdMgmOfsFile:2934             tident=apache.1:1@10-54-4-66.nginx-proxy.cta.svc.cluster.archive sec=https uid=48 gid=48 name=apache geo="" op=write path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 info=&eos.app=http&eos.bookingsize=1049357764&eos.targetsize=1049357764 target[0]=(za16hr4s-0.fst.cta.svc.cluster.archive,37)  redirection=za16hr4s-0.fst.cta.svc.cluster.archive?&cap.sym=<...>&cap.msg=<...>&mgm.logid=09dcf626-03c0-11ec-9841-32d4fb053b09&mgm.replicaindex=0&mgm.replicahead=0&mgm.etag="143184814407680:00000000"&mgm.id=0008239d&mgm.event=sync::closew&mgm.workflow=default&mgm.instance=cta&mgm.owner_uid=48&mgm.owner_gid=48&mgm.requestor=apache&mgm.requestorgroup=apache&mgm.attributes=c3lzLmFyY2hpdmUuZmlsZV9pZD0xMzY2NTs7O3N5cy5hcmNoaXZlLnN0b3JhZ2VfY2xhc3M9c2luZ2xlLWNvcHktYmFja3Vw xrd_port=1095 http_port=8001
210823 03:13:11 time=1629688391.661932 func=log_info                 level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fce939f9700 source=CommitHelper:209               tident=daemon.1:1374@za16hr4s-0.fst.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" subcmd=commit path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 size=1049357764 fxid=0008239d fsid=37 dropfsid= checksum=15d85060 mtime=1629688391 mtime.nsec=661305000 oc-chunk=0 oc-n=0 oc-max=0 oc-uuid=
210823 03:13:11 time=1629688391.662316 func=Commit                   level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fce939f9700 source=Commit:290                     tident=daemon.1:1374@za16hr4s-0.fst.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" commitsize=1 n1=f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 n2=f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 occhunk=0 ocdone=0
210823 03:13:11 time=1629688391.662765 func=Event                    level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fce939f9700 source=Event:112                      tident=daemon.1:1374@za16hr4s-0.fst.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" subcmd=event event=sync::closew path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 fid=0008239d
210823 03:13:11 time=1629688391.663031 func=HandleProtoMethodEvents  level=INFO  logid=static.............................. unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fce939f9700 source=WFE:1615                       tident= sec=(null) uid=99 gid=99 name=- geo="" default SYNC::CLOSEW /eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 cta-frontend-ctafrontend.cta.svc.cluster.archive:10955 fxid=0008239d mgm.reqid=""
...
# File being read by tapeserver:
210823 04:38:08 time=1629693488.063590 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcedacfc700 source=XrdMgmOfsFile:532              tident=cta.68652:1526@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" msg="access by inode" ino=fxid:8239d path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81
210823 04:38:08 time=1629693488.064318 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcedacfc700 source=XrdMgmOfsFile:2938             tident=cta.68652:1526@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" op=read path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 info=eos.lfn=fxid:8239d target[0]=(za16hr4s-0.fst.cta.svc.cluster.archive,37)  redirection=za16hr4s-0.fst.cta.svc.cluster.archive?&cap.sym=<...>&cap.msg=<...>&mgm.logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09&mgm.blockchecksum=ignore&mgm.replicaindex=0&mgm.replicahead=0&mgm.id=0008239d&mgm.mtime=1629688391 xrd_port=1095 http_port=8001
...

#  Receiving acknowledgement from tapeserver that file is now archived:
210823 04:38:08 time=1629693488.063590 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcedacfc700 source=XrdMgmOfsFile:532              tident=cta.68652:1526@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" msg="access by inode" ino=fxid:8239d path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81
210823 04:38:08 time=1629693488.064318 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=mgm@eos-mgm-0.mgm.cta.svc.cluster.archive:1094 tid=00007fcedacfc700 source=XrdMgmOfsFile:2938             tident=cta.68652:1526@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=sss   uid=2 gid=2 name=daemon geo="" op=read path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 info=eos.lfn=fxid:8239d target[0]=(za16hr4s-0.fst.cta.svc.cluster.archive,37)  redirection=za16hr4s-0.fst.cta.svc.cluster.archive?&cap.sym=<...>&cap.msg=<...>&mgm.logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09&mgm.blockchecksum=ignore&mgm.replicaindex=0&mgm.replicahead=0&mgm.id=0008239d&mgm.mtime=1629688391 xrd_port=1095 http_port=8001

FST Logs:

#File being written to eoscta:
210823 03:13:07 time=1629688387.580782 func=open                     level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6bb27a5700 source=XrdFstOfsFile:121              tident=nobody sec=unix  uid=0 gid=0 name=nobod
y geo="" path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 info=&cap.sym=<...>&cap.msg=<...>&mgm.logid=09dcf626-03c0-11ec-9841-32d4fb053b09&mgm.replicaindex=0&mgm.replicahead=0&mgm.etag="14
3184814407680:00000000"&mgm.id=0008239d&mgm.event=sync::closew&mgm.workflow=default&mgm.instance=cta&mgm.owner_uid=48&mgm.owner_gid=48&mgm.requestor=apache&mgm.requestorgroup=apache&mgm.attributes=c3lzLmFyY2hpdmUuZmlsZV9pZD0xMzY2NTs7O3N5cy5hcmNoaXZlLnN0b3JhZ2VfY2xhc3M9c2luZ2xlLWNvc
HktYmFja3Vw open_mode=4302
210823 03:13:07 time=1629688387.580939 func=ProcessCapOpaque         level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6bb27a5700 source=XrdFstOfsFile:2268             tident=nobody sec=(null) uid=99 gid=99 name=(n
ull) geo="" capability=&tapeenabled=1&mgm.access=create&mgm.ruid=48&mgm.rgid=48&mgm.uid=99&mgm.gid=99&mgm.path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81&mgm.manager=eos-mgm-0.mgm.cta.svc
.cluster.archive:1094&mgm.fid=0008239d&mgm.cid=75170&mgm.sec=https|apache|consumer.consumer.cta.svc.cluster.archive=>https|||||http&mgm.lid=1048594&mgm.bookingsize=1049357764&mgm.targetsize=1049357764&mgm.fsid=37&mgm.url0=root://za16hr4s-0.fst.cta.svc.cluster.archive:1095//&mgm.fsi
d0=37&cap.valid=1629691987
210823 03:13:07 time=1629688387.580974 func=open                     level=INFO  logid=09dcf626-03c0-11ec-9841-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6bb27a5700 source=XrdFstOfsFile:199              tident=nobody sec=(null) uid=48 gid=48 name=no
body geo="" ns_path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 fst_path=/data/00000035/0008239d
210823 03:13:12 time=1629688392.397762 func=Report                   level=INFO  logid=static.............................. unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6beadfc700 source=Report:51                      tident= sec=(null) uid=99 gid=99 name=- geo="" log=09dcf626-03c0-11ec-9841-32d4fb053b09&path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81&fstpath=/data/00000035/0008239d&ruid=48&rgid=48&td=nobody&host=za16hr4s-0.fst.cta.svc.cluster.archive&lid=1048594&fid=533405&fsid=37&ots=1629688387&otms=580&cts=1629688391&ctms=662&nrc=0&nwc=502&rb=0&rb_min=0&rb_max=0&rb_sigma=0.00&rv_op=0&rvb_min=0&rvb_max=0&rvb_sum=0&rvb_sigma=0.00&rs_op=0&rsb_min=0&rsb_max=0&rsb_sum=0&rsb_sigma=0.00&rc_min=0&rc_max=0&rc_sum=0&rc_sigma=0.00&wb=1049357764&wb_min=175095&wb_max=2095429&wb_sigma=90022.23&sfwdb=0&sbwdb=0&sxlfwdb=0&sxlbwdb=0&nfwds=0&nbwds=0&nxlfwds=0&nxlbwds=0&rt=0.00&rvt=0.00&wt=2162.88&osize=0&csize=1049357764&delete_on_close=0&sec.prot=https&sec.name=apache&sec.host=consumer.consumer.cta.svc.cluster.archive=>https&sec.vorg=&sec.grps=&sec.role=&sec.info=&sec.app=http
...

# File being read by the tapeserver:
210823 04:38:08 time=1629693488.065845 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:121              tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=unix  uid=0 gid=0 name=cta geo="" path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 info=cap.msg=<...>&cap.sym=<...>&eos.lfn=fxid:8239d&mgm.blockchecksum=ignore&mgm.id=0008239d&mgm.logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09&mgm.mtime=1629688391&mgm.replicahead=0&mgm.replicaindex=0 open_mode=0
210823 04:38:08 time=1629693488.066015 func=ProcessCapOpaque         level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:2268             tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=(null) uid=99 gid=99 name=(null) geo="" capability=&tapeenabled=1&mgm.access=read&mgm.ruid=2&mgm.rgid=2&mgm.uid=99&mgm.gid=99&mgm.path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81&mgm.manager=eos-mgm-0.mgm.cta.svc.cluster.archive:1094&mgm.fid=0008239d&mgm.cid=75170&mgm.sec=sss|daemon|10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive||daemon|||&mgm.lid=1048594&mgm.bookingsize=1024&mgm.fsid=37&mgm.url0=root://za16hr4s-0.fst.cta.svc.cluster.archive:1095//&mgm.fsid0=37&cap.valid=1629697088
210823 04:38:08 time=1629693488.066048 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:199              tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=(null) uid=2 gid=2 name=nobody geo="" ns_path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 fst_path=/data/00000035/0008239d
210823 04:38:08 time=1629693488.066154 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:470              tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=(null) uid=2 gid=2 name=nobody geo="" fst_path=/data/00000035/0008239d open-mode=0 create-mode=180 layout-name=replica oss-opaque=&mgm.lid=1048594&mgm.bookingsize=0
210823 04:38:08 time=1629693488.066166 func=Open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=ReplicaParLayout:104           tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=unix  uid=0 gid=0 name=cta geo="" replica_head=0, replica_index=0
210823 04:38:08 time=1629693488.066193 func=fileOpen                 level=INFO  logid=e9fe918c-03cb-11ec-92f2-1645b7977b45 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=LocalIo:70                     tident=<service> sec=      uid=0 gid=0 name= geo="" flags=0, path=/data/00000035/0008239d
210823 04:38:08 time=1629693488.066232 func=open                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:558              tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=(null) uid=2 gid=2 name=nobody geo="" msg="layout size" disk_size=1049357764 db_size= 1049357764
210823 04:38:08 time=1629693488.066321 func=stat                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:937              tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=      uid=2 gid=2 name=nobody geo="" path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 fxid=0008239d size=1049357764 mtime=1629688391.113121498
210823 04:38:08 time=1629693488.066536 func=stat                     level=INFO  logid=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09 unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6be4bfe700 source=XrdFstOfsFile:937              tident=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive sec=      uid=2 gid=2 name=nobody geo="" path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81 fxid=0008239d size=1049357764 mtime=1629688391.113121498
210823 04:39:37 time=1629693577.925475 func=Report                   level=INFO  logid=static.............................. unit=fst@za16hr4s-0.fst.cta.svc.cluster.archive:1095 tid=00007f6beadfc700 source=Report:51                      tident= sec=(null) uid=99 gid=99 name=- geo="" log=e9fe18ec-03cb-11ec-a8e1-32d4fb053b09&path=/eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81&fstpath=/data/00000035/0008239d&ruid=2&rgid=2&td=cta.68652:142@10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive&host=za16hr4s-0.fst.cta.svc.cluster.archive&lid=1048594&fid=533405&fsid=37&ots=1629693488&otms=65&cts=1629693577&ctms=790&nrc=4004&nwc=0&rb=1049357764&rb_min=257476&rb_max=262144&rb_sigma=73.77&rv_op=0&rvb_min=0&rvb_max=0&rvb_sum=0&rvb_sigma=0.00&rs_op=0&rsb_min=0&rsb_max=0&rsb_sum=0&rsb_sigma=0.00&rc_min=0&rc_max=0&rc_sum=0&rc_sigma=0.00&wb=0&wb_min=0&wb_max=0&wb_sigma=0.00&sfwdb=0&sbwdb=0&sxlfwdb=0&sxlbwdb=0&nfwds=0&nbwds=0&nxlfwds=0&nxlbwds=0&rt=2443.48&rvt=0.00&wt=0.00&osize=1049357764&csize=1049357764&delete_on_close=0&sec.prot=sss&sec.name=daemon&sec.host=10-54-0-105.cta-tapeserver-ctatapeserver.cta.svc.cluster.archive&sec.vorg=&sec.grps=daemon&sec.role=&sec.info=&sec.app=

Tapeserver logs:

Aug 23 04:31:11.245568 cta-tapeserver-ctatapeserver-0 cta-taped: LVL="INFO" PID="68652" TID="68773" MSG="Created tasks for migrating a file" thread="MigrationTaskInjector" tapeDrive="DR3" tapeVid="A00764" mountId="80" fileId="13665" fSeq="2669" path="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" 
Aug 23 04:38:08.066638 cta-tapeserver-ctatapeserver-0 cta-taped: LVL="INFO" PID="68652" TID="68761" MSG="Opened disk file for read" thread="DiskRead" tapeDrive="DR3" tapeVid="A00764" mountId="80" threadID="6" path="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" actualURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" fileId="13665" 
Aug 23 04:39:32.160130 cta-tapeserver-ctatapeserver-0 cta-taped: LVL="DEBUG" PID="68652" TID="68771" MSG="Successfully opened the tape file for writing" thread="TapeWrite" tapeDrive="DR3" tapeVid="A00764" mountId="80" vo="Shared" mediaType="LTO7" tapePool="crlt-piranha-tapes" logicalLibrary="crlt-piranha" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="6000000000000" fileId="13665" fileSize="1049357764" fSeq="2669" diskURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" 
Aug 23 04:39:37.789868 cta-tapeserver-ctatapeserver-0 cta-taped: LVL="INFO" PID="68652" TID="68761" MSG="File successfully read from disk" thread="DiskRead" tapeDrive="DR3" tapeVid="A00764" mountId="80" threadID="6" path="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" actualURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" fileId="13665" readWriteTime="2.474822" checksumingTime="0.000000" waitFreeMemoryTime="87.245044" waitDataTime="0.000000" waitReportingTime="0.000000" checkingErrorTime="0.003302" openingTime="0.003818" transferTime="89.726989" totalTime="89.726989" dataVolume="1049357764" globalPayloadTransferSpeedMBps="11.695007" diskPerformanceMBps="11.695007" openRWCloseToTransferTimeRatio="0.027624" 
Aug 23 04:39:38.130543 cta-tapeserver-ctatapeserver-0 cta-taped: LVL="DEBUG" PID="68652" TID="68771" MSG="In MigrationReportPacker::reportCompletedJob(), pushing a report." thread="TapeWrite" tapeDrive="DR3" tapeVid="A00764" mountId="80" vo="Shared" mediaType="LTO7" tapePool="crlt-piranha-tapes" logicalLibrary="crlt-piranha" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="6000000000000" fileId="13665" fileSize="1049357764" fSeq="2669" diskURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" type="ReportSuccessful" 
Aug 23 04:39:38.131280 cta-tapeserver-ctatapeserver-0 cta-taped: LVL="INFO" PID="68652" TID="68771" MSG="File successfully transmitted to drive" thread="TapeWrite" tapeDrive="DR3" tapeVid="A00764" mountId="80" vo="Shared" mediaType="LTO7" tapePool="crlt-piranha-tapes" logicalLibrary="crlt-piranha" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="6000000000000" fileId="13665" fileSize="1049357764" fSeq="2669" diskURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive:1094//eos/cta/general-s3-1/aarnet.edu.au/test_fileset_1@aarnet.edu.au/data/f7/f7d2e2f009daeb2db67f05971de748a2137abdbc9a34378b3c8e8f1326248d81?eos.lfn=fxid:8239d" readWriteTime="5.455891" checksumingTime="0.475557" waitDataTime="0.044759" waitReportingTime="0.000729" transferTime="5.976936" totalTime="5.976912" dataVolume="1049357764" headerVolume="480" driveTransferSpeedMBps="175.568629" payloadTransferSpeedMBps="175.568548" reconciliationTime="0" LBPMode="LBP_On" 

Thanks in advance for your help,

Cheers,

Denis

Hi Denis,

It’s likely that the cause of this is that some jobs failed during the reporting stage. So the file was archived to tape, but CTA failed to report this fact to EOS.

You should be able to examine the failed jobs queue to verify this. Use cta-admin failedrequest ls to view the failed jobs. cta-admin fr ls -S gives a summary, to examine the detailed logs use cta-admin --json fr ls -l.

Hi Michael,

Thanks for that. The failed jobs queue is empty unfortunately. I suspect that this might be more of an EOS problem, than CTA. CTA looks to be reporting to EOS successfully. You can see the attributes on the file are even set correctly:

sys.archive.error=""
sys.archive.file_id="13665"
sys.archive.storage_class="single-copy-backup"

The sys.archive.file_id is set when the archive request is processed successfully, I think?

Any thought on how to reconcile the state? I see in the documentation that EOS metadata is the master and CTA plays catchup, but I wonder how to fix the actual EOS metadata in this instance?

Cheers,

Denis

Hi Denis,

Sorry, I missed your reply so sorry for not getting back to you earlier.

The attribute sys.archive.file_id is set on the CREATE event, i.e. when you open the file for writing in EOS and before you have written the file contents. The file is archived to tape after the CLOSEW event, i.e. after you finish writing the file to disk.

In order to understand why some files are not having their metadata updated in EOS I would need to see the tape server logs showing the archive event and reporting events for one of the files with the missing metadata.

Cheers,

Michael

Hi Michael,

No trouble at all, and thank you for coming back to me! We haven’t seen this issue in a little while. It looks like a rebuild of the entire stack fixed the issue somehow, so this is no longer a concern to us.

You did mention tapeserver logs. While I can no longer reproduce the issue, the tapeserver logs are attached in my original post.

Warm Regards,

Denis Lujanski

Hi Denis,

The excerpt from the logs you provided shows that the file was archived and the migration report job was queued:

However, the log excerpt does not include the part which shows what happened to this reporting job. The report job was presumably popped from the reporting queue at some point, and failed. I would need to see that part of the logs to diagnose what happened.

Note that the report job may be popped by a different tape server to the one that wrote the file. The work of processing reports is distributed across all tape servers.

Best regards,

Michael

Hi Michael,

Thanks for that. It’s good to know the reporting is a separate job. I’ll be sure to look out for that, should this issue re-occur. So far it has not.

Thanks as always for your help!

Denis