Hi,
We identified two files on EOS which look like they are neither on tape nor on disk (d0::t0)
/storaged_dls/dls/container/m02/nt33797-15/247088895' Flags: 0644
Size: 5164600362
Status: locations::uncommitted
Modify: Wed Jun 26 10:19:06 2024 Timestamp: 1719393546.624344000
Change: Wed Jun 26 10:18:33 2024 Timestamp: 1719393513.383299404
Access: Wed Jun 26 10:18:33 2024 Timestamp: 1719393513.377728758
Birth: Wed Jun 26 10:18:33 2024 Timestamp: 1719393513.377728249
CUid: 701 CGid: 701 Fxid: 0c27396b Fid: 203897195 Pid: 200031949 Pxid: 0bec3ecd
XStype: adler XS: 59 77 63 8c ETAGs: "54733236516945920:5977638c"
Layout: replica Stripes: 1 Blocksize: 4k LayoutId: 00100012 Redundancy: d0::t0
#Rep: 0
*******
By running the –json version eos fileinfo we see that
"status" : "locations::uncommitted",
These files are on tape. E.g. querying for the above disk fileid in the CTA DB, returned and archive file id which points to a file with a matching size, fid and checksum.
Can you please let us know how we can change EOS location (or whatever) so that it points to the CTA tapefile? I think it is the eos file commmand but I am not sure….
Thanks,
George
This is really strange as the file disk replica should not have been evicted on eos side if the tape replica is not registered before.
This would be interesting to understand under which circumstances the file replica was removed for this case. But that was almost 2 years ago…
If all metadata are matching you just need to add a replica on the tape fsid for this file:
eos file tag fid:203897195 +65535.
Best regards,
Julien Leduc
Many thanks for this Julien!
I am really not sure what happened here…as you said the files were created almost 2 years ago and we dont have logs going back that long ago.
Best,
George
Sorry, the commnad does work as expected but after a few mins the location appears to get lost again….! I mean, eos fileinfo shows again d0::t0 layout.
as if EOS…is reverting my change. I will check the MGM logs why this is happening
it was the EOS replica repaid thread (I activated it for another reason) that did this…and guess what else it did: it removed the 65535 tag from many other files which I have to re-add now 
Hi @george_patargias ,
The EOS team will have a look. Can you please remind me the EOS version you are running?
Thanks,
Cheers,
Cedric
Hi Cedric,
We are running EOS 5.3.23.
As I said, it turned out to be my fault…I should have not started the replica repair thread (as per mention in Change/repair checksum of disk replica - EOS Community) in the first place. This is an EOS feature developed only for the disk storage use case and apparently it should not be used in the tape storage case becase the thread is removing the virtual tape fsid tag (65535) as inconsistent.
For example
260414 10:47:31 time=1776160051.539705 func=RepairReplicaInconsistencies level=INFO logid=f4dbdc6e-37e6-11f1-a4b8-0c42a1f42af0 unit=mgm@antares-eos15.scd.rl.ac.uk:1094 tid=00007fbd927bc640 source=FsckEntry:912 tident=<service> sec= uid=0 gid=0 name= geo="" xt="" ob="" fxid=009ff5a1 fsid=65535
...
260414 10:47:31 time=1776160051.541555 func=DropReplica level=INFO logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@antares-eos15.scd.rl.ac.uk:1094 tid=00007fbd927bc640 source=DropReplica:43 tident=<single-exec> sec= uid=0 gid=0 name= geo="" xt="" ob="" msg="drop replica/stripe" fxid=009ff5a1 fsid=65535
260414 10:47:31 time=1776160051.541572 func=DropReplica level=ERROR logid=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx unit=mgm@antares-eos15.scd.rl.ac.uk:1094 tid=00007fbd927bc640 source=DropReplica:48 tident=<single-exec> sec= uid=0 gid=0 name= geo="" xt="" ob="" msg="failed to send unlink to FST" fxid=009ff5a1 fsid=65535
From then on, the thread kept failing with the replica repair (for this particular file)
260414 11:17:27 time=1776161847.239515 func=RepairReplicaInconsistencies level=ERROR logid=232f93c2-37eb-11f1-a4b8-0c42a1f42af0 unit=mgm@antares-eos15.scd.rl.ac.uk:1094 tid=00007fb748ff9640 source=FsckEntry:1082 tident=<service> sec= uid=0 gid=0 name= geo="" xt="" ob="" msg="replica inconsistency repair failed" fxid=009ff5a1
So, the thread removed the virtual fsid 65535 from >19,000 files…I added it back by running
eos file tag fxid:…+65535
in a loop.
Best,
George
Hi George,
Sure but I think it would be nice that we add something that prevents this from happening for tape fsid.
What do you think?
Hi Cedric,
Yes, I absolutely agree- thanks!
George
OK thanks a lot for the feedback. I’ve created an internal EOS issue so it is fixed.
Cheers,
Cedric