Errors when restoring files

Hi there,

when restoring many files we notice errors such as:

Jun 13 10:18:26 tpm102 cta-taped: LVL="ERROR" PID="21708" TID="21708" MSG="In Agent::deleteAndUnregisterSelf: agent still owns objects. Here is a part of the list." SubprocessName="maintenanceHandler" agentAddress="DriveProcess-rz1-2,3,2-tpm102.desy.de-21706-20230613-08:44:24-0" gcAgentAddress="Maintenance-tpm102.desy.de-21708-20230613-08:44:24-0" agentObject="DriveProcess-rz1-2,3,2-tpm102.desy.de-21706-20230613-08:44:24-0" objects="RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35957 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35958 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35959 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35965 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35966 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35978 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35981 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35983 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35986 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35987 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35988 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35994 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-35997 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36000 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36001 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36004 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36008 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36010 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36014 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36015 RetrieveRequest-Frontend-cta-frontend.desy.de-2517-20230607-17:06:04-0-36020..." startIndex="1625" endIndex="1649" totalObjects="1666"

and then what follows is a long slew of stack trace messages like:

Jun 13 10:18:26 tpm102 cta-taped: LVL="INFO" PID="21708" TID="21708" MSG="Stack trace" SubprocessName="maintenanceHandler" traceFrameNumber="0" traceFrame="/lib64/libctacommon.so.0(cta::exception::Backtrace::Backtrace(bool)+0x69) [0x7f5de88e5ed9]"
Jun 13 10:18:26 tpm102 cta-taped: LVL="INFO" PID="21708" TID="21708" MSG="Stack trace" SubprocessName="maintenanceHandler" traceFrameNumber="1" traceFrame="/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::string const&, bool)+0x89) [0x7f5de88e773d]"
Jun 13 10:18:26 tpm102 cta-taped: LVL="INFO" PID="21708" TID="21708" MSG="Stack trace" SubprocessName="maintenanceHandler" traceFrameNumber="2" traceFrame="/lib64/libctaobjectstore.so.0(_ZN3cta11objectstore5Agent21AgentStillOwnsObjectsCI2NS_9exception9ExceptionEERKSsb+0x3e) [0x7f5deac23064]"
Jun 13 10:18:26 tpm102 cta-taped: LVL="INFO" PID="21708" TID="21708" MSG="Stack trace" SubprocessName="maintenanceHandler" traceFrameNumber="3" traceFrame="/lib64/libctaobjectstore.so.0(cta::objectstore::Agent::removeAndUnregisterSelf(cta::log::LogContext&)+0x67b) [0x7f5deac21b11]"
Jun 13 10:18:26 tpm102 cta-taped: LVL="INFO" PID="21708" TID="21708" MSG="Stack trace" SubprocessName="maintenanceHandler" traceFrameNumber="4" traceFrame="/lib64/libctaobjectstore.so.0(cta::objectstore::GarbageCollector::OwnedObjectSorter::lockFetchAndUpdateOtherObjects(cta::objectstore::Agent&, cta::objectstore::AgentReference&, cta::objectstore::Backend&, cta::catalogue::Catalogue&, cta::log::LogContext&)+0x354) [0x7f5dead7543e]"
Jun 13 10:18:26 tpm102 cta-taped: LVL="INFO" PID="21708" TID="21708" MSG="Stack trace" SubprocessName="maintenanceHandler" traceFrameNumber="5" traceFrame="/lib64/libctaobjectstore.so.0(cta::objectstore::GarbageCollector::cleanupDeadAgent(std::string const&, std::list<cta::log::Param, std::allocator<cta::log::Param> >, cta::log::LogContext&)+0x72c) [0x7f5dead6e942]"
...

tape is then unloaded and remounted and retries the process again.

Any clues as to what might be going on here?

thanks,
mwai

Hi @Mwai, can you give a ballpark estimate of how many files many are?
Also, could you please share which version of CTA this is occurring on, and how you are calling the cta-restore-deleted-files command?

Cheers,
Richard

hi @rbachman,

Sorry, by restoring I meant retrieving files from tape. So this is around 5K files. We are using CTA v5.8.7-1. We’ve sort of found the culprit. This happened after we hit the limit of the no. of open file descriptors. We’ve since increased this and hopefully our assumption is right.

Jun 13 10:16:21 tpm102 cta-taped: LVL="INFO" PID="21706" TID="21723" MSG="Stack trace" Message="In BackendVFS::lockHelper(): Failed to open file /cta-cache/ctaObjectStore/.DriveProcess-rz1-2,3,2-tpm102.desy.de-21706-20230613-08:44:24-0.lock: Too many open files" traceFrameNumber="1" traceFrame="/lib64/libctacommon.so.0(cta::exception::Exception::Exception(std::string const&, bool)+0x89) [0x7f5de88e773d]"

best,
mwai

1 Like