CTA aborts archive operations because of drive humidity

Hi again forum,

I noticed that when a drive reports high humidity levels, CTA will refuse to do any archiving operations.

When the drive tries to start an archive session, this first warning appears:

Jun 29 13:55:37.952137 ctatps002 cta-taped: LVL="WARN" PID="9045" TID="5277" MSG="Tape alert detected" thread="TapeWrite" tapeDrive="IBML9541" tapeVid="V03648" mountId="976" vo="backup" mediaType="LTO8" tapePool="dcache" logicalLibrary="cta" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="12000000000000" tapeAlert="Drive humidity" tapeAlertNumber="0" tapeAlertCount="1"

Then it throws this error, and aborts the session right away. The requests go back to the queue, and it keeps retrying indefinetly…

Jun 29 13:56:07.616020 ctatps002 cta-taped: LVL="ERROR" PID="9045" TID="5277" MSG="Tape thread complete for writing" thread="TapeWrite" tapeDrive="IBML9541" tapeVid="V03648" mountId="976" ErrorMessage="Aborting write session in presence of critical tape alerts" type="write" mountTime="0.000000" positionTime="0.000000" waitInstructionsTime="0.199517" checksumingTime="0.000000" readWriteTime="0.000000" waitDataTime="0.000000" waitReportingTime="15.037105" flushTime="0.000000" unloadTime="24.191397" unmountTime="5.237440" encryptionControlTime="0.000577" transferTime="15.236622" totalTime="44.698364" dataVolume="0" headerVolume="0" files="0" payloadTransferSpeedMBps="0.000000" driveTransferSpeedMBps="0.000000" status="error"

Interestingly, the behaviour is not the same with retrieval operations. The warning appears more than once in this case, but CTA goes on with the reading operation and the session finishes ok.

Jun 29 14:12:38.374887 ctatps002 cta-taped: LVL="WARN" PID="7376" TID="7812" MSG="Tape alert detected" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" vo="dteam" mediaType="LTO8" tapePool="dteam1" logicalLibrary="cta" mountType="Retrieve" labelFormat="0000" vendor="IBM" capacityInBytes="12000000000000" tapeAlert="Drive humidity" tapeAlertNumber="0" tapeAlertCount="1"
Jun 29 14:12:31.659096 ctatps002 cta-taped: LVL="INFO" PID="7376" TID="7812" MSG="Tape mounted for read-only access" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" vo="dteam" mediaType="LTO8" tapePool="dteam1" logicalLibrary="cta" mountType="Retrieve" labelFormat="0000" vendor="IBM" capacityInBytes="12000000000000" drive_Slot="smc0" MCMountTime="7.603698" mode="R"
Jun 29 14:12:38.374887 ctatps002 cta-taped: LVL="WARN" PID="7376" TID="7812" MSG="Tape alert detected" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" vo="dteam" mediaType="LTO8" tapePool="dteam1" logicalLibrary="cta" mountType="Retrieve" labelFormat="0000" vendor="IBM" capacityInBytes="12000000000000" tapeAlert="Drive humidity" tapeAlertNumber="0" tapeAlertCount="1"
Jun 29 14:12:38.375698 ctatps002 cta-taped: LVL="INFO" PID="7376" TID="7812" MSG="Tape mounted and drive ready" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" vo="dteam" mediaType="LTO8" tapePool="dteam1" logicalLibrary="cta" mountType="Retrieve" labelFormat="0000" vendor="IBM" capacityInBytes="12000000000000" mountTime="14.321185" tapeLoadTime="6.714782"
Jun 29 14:14:28.214627 ctatps002 cta-taped: LVL="INFO" PID="7376" TID="7812" MSG="Successfully positioned for reading" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" vo="dteam" mediaType="LTO8" tapePool="dteam1" logicalLibrary="cta" mountType="Retrieve" labelFormat="0000" vendor="IBM" capacityInBytes="12000000000000" fileId="202" BlockId="2990738" fSeq="74" dstURL="root://dc111.pic.es:45156/0000447E631F20154137BBF1BE79AD262F7F?oss.asize=10240000000" isRepack="0" isVerifyOnly="0"
Jun 29 14:15:11.643339 ctatps002 cta-taped: LVL="INFO" PID="7376" TID="7812" MSG="File successfully read from tape" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" vo="dteam" mediaType="LTO8" tapePool="dteam1" logicalLibrary="cta" mountType="Retrieve" labelFormat="0000" vendor="IBM" capacityInBytes="12000000000000" fileId="202" BlockId="2990738" fSeq="74" dstURL="root://dc111.pic.es:45156/0000447E631F20154137BBF1BE79AD262F7F?oss.asize=10240000000" isRepack="0" isVerifyOnly="0" positionTime="104.113019" readWriteTime="43.382487" waitFreeMemoryTime="0.008721" waitReportingTime="0.037012" transferTime="43.428220" totalTime="147.541167" dataVolume="10240000000" headerVolume="480" driveTransferSpeedMBps="69.404361" payloadTransferSpeedMBps="69.404358" LBPMode="LBP_On" repackFilesCount="0" repackBytesCount="0" userFilesCount="1" userBytesCount="10240000000" verifiedFilesCount="0" verifiedBytesCount="0" checksumType="ADLER32" checksumValue="c5530001"
Jun 29 14:15:11.645601 ctatps002 cta-taped: LVL="WARN" PID="7376" TID="7812" MSG="Tape alert detected" thread="TapeRead" tapeDrive="IBML9541" tapeVid="V03646" mountId="979" tapeAlert="Drive humidity" tapeAlertNumber="0" tapeAlertCount="1"
...

It is true that we’ve been having quite an issue with high humidity these weeks (trust me, the drives aren’t the only ones suffering from it :sweat_smile:) but CTA was archiving succesfully until a few days ago. As far as I know we’re still inside the operational levels. Is CTA triggered by a certain %? I’ve also tried with a few drives just in case, but same result.

However, I am surprised to know that CTA is refusing to write to tapes… while still allowing read operations. Is there an explanation to this?

Also, is there a way to force CTA to go on with the archiving process given this situation? I am using version 5.8.7-1 by the way.

Thank you very much! :slight_smile:
Eli

Dear Eli,

we feel your pain as we also suffer from these issues when humidity in one room is higher than it should be.

CTA is refusing to proceed with writing to tape because all conditions are not met (= there is a tape alert). It is to protect the data = make sure that the data is written in a recommended conditions such that it is readable afterwards.

This is not the same for reading, CTA reports the tape alert, but contines anyway with a good hope that the data can be read.

As I said above, we discussed this in the past - whether we change the behavior or not. In the end we decided to leave it as it is implemented now because our main priority is to make sure the data on tape is safe.

This primarily affects LTO-9 and should go away with LTO-10.

As to what are the values, one thing is what you think, another thing is what the sensors within the drive are reporting. We have a small script (will be made officially public at some point) which I can share with you, it should give you the values as seen by the drive.

Lastly, there is no way to force CTA to write. You only have two options:

  • find a drive that doesn’t see high humidity (usually drives at the bottom of a library are a bit colder than those at the top = bottom ones see more humidity, than the top ones)
  • fix the humidity problems in your room (which is what we are working on - for example increasing room temperature from 22 to 23 Celsius)

Let me know if you need more details. Best regards,

Vladimir

1 Like

Thank you very much again. We are currently gathering the humidity level for each drive from the IBM API, nonetheless I’d be glad if you could share your script too. I’ll send you my address.

Cheers,
Eli