Cta-admin dr up drive fails

Hi

Having problems setting a drive up although tape slot is empty:

Feb  1 12:57:35 tpm103 cta-taped: LVL="INFO" PID="28682" TID="28682" MSG="In Scheduler::setDesiredDriveState(): success." thread="MainThread" tapeDrive="6,2,4" drive="6,2,4" up="down" force="no" reason="[cta-taped] ERROR A tape was detected in the drive. Putting the drive down." comment="" schedulerDbTime="0.001155"
Feb  1 12:57:35 tpm103 cta-taped: LVL="ERROR" PID="28682" TID="28682" MSG="A tape was detected in the drive. Putting the drive down." thread="MainThread" tapeDrive="6,2,4"

concerned drive:

[karimimw@tpm103 ~]# cta-smc -q D -D 1
Drive Ordinal	Element Addr.	  Status     Vid
            1	          258	    free

Have you come across this and how can we go about it?

Thanks,
Mwai

Hello Mwai,

in situations like this mt -f /dev/nst0 stat is your friend.

Please have a look whether there is a stuck tape in the drive.

Example of empty drive:

[root@tpsrv045 ~]# mt -f /dev/nst0 stat
SCSI 2 tape drive:
File number=-1, block number=-1, partition=0.
Tape block size 0 bytes. Density code 0x0 (default).
Soft error count since last status=0
General status bits on (58000):
 DR_OPEN IM_REP_EN CLN

If you do not see DR_OPEN then there is high probability that the drive is not free.

In vary rare occasion, the library will tell you that the slot is free, yet the drive sees something. In that case, please check the library GUI. There might be a cleaning cartridge inside (if the previous request had a problem or the tape was mounted for a very long time).

If in doubt, reset the drive, it should unload any cartridge it has inside.

Hope this helps.

Vladimir Bahyl

1 Like

Hi Vladimir,

As a follow up, there are situations where a tape is mounted but nothing takes place because rmcd complains that the same vol is in use

thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48756" ErrorMessage="Failed to mount tape for read/write access: vid=J01397 slot=smc1: Failed to mount tape in SCSI tape-library for read/write access: vid=J01397 librarySlot=smc1: Received error from rmcd: rmcRc=2204 rmcErrorStream=smc_mount: Asked for J01397, got reply for J01397  smc_mount: Location: data transfer element (0x4)  smc_mount: SR018 - mount of J01397 on drive 1 failed : volume in use"

this then becomes a mount/dismount/cleanup loop

 MSG="WARNING: In DriveHandler::processUnmounting(): unexpected previous state/type." tapeDrive="6,2,4" PreviousState="Mounting" PreviousType="Archive" NewState="Unmounting" NewType="Archive"

and eventually leads to

thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" ErrorMessage="Failed to test unit ready after waiting 60 seconds: SCSI error in testUnitReady: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present"

then the loop continues till the drive is down. Strangely this seems to happen on a single drive. Any recommendations?

Thank you.

Mwai

Mwai,

could you please expand on “this then becomes a mount/dismount/cleanup loop” Can you please provide more logs?

Also, why is the tape stuck somewhere else in the first place?

We have various monitoring that detects when a drive is put DOWN by cta-taped and if there is a tape stuck there, that tape is DISABLED so it is not requested elsewhere. We could share that piece of Perl code if necessary (but it is highly CERN specific as it also opens a ticket to handle this situation).

Vladimir

By the loop, I meant that the tape will be on mount status then get dismounted then mounted again and will be on the ‘Cleanup’ state then repeat.

LVL="DEBUG" PID="3708" TID="3708" MSG="WARNING: In DriveHandler::processMounting(): unexpected previous state/type." tapeDrive="6,2,4" PreviousState="PendingFork" PreviousType="Undetermined" NewState="Mounting" NewType="Archive"
LVL="DEBUG" PID="3708" TID="3708" MSG="Computed new timeout" tapeDrive="6,2,4" PreviousState="PendingFork" PreviousType="Undetermined" NewState="Mounting" NewType="Archive" TimeoutType="StateChange" LastStateChangeTime="2680121" LastHeartBeatTime="-9223372036" LastDataMovementTime="-9223372036" Now="2680121" Timeout="2680721"
LVL="DEBUG" PID="3708" TID="3708" MSG="WARNING: In DriveHandler::processUnmounting(): unexpected previous state/type." tapeDrive="6,2,4" PreviousState="Mounting" PreviousType="Archive" NewState="Unmounting" NewType="Archive"
LVL="INFO" PID="3710" TID="11413" MSG="Opened disk file for read" thread="DiskRead" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" threadID="9" path="root://131.169.80.62:1094/0000C76BA411EE234CA5BECFB75C01297C8E" actualURL="root://131.169.80.62:1094/0000C76BA411EE234CA5BECFB75C01297C8E" fileId="6029"
LVL="INFO" PID="3710" TID="11402" MSG="Opened disk file for read" thread="DiskRead" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" threadID="0" path="root://131.169.80.62:1094/000061D09267BEB047269A4DE12C0AE0E977" actualURL="root://131.169.80.62:1094/000061D09267BEB047269A4DE12C0AE0E977" fileId="6030"
LVL="INFO" PID="3710" TID="11415" MSG="Tape mounted for read/write access" thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" vo="desy" mediaType="3592JE20T" tapePool="test-01" logicalLibrary="ctajaglib" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="20000000000000" drive_Slot="smc1" MCMountTime="3.852839" mode="RW"
LVL="DEBUG" PID="3710" TID="11414" MSG="going to report" thread="Watchdog" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755"
LVL="DEBUG" PID="3708" TID="3708" MSG="Computed new timeout" tapeDrive="6,2,4" TimeoutType="StateChange" LastStateChangeTime="2680121" LastHeartBeatTime="2680136" LastDataMovementTime="-9223372036" Now="2680136" Timeout="2680721"
LVL="DEBUG" PID="3710" TID="11414" MSG="going to report" thread="Watchdog" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755"
LVL="DEBUG" PID="3708" TID="3708" MSG="Computed new timeout" tapeDrive="6,2,4" TimeoutType="StateChange" LastStateChangeTime="2680121" LastHeartBeatTime="2680151" LastDataMovementTime="-9223372036" Now="2680151" Timeout="2680721"
LVL="INFO" PID="3710" TID="11403" MSG="File successfully read from disk" thread="DiskRead" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" threadID="1" path="root://131.169.80.62:1094/00006B6CE69B2B8040C080FC934F29973BCC" actualURL="root://131.169.80.62:1094/00006B6CE69B2B8040C080FC934F29973BCC" fileId="6021" readWriteTime="33.041737" checksumingTime="0.000000" waitFreeMemoryTime="0.013510" waitDataTime="0.000000" waitReportingTime="0.000000" checkingErrorTime="0.001958" openingTime="0.003474" transferTime="33.060682" totalTime="33.060682" dataVolume="22548578304" globalPayloadTransferSpeedMBps="682.036091" diskPerformanceMBps="682.036091" openRWCloseToTransferTimeRatio="0.999532"
LVL="INFO" PID="3710" TID="11403" MSG="Finishing of DiskReadWorkerThread" thread="DiskRead" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" threadID="1" threadReadWriteTime="33.041737" threadWaitFreeMemoryTime="0.013510" threadCheckingErrorTime="0.001958" threadOpeningTime="0.003474" threadTransferTime="33.060682" threadTotalTime="33.061327" threadDataVolume="22548578304" threadFileCount="1" threadGlobalPayloadTransferSpeedMBps="682.022785" threadAverageDiskPerformanceMBps="682.036091" threadOpenRWCloseToTransferTimeRatio="0.999532"
LVL="DEBUG" PID="3710" TID="11403" MSG="Will not signal the end to task injector yet" thread="DiskRead" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" threadID="1" remainingThreads="9"
LVL="DEBUG" PID="3710" TID="11414" MSG="going to report" thread="Watchdog" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755"
LVL="DEBUG" PID="3708" TID="3708" MSG="Computed new timeout" tapeDrive="6,2,4" TimeoutType="StateChange" LastStateChangeTime="2680121" LastHeartBeatTime="2680166" LastDataMovementTime="-9223372036" Now="2680166" Timeout="2680721"
LVL="DEBUG" PID="3710" TID="11414" MSG="going to report" thread="Watchdog" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755"
LVL="DEBUG" PID="3708" TID="3708" MSG="Computed new timeout" tapeDrive="6,2,4" TimeoutType="StateChange" LastStateChangeTime="2680121" LastHeartBeatTime="2680181" LastDataMovementTime="-9223372036" Now="2680181" Timeout="2680721"
LVL="ERROR" PID="3710" TID="11415" MSG="Got timeout or error while waiting for drive to be ready." thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" vo="desy" mediaType="3592JE20T" tapePool="test-01" logicalLibrary="ctajaglib" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="20000000000000" exceptionMessage="Failed to test unit ready after waiting 60 seconds: SCSI error in testUnitReady: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present" configuredTapeLoadTimeout="60" tapeLoadTime="60.052331"
LVL="DEBUG" PID="3710" TID="11415" MSG="In MigrationReportPacker::reportDriveStatus(), pushing a report." thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" type="ReportDriveStatus" Status="CleanUp"
LVL="DEBUG" PID="3710" TID="11417" MSG="In MigrationReportPacker::WorkerThread::run(): Got a new report." thread="ReportPacker" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" typeId="castor::tape::tapeserver::daemon::MigrationReportPacker::ReportDriveStatus"
LVL="DEBUG" PID="3710" TID="11417" MSG="In MigrationReportPacker::ReportDriveStatus::execute(): reporting drive status." thread="MainThread" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" status="CleanUp"

longer version of the same here

in this particular case, only this drive was up so the tape wasn’t requested anywhere else.

02/04 11:04:06 35855 rmc_srv_mount: RMC92 - mount request by 1000,33 from localhost
02/04 11:04:06 35855 rmc_srv_mount: RMC98 - mount J01397/0 on drive 1
02/04 11:04:10 35855 rmc_srv_mount: returns 0
02/04 11:06:23 35855 rmc_srv_mount: RMC92 - mount request by 1000,33 from localhost
02/04 11:06:23 35855 rmc_srv_mount: RMC98 - mount J01397/0 on drive 1
02/04 11:06:24 35855 rmc_sendrep: smc_mount: Asked for J01397, got reply for J01397
02/04 11:06:24 35855 rmc_sendrep: smc_mount: Location: data transfer element (0x4)
02/04 11:06:24 35855 rmc_sendrep: smc_mount: SR018 - mount of J01397 on drive 1 failed : volume in use
02/04 11:06:24 35855 rmc_srv_mount: returns 2204

mwai

Mwai,

thank you for the detailed logs. Lets concentrate on these 2 lines:

This line says that the tape has been mounted:

Feb  4 11:04:10 tpm103 cta-taped: LVL="INFO" PID="3710" TID="11415" MSG="Tape mounted for read/write access" thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" vo="desy" mediaType="3592JE20T" tapePool="test-01" logicalLibrary="ctajaglib" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="20000000000000" drive_Slot="smc1" MCMountTime="3.852839" mode="RW"

but a minute later, you have a timeout - tape is not in the drive:

Feb  4 11:05:10 tpm103 cta-taped: LVL="ERROR" PID="3710" TID="11415" MSG="Got timeout or error while waiting for drive to be ready." thread="TapeWrite" tapeDrive="6,2,4" tapeVid="J01397" mountId="48755" vo="desy" mediaType="3592JE20T" tapePool="test-01" logicalLibrary="ctajaglib" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="20000000000000" exceptionMessage="Failed to test unit ready after waiting 60 seconds: SCSI error in testUnitReady: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present" configuredTapeLoadTimeout="60" tapeLoadTime="60.052331"

Did you check using the library GUI in which drive the tape J01397 is actually mounted?

What was the output of cta-smc -q D in that moment?

Cheers,

Vladimir

Wasn’t quick to the GUI but output of cta-smc -q D for both tapes (1387 & 1397) was the same

[root@tpm103 ~]# cta-smc -q D
Drive Ordinal	Element Addr.	  Status     Vid
            0	          257	    free
            1	          258	  loaded  J01397
            2	          259	    free
            3	          260	    free

thanks

And mt -f /dev/nst0 status command executed on tpm103 sees that tape as well?

Basically do this:

1. make sure the drive is empty
2. cta-smc -m -D 1 -V J01397
3. mt -f /dev/nst0 status
4. dd if=/dev/nst0 bs=80 count=3

/dev/nst0 status

[fh-it@tpm103 ~]# mt -f /dev/nst0 status
SCSI 2 tape drive:
File number=0, block number=0, partition=0.
Tape block size 0 bytes. Density code 0x57 (no translation).
Soft error count since last status=0
General status bits on (41010000):
 BOT ONLINE IM_REP_EN

Couldn’t work with bs=80

[fh-it@tpm103 ~]# dd if=/dev/nst0 bs=256 count=3
HDR1983              J0139700010001000100022019022019 000000CASTOR 2.1.15       BLdd: warning: partial read (84 bytes); suggest iflag=fullblock
HDR2F0000000000                   P               00                            ��kmUHL1000000000100002621440000262144DESY    TPM103    IBM     0359260F00000783831A�T0+3 records in
0+3 records out
252 bytes (252 B) copied, 0.00116865 s, 216 kB/s

thanks!

Strange, I have no immediate suggestions. The tape can be mounted in that drive and the drive can read it.

Did this ever work? Is this your first attempt to make your cta-taped work?

Try to dismount it, (mt -f /dev/nst0 eject ; cta-smc -d -D 1), make sure that all drives are free (cta-smc -q D) and then re-try again the CTA command.

If you start again with whole library being empty and you get again volume in use in the RMCD logs, then next option for me would be to connect to your system to debug?

Vladimir

Hi Vlado,

Actually we’ve been running cta-taped for a while now and so far this behaviour has only been observed on 2 drives.

Feb  7 15:33:48 tpm103 cta-taped: LVL="DEBUG" PID="31382" TID="21075" MSG="In MigrationReportPacker::reportEndOfSessionWithErrors(), pushing a report." thread="TapeWrite" tapeDrive="6,2,3" tapeVid="J01387" mountId="7" ErrorMessage="Failed to test unit ready after waiting 60 seconds: SCSI error in testUnitReady: status=0x2 host_status=0 driver_status=0x8: SCSI command failed with status CHECK CONDITION: Sense Information: Not Ready: Medium not present"
Feb  7 15:35:02 tpm103 cta-taped: LVL="INFO" PID="21512" TID="21571" MSG="Tape thread complete" thread="TapeWrite" tapeDrive="6,2,3" tapeVid="J01387" mountId="8" ErrorMessage="Failed to mount tape for read/write access: vid=J01387 slot=smc0: Failed to mount tape in SCSI tape-library for read/write access: vid=J01387 librarySlot=smc0: Received error from rmcd: rmcRc=2204 rmcErrorStream=smc_mount: Asked for J01387, got reply for J01387  smc_mount: Location: data transfer element (0x4)  smc_mount: SR018 - mount of J01387 on drive 0 failed : volume in use"

perhaps it’s an issue w/ the drives themselves…
I will get back on the access to the host.

cheers,
mwai

Hi Vlado,

It seems to be a herculean task to get you access to the node. Would you be okay with a quick zoom session?

best,
mwai

Mwai,

can you please contact me on my office e-mail and we then agree on the meeting date / time?

Thanks.

Vladimir

To conclude this thread, following a video call, this issue has now been resolved.

Vladimir