Segfaults are slow performance

Hello Friends,

Sorry in advance for lack of coherent problem description here, we are in early stages of investigating some issues we are having in our CTA deployment.

The issues started rather suddenly, and we cannot see any changes we made that may be causing these issues.

Anyway, the symptoms we are experiencing are two fold: poor eoscta fst performance and cta frontend + tapeserver segfaults.

With segfaults, it tends to happen randomly with nothing particularly interesting happening in the system. The segfaults look like this:

#server 1:
Jan  9 02:16:23 crlt-v3 kernel: xrootd[248834]: segfault at 7fc9fb400000 ip 00007fca1ff9b93f sp 00007fc9d8bbfa38 error 4 in libc-2.17.so[7fca1fe4e000+1c4000]
Jan 26 01:58:12 crlt-v3 kernel: cta-tpd-maint[231025]: segfault at 0 ip 00007f7b986a553b sp 00007f7af87f1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 01:58:13 crlt-v3 kernel: cta-tpd-maint[232740]: segfault at 0 ip 00007f119cd3a53b sp 00007f1100bf1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 01:58:15 crlt-v3 kernel: cta-tpd-maint[236329]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d5bf1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 02:35:43 crlt-v3 kernel: cta-tpd-maint[237041]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d5ff1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 02:48:42 crlt-v3 kernel: cta-tpd-maint[254054]: segfault at 0 ip 00007f119cd3a53b sp 00007f10fd3f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 02:49:03 crlt-v3 kernel: cta-tpd-DR1[225561]: segfault at 0 ip 00007f119cd3a53b sp 00007f10fcbf1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 03:14:53 crlt-v3 kernel: cta-tpd-DR0[117978]: segfault at 0 ip 00007f7b986a553b sp 00007f7afe7f1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 03:17:24 crlt-v3 kernel: cta-tpd-maint[164551]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d5bf1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 03:20:37 crlt-v3 kernel: cta-tpd-DR0[188431]: segfault at 0 ip 00007f7b986a553b sp 00007f7af8ff1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 03:26:52 crlt-v3 kernel: cta-tpd-maint[86470]: segfault at 0 ip 00007f119cd3a53b sp 00007f1100ff1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 03:33:23 crlt-v3 kernel: cta-tpd-maint[104569]: segfault at 0 ip 00007f119cd3a53b sp 00007f1101ff1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 03:37:26 crlt-v3 kernel: cta-tpd-DR2[123636]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d93f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 03:42:27 crlt-v3 kernel: cta-tpd-maint[210534]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d47f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 03:44:46 crlt-v3 kernel: cta-tpd-maint[217584]: segfault at 0 ip 00007f119cd3a53b sp 00007f10ff7f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 03:47:32 crlt-v3 kernel: cta-tpd-DR2[5473]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d8bf1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:00:38 crlt-v3 kernel: cta-tpd-maint[94533]: segfault at 0 ip 00007f119cd3a53b sp 00007f11013f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:00:45 crlt-v3 kernel: cta-tpd-DR1[233546]: segfault at 0 ip 00007f119cd3a53b sp 00007f10ffff12d0 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:03:17 crlt-v3 kernel: cta-tpd-DR2[225291]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d77f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:06:51 crlt-v3 kernel: cta-tpd-DR1[120425]: segfault at 0 ip 00007f119cd3a53b sp 00007f10fc7f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:08:33 crlt-v3 kernel: cta-tpd-maint[235093]: segfault at 0 ip 00007f7b986a553b sp 00007f7afbff1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 04:11:21 crlt-v3 kernel: cta-tpd-DR2[136681]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d93f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:11:46 crlt-v3 kernel: cta-tpd-maint[98180]: segfault at 0 ip 00007f119cd3a53b sp 00007f11033f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:15:05 crlt-v3 kernel: cta-tpd-maint[7409]: segfault at 0 ip 00007f119cd3a53b sp 00007f10ffbf1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:15:57 crlt-v3 kernel: cta-tpd-DR1[207780]: segfault at 0 ip 00007f119cd3a53b sp 00007f10ffff1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:17:35 crlt-v3 kernel: cta-tpd-maint[218566]: segfault at 0 ip 00007f7b986a553b sp 00007f7af9bf1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 04:17:48 crlt-v3 kernel: cta-tpd-maint[81494]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d5bf1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:21:24 crlt-v3 kernel: cta-tpd-DR1[65581]: segfault at 0 ip 00007f119cd3a53b sp 00007f10fe3f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:26:12 crlt-v3 kernel: cta-tpd-DR2[111597]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d6bf1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:35:26 crlt-v3 kernel: cta-tpd-maint[102349]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d83f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:38:47 crlt-v3 kernel: cta-tpd-maint[86368]: segfault at 0 ip 00007f7b986a553b sp 00007f7afb3f1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 04:43:25 crlt-v3 kernel: cta-tpd-DR2[5071]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d73f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 04:47:45 crlt-v3 kernel: cta-tpd-maint[171835]: segfault at 0 ip 00007f7b986a553b sp 00007f7afbff1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 04:54:50 crlt-v3 kernel: cta-tpd-DR0[212422]: segfault at 0 ip 00007f7b986a553b sp 00007f7afb3f1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 04:58:37 crlt-v3 kernel: cta-tpd-maint[55892]: segfault at 0 ip 00007f119cd3a53b sp 00007f10feff1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 04:59:30 crlt-v3 kernel: cta-tpd-DR2[236495]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d77f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 05:12:19 crlt-v3 kernel: cta-tpd-maint[40695]: segfault at 0 ip 00007f7b986a553b sp 00007f7afbff1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 05:29:16 crlt-v3 kernel: cta-tpd-maint[117361]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d77f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 05:31:05 crlt-v3 kernel: cta-tpd-maint[200317]: segfault at 0 ip 00007f7b986a553b sp 00007f7afbbf1630 error 6 in libctacommon.so.0.1.0[7f7b9856b000+196000]
Jan 26 05:31:10 crlt-v3 kernel: cta-tpd-maint[219413]: segfault at 0 ip 00007f119cd3a53b sp 00007f10ffbf1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 26 05:36:20 crlt-v3 kernel: cta-tpd-maint[257062]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d7ff1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 26 05:40:56 crlt-v3 kernel: cta-tpd-DR1[55600]: segfault at 0 ip 00007f119cd3a53b sp 00007f11037f1630 error 6 in libctacommon.so.0.1.0[7f119cc00000+196000]
Jan 27 01:20:37 crlt-v3 kernel: cta-tpd-maint[107586]: segfault at 0 ip 00007fe673f9053b sp 00007fe5d83f1630 error 6 in libctacommon.so.0.1.0[7fe673e56000+196000]
Jan 27 08:14:23 crlt-v3 kernel: xrootd[251027]: segfault at 10 ip 00007f147f403491 sp 00007f1476bfd630 error 4 in libXrdCl.so.2.0.0[7f147f386000+120000]
Jan 28 02:00:00 crlt-v3 kernel: cta-tpd-maint[259308]: segfault at 0 ip 00007f6b216f653b sp 00007f6a83bf1630 error 6 in libctacommon.so.0.1.0[7f6b215bc000+196000]
Jan 28 02:38:21 crlt-v3 kernel: cta-tpd-maint[256587]: segfault at 0 ip 00007fbe24f8753b sp 00007fbd89bf1630 error 6 in libctacommon.so.0.1.0[7fbe24e4d000+196000]
Jan 28 03:35:41 crlt-v3 kernel: cta-tpd-maint[47941]: segfault at 0 ip 00007f6b216f653b sp 00007f6a85bf1630 error 6 in libctacommon.so.0.1.0[7f6b215bc000+196000]
Feb  1 01:59:34 crlt-v3 kernel: cta-tpd-maint[261062]: segfault at 0 ip 00007fbca7cb853b sp 00007fbc09ff1630 error 6 in libctacommon.so.0.1.0[7fbca7b7e000+196000]
Feb  1 01:59:34 crlt-v3 kernel: cta-tpd-maint[182998]: segfault at 0 ip 00007fbe24f8753b sp 00007fbd87bf1630 error 6 in libctacommon.so.0.1.0[7fbe24e4d000+196000]
Feb  1 02:10:20 crlt-v3 kernel: eos[201201]: segfault at 10 ip 00007fade47d4d00 sp 00007fff698e14c8 error 4
Feb  1 03:07:59 crlt-v3 kernel: xrootd[42728]: segfault at fffffffffffffff0 ip 00007efef0ff6356 sp 00007efec97fb520 error 5
Feb  1 03:07:59 crlt-v3 kernel: xrootd[42729]: segfault at fffffffffffffff7 ip 00007efef0ff6356 sp 00007efeca7fd520 error 5 in libstdc++.so.6.0.19[7efef0f99000+e9000]
Feb  1 03:11:44 crlt-v3 kernel: eos[42309]: segfault at 10 ip 00007f487d1f0d00 sp 00007fff4caba988 error 4 in libpthread-2.17.so[7f487d1e7000+17000]
Feb  1 03:51:49 crlt-v3 kernel: xrootd[92361]: segfault at fffffffffffffff0 ip 00007f3c167bd356 sp 00007f3bef3fb520 error 5 in libstdc++.so.6.0.19[7f3c16760000+e9000]
Feb  1 03:51:49 crlt-v3 kernel: xrootd[92370]: segfault at fffffffffffffff0 ip 00007f3c167bd356 sp 00007f3befbfc520 error 5
Feb  1 04:08:49 crlt-v3 kernel: cta-tpd-maint[196329]: segfault at 0 ip 00007fbc9f70b53b sp 00007fbc03bf1630 error 6 in libctacommon.so.0.1.0[7fbc9f5d1000+196000]
Feb  1 04:37:32 crlt-v3 kernel: cta-tpd-maint[138882]: segfault at 0 ip 00007fbc9f70b53b sp 00007fbc027f1630 error 6 in libctacommon.so.0.1.0[7fbc9f5d1000+196000]
Feb  1 04:43:12 crlt-v3 kernel: cta-tpd-maint[172672]: segfault at 0 ip 00007fbc9f70b53b sp 00007fbc05bf1630 error 6 in libctacommon.so.0.1.0[7fbc9f5d1000+196000]
Feb  1 04:53:46 crlt-v3 kernel: cta-tpd-maint[260073]: segfault at 0 ip 00007fbc9f70b53b sp 00007fbc01bf1630 error 6 in libctacommon.so.0.1.0[7fbc9f5d1000+196000]
Feb  2 02:53:51 crlt-v3 kernel: xrootd[159499]: segfault at 7f8169400000 ip 00007f81925e9943 sp 00007f7f64bff8f8 error 4 in libc-2.17.so[7f819249c000+1c4000]


#server 2:
Dec 12 23:25:33 crlt-v4 kernel: cta-admin[227474]: segfault at 8 ip 00007f4f60c1c14b sp 00007f4f59f1c938 error 6 in libpthread-2.17.so[7f4f60c11000+17000]
Dec 21 08:58:02 crlt-v4 kernel: cta-admin[230964]: segfault at 0 ip 00007fbe346c03b1 sp 00007fbe2dad78d0 error 4 in libXrdCl.so.2.0.0[7fbe34686000+120000]
Jan 12 17:44:02 crlt-v4 kernel: cta-admin[92742]: segfault at 8 ip 00007fcdecbac14b sp 00007fcde56ab938 error 6 in libpthread-2.17.so[7fcdecba1000+17000]
Jan 26 01:58:05 crlt-v4 kernel: cta-tpd-maint[72203]: segfault at 0 ip 00007fdcdc1c453b sp 00007fdc3f7f1630 error 6 in libctacommon.so.0.1.0[7fdcdc08a000+196000]
Jan 26 01:58:12 crlt-v4 kernel: cta-tpd-maint[205395]: segfault at 0 ip 00007f77b8d6953b sp 00007f7719ff1630 error 6 in libctacommon.so.0.1.0[7f77b8c2f000+196000]
Jan 26 02:15:51 crlt-v4 kernel: xrootd[217461]: segfault at 0 ip 00007f12ab3f653b sp 00007f1213ffec30 error 6 in libctacommon.so.0.1.0[7f12ab2bc000+196000]
Jan 26 02:30:38 crlt-v4 kernel: cta-tpd-maint[43302]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac53f1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Jan 26 03:25:41 crlt-v4 kernel: cta-tpd-DR3[88104]: segfault at 0 ip 00007f15f05f253b sp 00007f1551bf1630 error 6 in libctacommon.so.0.1.0[7f15f04b8000+196000]
Jan 26 03:42:17 crlt-v4 kernel: cta-tpd-maint[72374]: segfault at 0 ip 00007f15f05f253b sp 00007f15527f1630 error 6 in libctacommon.so.0.1.0[7f15f04b8000+196000]
Jan 26 03:43:33 crlt-v4 kernel: cta-tpd-DR4[219272]: segfault at 0 ip 00007f77b8d6953b sp 00007f77183f1630 error 6 in libctacommon.so.0.1.0[7f77b8c2f000+196000]
Jan 26 03:44:47 crlt-v4 kernel: cta-tpd-maint[34152]: segfault at 0 ip 00007f15f05f253b sp 00007f15547f1630 error 6 in libctacommon.so.0.1.0[7f15f04b8000+196000]
Jan 26 03:50:55 crlt-v4 kernel: cta-tpd-maint[80054]: segfault at 0 ip 00007f15f05f253b sp 00007f15537f1630 error 6 in libctacommon.so.0.1.0[7f15f04b8000+196000]
Jan 26 04:14:53 crlt-v4 kernel: cta-tpd-maint[140534]: segfault at 0 ip 00007f77b8d6953b sp 00007f77187f1630 error 6 in libctacommon.so.0.1.0[7f77b8c2f000+196000]
Jan 26 04:20:22 crlt-v4 kernel: cta-tpd-maint[156490]: segfault at 0 ip 00007f77b8d6953b sp 00007f77197f1630 error 6 in libctacommon.so.0.1.0[7f77b8c2f000+196000]
Jan 26 04:30:41 crlt-v4 kernel: cta-tpd-DR3[139619]: segfault at 0 ip 00007f15f05f253b sp 00007f15533f1630 error 6 in libctacommon.so.0.1.0[7f15f04b8000+196000]
Jan 26 04:46:35 crlt-v4 kernel: cta-tpd-DR5[138318]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac57f1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Jan 26 05:10:37 crlt-v4 kernel: cta-tpd-DR10[204220]: segfault at 0 ip 00007fdcdc1c453b sp 00007fdc3d3f1630 error 6 in libctacommon.so.0.1.0[7fdcdc08a000+196000]
Jan 26 05:11:08 crlt-v4 kernel: cta-tpd-maint[36542]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac3ff1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Jan 26 05:34:30 crlt-v4 kernel: cta-tpd-maint[218571]: segfault at 0 ip 00007f15f05f253b sp 00007f1550bf1630 error 6 in libctacommon.so.0.1.0[7f15f04b8000+196000]
Jan 26 05:42:22 crlt-v4 kernel: cta-tpd-maint[199908]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac8ff1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Jan 26 05:44:51 crlt-v4 kernel: cta-tpd-DR5[41977]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac6bf1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Jan 27 01:20:37 crlt-v4 kernel: xrootd[19238]: segfault at 0 ip 00007fadf5bf653b sp 00007fad613fec30 error 6 in libctacommon.so.0.1.0[7fadf5abc000+196000]
Jan 27 01:21:07 crlt-v4 kernel: cta-tpd-maint[207786]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac8bf1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Jan 28 02:21:48 crlt-v4 kernel: cta-tpd-maint[137607]: segfault at 0 ip 00007fdcdc1c453b sp 00007fdc3fbf1630 error 6 in libctacommon.so.0.1.0[7fdcdc08a000+196000]
Jan 29 01:43:34 crlt-v4 kernel: xrootd[178888]: segfault at 0 ip 00007f6f84ef553b sp 00007f6eee7fec30 error 6 in libctacommon.so.0.1.0[7f6f84dbb000+196000]
Feb  1 01:59:06 crlt-v4 kernel: cta-tpd-maint[123928]: segfault at 0 ip 00007f4b64bec53b sp 00007f4ac5ff1630 error 6 in libctacommon.so.0.1.0[7f4b64ab2000+196000]
Feb  1 01:59:06 crlt-v4 kernel: cta-tpd-maint[29122]: segfault at 0 ip 00007fdcdc1c453b sp 00007fdc3eff1630 error 6 in libctacommon.so.0.1.0[7fdcdc08a000+196000]
Feb  1 02:00:48 crlt-v4 kernel: xrootd[192858]: segfault at 0 ip 00007fe9093f653b sp 00007fe8717fec30 error 6 in libctacommon.so.0.1.0[7fe9092bc000+196000]
Feb  1 03:03:14 crlt-v4 kernel: xrootd[161418]: segfault at 0 ip 00007fa0807f653b sp 00007f9fe93fec30 error 6 in libctacommon.so.0.1.0[7fa0806bc000+196000]
Feb  1 03:12:45 crlt-v4 kernel: xrootd[94080]: segfault at 0 ip 00007fb028ff653b sp 00007faf92ffec30 error 6 in libctacommon.so.0.1.0[7fb028ebc000+196000]
Feb  1 03:21:15 crlt-v4 kernel: xrootd[70091]: segfault at 0 ip 00007f2bc7ff653b sp 00007f2b30bfec30 error 6 in libctacommon.so.0.1.0[7f2bc7ebc000+196000]
Feb  1 03:29:35 crlt-v4 kernel: xrootd[40465]: segfault at 0 ip 00007f9035ff653b sp 00007f8f9fbfec30 error 6 in libctacommon.so.0.1.0[7f9035ebc000+196000]
Feb  1 08:55:11 crlt-v4 kernel: cta-tpd-maint[101635]: segfault at 0 ip 00007fdcdc1c453b sp 00007fdc3e7f1630 error 6 in libctacommon.so.0.1.0[7fdcdc08a000+196000]
Feb  2 06:52:25 crlt-v4 kernel: xrootd[245484]: segfault at 10 ip 00007fe735f02491 sp 00007fe72c7fb630 error 4 in libXrdCl.so.2.0.0[7fe735e85000+120000]
Feb  3 00:40:49 crlt-v4 kernel: xrootd[136258]: segfault at fffffffffffffff0 ip 00007efc31910356 sp 00007efc0a7fb520 error 5 in libstdc++.so.6.0.19[7efc318b3000+e9000

#server 3:
Jan 26 01:58:18 crlt-v5 kernel: cta-tpd-maint[202198]: segfault at 0 ip 00007f46c5c2753b sp 00007f4627bf1630 error 6 in libctacommon.so.0.1.0[7f46c5aed000+196000]
/var/log/messages:Feb  1 03:53:40 crlt-v5 kernel: xrootd[227014]: segfault at 0 ip 00007fbc7abf653b sp 00007fbbe23fec30 error 6 in libctacommon.so.0.1.0[7fbc7aabc000+196000]
/var/log/messages:Feb  1 07:55:04 crlt-v5 kernel: cta-tpd-maint[51506]: segfault at 0 ip 00007fb3ce69553b sp 00007fb333ff1630 error 6 in libctacommon.so.0.1.0[7fb3ce55b000+196000]
/var/log/messages:Feb  1 08:55:11 crlt-v5 kernel: cta-tpd-maint[195750]: segfault at 0 ip 00007fe09c8a853b sp 00007fdffc3f1630 error 6 in libctacommon.so.0.1.0[7fe09c76e000+196000]
/var/log/messages:Feb  1 08:55:19 crlt-v5 kernel: xrootd[75649]: segfault at 0 ip 00007f41a6bf653b sp 00007f4110bfec30 error 6 in libctacommon.so.0.1.0[7f41a6abc000+196000]
/var/log/messages:Feb  3 00:40:21 crlt-v5 kernel: xrootd[211002]: segfault at fffffffffffffff0 ip 00007f5bfae89356 sp 00007f5bd3ffb520 error 5 in libstdc++.so.6.0.19[7f5bfae2c000+e9000]
/var/log/messages:Feb  3 00:40:33 crlt-v5 kernel: xrootd[213701]: segfault at fffffffffffffff0 ip 00007f9241bda356 sp 00007f921affc520 error 5 in libstdc++.so.6.0.19[7f9241b7d000+e9000]

As you can see, aside from a small amount of segfault prior, they consistently appear on 26th Jan. We run CTA on 3 servers, all of which run multiple instances of cta-taped, and only one node at a time runs cta frontend. We have moved cta frontend around between the hosts the eliminate one particular host as the issue, and it has segfaulted on all our hosts.

Segfaults like this:

xrootd[213701]: segfault at fffffffffffffff0 ip 00007f9241bda356 sp 00007f921affc520 error 5 in libstdc++.so.6.0.19[7f9241b7d000+e9000]

appear to line up when us taking down the frontend/tapeserver services and bringing them back up, so they may be not so relevant.

The other issues we are seeing is the extremely slow performance of the tapeserver <-> fst. We are not exactly sure if it started on Jan 26th or later, but we know that it was not earlier.

We first noticed because the archive queue would very often not drain at all, then drain very slowly. We are used to getting 2+GB/s throughput, from disk to tape, but we are seeing around 20MB/s at best currently. Here is a graph of the archive queue size to illustrate:

Sorry the graph has some blips in the data. You can see the 2T queue is not being drained at all for several hours, then starts to drain very slowly.

Running cta-admin dr ls shows data rates in MB/s:

[root@ctafrontend-0 /]# cta-admin dr ls
     library drive                     host desired        request   status since    vid           tapepool     vo files  data MB/s session priority activity age reason
crlt-piranha   DR0 cta-tapeserver-crlt-v3-0      Up ArchiveForUser Transfer  2970 A01951 crlt-piranha-tapes Shared     8  8.4G  2.7   69569        0        -   9 -
crlt-piranha   DR1 cta-tapeserver-crlt-v3-0      Up ArchiveForUser Transfer  2876 A01961 crlt-piranha-tapes Shared    19 12.8G  4.2   69571        0        -   7 -
crlt-piranha  DR10 cta-tapeserver-crlt-v4-0      Up              -     Free  3062      -                  -      -     -     -    -       -        0        -   1 -
crlt-piranha   DR2 cta-tapeserver-crlt-v3-0      Up ArchiveForUser Transfer  2789 A01966 crlt-piranha-tapes Shared     8  5.3G  1.7   69572        0        -  10 -
crlt-piranha   DR3 cta-tapeserver-crlt-v4-0      Up ArchiveForUser Transfer  2913 A01956 crlt-piranha-tapes Shared    17  8.6G  2.8   69574        0        -   6 -
crlt-piranha   DR4 cta-tapeserver-crlt-v4-0      Up ArchiveForUser Transfer  2878 A01809 crlt-piranha-tapes Shared     8  5.5G  1.8   69576        0        -   7 -
crlt-piranha   DR5 cta-tapeserver-crlt-v4-0      Up ArchiveForUser Transfer  2917 A01970 crlt-piranha-tapes Shared    13  7.4G  2.4   69577        0        -   9 -
crlt-piranha   DR6 cta-tapeserver-crlt-v5-0      Up              -     Free  3062      -                  -      -     -     -    -       -        0        -   1 -
crlt-piranha   DR7 cta-tapeserver-crlt-v5-0      Up ArchiveForUser Transfer  2815 A01950 crlt-piranha-tapes Shared    20  9.7G  3.2   69570        0        -   4 -
crlt-piranha   DR8 cta-tapeserver-crlt-v5-0      Up ArchiveForUser Transfer  2883 A01957 crlt-piranha-tapes Shared     8  7.4G  2.4   69573        0        -   4 -
crlt-piranha   DR9 cta-tapeserver-crlt-v5-0      Up ArchiveForUser Transfer  2983 A01978 crlt-piranha-tapes Shared   110 26.3G  8.6   69575        0        -   6 -

A snipper from tapeserver logs, archving a 460M file:

Feb  3 00:50:59.354933 cta-tapeserver-crlt-v4-0 cta-taped: LVL="INFO" PID="352" TID="352" MSG="Created tasks for migrating a file" thread="MainThread" tapeDrive="DR3" tapeVid="A01956" mountId="69574" byteSizeThreshold="80000000000" maxFiles="4000" fileId="7690256" fSeq="12228" path="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7"
Feb  3 00:56:31.302066 cta-tapeserver-crlt-v4-0 cta-taped: LVL="INFO" PID="352" TID="1146" MSG="Opened disk file for read" thread="DiskRead" tapeDrive="DR3" tapeVid="A01956" mountId="69574" threadID="4" path="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7" actualURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7" fileId="7690256"
Feb  3 00:56:49.276554 cta-tapeserver-crlt-v4-0 cta-taped: LVL="DEBUG" PID="352" TID="1153" MSG="Successfully opened the tape file for writing" thread="TapeWrite" tapeDrive="DR3" tapeVid="A01956" mountId="69574" vo="Shared" mediaType="LTO7" tapePool="crlt-piranha-tapes" logicalLibrary="crlt-piranha" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="6000000000000" fileId="7690256" fileSize="460152100" fSeq="12228" diskURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7"
Feb  3 00:56:50.355037 cta-tapeserver-crlt-v4-0 cta-taped: LVL="INFO" PID="352" TID="1146" MSG="File successfully read from disk" thread="DiskRead" tapeDrive="DR3" tapeVid="A01956" mountId="69574" threadID="4" path="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7" actualURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7" fileId="7690256" readWriteTime="1.235364" checksumingTime="0.000000" waitFreeMemoryTime="17.816795" waitDataTime="0.000000" waitReportingTime="0.000000" checkingErrorTime="0.000715" openingTime="0.314078" transferTime="19.366955" totalTime="19.366955" dataVolume="460152100" globalPayloadTransferSpeedMBps="23.759651" diskPerformanceMBps="23.759651" openRWCloseToTransferTimeRatio="0.080004"
Feb  3 00:56:50.366461 cta-tapeserver-crlt-v4-0 cta-taped: LVL="DEBUG" PID="352" TID="1153" MSG="In MigrationReportPacker::reportCompletedJob(), pushing a report." thread="TapeWrite" tapeDrive="DR3" tapeVid="A01956" mountId="69574" vo="Shared" mediaType="LTO7" tapePool="crlt-piranha-tapes" logicalLibrary="crlt-piranha" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="6000000000000" fileId="7690256" fileSize="460152100" fSeq="12228" diskURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7" type="ReportSuccessful"
Feb  3 00:56:50.366613 cta-tapeserver-crlt-v4-0 cta-taped: LVL="INFO" PID="352" TID="1153" MSG="File successfully transmitted to drive" thread="TapeWrite" tapeDrive="DR3" tapeVid="A01956" mountId="69574" vo="Shared" mediaType="LTO7" tapePool="crlt-piranha-tapes" logicalLibrary="crlt-piranha" mountType="ArchiveForUser" vendor="IBM" capacityInBytes="6000000000000" fileId="7690256" fileSize="460152100" fSeq="12228" diskURL="root://eos-mgm-0.mgm.cta.svc.cluster.archive//eos/cta/aarnet-cloudstor/REDACTED/REDACTED@REDACTED/data/06/0602a80f5d1cb83d95d189be014eab62bc43a8f9871e92bba3d52cfa4f5f5d59?eos.lfn=fxid:df76b7" readWriteTime="0.930642" checksumingTime="0.147576" waitDataTime="0.015717" waitReportingTime="0.000148" transferTime="1.094083" totalTime="1.094069" dataVolume="460152100" headerVolume="480" driveTransferSpeedMBps="420.588263" payloadTransferSpeedMBps="420.587824" reconciliationTime="0" LBPMode="LBP_On

It looks from that snippet like the tape drive is writing as a good rate (420MBps), but the disk is underperforming (23.7MBps). I see our FST disks are not overloaded, in fact since 26th of Jan, their load decreased significantly.

Can you shed any light on any of these issues?

We are running cta v4.7.8-1
EOS 5.1.8 with xrootd 5.5.1-1 (also tried rolling eos back to 4.8.76 to no avail)

Thank you very much in advance,

Warm Regards,

Denis

Hi Denis,
I’m sorry to say that I don’t have a good answer for you here, we have no similar segfault observations on our side to investigate.

There is one thing that stands out though:

Is there a particular reason you are running CTA v4.y.z together with EOS/xrootd 5? For CTA the versioning convention is to have the major version match the XRootD version it is compiled against, see Tagging a new CTA release - EOSCTA Docs . So based on the information above you should be using CTA v5.y.z.
Have you tried v5 and checked if this improves the situation?

To complement what Richard mentioned, I have checked which versions of xrootd have been tested together with these CTA and EOS versions:

  • CTA v4.7.8-1 → xrootd-4.12.6-1
  • EOS 5.1.8 → eos-xrootd-5.5.5-1 (EOS depends on the repackaged eos-xrootd rpm)

I hope this helps

Hello Joao and Richard,

Sorry for late reply on my part. Thank you very much for looking at this and providing your insight.

The issue went away after we rebooted our infrastructure, we just could not figure out the reason otherwise via debugging.

You do raise a very good point, our cta is behind eos. We will upgrade it ASAP so that we run a supported stack.

Thank you again!

Warm Regards,

Denis

2 Likes