Performance for small files?

The discussion on the bytes/second today in the dev meeting reminds me of a question we have.

We are doing some tests with small files. Sequential writing, then recalling (also sequential, no gaps).

100 kB files get written at about 11 MB/s and read back at 4 MB/s

1 MB files are written at about 80 MB/s and read back at 35 or so MB/s (these are a rough average of per file speeds, not session speeds)

Is this consistent with what you see? Is the read hampered by having to constantly re-create the threads that are reading files?

Hi Eric,
this performance could be improved by configuring cta-taped to deal with small files:
ArchiveFlushFiles default is 200, therefore if your files are really small for example 100kB, you flush every 20MB which is a performance killer…
You may also need to tweak other parts of cta-taped (like NbDiskThreadsm fetch files…).

Is this on real tape drive or virtual tape drives?

cta-taped is by default configured for reasonably sized files for a tape system.

This is a real M8 tape. I’ve been told 100k is unreasonably small even for the small stuff some FNAL experiments write. So I don’t know if this is relevant to us, but I wanted to ask. Would ArchiveFlushFiles impact reading or only writing?

It impacts only archive: this is how many files you write to tape before flushing, there is another parameter to configure the archive flush size.

The first reached is used. I think the size is around 32GB.

Therefore if you have small files you can use a high ArchiveFlushFiles so that you do not flush every 200 small files but many more (flush causes a sync that slows down tape write on the drive).

For large files anyway once the drive wrote more than 32GB these will be flushed, therefore if you have 1GB files, after 32 of these the size will be reached and flushed and the ArchiveFlushFiles will not be reached. For smaller files, you will reach `ArchiveFlushFiles before 32GB and these will be flushed.

Flushing is important to validate what has been written by the tape drive and report as successfully written to the cta catalogue.

You can do some tests and check how this impacts you, I can do some performance test in preproduction if needed: I instrumented the kernel to measure this kind of slowdowns.

Julien