DragonFly kernel List (threaded) for 2010-02
Re: kernel work week of 3-Feb-2010 HEADS UP
: SATA SSDs usually have fairly large erase blocks which once all the
:blocks have been touched reduces write performance a lot (often it becomes
:worse than hard disc write performance), PCI SSDs are apparently better in
:this respect but I've yet to see any in the flesh.
I think major improvements have been made in SSD firmware in the last
year. I got a couple of Intel 40G SSDs (the latest ones but on the
low-end of the capacity scale) and I also got one of the 120G OCZ
Colossus's which has 128M of ram cache and is advertised as having
200+ MB/sec of read bandwidth. Clearly there are going to be a lot
of choices and SATA SSDs are commodity hardware now. The
higher-performing devices, such as a direct PCIe attachment, exist
more for commercial database uses and not so much for generic
server operations due to their premium pricing.
Of course, we have to remember here that when serving large data sets
from a normal HD, read performance is dismal... usually less then
10MB/sec per physical drive and often worse due to seeking. Even
older SSDs would beat the hell out of a modern HD in that regard.
I'll have a better idea once I am able to start testing with these
Swap space will be reading and writing 4K or larger chunks as well,
not 512-byte itsy-bitsy chunks, and I expect that will make a
difference. For clean data caching we can easily cluster the backing
store to the buffer cache buffer size which is 8K or 16K for UFS
and 16K for HAMMER.
: SSGs still have fairly low lifespans in terms of writes per cell,
:it seems to me that using them as caches would tend to hit that limit
:faster than most other uses.
:Steve O'Hara-Smith | Directable Mirror Arrays
Yes, there are definitely management issues but wear life scales
linearly with the amount of SSD storage (in terms of how much data
you can write) and it is fairly easy to calculate.
A cheap Intel 40G with 10,000 cycles would have a wear life of 400TB.
That's 100GB/day for 10 years, or 1.2MB/sec continuously for 10 years.
I'm just going by the Wiki. The Wiki says MLC's can do 1,000 to
10,000 per cell. So if it's 1,000 that would be 10GB/day for a 40G
For a larger SSD the total write capability scales linearly to size
since there is more flash to work with.
The main issue from an implementation standpoint is to try to burst
in a bunch of data, potentially exceeding 10GB/day, but then back-off
and reduce the allowed write bandwidth as a means of trying to 'track'
the working data set without wearing the drive out too quickly. You
would want to limit yourself to e.g. 10GB/day (or whatever) as an
average over a period of a week.
I'd say it is already cost effective if we can eek out 10 years of life
out of a 40G SSD in a server environment. I don't think there would be an
issue at all for meta-data caching, and with reasonable bandwidth and
bursting limits (for writing) I don't think there would be an issue
for bulk-data caching either.
And since the drive would just be used for swap space it is effectively
a junk drive which can be replaced at any time with no effort. We could
even do it hot-swap if we implement swapoff (though for now that isn't
in the cards).
One of things I will do when I get these SSDs in and get the basics
working is intentionally wear out one of the Intel 40G drives to see
how long it can actually go. That will be fun :-)