Article

timscotland's picture
article
Reads:

15392

Score:
4.666665
4.7
6
 
Comments:

24

GroupWise 7 and OES2/SLES10 File Systems

Author Info

25 March 2008 - 5:29am
Submitted by: timscotland

(View Disclaimer)

Problem

During BrainShare 2008, the most common question that the GroupWise team and the GroupWise SysOps were asked was, "What file system should I use for Linux?" This article explains the choices and why I reach the conclusion that I do ...

Solution

There are 4 (or 5) choices of file system for GroupWise 7 on SLES10 or OES2:

- XFS
- EXT3 (and EXT3 plus H-Tree)
- ReiserFS
- NSS

Let's look at these in turn and then explain their advantages and disadvantages.

XFS

XFS is extremely fast, but it uses some very aggressive caching to achieve the throughput. It's questionable in its management. For example, try formatting a USB stick with XFS, then copy a large file to the USB stick. When the copy has finished, umount the drive. When you next plug the USB stick in, is the file readable? (I have failed to have the copy complete in 10 test attempts.) This makes it totally unsuitable for a cluster, and probably too vulnerable for a standalone GroupWise 7 Server. Discounted.

EXT3

EXT3 is slow without the H-Tree and so is discounted.

With H-Tree, EXT3 becomes a very strong performer. However, there is a price to pay for the increased performanace. GroupWise uses telldir(), seekdir(), and readdir(), in the calling of files, all of which return a cookie that is nominally the position in the file. It's unfortunately based an assumption that was true when the interface was designed but is not currently true. The assumption is that directories will always be laid out linearly, so a file offset will be enough to identify the directory entry. Unfortunately, this offset is a signed int, where only positive values are valid. This translates to a cookie size of 31 bits in which to uniquely identify the position.

The problem arises with ext3 hashed directories, because they use a 64-bit hash that consists of a 32-bit major and a 32-bit minor hash. Ext3 returns the major hash, which is additionally truncated by one bit. This not only eliminates the collision handling that the kernel handles internally, but also creates even more collisions for telldir(). Discounted.

So, now we are down to a 2 horse race, ReiserFS Vs NSS.

NSS and ReiserFS

In the original OES, based on SLES9, the performance of NSS was severely lacking in comparison to ReiserFS, having only some 80% or less of the performance of Reiser. With the new SLES10-based OES2, that degradation in performance of NSS compared to ReiserFS appears to have been reduced to a more acceptable 5% or so – so long as the volume is created with Salvage disabled. For a long time, ReiserFS has had a reputation of being quick but fragile, and when the file system tree has to be rebuilt, you are more likely to get a few sticks and a pile of leaves than a whole tree. NSS rebuilds are remarkably complete (as one would expect from a file-server file system); therefore, NSS provides an excellent alternative to ReiserFS.

Looking at telldir() and ReiserFS, ReiserFS doesn't display the same problem because it uses a much smaller hash space. It has a 32-bit total, where bits 7-30 describe the hash, and bits 0-6 describe the generation number that handles collisions. Because the last bit is unused, ReiserFS doesn't run into problems with telldir(). The trade-off is that ReiserFS supports a small hash space with a maximum of 127 collisions, so it's much more prone to spuriously returning -EBUSY when the maximum number of collisions has been reached. I expect that XFS could have the same problem as EXT3 plus H-Tree, since it uses 64-bit offsets that end up getting truncated.

In conclusion there are really just 2 choices. The most performing (stable) system is ReiserFS, which should be used when every cycle is critical - but beware of overloading the system. The best compromise between speed and resilience is NSS with Salvage disabled.

Based on the above, I would recommend that the default file system to be used for a GroupWise system running on OES2 is NSS.

I hope this helps some folk with the sleepless nights.


Disclaimer: As with everything else at Cool Solutions, this content is definitely not supported by Novell (so don't even think of calling Support if you try something and it blows up).

It was contributed by a community member and is published "as is." It seems to have worked for at least one person, and might work for you. But please be sure to test, test, test before you do anything drastic with it.




User Comments

penguin_roar's picture

Ext3

Submitted by penguin_roar on 25 March 2008 - 10:30am.

I would really like to see a performance comparison between ext3 and NSS. Personally i have been burned by NSS a couple of times while in the same time i have never ever had any problems at all with ext3 even if i have used it on countless more machines than NSS or reiserfs.

If the performance benefits with NSS is small id be glad to throw money at faster disks just to keep the stability of ext3.

Targeting Groupwise against the default filesystem on Linux could also be a thought to ponder hard for you guys. Stability should be as important to you as it is for me.

timscotland's picture

Ext3

Submitted by timscotland on 28 March 2008 - 2:44am.

The problem with Ext3 comes when you have more than 512 files in a folder. I ran my first test system on Ext3 and thought it was brilliant (because there was very little real data) and then I ran it on our production system - and performance dived.

raronson's picture

I would disagree vehementl!

Submitted by raronson on 25 March 2008 - 12:05pm.

The XFS filesystem does quite a lot in memory, thus the failure on the memory stick test. Still, it does a much better job of handling a large number of small files than ext3 and is developed by someone much more stable than reiser (in jail for murder). We have seen many situations where filesystems running GroupWise become corrupt and require lengthy recovery operations. The repairs can take hours. XFS is much more resilient and better suited to the task.

timscotland's picture

You may, but...

Submitted by timscotland on 28 March 2008 - 2:48am.

Using the USB stick as an example of how aggressively XFS uses caching an memory explais to most people why this would be a bad file system in a cluster (as I said) and again, could (not will) cause issues on a stand alone server. When this is taken in conjunction with the probably telldir() issue, I believe there is a logical conclusion that as the is "data" that we are manipulation, we cannot take that risk. The nice thing about Linux is it provides the choice - so if you are happy with the risk, so be it.

ecl2's picture

NSS

Submitted by ecl2 on 25 March 2008 - 6:31pm.

Would NSS also be better when migrating a netware cluster to sles 10.1, in place. same servers, same san. would I even have to touch the san data ?

timscotland's picture

NSS from NetWare

Submitted by timscotland on 28 March 2008 - 2:54am.

It seems that as long as Salvage was disabled when the volume was created - NSS performs, turning it off doesn't seem to have the same benefit.

Thus I would still be looking at a migration.

T

rwhuggins's picture

I'm smart...or so it appears :)

Submitted by rwhuggins on 26 March 2008 - 6:55am.

I am in the process of migrating GroupWise from NetWare to OES Linux and ran into the same file system dilemma. I ended up using NSS since I have been a NetWare admin for years and felt more comfortable with the structure. I'm very happy to be reading your recommendation for NSS. It was purely a comfort level issue for me, but I guess I look smart now!! :)

jstacey's picture

NSS ... sorry but no

Submitted by jstacey on 26 March 2008 - 7:14am.

I am sorry NSS? What are you smoking and not sharing?
Way too much overhead IMHO.

With Hans Reiser in jail and his company Namesys gone reiserfs is not an option. There is just no upgrade path to reiser4 and the ownership is up in the air.

Ext3 is pretty much the standard filesystem for all Linux distributions these days. It combines good performance with good reliability and wide support. It also provides an easy upgrade path to Ext4.

In all my years as a Linux admin Ext3 has never once burned me, no matter what I threw at it. I would recommend using it for just about any workload, even for GroupWise.

timscotland's picture

Ext3

Submitted by timscotland on 28 March 2008 - 2:57am.

I would have preferred to use Ext3 my self - but without H-Tree when each folder gets to more than 512 files the performance drops off drastically - ask Steven Tweedy (He lives here in Edinburgh)

As H-Tree and telldir() are broken (in their interaction) we can't speed it up. I am looking forward to Ext4 but then again, we have been doing that for the past decade :-)

As the saying goes YMMV

Have fun

T

mww's picture

Hmm, can we dumb this down

Submitted by mww on 26 March 2008 - 7:16am.

Maybe you can simplify, are you saying EXT3 with H-TREE will cause curroption with GW? Or simply that it's slower...just trying to decipher you explination of

"This not only eliminates the collision handling that the kernel handles internally, but also creates even more collisions for telldir(). Discounted."

Are collisions an integrity issue or speed issue?

We have always used EXT3 becuase reiser always seemed to corrupt so easy and we didn't use NSS becuase by the time you add the dependencies like edirectory we saw the memory overhead as counter intuitive.

I would love to see some benchmarks!

timscotland's picture

Conformation

Submitted by timscotland on 28 March 2008 - 3:00am.

Ext3 is slow with greater than 512 files in a folder.

Ext3 plus H-Tree is fast - but can return the wrong file (not corruption) which is bad for an email system.

I like Ext3 and use it as my preferred file system, but in this case because of the way that the GW flaim database works, the performance drops off drastically once the 512 files limit is reached.

I believe that Novell will be doing some benchmark tests on the different file systems soon - I was pushing like heck for them at BrainShare :-)

T

dustyp's picture

comparison of linux/windows performance required

Submitted by dustyp on 26 March 2008 - 8:04am.

How do the above compare to running on an NTFS system, Tim?

timscotland's picture

NTFS

Submitted by timscotland on 28 March 2008 - 3:04am.

I'm not the man to go into file systems - but if you look at all of the systems in use today, the only one that really requires to be defragmented on a regular basis in NTFS - it really is a poor piece of design in my opinion.

A Windows server is the slowest of a NetWare, Linux, Windows trio with the fastest now being the Linux box.

If you look at SAMBA a Linux server can handle more clients than a windows server!

I don't think this answers your question, but anecdotal evidence would imply that it is slower...

HTH

bsnipes's picture

Reiser

Submitted by bsnipes on 26 March 2008 - 8:50am.

I tried NSS on OES1 for GroupWise and it was very, very slow. I know it has improved since then but I've been using ReiserFS for over 2 years and have had zero trouble and responsiveness is excellent for the amount of email we have ( over 100GB of email on that box ).

Brian

timscotland's picture

Reiser

Submitted by timscotland on 28 March 2008 - 3:06am.

On OES1 there was only one choice - Reiser.

The whole point of the article is that the release of SLES10/OES2 has changed the dynamics :-)

NSS when OES was first released was like a 2 legged dog walking backwards - terrible (shudder) but the changes since then have revolutionised the performance.

Unless you are about to redeploy on OES2 - stay where you are.

T

dfield's picture

What was said at Brainshare 2008

Submitted by dfield on 26 March 2008 - 3:38pm.

From my understanding...the OES Engineers said OES2 WITH SP1 you will see major performance increases with NSS. They did say there were issues with OES1 regarding performance with NSS..

Just my 2 cents..

mfisk's picture

Typical Linux Community Response

Submitted by mfisk on 26 March 2008 - 5:19pm.

Why does any technical discussion always wind up in a nerd war with the Linux community?

How about some actual test results instead of the typical Ford vs Chevy arguing?

timscotland's picture

Volvo

Submitted by timscotland on 28 March 2008 - 3:08am.

I'm a Volvo man my self :-)

I believe that they will be forthcoming from Novell later this year - I was pushing hard at BrainShare.

T

MARVHUFFAKER's picture

Flexibility vs. Performance?

Submitted by MARVHUFFAKER on 27 March 2008 - 12:52pm.

I have built quite a few GroupWise servers on Linux, some on OES 1, some on OES2, and some on SLES10. In most cases I have used Reiser only because it was never clear which one was best, but Reiser was the 'default' answer.

The biggest downfall I see with reiser (or other linux file systems), is that if I need more space, I can't do it. If I could use NSS for GroupWise instead, it's easy to expand the pool on the fly, immediately giving more space to GroupWise. I realize that most hardcore linux people are not familiar with NSS or Novell's traditional services, so they don't understand many of the benefits that Novell people have taken for granted for numerous years.

Even if GroupWise on NSS has a 5% degradation compared to Reiser (Assuming OES2), the extra flexibility for future expansion easily outweighs the performance. 5% is nothing in my book.

Marvin Huffaker
www.redjuju.com

paul_woodward's picture

read about LVM2

Submitted by paul_woodward on 7 April 2008 - 8:25am.

The biggest downfall I see with Novell people is an assumption that Novell software is automatically superior to anything else.

LVM is at least as powerful as NSS pools, you can extend volumes to free space on existing disk, or add new disk and extend the volume group to include the new disk. You can also do volume level snapshots.

Reiser and XFS partitions can be extended on the fly to grow to the size of the bigger volume. Ext3 will require unmounting, extending and remounting.

EVMS is even more powerful, but I'm not as familiar with that as I am with LVM.

Jason Williams's picture

Believe it or not...

Submitted by Jason Williams (not verified) on 28 March 2008 - 4:22pm.

Believe it or not, this is a very interesting topic in the hallways of the Workgroup business unit these days. Both Alex Evans (GroupWise PM) and I have fielded dozens of questions and comments about the choice of file systems when running GroupWise on Open Enterprise Server 2. Clearly, NSS is a great file system, and its performance on OES2 is far superior to OES1, but this is not a one-size-fits-all issue.

We know that our customers would like Novell's guidance on this. We are starting to articulate use cases that would provide a reference for customers of both GroupWise and OES2 on Linux when deciding which file system to use: NSS, ext3 or Reiser. We think that with the upcoming release of OES2 SP1 and Bonsai, it's a good time to get this down on paper (or up on a wiki) and out to our customers.

Please keep your eyes open for that in the near future.

Regards, Jason Williams & Alex Evans

mww's picture

Where is the Htree setting?

Submitted by mww on 31 March 2008 - 7:24am.

So that begs the question. How do I check to see if HTree for my Ext3 partitions is enabled or not?

when created an EXT3 filesystem in SLES10 is htree automatically enabled?

timscotland's picture

H-Tree

Submitted by timscotland on 31 March 2008 - 9:55am.

Rest easy - you have to activate it (on SLES9 it wasn't even supported)

To enable H-Tree look up dir_index

T

mrj412's picture

Additional NSS recommendations?

Submitted by mrj412 on 31 March 2008 - 1:22pm.

I was one of the many who asked you this very question at BrainShare -

I am setting up my OES2 servers now and was curious if you have any additional recommendations along with no salvage. Should I stick with the Unix namespace? Disable oplocks?

Are there any issues with running the agents as a non-root user on NSS?

© 2013 Novell