Beginning with the OES 2015 SP1 release, the NSS file system has improved disk storage allocation policies. This improvement in disk storage allocation is referred to as delayed block allocation.
Delayed block allocation improves the file system performance and reduces the file fragmentation by effectively writing larger amounts of data at a time. Delayed block allocation allows the aggregation of sequential file blocks before writing them to the disk. The aggregation of sequential file blocks allows multiple blocks to be allocated as a single extent (set of contiguous disk blocks) instead of separate disk blocks. By default, this feature is enabled.
For more information about the file fragmentation, see
The benefits of delayed block allocation are:
- It combines the write requests and allocates the extents in large chunks so that the number of extents allocated to the files are less. It means more contiguous blocks are allocated
- Read performance is improved as data is written in more contiguous blocks so it minimizes the rotational and seek latencies involved in the movement of disk head in a rotational disk
- Write performance is improved by reducing the writes of NSS metadata such as journal, file map, free tree etc, as it now updates each of these metadata for all aggregated sequential file blocks instead of individual user file block writes
The following use cases might be benefited by the aforementioned capabilities of delayed block allocation.
- File server use case where multiple users are accessing the file system and there are multiple write requests coming at the same time. In case of traditional block allocation, blocks were allocated as and when the write request comes, but delayed block allocation tries to allocate as many contiguous blocks or large extents by waiting for a certain period of time. It also helps in keeping allocations contiguous when there are several files growing at the same time and thereby improves the access time. File server users can notice the difference when they open and read those files.
- Full and incremental backup performance is improved as the number of allocated extents are less. It has direct impact on the backup window. Assume that the approximate data growth is around 200 GB per week and it takes around 12 hours to finish that incremental backup, but now with delayed block allocation it might take around 2 hours to finish the same job. This calculation is based on the throughput observed by the test results shown below for the ‘Backup Performance test’
I conducted the following tests to see the performance benefits offered by the delayed block allocation feature for the aforementioned use cases. Tests were performed with and without Delayed Block Allocation (DBA).
- File server use case – read and write test from multiple CIFS connections
Around 590 Novell CIFS connections were used for this test. Each connection mapped the NSS volume and performed continuous write operations on different size of files. This test created around ~230GB data. The created data is read by the same number of connections after the completion of write test. Java programs were used for this test.
File sizes – 4KB, 8KB, 16KB, 32KB, 64KB, 128KB, 256KB, 512KB, 1MB, and 2MB
Record size – 4KB
Number of files per connection – 1000
|Write Test – Response Time||Without DBA||With DBA||% Change w.r.t Without DBA|
|Avg Open response time (ms)||36.69||8.86||-75.851|
|Avg Close response time (ms)||29.82||11.58||-61.167|
|Avg Create response time (ms)||66.31||58.17||-12.275|
|Avg Write response time (ms)||76.09||28.38||-62.702|
|Time taken for test completion (mins)||140||55||-60.714|
|Write Test – Throughput||Without DBA||With DBA||% Change w.r.t Without DBA|
|Avg Write throughput (KB/sec)||53.07||142.42||168.362|
It can be seen that with DBA write throughput is improved by 168% and write test is taking 60% less time for the completion when compared to without DBA.
|Read Test – Response Time||Without DBA||With DBA||% Change w.r.t Without DBA|
|Avg Open response time (ms)||2720.65||498.59||-81.673|
|Avg Close response time (ms)||2.71||2.51||-7.38|
|Avg Read response time (ms)||196.91||39.16||-80.112|
|Time taken for test completion (mins)||690||135||-80.434|
|Read Test – Throughput||Without DBA||With DBA||% Change w.r.t Without DBA|
|Avg Read throughput (KB/sec)||20.24||101.79||402.915|
It can be seen that with DBA read throughput is improved by 402% and read test is taking 80% less time for the completion when compared to without DBA.
The above data represents the average response time and throughput seen by the individual connection when around 590 connections are performing operations simultaneously.
- Backup Performance test
In this test, the local backup performance is measured using the tool called TSAtest which is available as part of Open Enterprise Server. Firstly, around ~30 GB data was created for 590 Novell CIFS connections where each connection created around 50 files of 1MB size. The created data is then backed up by using the single instance of TSAtest tool to measure the backup performance
|Backup Test||Without DBA||With DBA||% Change w.r.t Without DBA|
|Backup Throughput (MB/min)||272.45||1572.13||477.034|
|Time taken for test completion||103||18||-82.524|
Where extents is a bunch of contiguous blocks, which were allocated for the created files.
These test results proves that significant performance improvements are observed for both the use cases.
- File server
- Backup performance
Please note that the above test results were observed under lab conditions and results may vary based on the test, hardware or network configuration. The number of connections, file access protocol, type of test, and number of operations also plays a crucial role in defining the test results.
The following are the hardware details for server and clients that were part of above tests:
RAM size – 16151420 kB
CPU information –
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
On-line CPU(s) list: 0-7
Thread(s) per core: 2
Core(s) per socket: 4
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
CPU MHz: 3101.000
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0-7
Version – Windows 7 Enterprise SP1 32 bit
RAM size – 2GB