BP106 IBM Lotus Domino RunFaster 1 - Nash!
BP106 IBM Lotus Domino RunFaster 1 - Nash!
BP106 IBM Lotus Domino RunFaster 1 - Nash!
Daniel Nashed
nsh@nashcom.de
http://www.nashcom.de
Introduction/Overview
Q&A
No!
This is a task for
─ People who take care about hardware
─ Operation System Support team
─ Network team
─ SAN Admins, ...
─ Domino Administrators
─ Domino Developers
─ People who install the Notes Clients and Workstation software (OS …)
─ In most cases we are called by the Notes team and end up speaking with a number of different
IT teams within a company
─ Virtualization Layer
─ Notes Infrastructure
– Server, Server settings, …
– Client, Client settings
Network
─ A single 1 GB card should be fine
– You might want a second card for hardware failover and load-balancing
You should always use a 64bit OS for all type of Domino servers
─ 32bit Windows is limited to 2GB Application Space and 300 MB of File-System Cache
64bit Windows and Linux allows a 32bit application to use the full 32bit 4 GB
address space
─ You don't necessarily update your Domino Server to native 64bit itself.
Most benefits come from the 64bit OS in Domino 8.5.x
We have seen application servers with 32GB of RAM holding most of the used
NSF in memory with almost zero read I/O during normal operations
─ Can be more cost effective than using expensive enterprise level SSD Drives
7 © 2013 IBM Corporation
Domino 32bit on a 64Bit Operating System
4GB Total
Servertas Servertas
Servertask Servertask Servertask Servertask
k 32bit k 32bit
32bit 1GB 32bit 1GB 32bit 1GB 32bit 1GB
1GB 1GB
Domino 32bit / Shared Memory 3GB Domino 32bit / Shared Memory 3GB
– notes.ini FTBasePath=c:\FTBasePath
Domino Data
─ RAID10 instead of RAID5.
─ High End SAN hardware should be always RAID10
─ Performance mostly depends on
– Cache, Number of disks, Performance of each disk
DAOS
─ Needs lower performance and has larger sequential I/O requests
─ Could be RAID5 or NAS (even with UNC Path)
– But with UNC path you need sub-directory on NAS side!!!
11 © 2013 IBM Corporation
OS Level Antivirus
Not excluding those files could lead to performance issues and even crashes or
hangs
If you need Antivirus scans on Notes databases you should use a Domino aware
Antivirus solution
─ Mail Scans on Gateway & Scheduled Scans on Databases
TIP: Review exclusion lists after every antivirus software (not pattern) update!
– In case of NAS/iSCSI, check configuration best practices for network card with your NAS
vendor
Fixed page file (min and max values should be the same)
─ Have separate file-system for paging file
– Tests have shown that this works better for almost all SAN or local disk configurations
– Dramatical improvement!
– See next slides for details
Platform.LogicalDisk.1.AssignedName = HarddiskVolume1
Platform.LogicalDisk.1.AvgQueueLen = 0,2
Platform.LogicalDisk.1.AvgQueueLen.Avg = 0,8
Platform.LogicalDisk.1.BytesReadPerSec = 0
Platform.LogicalDisk.1.BytesWrittenPerSec = 0
Platform.LogicalDisk.1.PctUtil = 11,00
Platform.LogicalDisk.1.PctUtil.Avg = 50,00
Platform.LogicalDisk.1.ReadsPerSec = 0
Platform.LogicalDisk.1.WritesPerSec = 0
Platform.LogicalDisk.1.AssignedName = sda
Platform.LogicalDisk.1.AvgQueLen = 11.89
Platform.LogicalDisk.1.AvgQueLen.Avg = 11.89
Platform.LogicalDisk.1.PctUtil = 95.63
Platform.LogicalDisk.1.PctUtil.Avg = 95.63
Platform.LogicalDisk.1.PctUtil.Peak = 95.63
Platform.LogicalDisk.1.ServiceTimeinmsecs = 8.35
Platform.LogicalDisk.1.ServiceTimeinmsecs.Avg = 8.35
%util =Disk Utilisation in % → Values above 90% are an indicator for a busy disk
r/s = Disk reads per second
w/s = Disk writes per second
svctm = Disk services time in ms (how fast the device responds)
await = Time the whole request needs (application to disk queue, disk and back)
─ This is the most important statistic and key indicator
# iostat -x 2
Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s avgrq-sz avgqu-sz await svctm %util
sda 0.00 2024.50 0.00 762.00 0.00 22268.00 29.22 0.86 1.13 0.38 28.80
For larger servers with more than 500-800 users all components have to play
well together and you should plan for sufficient resources
─ Domino can be very resource demanding specially in the I/O area
─ Supported even for larger servers since ESX became a “tier-1 virtualization platform”
Have separate VMDKs for all parts of the server as mentioned before
─ OS, NSF, Translog, DAOS, ..
─ Take care where VMDKs are located on the SAN side and who you share your storage and
servers with!
─ RHEL 6.3
– Kernel 2.6.32-x
─ SLES 11 SP2
– Kernel 3.x
– Already a 3.x kernel. Introduced in SP2
– There is a random I/O optimization fix in 2.6.33 – not yet in RHEL 6.3
─ In earlier relases (SLES 11 SP1 + RHEL 6.x) with CFQ you might want to use
– echo “0” > /proc/sys/kernel/sched_features
So the speed alone does not matter if you have many small transactions
─ Network Accelerators can help to some extend specially for re-transmissions etc
─ But can only reduce the latency to a small extend
─ For complex applications already 50 ms make it difficult to work
─ Example: 100 NoteOpen Transactions
– in LAN = 100 ms
─ Multiple work-arounds:
– Optimize Application
– Optimize Network
– User different access method like Web or Citrix
– Offload operations to the server (e.g. agent run on server)
22 © 2013 IBM Corporation
Network Optimization
If you have no active network component that optimizes your traffic or if you use
Notes port encryption, enable Notes Port Compression!!
─ This can reduce the traffic by 50% (improves response time but does not reduce network latency)
You might want to change the autotuning level from “normal” to “restricted”
─ Set → netsh interface tcp set global autotuninglevel=restricted
─ Check → netsh interface tcp show global
Windows 2008 has a feature called "Stealth mode" wherein by default it does not
respond to network requests on ports for which no process is listening. This
feature creates a problem with the Notes client failover when a server is down,
since the OS does not respond to the client and thus the client will not failover
until it's timeout has been exceeded. Although Domino does support Windows
2008, at the appropriate version level, this is a feature which "breaks" the Notes
client failover.
To run in "Stealth Mode" and get good failover, IBM recommends the installation
of Firewall software which disables this "feature" on the Notes NRPC port (1352
by default).
SPR# SWAS8GGHMC
─ Fix makes cluster failover more efficient in 8.5.3 by default. This fix was also in 8.5.2 FP3 but
you had to set a notes.ini ENABLE_CLUSTER_MATCHES_FAST=1 in the Domino server
notes.ini to enable it.
─ There is also a notes.ini to disable that is only valid in 8.5.3 and above which is
DISABLE_CLUSTER_MATCHES_FAST=1 (default value being 0).
─ The existing API - ServerGetClusterReplicaMatches resolved the failover replica IDs but caused
server outages.
─ Regression in 8.5.2 FP2.
You can specify a directory (e.g. on system disk) for optimized view rebuild
On Linux you can put temp-files and view rebuild files into tempfs
─ Similar to a RAM drive but not reserving memory and self organized
─ Changes are only written to disk if memory is needed by the server
– It would swap to disk if space is needed
─ tempfs is enabled by default with half the size of the physical memory
– Located in /dev/shm
By default the NSF Buffer Pool is 512 MB for 32bit and 1 GB for 64bit
─ The default works for almost all configurations
─ A cache/buffer is only effective until a certain size
Interpretation: Bad < 90% < PercentReadsInBuffer < 98% < Perfect
─ If you really need to change it use notes.ini NSF_BUFFER_POOL_SIZE_MB=512
Interpretation
─ Good = HighWaterMark < MaxEntries
─ Good = 0 OvercrowdingRejections
Interpretation
─ Waiting should be ZERO
Tune
─ Server_Pool_Tasks = n ( e.g. 80)
─ Server_Max_Concurrent_Trans = m (e. g. Server_Pool_Tasks * Number of Ports)
Router Optimization
─ RouterMaxConcurrentDeliverySize=1048576
─ Disable_BCC_group_expansion=1
31 © 2013 IBM Corporation
NSF Optimization
Since Domino 8 you can "Defer index creation until first use" under the "Click on
column header to sort" in the properties of the column of a view/folder
─ This will ensure that only the primary index of the view/folder is created
─ If the user clicks on another sort index, the index will be automatically build
─ And updated via update task from now on!
Avoid on-the-fly fulltext indexing needed by agents with search queries and no
FT index on the database
─ Set notes.ini FT_FLY_INDEX_OFF=1 to disable on-the-fly FT indexing completely
─ Will let agents fail with error which need on-the-fly FT indexing
The warning message you see in log.nsf for on-the-fly FT index is:
─ "Warning: Agent is performing full text operations on database '<name>' which is not full
text indexed. This is extremely inefficient."
Split your application into static config database and databases with normal “data”
Use archiving to have smaller active and larger static archive databases
─ Sounds simple but is very effective
─ Improves performance and reduces I/O load dramatically
─ Specially true for mail!
You could consider using it even if you just archive to the same server
Most expensive element is the $Inbox folder for a large, busy mail-database!
View update performance is linear to number of docs and also number of columns
─ But complex view performance changes exponential
– 2 times more documents -> takes ~4 times longer
TIP: You should disable unreadmarks for all system databases if you don't need it
─ names.nsf, log.nsf, mail.box, admin4.nsf, statrep.nsf
Workaround: Use Run on Server Agents off-loading the work to the server
─ This can be a good approach for other operations that need many transactions
When the design cache is broken fall-back code is used to find design elements
─ Instead FINDDESIGN_NOTES transactions the design collection is opened and searched
sequentially until a match is found!
– For databases with a lot of data this can be quite an overhead
─ Alternate way: Check for design collection usage via client_clock tracing
– Client_Clock=1, debug_outfile=c:\notes.log
Client_Clock logs
─ Transaction sequence
─ Transaction name
─ Transaction data (ReplicaID, NoteID)
─ Response time (ms)
─ Bytes send, received
Example:
─ (15-78 [15]) OPEN_NOTE(REPC1256B16:0072BCBE-NT00000E3E,00400020): 0 ms.
[52+1454=1506]
45 © 2013 IBM Corporation
Client Clock Annotation
─ You don't need necessarily a SSD but you should have decent, current hard-disks
– 7200 RPM vs. 5400 RPM disks, larger cache
Startup Performance
─ Every Dot-Release has better performance for the standard client
─ Keep up to date!
Don't use extmgr_addins from Antivirus Vendors like McAffee and Norton
─ Experience from the field: they can cause stability issues and also cost performance
─ Also see reference https://kc.mcafee.com/corporate/index?page=content&id=KB57589
Getthreadinfo(LSI_THREAD_TICKS)
─ LSI_THREAD_TICKS = 6
─ LSI_THREAD_TICKS_PER_SEC = 7 returns ticks per second (1000)
─ tick = millisecond (different than agent ticks!)
─ Can be used to calculate run-time in milliseconds ;-)
Code Example
─ Dim lStart,lStop As Long
─ lStart = Getthreadinfo(6)
─ ... do some work ...
─ lStop = Getthreadinfo(6)
─ ms = lStop – lStart
Questions?
─ Now, find me later at the conference or contact me offline
Contact
─ nsh@nashcom.de
─ http://www.nashcom.de
─ http://blog.nashcom.de
─ +49 172 2141912