KB Emc308914
KB Emc308914
KB Emc308914
Article Content
Impact: ETA emc308914: VNX: Excessive trespassing causes I/O performance issues, which may adversely affect Virtual
Provisioning.
Issue: This issue can be indicated by a LUN or LUNs being trespassed excessively by the Middle Redirector Driver.
The Virtual Provisioning driver (MLU) is calculating the number of I/Os serviced by the local Storage Processor
(SP) compared to the number of I/Os serviced by the peer SP which triggers a trespass to be issued. The 712d8107
event is a symptom of this issue, not the cause. See 86538 for an explanation of the 712d8107 event. If you are
running VNX OE 05.32.000.5.006, 05.32.000.5.008,05.32.000.5.011 and you see excessive 712d8107 and
711b0001 messages in the SP Event Logs, EMC recommend to upgrade to VNX Operating Environment (OE)
05.32.000.5.015 or later, which contains a fix for this issue.
Examples of the Virtual Provisioning issues encountered due to this issue can include:
Performance issues
Pool Luns being marked for recovery
SP Bugchecks
Virtual Provisioning driver becoming degraded
This statement does not apply: EMC SW: VNX Operating Environment (OE)
05.32.000.5.015
Resolution: Workaround:
To temporarily alleviate this issue, disable Autotiering on a per pool basis and reboot the SP's (one at a time)
and schedule an upgrade to VNX OE 05.32.000.5.015. If you are affected by one of the symptoms listed
above please engage EMC Customer Support to resolve the issue.
Autotiering can be switched off by using:
Naviseccli autotiering -relocation -stop -all
Naviseccli autotiering -schedule -disable
Autotiering can be enabled after upgrading to VNX OE 05.32.000.5.015
Autotiering can be reenabled by using:
Naviseccli autotiering -relocation -start -all
Naviseccli autotiering -schedule -enable
Permanent Fix:
This issue is addressed in VNX OE 05.32.000.5.015 (released December 2012).
Notes
Symptom: A sample of what may be seen in the Triiage_SPlogs.txt:
(Employees
A 12/07/12 08:36:05 MLU
712d014e Trespass Execute received on LUN 5 (object ID A00000009, WWN
and
Partners): 6006016010503000:f4e80d127384e111) in pool 0 (object ID 300000003).
A 12/07/12 08:36:05 MidRedirect 711b0001 DynStrings:\Device\CLARiiON\mlu\000000095
A 12/07/12 08:36:05 MLU
712d0003 Operation Promote Replica started by 900000009 on 200000009.
A 12/07/12 08:36:05 MLU
712d0004 Operation Promote Replica completed on 900000009.
A 12/07/12 08:36:05 MLU
712d0004 Operation Mount FS completed on 200000009.
A 12/07/12 08:36:05 MLU
712d0003 Operation Mount FS started by 200000009 on 300000003.
B 12/07/12 08:36:05 MLU
712d014d Trespass Ownership Loss received on LUN 5 (object ID A00000009,
WWN 6006016010503000:f4e80d127384e111) in pool 0 (object ID 300000003).
B 12/07/12 08:36:05 MLU
712d0d01 LUN 6006016010503000:f4e80d127384e111 is ready to service IO.
LU OID A00000009 Pool OID 300000003. [ALU 5]
B 12/07/12 08:36:05 MLU
712d0004 Operation Unmount FS completed on 200000009.
B 12/07/12 08:36:05 MLU
712d0003 Operation Unmount FS started by 200000009 on 300000003.
A 12/07/12 08:37:52 MLU
712d014d Trespass Ownership Loss received on LUN 5 (object ID
A00000009, WWN 6006016010503000:f4e80d127384e111) in pool 0 (object ID 300000003).
Symptom: Continuous trespasses for unknown reason. Trespassing itself is not the reason, but watching for
excessive assignments in the Triiage_analysis file will indicate what LUN is affected.
Symptom: Once a threshold of 2,147,483,649 of I/Os of any kind to a LUN is reached, each subsequent
64,000 I/O requests will result in a Middle Redirector trespass of that LUN.
Cause: The trespass storm is caused by an issue in MLU by using a "LONG" data type to store the result of
subtraction of two values of data type "ULONGLONG." This result is being interpreted as being positive even
when the actual result is negative. After every 64,000 of I/O to a file system (LUN or AdvanceSnap), MLU
checks if it would better for the LUN to be owned by the peer. This is done by querying the middle redirector
for volume statistics. The volume statistics provide a count of how many total I/Os were served locally and
how many were served on the peer. MLU uses this information to calculate:
VolumeStats.totalPeerIOServiced - VolumeStats.totalLocalIOServicedLocally;
If the result is negative, it implies that more I/Os are being served locally compared to the number being
served on the peer (and therefore being redirected). This would be the normal case of a single LUN file
system (LUN without AdvanceSnaps). The bug is that the result of this subtraction is stored in a LONG and
therefore if the totalPeerIOServiced were 0 and totalLocalIoServicedLocallywere greater than a 32- bit
number (0x80000001) (decimal 2147483649), the 32-bit result from the subtraction would positive and
therefore the MLU would incorrectly assume that more I/Os were being redirected than served locally and the
LUN would be better serviced by the peer SP. This will result in MLU requesting a trespass of the LUN.
Since the statistics do not get reset on a trespass, a similar problem happens on the peer when MLU checks
after 64,000 I/Os. The trespass storm continues until the statistics are reset on a reboot.
Note: EMC Support can download the following program MRv2.exe, which can assist in diagnosing the
issue. Extract the program into the relevant tools folder (e.g. c:\tools\) and run the command in the folder
where the SPcollects are stored and execute using the command MRv2
Example of the Tool being used:
Codelevel = 05320005.011
The following stats are only valid if this is R32 below p15
Note: For more information consult ARS defect numbers 507904, 507478, 513279, 521421, 524467,
526706,and 526071. ARS access is only available to authorized Customer Service Representatives.
Note: Please read! ETAs constitute formal notification from EMC to customers, partners, and EMC field
personnel. Changes to this solution require approval of the Customer Service ETA Approver and this approval
must be recorded in the Comments of the solution. To identify the Customer Service ETA Approver, go to the
Comments tab or the list of ETA Approvers located on the ETA web page of the Global Service web site.
Article Metadata
Product: VNX1 Series
Problem Code: EMC Software
Shared: Yes
RCA Status: Complete
External Source: Primus
Primus/Webtop solution ID: emc308914
Originally Created By: Gearoid Griffin
Legal Information
Please read! ETAs constitute formal notification from EMC to customers, partners, and EMC Customer Service
personnel. Changes to this article require approval of Customer Service ETA Approvers for the EMC products listed. This
approval must be recorded in the Internal Authoring Notes of the article. To identify the Customer Service ETA Approver,
refer to ETA Approvers List on KCS at EMC on EMC ONE.
EMC CONFIDENTIAL INFORMATION