Mods PDF
Mods PDF
Mods PDF
MODULAR DIAGNOSTIC
SOFTWARE
FOR 343.X DIAGNOSTICS
MODS.DOCX_R343_v02 | Aug 2013
NVIDIA CONFIDENTIAL | Prepared and Provided Under NDA
Software Documentation
DOCUMENT CHANGE HISTORY
MODS.DOCX_R343_v02
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | ii
TABLE OF CONTENTS
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | iii
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | iv
LIST OF TABLES
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | v
MODULAR DIAGNOSTIC SOFTWARE (MODS)
1.0 INTRODUCTION
This document describes the NVIDIA Modular Diagnostic Software (MODS). MODS is a
powerful software program that allows users to test NVIDIA hardware. MODS is used
for three primary purposes:
Chip and board functional validation
Architectural verification
This document covers the usage of MODS for graphics and compute products.
Microsoft Windows 7
MacOSX
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02| 6
MODULAR DIAGNOSTIC SOFTWARE (MODS)
One script will run on all supported operating systems without modification.
Complete embedded OpenGL and CUDA drivers, and resource manager—this is the
same code base that is used in the Linux and Windows drivers.
The MODS GPU manufacturing test suite exercises most but not all of the capabilities of
the NVIDIA hardware. It is assumed that the silicon has undergone a normal screening
process prior to shipping to the customer and that the primary purpose of the test is to
determine if the board manufacturing process has completed successfully and all solder
connections and components are working properly.
2.0 USAGE
Normally, MODS is invoked by using the command-line:
mods gputest.js –mfg (for CEM testing)
The difference between these two test options is that the –mfg option runs the full suite
of tests. The –oqa test is a slightly less stressful and quicker suite of the tests optimized
for speed and coverage.
MODS test suite is usually distributed to customers in a package with a part number like
“618-60506-3501-CX0.” These packages have been qualified to test a particular product
and contain release notes and batch files tailored to that card. The directions in those
release notes should be followed instead of running the command-lines above.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 7
MODULAR DIAGNOSTIC SOFTWARE (MODS)
Note that there are two types of options in MODS: those that are arguments to MODS
itself, and those that are arguments to the script. By convention, MODS’ arguments are
usually a single character, but the script arguments are usually many characters.
Example:
mods –d –C gputest.js –mfg –run_on_error
In the above example, -d and –C are optional arguments to MODS, and –mfg and –
run_on_error are arguments to the script. For more optional arguments to MODS, please
see section 3.3. For JavaScript based arguments (script dependent), please see section 3.4.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 8
MODULAR DIAGNOSTIC SOFTWARE (MODS)
File Description
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 9
MODULAR DIAGNOSTIC SOFTWARE (MODS)
testlist.jse thermcal.jse
tofile.jse tunetrim.jse
tunevolt.jse boards.dbe
The version is in the following format XX.YY where XX is the major version number,
and YY is the minor version number. MODS uses NVIDIA’s “unified software
architecture” and much of the code base is shared with the drivers. A version of MODS
with the version XX.YY (e.g., 195.5) has a lot of shared code with a driver that also starts
with XX (e.g., 195.10).
Linux
Intel or AMD CPU with AMD64 support
Apple Macintosh
x86-based Macintosh. PowerPC-based systems are no longer supported.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 10
MODULAR DIAGNOSTIC SOFTWARE (MODS)
Option Description
-h or -? print help
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 11
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 12
MODULAR DIAGNOSTIC SOFTWARE (MODS)
Option Description
-? Display Help
-attrcb_timeslice_flag
enable/disable attribute CB timeslice mode
<string>
-aza_maxsinglewaittime Set the Azalia maximum time to wait at a single
<string> time for simulation
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 13
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 14
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-bgtest_flags <string>
Run a bgtest with various flags, separated by a ','
<string>
-bgvolt <string> <string> Log voltage droop [fbp/gpc/sys] [clks per meas]
<string> <string> [print interval] [read interva].
Blacklists bad physical address pages if memory
-blacklist_pages_on_error
tests hit an error.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 15
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 16
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 17
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 18
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-
enable_ecc_inforom_reporti RM to blacklist pages in InfoROM on ECC error
ng
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 19
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-etmp_range <string> Set min, max degrees Celsius for External temp
<string> sensor sanity-check.
-exit_on_breakpoint_count Exit MODS when the breakpoint count is reached
<string> (0 = dont abort)
-fan_speed <string> Force current gpu devices fan to this pct of max.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 20
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 21
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-global_surface_overrides
Set the GlobalSurfaceOverrides registry key.
<string>
Set the FrameRetries on all glrandom tests (used
-glr_frame_retries <string>
for reporting soft/hard)
-gpu_cache_alloc_policy
Set the GPU cache allocation policy
<string>
-
gpu_cache_promotion_poli Set the GPU cache promotion policy
cy <string>
-gpu_cache_write_mode
0=default 1=writeback 2=writethrough
<string>
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 22
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-h Display Help
-hasbug_override <string>
Override Has Bug.
<string>
SDI Line the GPU Codec is connected to for the
-hd_codec_sdi <string>
HD Codec test
-hw_speedo_override
Override hw speedo value
<string>
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 23
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-
Ignore unexpected gpu interrupts in tests. Please
ignore_unexpected_interru
use with caution.
pts
-int_therm_calibrate
Thermal Calibration of Internal Sensor
<string> <string>
Set the stuck interrupt threshold for
-intr_thresh <string>
ResourceManager.
-ipmi_temp_range <string> Set min and max ipmi temperature range -
<string> trigger errors for tests.
-itmp_range <string> Set min, max degrees Celsius for Internal temp
<string> sensor sanity-check.
Enable single-run JSON logfile to
-json
modsNNNN.log.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 24
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-link_speed_override
override link speed of a perf point
<string>
-link_width_override
override link width of a perf point
<string>
-list_tests List all the MODS tests and their test numbers.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 25
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-maxframes <string> Limit max frames per test (shorten test times).
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 26
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 27
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 28
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-pclk_overclock_pct
Set display PClk overclock percent
<string>
-perlink_aspm <string> sets ASPM for each PEX device. Parameter is
<string> <string> Depth, Loc ASPM, Host ASPM
sets the allowed number of CORR error per PCIE
-perlink_corr_error <string>
node. Parameter is Depth, LocTolerance,
<string> <string>
HostTolerance
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 29
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-pmu_bootstrap_mode
PMU Bootstrap Mode
<string>
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 30
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-pstate_callbacks <string>
Set the PState callback script and function names.
<string> <string> <string>
Do not allow RM to change clocks or initialize
-pstate_disable
perf tables
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 31
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-pwr_range_mw <string> Set min and max power range for a sensor -
<string> <string> trigger errors for tests.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 32
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 33
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-
set_powerrail_leakage_thre set the power rail thresholds
sh <string> <string>
-set_powerrail_voltage
set the power rail voltage at init
<string>
-skip_pertest_pexcheck Skips the init and per GPU test PEX check
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 34
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-spec <string> Use the user specified table for this mode.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 35
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-subsystem <string>
Check the subsystem vendor and device IDs.
<string>
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 36
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-tpc_mask_on_gpc <string> Set TPC enable mask for the given GPC. (usage:
<string> <gpc_num> <mask>)
-trepfile <string> Set the name of the trep (test report) file
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 37
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-verify_fuse <string> Append fuse & spec to the list to be checked for
<string> Test 1 - CheckConfig
Fail if PCIE lanes are different than specified
-verify_lanes <string>
number.
-verify_sku <string> Checks if the fuses are burnt for the given sku
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 38
MODULAR DIAGNOSTIC SOFTWARE (MODS)
Option Description
Using –test on the same invocation with –add, -force or –skip will cause an error, even if
they refer to different tests.
3.6 Installation
Place all distribution package files into single directory.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 39
MODULAR DIAGNOSTIC SOFTWARE (MODS)
On MacOSX, click on the ".tgz" package to unpack it. To run MODS, type "./mods
gputest.js" or another command line in the "Mods.app/Contents/Resources" directory.
Alternately, you can edit "Mods.app/Contents/Resources/mods.arg" to contain the
command line you want to run, then click on the MODS icon
Linux manufacturing MODS requires minimum kernel version of 2.6.16. Version 2.6.29
or newer is recommended for performance reasons. Older versions have not been tested
and may not be working. Kernel 2.4 is not supported. The version of the running kernel
can be established by running:
$ uname -r
Linux manufacturing MODS is a 64-bit application and it requires kernel compiled for
x86_64 architecture. To determine kernel architecture, type:
$ uname -m
The system on which MODS is run must be built on glibc-2.3.2 or newer. To determine
glibc version, type:
$ /lib/libc.so.6
Linux manufacturing MODS includes a kernel module. The purpose of this module is to
expose certain kernel-mode APIs to MODS, which runs as a user-mode application. In
order to be able to install the kernel module, the system must contain configured kernel
sources and development tools, including make and gcc. Without them it is not possible
to compile the kernel module. Use package manager provided by your distribution to
install kernel sources. Typically the package's name is kernel-sources or linux-sources.
For example on Debian, type:
If you run MODS as root, MODS will automatically run the included install_module.sh
script to compile and insert the MODS kernel module. However if MODS is not run by
the root user, it is necessary to install the kernel module, which is recommended.
For successful MODS runs the NVIDIA GPUs in the system must be in their original,
unaltered state, as initialized by VBIOS. This means X must not have been run on the
NVIDIA GPUs prior to running MODS. Please make absolutely sure that the nvidia
kernel module is not loaded, otherwise the system may become unstable. In order to
unload the nvidia kernel module it is necessary to first kill X. Killing X is also
recommended even if it is using vesa or fb driver.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 40
MODULAR DIAGNOSTIC SOFTWARE (MODS)
If not using gnome, type kde or xdm instead of gdb (as applicable).
Some newer Linux distributions include the nouveau driver in the kernel. This driver
performs a kernel mode set and it also supports a framebuffer console. For MODS to
function correctly, this driver has to be unloaded, preferably blacklisted so that it is not
automatically loaded at boot.
Framebuffer consoles are also not recommended, because they may modify memory of
the tested device during the tests. To disable the framebuffer console, edit
/boot/grub/menu.lst and make sure the kernel arguments contain vga=normal instead of
any other value. Make sure they do not contain anything like video=.
Linux MODS relies on a kernel driver to handle cases where it is necessary to use kernel-
mode APIs. The easiest way to install the MODS kernel module is to use the provided
installation script, which you will find in MODS runspace:
$ ./install_module.sh --install
[Note: You can’t install the kernel module from a network directory where the root user
does not have write access. In this case copy install_module.sh and driver.tgz to /tmp
and run it there.]
To find out what group the driver has been assigned to, type:
$ ls -l /dev/mods
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 41
MODULAR DIAGNOSTIC SOFTWARE (MODS)
$ id
If you decide to modify the group in 99-mods.rules, you have to reload the kernel
module:
$ modprobe -r mods
$ modprobe mods
To make sure the kernel module is always loaded when the system starts up, follow
your distribution specific guidelines.
On Debian-based distros (such as Ubuntu) add the mods module name to /etc/modules
if the installation script didn't add it.
On SuSE-based distros add the mods module name to /etc/sysconfig/kernel file in the
MODULES_LOADED_ON_BOOT variable.
The package default.zip can be installed on a target drive (e.g. USB stick) from Windows
XP. The installation procedure is as follows:
Insert the USB stick or make sure the drive where you want to install it is connected.
Format it using FAT32 filesystem. For USB sticks right click on the drive and choose
"Format...". For non-removable drives (such as SATA) you need to go to Control
Panel->Administrative Tools->Computer Management->Storage->Disk Management
and create a partition smaller than 32GB (max size for FAT32) and then format it.
Unzip the modsdisk.zip package to the target drive you've just formatted e.g. by right
clicking on the zip file in Explorer, choosing "Extract all..." and entering drive letter
(e.g. e:) as the destination.
Open command prompt (e.g. Start->Run... and type "cmd<ENTER>").
Go to the target drive by typing drive letter of the target drive and pressing enter (e.g.
"e:<ENTER>").
Install the boot loader (assuming e: is your drive) (for non-removable drives you
might need to also add the -f switch):
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 42
MODULAR DIAGNOSTIC SOFTWARE (MODS)
$ syslinux -m -a e:
File Description
3.9.3 Usage
After you made the disk bootable, insert it or connect it and order BIOS to boot from it.
You can edit and customize the /syslinux/commands file to load additional kernel
modules, initialize networking, send mods.log created by MODS over the network, etc.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 43
MODULAR DIAGNOSTIC SOFTWARE (MODS)
When you boot the Linux distribution it will load Linux and execute commands in
/syslinux/commands which by default will launch MODS with arguments specified in
/mods/args.
The default image will run MODS immediately after boot. To run MODS again:
$ cd /mnt/dos/mods
$ /tmp/mods/mods @args
This is an explanation what happens from the moment BIOS boots from the Linux disk:
First the bootloader locates the /syslinux/syslinux.cfg file and finds where to find the
kernel and the initial ramdisk.
The bootloader loads the kernel (/syslinux/kernel) and the initial ramdisk
(/syslinux/initrd) to memory and uncompresses them.
The kernel boots and initializes all devices it has drivers for.
After the kernel finishes booting it runs the /linuxrc script located in the initial ramdisk.
This is our mods-linuxrc0 script.
The script mounts basic filesystems (/dev, /proc, /sys), finds the drive where the DOS
filesystem is located, mounts it and then mounts the squashed Linux root filesystem. It
also prepares a new ramdisk which will become the new, final root filesystem.
The script executes another, final /linuxrc script from the squashed root filesystem. This
is our mods-linuxrc1 script. At the same time it rotates the root directory so that the final
squashed root filesystem becomes the main root filesystem.
The second linuxrc script again mounts basic filesystems under the new root filesystem,
loads MODS driver and then executes commands from /syslinux/commands.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 44
MODULAR DIAGNOSTIC SOFTWARE (MODS)
Loop N times:
If the golden values do not match, report an error and abort the loop.
Optionally, capture image file(s) in .TGA format for failure analysis.
Each test carefully chooses the random test parameters, i.e. invalid values are avoided,
edge cases are properly covered, and proper weighting is given to more common cases.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 45
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-verify_sku Checks if the fuses are burnt for the given sku
4 EvoCurs Test the cursor rendering circuitry. This test randomly positions
the cursor and performs a DAC CRC to verify if the rendered
cursor is correct. This test cycles through all combinations of
display devices so that all heads get tested.
7 EvoOvrl Test the GPU's overlay video circuitry. This test reads a given
YUV image from specific location with certain size, and renders
it as an RGB image at a specific screen location, pixel size, and
magnification. A DAC CRC is used to verify if the rendered
image is correct.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 46
MODULAR DIAGNOSTIC SOFTWARE (MODS)
17 ValidSkuCheck The purpose of this test is to confirm that it matches a valid sku
configuration. It is used to catch these failures:
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 47
MODULAR DIAGNOSTIC SOFTWARE (MODS)
rectangle.
25 MSDECTest This is a test for the video decompression engine. This engine is
also called “VP3.”
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 48
MODULAR DIAGNOSTIC SOFTWARE (MODS)
validate.
33 GetDisplayConfig Get the display configuration, i.e. print the attached display
devices on each display head.
38 CheckFpGray Displays a special gray image on all flat panels for a visual
inspection. This is an interactive test.
43 SpdifCheck A SPDIF cable check. The support chipset will output SPDIF
signal out of the motherboard. An SPDIF cable coming out of
motherboard should be plugged into the graphics card. Using
the Azalia chipset, this test will output 3 different sampling
frequencies and expect the GPU to see that the sampling
frequency changed.
44 SecTest GPU mfg test for the SEC (SECurity) engine. The SEC engine is
a DMA engine that also handles encryption and decryption, as
required by HD-DVD video data in memory-spaces that might
be accessible to hackers trying to make unlicensed copies of
movies. This tests a randomized sequence of transfers using all
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 49
MODULAR DIAGNOSTIC SOFTWARE (MODS)
45 CheckFpStripes Display a special stripe image on all flat panels for a visual
inspection. This is an interactive test.
50 I2CTest Check if the GPU's external I2C bus is properly equipped with
pull-up resistors.
52 MarchTest This is an alternate way to call the Mats test (see below). This
version does a "marching ones and zeros" memory pattern.
63 Optimus Tests the GPU power down, power up, and re-initialization in
Optimus notebooks
69 CheckHiResCrcs Check that the DAC can handle high-resolution video modes
and still scan them out correctly.
70 PatternTest This is an alternate way to call the Mats test. This version uses
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 50
MODULAR DIAGNOSTIC SOFTWARE (MODS)
71 AppleGL A port of Apple’s OpenGL test. This test only runs on the
Macintosh version of MODS.
2. If the fan RPM at 100% PWM is at least 30% more than the
RPM at 30% PWM.
3. If the fan RPM at 65% PWM is the average of the fan speeds
at 30% PWM and 100% PWM with a 30% tolerance.
81 GLRandomHw Obsolete Test. Replaced by new GLRandom tests. These are test
130 through 141.
TestCrc
83 CheckVbridge Tests for the presence of an SLI video bridge. This test is
enabled with with –check_vbridge command-line option.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 51
MODULAR DIAGNOSTIC SOFTWARE (MODS)
90 FbioLinkTest This is a simple FB memory interface test that uses the GPU's
built-in FBIO training engine to generate the read/write traffic
and count errors. Our other FB tests use either CPU traffic over
the PCI-E bus (Mats) or the 2d engine (FastMats, WfMats) or 3d
engine (RmStress, GLStress). In theory the built-in FBIO
training engine should be usable by runtime resman operations
to adapt a board on each boot. We're prototyping such adaptive
training here. Before the FBIO link training engine starts the test
operation, it blocks all other memory clients and swaps in a
whole alternate set of "tunable" FBIO registers: read and write
strobe timeing and voltage-ref. This allows tuning to proceed
to fairly extreme values without corrupting instance memory.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 52
MODULAR DIAGNOSTIC SOFTWARE (MODS)
93 NewWfMatsShort An alternate run of NewWfMats (see test 94) which runs only
the blit loop with no CPU loop at all.
The duration of the test is controlled by the size of the CPU box
list. By default, the CPU box list contains about 1/8th of all FB
memory. The Coverage property reduces this ratio, reducing
test time. The End property limits the FB memory in total, also
reducing test time.
When each cpu box has been read/written to each pattern in the
CpuPattern list (default is 4 patterns: 0x00000000, 0xffffffff,
0xaaaaaaaa, 0x555555555), the blit loop is stopped and the blit
boxes are checked for errors.
The blit loop boxes are each initially filled with a different
pattern, by default using all 29 patterns supported by the
PatternClass object. Each time the blit loop runs, each box is
copied to the next box 29 times so that at the end of the loop
each box returns to its original pattern.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 53
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 54
MODULAR DIAGNOSTIC SOFTWARE (MODS)
101 Elpg This is the most basic MODS power gating test based on PMU
messages .
For chips that supports power rail gating, this will also attempt
to make sure that the chip can enter and exit power rail gating.
102 DispClkStatic Iterate through all available dispclk perf points and run the
display tests (4, 7, 11) at each disp clock point
103 IntAzalia Some GPUs starting with GT21x have an onboard azalia audio
controller. This test makes sure that that azalia controller is
Loopback working by creating a loopback between different codecs. It
sends an output stream through one codec and will try to
receive through another codec. It will then compare the input
stream to output stream. If the stream cannot be read or was
corrupted, the test fails.
104 PcieLinkTest This test stresses ASLM (link width change) by changing the
PEX link width and throws bursty data on the PEX bus. The
test checks for whether the link width change is successful,
determines the correctness of the data transfer to sysmem and
FB, and also checks whether PCIE errors occurred during the
test.
105 I2CSTest The GPU's internal sensor can act as a slave I2C device. On a
system that has another master (happens sometimes in
notebook), Gpu temperature can be read through I2C read on
the I2CS port. This test checks out if this interface is functional
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 55
MODULAR DIAGNOSTIC SOFTWARE (MODS)
We try to verify that the value read by I2C -> I2CS is the same
as the value read back through RM->register reads
(Subdev.Thermal.ChipTempViaInt).
106 KFuseSanity This checks whether valid HDCP keys were blown into the fuse
block and verifies that the keys are not all zero. It also confirms
that the CRC of the KFuses is correct
The test checks the integrity of the data transfers and whether
PCIE errors accumulated during the test. In addition, since link
width change and link speed change are tied to pstate switches,
this test will also attempt to verify that the correct link speed
and link width are set for each pstate change.
For PState 2.0 and GPU Boost systems, this test has be replaced
by test 145.
110 CudaBoxTest This is a variant of test 3 (MatsTest) that uses CUDA instead of
CPU “dumb framebuffer” accesses to exercise memory.
111 CudaByteTest This is a variant of test 18 (ByteTest) that uses CUDA instead of
CPU “dumb framebuffer” accesses to exercise memory.
112 CudaMarchTest This is a variant of test 52 (MarchTest) that uses CUDA instead
of CPU “dumb framebuffer” accesses to exercise memory.
113 CudaPatternTest This is a variant of test 70 (PatternTest) that uses CUDA instead
of CPU “dumb framebuffer” accesses to exercise memory.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 56
MODULAR DIAGNOSTIC SOFTWARE (MODS)
119 CudaRandom This test uses CUDA to test out all of the single & double
precision mathmetically operations for a given compute
capability. It is designed to verify consistency not accuracy of
these operations.
120 CheckPower This that creates a high graphics processing load on the GPU
and reads values from power controller on a board to check if
Phases all power phases are giving expected output.
122 ElpgGraphics This test toggles graphics ELPG while running GLStress. This is
to make sure that engaging power gating has no effect on the
Stress correctness of graphics operations.
123 NewWfMatsBus This is a memory test that tries to determine if memory failures
occur on read or on writes. The amount of IO done by the GPU
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 57
MODULAR DIAGNOSTIC SOFTWARE (MODS)
124 ElpgVideoStress This test engages video ELPG in the background while running
MSDEC (test 25). The purpose is similar to test 122. It can
additionally toggle Graphics ELPG in the background as well.
This creates noise and larger change of di/dt on the power rail.
125 DeepIdleStress This test runs a version of GLStress that periodically forces a
transition into the Deep Idle low power state. The graphics
operations generated by the OpenGL driver force the transition
out of the low power state. This test requires the VBIOS to
support P-State 12, and the upstream bridge device connected
to the GPU to support L1 ASPM.
126 GLRandomOcg Test the internal OpenGL shader compiler by creating a large
amount of randomly generated vertex/geometry/fragment
programs and then issue random graphics operations through
the OpenGL driver. This test is a derivation of test 16. The
major difference is that a new set of random shaders are created
at the start of each loop instead of each frame. This is a
consistency test not an accuracy test.
127 CudaColumnTest FB memory test for long time retention of data in DRAM cells
with sparse write changes. Designed to expose spurious bit flips
correlated to refresh cycles and content of DRAM.
128 DeepIdleVETest This test validates nvDPS and Deep Idle Video Enabled
functionality. Random rectangles are rendered to the screen via
the 2D engine and periodically the rendering is paused. When
this occurs the nvDPS hardware detects a lack of screen activity
and signals entry into the Deep Idle Video Enabled state (a low
power state with display enabled, and a forced lower refresh
rate). This test has the same requirements as DeepIdleStress,
but in addition it also requires a LVDS or eDP display.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 58
MODULAR DIAGNOSTIC SOFTWARE (MODS)
130 GlrA8R8G8B8 Test the 3-D graphics engine by issuing random graphics
operations through OpenGL driver. It uses normal 32-bpp
color/Z, i.e. a8r8g8b8 & s8d24
131 GlrR5G6B5 Directed OpenGL test for normal 16-bpp color/Z, i.e. r5g6b5 &
d16
137 GlrY8 Directed OpenGL test for 8-bpp color (GL_INTENSITY8) & 32-
bpp Z
139 GlrFsaa4v4 Directed OpenGL test for 32-bpp color, 64-bpp Z, 4v4 VCAA
full-scene-anti-aliasing
140 GlrFsaa8v8 Directed OpenGL test for 32-bpp color, 64-bpp Z, 8v8 VCAA
full-scene-anti-aliasing
141 GlrFsaa8v24 Directed OpenGL test for 32-bpp color, 64-bpp Z, 8v24 VCAA
full-scene-anti-aliasing
142 MultiCellFlipTest This is a directed test which targets specific single-bit failures
found on Hynix memory chips. It attempts to minimize
modifying adjacent cells in the same row when testing target
cells by limiting modifications to dwords from four adjacent
columns (burst length) across all internal banks, external banks,
partitions and lanes. It also modifies a few rows
simultaneously, because it was discovered that row switching
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 59
MODULAR DIAGNOSTIC SOFTWARE (MODS)
146 PexBandwidth This test uses CopyEngine to saturate the PCIE bus
147 GpuGc6Test Test for GC6 feature. Enter GC6 and exit G6 by various wakeup
events. In each loop, verify FB is not corrupted
148 GlrA8R8G8B8Sys Directed OpenGL test for 32-bpp color Z, with render to System
Memory instead of Frame Buffer
150 MMERandomTest This is a test of the Method Macro Expander, which is a unit at
the front end that can generate pushbuffer methods
programmatically via a small simple language. Random MME
programs are generated and their output is routed to a surface
(rather than to host as pushbuffer methods). This output is then
check against the output of a software MME simulator for
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 60
MODULAR DIAGNOSTIC SOFTWARE (MODS)
consistency
151 Bar1Remapper This test validates that the BAR 1 remapper hardware functions
correctly by creating randomized block linear surfaces with
Test known data and then reading them back in a pitch linear
fashion via the CPU with the BAR 1 remapper correctly
configured.
154 CudaL2Mats This tests validates the L2 cache on Fermi and newer GPUs. The
test monitors the number of hits and misses to the L2 cache. In
order for the test to pass, the L2 misses must be under
AllowedMissPercent, which defaults to 10%.
155 EccFbTest This is a test of Frame Buffer ECC logic on ECC-enabled boards.
157 NewWfMatsMem This is a variant of test 94 (NewWfMats) that uses the “memory
ToMem to memory format” engine rather than the “2D rendering
engine” to do framebuffer->framebuffer memory copies. This
engine is less efficient and makes the test run slower, but it is
useful for isolating framebuffer problems on GPUs where the
graphics pipeline is not working correctly.
161 NewWfMatsCEO This is a variant of test 94 (NewWfMats) that uses the Copy
nly Engine rather than the “2D rendering engine” or “memory-to-
memory format” to do framebuffer->framebuffer memory
copies.
174 CheckPwrSensor This test validates that the power sensors on the board matches
the description in boards.js
175 GpuResetTest This test validates the suspend resume functionality of the
GPU. Option for reset via XVE or cold reset (like Optimus)
178 WfMatsBgStress This test runs WfMats on Copy Engine while running GLStress
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 61
MODULAR DIAGNOSTIC SOFTWARE (MODS)
180 NewWfMats Runs WfMats in narrow mode. This causes WfMats blits to be
broken into one-pixel wide blits. This causes lower bandwidth,
Narrow but better exercises the FBIO byte-enable lines.
185 CudaRadixTest Stress the GPU by using radix sort algorithm by Duane Merrill
187 CudaMatsShmoo This test is a variant of test 87 (CudaMatsTest) that iterates with
different input parameters to find the most stressful
Test configuration for a given board.
190 DPStressTest This is a CUDA based double precision test. On some Tesla
system, this test was found to be more stressful than GLStress
191 CudaJuliaTest This is a CUDA based double precision test that generates Julia
set fractal images
225 MSENCTest Test for Video Encoder Engine. This test runs four streams with
different H.264 coding CAVLC and CABAC.
227 CudaColumn This test loops test 127 (CudaColumnTest) with various
parameters in an attempt to find specific types of DRAM faults.
ShmooTest
247 GpuGc5 Test for GC5 feature. Enter/exit GC6 while verifying the
wakeup reason is correct and verify that FB is not corrupted.
275 BoostBaseClockTe GLStress based test to hit Boost target temperature, clocks,
st voltage, and fan speeds.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 62
MODULAR DIAGNOSTIC SOFTWARE (MODS)
293 I2cDcbSanityTest Test to ensure all the I2C devices in the DCB table are stuffed
347 GcxTest New generation of GC6/5 test – intermix the two power saving
states.
When a test begins, the following message is printed to the log file. The portion in
brackets is only printed if the –time command-line option is used.
When a test ends, the following message is printed to the log file.
In this case, 19083 is the error code of the test. “FastMatsTest.Run” is the name of the
test. “golden value miscompare” is a description of the error code. The time in brackets
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 63
MODULAR DIAGNOSTIC SOFTWARE (MODS)
is how long the test took to execute. The execution time is only displayed if the –time
command-line option is used.
P-states are generally 0, 8 or 12, but others are possible. The test numbers start at 1 and
end at 227. Errors are between 1 and 999.
For example, an error code of 201208119083 would mean that the CudaRandom, test 119,
failed in PerfSweep function of test 145, while at index 12 on VF point table, in p-state 8
with an 83 error, which a golden value miscompare.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 64
MODULAR DIAGNOSTIC SOFTWARE (MODS)
The test number 999 is reserved for scripts and tools external to MODS. Error codes like
“999123” are not returned by MODS itself.
If a card fails, take a look at the log, then attempt deduce what could be wrong. If the log
is very long, attempt to look for keywords like “failure” or “error”.
You may attempt to isolate to whether the problem is display related. Adding –
null_display would disable display.
You may attempt to isolate using –test or testspec to find out which test is catching the
problem on the graphics card.
You may attempt to isolate whether the problem is perf related.You may wish to try
these experiments:
Note: Some DDR Drams require that the dramclk be above a certain frequency for the
DLL to work. Furthermore, some products require that you keep dramclk and gpc2clk
in less than a 2:1 ratio.
If memory tests are failing, you can get extra information on the failure in the log file by
using the –matsinfo command-line option.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 65
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 66
MODULAR DIAGNOSTIC SOFTWARE (MODS)
It is not usually necessary to test the entire framebuffer to collect enough error statistics
to be useful. The user can run “mats –c 1” which will test 1% of memory distributed
throughout the framebuffer. This is useful because it will complete in a very short time
and still produce meaningful debug information in the report.txt file.
8.1 HDMI
MODS includes an HDMI test that uses audio loopback. It requires extra hardware and
setup. Since HDMI-audio requires us to send a SPDIF signal into the board, the test
requires a motherboard that meets the following requirements:
The GPU being tested must have an embedded Azalia audio controller.
Connect the headphone jack on the HDMI display to the line-in or mic-in jack on the
motherboard.
Run either "mods gputest.js -engr -test 84" or "mods gputest.js -engr -test 85". (The
latter runs all the usual gputest tests in addition to the hdmi tests.)
There are some known issues due to differences between various motherboards and
displays:
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 67
MODULAR DIAGNOSTIC SOFTWARE (MODS)
1. Try the loopback test without the headphone-jack cable. You should hear a hum as
long as the display volume is high enough. This step ensures MODS is correctly
driving on audio signal over HDMI.
2. There's no telling which port number is used for the line-in or mic-in jack. Try
connecting the audio cable and run MODS with different "-line_in" values (-line_in 0,
-line_in 1, etc) until you get a successful loopback -- or at least an error message that
says something like "frequency mismatch" rather than "unexpected silence".
3. Finally, you may need to adjust the volume of the display. It needs to be loud enough
so that the motherboard can "hear" it, but not so loud that the signal gets clipped. An
"unexpected silence" error indicates the volume's too low, while a "frequency
mismatch" error indicates the volume's too high. (A good starting point is to run the
test while listening to a pair of headphones attached to the display. Find a volume
that can be heard but isn’t painfully loud.)
8.2 HDCP
Some graphics cards support an encryption protocol called HDCP (High Definition
Content Protection). This protocol encrypts data between an HDCP-enabled digital flat
panel and an HDCP-enabled graphics card.
The only way to test HDCP is to enable it with an HDCP-enabled display attached. One
of the goals of MODS is to enforce a textbook-correct test by default. Therefore, an
HDCP test is run automatically on HDCP-enabled cards. The upshot of this is that you
must have an HDCP-enabled display attached when testing an HDCP-enabled card or
MODS will fail. If the user does not want this behavior, then he or she should explicitly
skip HDCP testing using the “-skip 24” command-line argument.
1. A key exchange test. This is done with “mods gputest.js –hdcp_keys”. This test does
not actually enable HDCP, it only does a key exchange and passes if the exchange
was successful. If it passes, the key selection vectors (Aksv and Bksv) are printed in
the log file. There are many types of manufacturing faults that cannot be caught by
this test.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 68
MODULAR DIAGNOSTIC SOFTWARE (MODS)
2. The default HDCP test. This test does the key exchange above, then enables HDCP.
If the hardware detects that HDCP was successfully enabled, then the test passes.
This test will catch most (but not all) types of manufacturing problems. In particular,
there is a rare type of defect that can occur when the key exchange and enabling of
HDCP are both successful, but there will be snow on the screen.
3. The interactive test. This is does the key exchange and enables HDCP, then prompts
the user to ensure that the display looks OK. The key selection vectors and the HDCP
status (pass or fail) are displayed on the screen. This test is enabled with “mods
gputest.js –check_displays”. See section 9.3 below for more information on
interactive display tests.
There are some displays by specific manufacturers that are slow to enable HDCP
encryption. If you are having problems with a specific display, try using the following
command line arguments individually or in combination:
-hdcp_delay 2000
-hdcp_timeout 5000
mods gputest.js -check_display Display a slanted red, white and blue pattern on
the primary display and prompt the user if it is
OK.
mods gputest.js -check_display_bar Display vertical bars on the primary display and
prompt the user if it is OK.
mods gputest.js -check_displays Display a slanted red, white and blue pattern on
all possible display combinations and prompt the
user if each one is OK.
mods gputest.js -check_fp_gray Display various black, white and gray geometry
on all detected DFPs one at a time and prompt
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 69
MODULAR DIAGNOSTIC SOFTWARE (MODS)
By default, -mfg will run tests at each memory-clock setting (pstate) twice: once at max
shader clocks and voltage, and again at min voltage for that pstate. The PerfPoint
testing infrastructure is built on top of test specifications. Test specifications are lists of
tests that control when a given test is run. A simple example of a test specification is
shown below:
We can also control the parameters for each test. We have two additional tests that allow
the user to set the PerfPoint and to run user defined functions.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 70
MODULAR DIAGNOSTIC SOFTWARE (MODS)
spec.AddTest("SetPState", {"InfPts":perfPoints[1]});
spec.AddTest("RunUserFunc", {"UserFunc": SetFanSpeed, "PctOfMax": 100});
addSltPerPStateTests(spec);
}
The standard test ones are visibile in gpulist.js The following contains examples of how
test specifications can be used.
function addSltPerPStateTests(spec)
{
spec.AddTests(["FuseRdCheck"
,"MultiBoardDma"
,"SMRom"
,"ElpgGraphicsStress"
,"DeepIdleStress"
,"RmStress"
,"WfMatsMedium"
,"GLStress"
,"GLStressPulse"
,"NewWfMatsNarrow"
,"GLRandomCtxSw"
]);
addEngrComputeTests(spec);
spec.AddTests(["CheckFbCalib"
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 71
MODULAR DIAGNOSTIC SOFTWARE (MODS)
]);
}
function SetFanSpeed()
{
var rc;
var g = this.BoundGpuSubdevice;
CHECK_RC(g.Thermal.SetCoolerPolicy (Thermal.CoolerPolicyManual));
CHECK_RC(g.Thermal.SetFanSpeed(this.PctOfMax));
Out.Printf(Out.PriHigh, "SetFanSpeed: %d pct, trying for %d pct \n",
g.Thermal.FanSpeed, this.PctOfMax);
return OK;
}
There are two functions to allow users to write out and read back in their own test
specifications.
-savespec will save the identified specification to a filename specified by the user.
-readspec uses the test specification as defined in filename. If readspec is is used, then
do not specify a test specification (-mfg, -slt, etc); this will override the specification in
the user defined file.
Using these two functions, an enduser can write out a test specification, modify it as
they see fit, and then use it for their testing.
Within concurrent MODS, there are two different types of threads: foreground and
background. A background thread is controlled by a foreground thread and will be
stopped once the foreground thread finishes its execution. The specific details of these
background threads can be controlled through the various command-line arguments
listed below.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 72
MODULAR DIAGNOSTIC SOFTWARE (MODS)
Note: By default, concurrent MODS is not turned on. You must specify specific
command-line arguments listed below to enable concurrent MODS.
Any MODS arguments that are NOT device sensitive, must come before any "-dev Y"
are used on the command-line.
The last "-dev Y" used on the command-line will set the primary GPU tested by MODS
unless you're running with the "-concurrent_devices" argument.
-bgfunc X
A device sensitive argument that will run the given function X as a background
thread. Note, this function needs to call "this.SignalSetupCompleteAndWait()" at
some point. It should also check "this.KeepRunning" to know when to stop.
function foo()
// Initial setup
this.SignalSetupCompleteAndWait();
do
// Do stuff
} while (this.KeepRunning);
-bgtest X
A device sensitive argument that will run test number X as a background thread.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 73
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-bgtest_flags X Y
Similar to -bgtest, but takes a comma-separated list of flags for the second
argument
roe - Run On Error, continue running the background test even if it fails
-concurrent_devices
This argument will run the tests (the set of tests can be different per GPU) on
each GPU in the system, concurrently.
-threadid
This argument simply prepends the ID of the calling thread to the beginning of
each line of text in the log. This is useful for seeing which test or GPU printed
which line.
You will see the name and ID of each thread listed with +++ thread_name
thread_ID +++ in the log
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 74
MODULAR DIAGNOSTIC SOFTWARE (MODS)
-skip 24 -skip 17
Run the full MODS suite on all GPUs in the system concurrently
mods gputest.js -mfg -concurrent_devices
Run Random2d and GpuDma on device 0 sequentially, Run Random2d, GLStress and
TurboCipher on device 1 concurrently. Have both GPUs running their set of tests at the
same time.
mods gputest.js -mfg -concurrent_devices -test 58 -dev 0 -test 61 -dev 1
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 75
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 76
MODULAR DIAGNOSTIC SOFTWARE (MODS)
123 Decompressed data differs from expected results. 187 cannot set graphics clock
124 Invalid InfoROM 188 bad dac
125 No display device is connected 189 invalid channel
126 CRC capture failed 190 invalid subchannel
127 Vbios Certificate Error 191 bad format
128 Invalid input 192 put caught up to get
129 Invalid input (driver level) 193 invalid ram amount
130 Invalid input (test level) 194 bad memory
131 cannot allocate event 195 EDVR: system error
132 Robust channel Unexpected Error 196 ECIC: not CIC or lost CIC during command
133 HDCP did not operate properly 197 ENOL: write detected no listeners
134 EDC detected a memory-bus error 198 EADR: board not addressed correctly
135 Encryption and/or decryption of data failed 199 EARG: bad argument to function call
136 Request for Power state change failed. 200 ESAC: function requires board to be SAC
137 invalid window 201 EABO: asynchronous operation was aborted
138 A read/write to a register failed. 202 ENEB: non-existent board
139 Acceptable temperature limits exceeded or the thermal 203 EDMA: DMA hardware error detected
sensor is broken or miscalibrated 204 EBTO: DMA hardware uP bus timeout
140 Unused error code 140 205 EOIP: new I/O with old I/O in progress
141 The only devices found in the system are obsolete 206 ECAP: no capability for intended operation
142 Display mode is not possible 207 EFSO: file system operation error
143 PCI Express bus error 208 EOWN: Shareable board exclusively owned
144 CUDA error 209 EBUS: bus error
145 cuInit failed 210 ESTB: serial poll queue overflow
146 cuDeviceGet failed 211 ESRQ: SRQ line 'stuck' on
147 cuCtxCreate failed 212 ETAB: the return buffer is full
148 cuFuncGetByName failed 213 ELCK: board or address is locked
149 A specific test was requested to run, but was skipped. 214 unknown GPIB Error
150 No tests were run. 215 could not allocate a buffer
151 Primary surface already in use 216 Could not find the specified device
152 USB invalid RhPort 217 pci bios is not present
153 Display HW in use by another test 218 pci function is not supported
154 compute test failed 219 pci invalid vendor identification
155 Test exceeded the expected threshold 220 pci device not found
156 Test exceeded the maximum number of allowed memory 221 pci invalid register number
leaks 222 cpuid instruction is not supported
157 This board needs to be reflashed with different vbios 223 cpu does not support MTRR
158 CRC values are not unique 224 cpu is not supported
159 NVRM display is not ready. 225 invalid register number
160 Resource is reserved by another thread or test 226 invalid address
161 NVRM invalid address. 227 could not map physical address
162 USB Reg_Bits not set as expected 228 could not free physical memory map
163 USB reg not set as expected 229 hardware was not initialized
164 USB setup packet fail 230 invalid graphics aperture base
165 ECC detected a single-bit error 231 invalid graphics aperture size
166 ECC detected a double-bit error 232 wrong bios
167 USB DataIn packet fail 233 bad NVIDIA chip
168 USB DataOut packet fail 234 error occurred while reading or writing serial data
169 registry key not found 235 could not set environment variable
170 registry error 236 the expected value and the destination memory value do
171 incorrect rom version not match
172 golden check found bad pixel, continuing 237 unable to set mode
173 stored golden values have wrong NumCodeBins 238 specified video mode not found in mode timings table
174 golden value miscompare 239 invalid display type
175 invalid z pitch 240 invalid tv standard
176 IRQ not assigned 241 invalid head
177 invalid IRQ 242 failed to set image offset
178 invalid NV base address 243 failed to disable the cursor
179 invalid NV size 244 feature is not supported in the hardware
180 invalid FB base address 245 TIMEOUT: Timeout occurred on WaitSRQ
181 invalid max AGP requests 246 SRQ from Unknown source.
182 cannot set state 247 Javascript method is not defined
183 invalid AGP request dept 248 Bad SOR - CRC miscompare
184 invalid AGP data rate 249 AUDIO all descriptor entries have buffer
185 cannot set pixel clock 250 AUDIO no valid buffer in descriptor.
186 cannot set memory clock 251 AUDIO invalid 16bit sample number.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 77
MODULAR DIAGNOSTIC SOFTWARE (MODS)
252 CANNOT enable Io or Mem Space. 317 Unused error code 317
253 CANNOT enable Bus Master. 318 Unused error code 318
254 MemSize detected an invalid framebuffer size. 319 Unused error code 319
255 AUDIO not any buffer get freed. 320 Unused error code 320
256 MODEM all descriptor entries have buffer 321 Unused error code 321
257 Unused error code 257 322 Unused error code 322
258 MODEM not any buffer get freed. 323 Unused error code 323
259 Golden testname or recname too long. 324 Unused error code 324
260 CODEC NOT ready. 325 Unused error code 325
261 golden value miscompare in instance memory 326 Generic I2C error
262 oven communication error 327 TIMER TEST Invalid Counter number
263 couldn't reach target temperature 328 TIMER TEST No counter value Returned
264 temperature value not valid 329 TIMER TEST timer ticket number doesn't match the
265 CRC error while communicating with oven expected
266 must first initialize oven 330 Audio Invalid Aci Type
267 PMU device failure, operation attempted failed 331 Hardware does not support this FSAA mode
268 Invalid Bar(s) assigned to device 332 Unused error code 332
269 No Sub Devices found 333 Unused error code 333
270 Acoustic test failed, noise too high 334 Unused error code 334
271 Sub Device Index Invalid 335 Unused error code 335
272 Read parameter differs from expected 336 Unused error code 336
273 Clock speed below specified limit 337 Pool CANNOT allocate anymore memory
274 Current MODS version doesn't support this Tegra 338 Pool exceed maxim size
version 339 Pool invalid request size
275 HW entries have run out 340 Pool Invalid address to free
276 HW reports wrong status 341 Buffer mismatch
277 Error bit set in status register after command was issued 342 PMU Test Failed
278 Interrupt status differs from expected 343 Audio Requested channels cannot be enabled
279 No free head available 344 Unused error code 344
280 Power above specified limit 345 Unused error code 345
281 Temperature above specified limit 346 Unused error code 346
282 Performance varies from expected value 347 The Current Codec doesn't have loopback mode.
283 Incorrect OpenGL driver version. 348 Unused error code 348
284 unsupported system configuration 349 Out of date golden file.
285 NVRM buffer too small 350 incorrect chip revision
286 NVRM reset required 351 memory not strapped correctly
287 NVRM invalid request 352 AUDIO Loopback test amplitude mismatch
288 Power is below specified limit 353 Unused error code 353
289 Display underflow detected 354 Unused error code 354
290 Unused error code 290 355 Audio Processing Unit timeout
291 Unused error code 291 356 Audio Processing Unit CRC miscompare
292 Unused error code 292 357 Audio Processing Unit failed to get resources
293 Data too large. 358 Audio Processing Unit error
294 Cannot use loops with PIO channel. 359 Each board description must be unique
295 Must set a jump point before writing a jump. 360 Audio timeout Error
296 Subsequent channel writes wrote over jump location. 361 Unused error code 361
297 No loop to stop. 362 Unused error code 362
298 Usb port not connected to any device 363 Unused error code 363
299 Usb Test Fail at configuration 364 Audio CODEC power down register has wrong value
300 AUDIO Test Fail 365 CRTC FIFO underflow occurred
301 AUDIO Loopback test frequency mismatch 366 The order of commands in the MPEG stream was not
302 Drive test failed correct
303 MODEM Test Fail 367 Found a bad command in the MPEG stream
304 MODEM Loopback test frequency mismatch 368 MPEG hardware sent the wrong number of notifiers
305 incorrect subsystem id 369 Audio Resource Manager initialization failed
306 Ism experiment is not complete 370 bad stereo glasses connector
307 Timed out waiting for MINI Isms to complete 371 Device Register PIO Access not enabled
308 InfoROM not found 372 Device Register Memory Access not enabled
309 Unused error code 309 373 Device DMA not enabled
310 Unused error code 310 374 Not High Speed Device connected to Usb2 port
311 Unused error code 311 375 The user determined that the TV quality was
312 bad index into FancyPicker array unacceptable
313 Unused error code 313 376 Unused error code 376
314 Unused error code 314 377 Unused error code 377
315 Unused error code 315 378 Unused error code 378
316 Unused error code 316 379 Unused error code 379
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 78
MODULAR DIAGNOSTIC SOFTWARE (MODS)
380 Unused error code 380 444 Unused error code 444
381 Unused error code 381 445 Unused error code 445
382 Unused error code 382 446 incorrect mode
383 Unused error code 383 447 incorrect vga windows
384 Unused error code 384 448 File size would become larger tha the implementation
385 Unused error code 385 can support.
386 Unused error code 386 449 File exists but cannot be accessed with given flags.
387 Unused error code 387 450 File write followed a nonblocked write before the latter
388 Unused error code 388 was complete.
389 Unused error code 389 451 File argument isn't valid file descriptor or isn't open for
390 Unused error code 390 writing.
391 Unused error code 391 452 File device or resource is busy.
392 Unused error code 392 453 No child process.
393 Unused error code 393 454 File deadlock.
394 Unused error code 394 455 File open with O_CREAT and O_EXCL set but the file
395 Unused error code 395 already exists.
396 Unused error code 396 456 File bad address.
397 Unused error code 397 457 File is too large.
398 Unused error code 398 458 File operation was interrupted by a signal.
399 Unused error code 399 459 File argument not valid.
400 Unused error code 400 460 File I/O error
401 Unused error code 401 461 The open operation was interrupted by a signal.
402 Unused error code 402 462 The process has too many files open.
403 Unused error code 403 463 Too many file links.
404 Unused error code 404 464 Filename is too long.
405 Unused error code 405 465 The system has too many files open.
406 Unused error code 406 466 No such device in file operation.
407 Unused error code 407 467 No such file or directory.
408 incorrect TV encoder type 468 Exec() format error in file operation.
409 Unused error code 409 469 The system has run out of file lock resources.
410 Unused error code 410 470 Not enough memory for file operation.
411 Unused error code 411 471 Not enough disk space left.
412 Unused error code 412 472 File function not implemented.
413 Unused error code 413 473 File argument is not a directory.
414 Remote Controller Test Not ALL Key were tested. 474 Directory isn't empty.
415 Remote Controller Test Key Pressed Mismatch expected. 475 Inappropriate I/O control operation.
416 Remote Controller Test Register value Mismatch 476 No such device or address in file operation.
expected. 477 File operation not permitted.
417 Network is not initialized. 478 Write to pipe or FIFO that isn't open for reading by any
418 Network cannot create socket. process
419 Network socket cannot bind to the specified port. 479 File on read-only file system and invalid flags are set.
420 Network socket cannot connect to peer. 480 Illegal file seek.
421 Network socket is not connected. 481 Invalid process during file operation.
422 Network socket is already connected. 482 Invalid cross-device link during file operation.
423 Network read error. 483 Unknown file error.
424 Network write error. 484 golden value miscompare on 2nd GPU
425 Network cannot determine host address. 485 golden value miscompare in Z buffer on 2nd GPU
426 A network error has occurred. 486 timeout waiting for notifier from GPU
427 Unused error code 427 487 timeout waiting for notifier from 2nd GPU
428 Data vector size mismatch expected. 488 Cannot access device registers.
429 Data vector value miscompare with expected. 489 the memory or frame buffer interface is marginal
430 error occurred trying to write a call pushbuffer 490 Cannot set AGP data rate.
instruction 491 Cannot set AGP sideband addressing mode.
431 not enough pushbuffer memory 492 Cannot set AGP fastwrite mode.
432 cdrom audio quality was unacceptable 493 Couldn't lock on to the input signal.
433 avpod audio quality was unacceptable 494 Couldn't lock on to the chroma data.
434 tuner audio quality was unacceptable 495 Actual crystal value does not match the strapped crystal
435 Unused error code 435 value.
436 Unused error code 436 496 invalid display mask
437 Unused error code 437 497 failed to get image offset
438 Unused error code 438 498 Invalid device Id
439 vbe call failed 499 SBIOS test failed
440 wrong vbe signature 500 A problem has been detected in the array of tests
441 wrong vbe version 501 Test failed due to an already-known problem.
442 Unused error code 442 502 Invalid Mfgtest test number
443 Unused error code 443 503 Invalid Mfgtest test mode
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 79
MODULAR DIAGNOSTIC SOFTWARE (MODS)
504 Unused error code 504 570 GPU channel software method parameter error.
505 AUDIO Loopback Left and Right Channel Crossed 571 Unused error code 571
506 Unused error code 506 572 The required function is not supported by present
507 Unused error code 507 CODEC.
508 Invalid Chip Version 573 Audio CODEC failure.
509 Not an NV Device 574 Unused error code 574
510 Test Cannot run on this Tegra Chip Version 575 Unused error code 575
511 Required chip library interface not found 576 Audio Test Invalid loopback Mode.
512 Unused error code 512 577 Could not acquire I2C port.
513 Unused error code 513 578 I2C SCL pull-up resistor missing.
514 Unused error code 514 579 I2C SDA pull-up resistor missing.
515 Unused error code 515 580 The auxiliary power connector is not plugged in.
516 Unused error code 516 581 can not generate golden values using an official release
517 Unused error code 517 582 gpu stress test found pixel miscompares
518 Usb Port mapping value is wrong. 583 thermal sensor reports overheating
519 Unused error code 519 584 Unused error code 584
520 Unused error code 520 585 failed to capture internal TV encoder crc
521 Number of Channel and number of input mismatch. 586 the internal TV encoder is bad
522 Unused error code 522 587 Smbus Cannot set DDC base.
523 Unused error code 523 588 invalid EDID
524 Unused error code 524 589 FramLock Test Check Reg Fail
525 Unused error code 525 590 FramLock Test Invalid DispalySync Unit of Invalid
526 Unused error code 526 Displays
527 System Control Invalid IO Base. 591 Unused error code 591
528 Unused error code 528 592 FramLock Test Set display(s) to Master fail
529 Unused error code 529 593 FramLock Test Set display(s) to Slave fail
530 Unused error code 530 594 FramLock Test Loopback Test fail
531 Usb invalid device. 595 FramLock Test Sync Test fail
532 Unused error code 532 596 FramLock Test Sync Test, User and Auto result mismatch
533 Unused error code 533 597 NVRM not supported
534 Unused error code 534 598 Unused error code 598
535 Unused error code 535 599 fan does not seem to cool the chip
536 Unused error code 536 600 Usb failure related to port mapping, port number.
537 Unused error code 537 601 Acpi timer failure.
538 Unused error code 538 602 NVRM bad channel
539 Unused error code 539 603 NVRM timeout
540 Unused error code 540 604 the counter overflowed
541 Unused error code 541 605 the frequency is incorrect
542 Unused error code 542 606 API call never returned
543 Unused error code 543 607 Bad compression-tag-ram in GPU
544 Unused error code 544 608 Interrupt request line stuck asserted
545 Unused error code 545 609 Interrupt request mechanism does not work
546 Unused error code 546 610 Unused error code 610
547 Invalid CPU Frequency measured. 611 Unused error code 611
548 Unused error code 548 612 Invalid value for Tegra configuration variable(s).
549 Unused error code 549 613 Invalid Tegra configuration filename.
550 Unused error code 550 614 Extra golden code miscompare
551 Unused error code 551 615 Extra golden code miscompare on 2nd GPU
552 Real time clock test failed to restore. 616 Unused error code 616
553 Unused error code 553 617 Unused error code 617
554 Graphics fifo method error. 618 Unused error code 618
555 GPU channel fifo software method error. 619 Unused error code 619
556 GPU channel fifo unknown method error. 620 DLL could not be loaded.
557 GPU channel fifo channel busy error. 621 Unused error code 621
558 GPU channel fifo runout overflow error. 622 Unused error code 622
559 GPU channel fifo parse error. 623 Unused error code 623
560 GPU channel fifo PTE error. 624 Error in VBIOS DCB tables.
561 GPU channel fifo idle timeout error. 625 Unused error code 625
562 GPU channel instance lookup failure. 626 Unused error code 626
563 GPU channel debug single-step. 627 Supplied mode not supported by the display.
564 GPU channel missing hardware error. 628 The framebuffer base address register is too small
565 GPU channel software method. 629 Memory leak detected.
566 GPU channel software notify. 630 Perfmon was already running an experiment
567 GPU channel fake error. 631 Memory access spans page boundary.
568 GPU channel scan line timeout error. 632 Memory access to unmapped page.
569 GPU channel vblank callback error. 633 Write access to read-only page.
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 80
MODULAR DIAGNOSTIC SOFTWARE (MODS)
634 Read access to write-only page. 700 Unused error code 700
635 Unused error code 635 701 Unused error code 701
636 could not create a JavaScript property 702 Unused error code 702
637 Invalid clock domain specified 703 Unused error code 703
638 Perfmon could not be reserved 704 Unhook ISR failed
639 Perfmon was not reserved 705 Unused error code 705
640 Unused error code 640 706 Unused error code 706
641 MsiTest of BR02 Failed. 707 Unused error code 707
642 Atapi Test Error 708 Unused error code 708
643 Unused error code 643 709 selected device is not supported
644 Unused error code 644 710 Unused error code 710
645 Unused error code 645 711 Msi is not supported for this device
646 Unused error code 646 712 Cannot enable Intx in Pci Cfg Space
647 Unused error code 647 713 Cannot enable Msi in Pci Cfg Space
648 Unused error code 648 714 Cannot disable Intx in Pci Cfg Space
649 Unused error code 649 715 Cannot disable Msi in Pci Cfg Space
650 Bad RAM in the GPU. 716 Given Cap. is not supported for this device
651 GPU did not get the expected number of lanes 717 Sata Loopback Test fail
652 Unused error code 652 718 invalid starting number of VPEs and/or SHDs
653 Unused error code 653 719 Read parameter differs from expected
654 Unused error code 654 720 Measured Jitter exceeded maximum amount
655 nvrm invalid parameter 721 Failed genlock
656 nvrm too many primaries 722 Non-GL device on GL board
657 Unused error code 657 723 Codec error detected
658 memory size mismatch expected 724 Stream Error Detected
659 wrong number of TPCs detected 725 Ring Buffer Error Detected
660 wrong number of framebuffer units detected 726 Azalia Test failed
661 memory fragment size mismatch expected 727 Unused error code 727
662 wrong number of ROPs detected 728 Ahci Port Error
663 wrong number of shader pipes detected 729 External drive (hardrive, cdRom, est) error
664 wrong number of vertex engines detected 730 ATA Descriptor table is not initialized
665 wrong number of PCI express lanes detected 731 Unused error code 731
666 incorrect feature set for this SKU 732 Unused error code 732
667 could not set NV_PBUS_FS to the desired values 733 External device does not support the function
668 could not meet floorsweeping requirements 734 External device is not found
669 Requested function not supported by Codec 735 Unused error code 735
670 Requested function not supported by Aci 736 Unused error code 736
671 Error testing L2 cache 737 Unused error code 737
672 Unused error code 672 738 Unused error code 738
673 Unused error code 673 739 Unused error code 739
674 NVRM object not found 740 Unused error code 740
675 NVRM gpu is still busy or possibly hung 741 Unused error code 741
676 NVRM card not present 742 Unused error code 742
677 NVRM in use 743 Unused error code 743
678 NVRM invalid access type 744 Unused error code 744
679 NVRM invalid argument 745 Unused error code 745
680 Unused error code 680 746 Unused error code 746
681 NVRM invalid command 747 Unused error code 747
682 NVRM invalid data 748 Unused error code 748
683 Unused error code 683 749 Unused error code 749
684 NVRM invalid method 750 Unused error code 750
685 NVRM invalid pointer 751 Unused error code 751
686 Unused error code 686 752 Unused error code 752
687 NVRM invalid registry key 753 Unused error code 753
688 NVRM invalid state 754 Unused error code 754
689 NVRM invalid string length 755 Unused error code 755
690 NVRM FB Training Failed 756 Unused error code 756
691 method count too large 757 Unused error code 757
692 pushbuffer too small 758 Unused error code 758
693 Unused error code 693 759 Unused error code 759
694 Unused error code 694 760 GPU channel bus master timeout error.
695 Unused error code 695 761 GPU channel display missed notifier.
696 Unused error code 696 762 GPU channel MPEG software method error.
697 Unused error code 697 763 GPU channel ME software method error.
698 Unused error code 698 764 GPU channel VP software method error.
699 Unused error code 699 765 Unused error code 765
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 81
MODULAR DIAGNOSTIC SOFTWARE (MODS)
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 82
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO
WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND
EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR
A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no
responsibility for the consequences of use of such information or for any infringement of patents or other
rights of third parties that may result from its use. No license is granted by implication of otherwise under
any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change
without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA
Corporation products are not authorized as critical components in life support devices or systems without
express written approval of NVIDIA Corporation.
HDMI
HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of
HDMI Licensing LLC.
Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and
other countries. Other company and product names may be trademarks of the respective companies with
which they are associated.
Copyright
© 2011 NVIDIA Corporation. All rights reserved.
www.nvidia.com