Mods PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 83

MODS

MODULAR DIAGNOSTIC
SOFTWARE
FOR 343.X DIAGNOSTICS
MODS.DOCX_R343_v02 | Aug 2013
NVIDIA CONFIDENTIAL | Prepared and Provided Under NDA

Software Documentation
DOCUMENT CHANGE HISTORY

MODS.DOCX_R343_v02

Version Date Authors Description of Change


01 May 30, 2014 Henry Wu Initial Release of R343 mods.docx

02 June 2, 2014 Henry Wu Updated the supported XML files

03 Aug 7, 2014 Henry Wu Add a few more new tests

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | ii
TABLE OF CONTENTS

MODULAR DIAGNOSTIC SOFTWARE (MODS) ...................................6


1.0 INTRODUCTION ............................................................................ 6
1.1 Notice to Users ........................................................................... 7
2.0 USAGE........................................................................................ 7
2.1 Normal Usage ............................................................................ 8
2.2 Interactive Mode ......................................................................... 8
2.3 Return Codes ............................................................................. 8
2.4 Error Logs ................................................................................. 8
3.0 DISTRIBUTION PACKAGE ................................................................. 9
3.1 Version Information .................................................................... 10
3.2 System Requirements .................................................................. 10
3.3 Command-line arguments to MODS .................................................. 11
3.4 Command-line arguments for gputest.js ............................................. 13
3.5 Test Selection ........................................................................... 38
3.6 Installation ............................................................................... 39
3.7 Prerequisites for Running Linux ....................................................... 40
3.8 Installing the Kernel Module ........................................................... 41
3.9 Creating a Linux Disk Image .......................................................... 42
4.0 GPU TESTS ................................................................................. 44
4.1 Test Descriptions........................................................................ 45
5.0 TEST RESULT .............................................................................. 63
5.1 Error Codes .............................................................................. 64
6.0 DEBUGGING TECHNIQUES ............................................................... 65
7.0 STAND-ALONE MATS ...................................................................... 67
8.0 GPU TESTS ................................................................................. 67
8.1 HDMI ..................................................................................... 67
8.2 HDCP ..................................................................................... 68
8.3 Interactive Display Testing ............................................................ 69
9.0 PerfPoint TESTING and test specifications ............................................. 70
10.0 CONCURRENT TESTING ................................................................. 72
10.1 Command-line Arguments ............................................................ 73
10.2 Command-line Examples ............................................................. 75
11.0 ERROR CODES ............................................................................ 76

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | iii
NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | iv
LIST OF TABLES

Table 1. Files distributed with MODS ......................................................... 9

Table 2. MODS command line arguments .................................................. 11

Table 3. Options to gputest.js ................................................................ 13

Table 4. Test selection arguments ........................................................... 39

Table 5. List of GPU tests ..................................................................... 45

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | v
MODULAR DIAGNOSTIC SOFTWARE (MODS)

The information in this document is confidential and is the property of NVIDIA


Corporation. This document may not be distributed without prior NVIDIA
authorization.

1.0 INTRODUCTION
This document describes the NVIDIA Modular Diagnostic Software (MODS). MODS is a
powerful software program that allows users to test NVIDIA hardware. MODS is used
for three primary purposes:
 Chip and board functional validation

 Chip and board failure analysis and debug.

 Architectural verification

This document covers the usage of MODS for graphics and compute products.

GPU MODS is currently supported under the following operating systems


 Linux (2.6 kernel)

 Microsoft Windows 7

 MacOSX

MODS has the following features


 Embedded JavaScript (version 1.7) and ANSI C preprocessor.

 All of the low-level functionality exposed to the scripting language.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02| 6
MODULAR DIAGNOSTIC SOFTWARE (MODS)

 Failure analysis and debug functionality is included — reading and writing of


registers, memory, PIO, and PCI address spaces, clock programming, etc.
 Easy to use and learn.

 One script will run on all supported operating systems without modification.

 Online regular expression help

 Complete embedded OpenGL and CUDA drivers, and resource manager—this is the
same code base that is used in the Linux and Windows drivers.

The MODS GPU manufacturing test suite exercises most but not all of the capabilities of
the NVIDIA hardware. It is assumed that the silicon has undergone a normal screening
process prior to shipping to the customer and that the primary purpose of the test is to
determine if the board manufacturing process has completed successfully and all solder
connections and components are working properly.

1.1 Notice to Users


NVIDIA has discontinued the support of DOS since R290. Please contact your NVIDIA
representative about moving to Linux MODS.

2.0 USAGE
Normally, MODS is invoked by using the command-line:
 mods gputest.js –mfg (for CEM testing)

 mods gputest.js –oqa (for OEM outgoing QA testing)

The difference between these two test options is that the –mfg option runs the full suite
of tests. The –oqa test is a slightly less stressful and quicker suite of the tests optimized
for speed and coverage.

MODS test suite is usually distributed to customers in a package with a part number like
“618-60506-3501-CX0.” These packages have been qualified to test a particular product
and contain release notes and batch files tailored to that card. The directions in those
release notes should be followed instead of running the command-lines above.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 7
MODULAR DIAGNOSTIC SOFTWARE (MODS)

2.1 Normal Usage


Usage:
 mods [options] [file] [JavaScript arguments]

Note that there are two types of options in MODS: those that are arguments to MODS
itself, and those that are arguments to the script. By convention, MODS’ arguments are
usually a single character, but the script arguments are usually many characters.

Example:
 mods –d –C gputest.js –mfg –run_on_error

In the above example, -d and –C are optional arguments to MODS, and –mfg and –
run_on_error are arguments to the script. For more optional arguments to MODS, please
see section 3.3. For JavaScript based arguments (script dependent), please see section 3.4.

2.2 Interactive Mode


MODS has an interactive mode, which can be invoked with mods –s. This is a useful
tool for debugging problems, but its use is beyond the scope of this document. To exit
interactive mode, type “Exit()”.

2.3 Return Codes


MODS will return 0 to the shell under normal operation. If an error occurs, MODS will
return non-zero error code to the shell. On Windows 7, MODS will also set the
“MODS_EC” environment variable to the error code.

2.4 Error Logs


By default, MODS produces a human-readable log file named “mods.log.” In presence
of a “-json” script argument, mods also produces a “mods.jso” log file. The mods.jso is
another version of the same data expressed in a markup language called JSON. (JSON is
short for JavaScript Object Notation.) This second file is written to make it easier for
automated upstream tools to analyze the result of the run.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 8
MODULAR DIAGNOSTIC SOFTWARE (MODS)

3.0 DISTRIBUTION PACKAGE


Normally, MODS is invoked by using the command-line:

mods gputest.js –mfg

MODS is distributed with the following files:

Table 1. Files distributed with MODS

File Description

cuda.bin default.bin Binary files used by various tests.


msdec2_e.bin msdec2_m.bin
msdec4_l.bin msdec4_s.bin
msdec_pi.bin msdec_v.bin
vic_data.bin vp2_stre.bin

mats Stand-alone memory test on Linux only. See


section 7.0 of this document for more information.

mods The main MODS binary.

cur_comm.he dev_p358.he Precompiled JavaScript header files.


drf.he fpk_comm.he
glr_comm.he mods.he

arghndlr.jse boards.jse Precompiled JavaScript files.


boostbase.jse comnargs.jse
comngpu.jse comnmcp.jse
comnmods.jse comnprnt.jse
comntest.jse cudatest.jse
dprun.jse edid.jse fileid.jse
glrandom.jse gpuargs.jse
gpudma.jse gpulist.jse
gputest.jse gshmoo.jse
intrutil.jse jsthread.jse
mods.jse mods_eng.jse
p358.jse prntutil.jse pstate.jse
pvstest.jse random2d.jse
shmclass.jse shmoo.jse

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 9
MODULAR DIAGNOSTIC SOFTWARE (MODS)

testlist.jse thermcal.jse
tofile.jse tunetrim.jse
tunevolt.jse boards.dbe

mods.pdf This document

quickref.pdf MODS quick reference document

relnotes.txt Release notes. These are updated with every


MODS release.

gm204_f.xme Precompiled XML files containing GPU-specific


information.

3.1 Version Information


The MODS version may be obtained by running the following command.
 mods –v

The version is in the following format XX.YY where XX is the major version number,
and YY is the minor version number. MODS uses NVIDIA’s “unified software
architecture” and much of the code base is shared with the drivers. A version of MODS
with the version XX.YY (e.g., 195.5) has a lot of shared code with a driver that also starts
with XX (e.g., 195.10).

3.2 System Requirements

Linux
 Intel or AMD CPU with AMD64 support

 4GB or more of system memory.

 Linux kernel 2.6.16 or newer

Apple Macintosh
 x86-based Macintosh. PowerPC-based systems are no longer supported.

 4GB or more of system memory.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 10
MODULAR DIAGNOSTIC SOFTWARE (MODS)

3.3 Command-line arguments to MODS


Table 2. MODS command line arguments

Option Description

-a append to log file

-c reference display reference

-D writes ‘debug’ level output to debug.log

-d set 'debug' level output

-e script execute script

-F string set a filter for serial and circular sinks

-g do not log return codes

-h or -? print help

-i file import JavaScript file

-l file log file name; do not log if file is 'null'

-L only write the log file if there is an error

-m script execute script before main()

-n script execute script after main()

-o do not run main()

-P enable circular buffer, set to 'debug' level & dump


on exit

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 11
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-r record user input

-redir file redirect standard output to file

-R remote user interface (run over network)

-s script user interface

-S level enable serial sink and set it its level (from 1 to 4)

-t macro user interface

-T remote terminal user interface (telnet)

-U ip port remote terminal user interface (client mode)

-w raw user interface

-v print MODS version

@<filename> fetch command line arguments from <filename>

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 12
MODULAR DIAGNOSTIC SOFTWARE (MODS)

3.4 Command-line arguments for gputest.js


If no script file is specified, mods.js is used. If no log file is specified, mods.log is used.
MODS parses the specified script file and any imported script files, and then executes
the script method main(). You may optionally specify begin() and end() methods that are
guaranteed to be called before and after main(), respectively.

Table 3. Options to gputest.js

Option Description

--args <string> Display Help for a particular subsection

-? Display Help

-add <string> Add the specified test(s).

-allow_ot_events <string> Ignore this many overtemp events.

-alt_mini_settings <string> Specify alternate settings for MINI speedos


<string> <string> <string> [startBit=0] [oscIdx=0] [outDiv=0] [mode=0].

-arg <string> Pass to wrapper scripts

-asr_enable <string> Enable or disable ASR. 0=disable, 1=enable

-assume_battery_power tell RM that we are on battery power.

-attrcb_timeslice_flag
enable/disable attribute CB timeslice mode
<string>
-aza_maxsinglewaittime Set the Azalia maximum time to wait at a single
<string> time for simulation

-aza_timescaler <string> Set the Azalia Timescaler for Simulation

-begin_dump_addr <string> Start BAR0 address to log.

-bg_ext_temp <string> Start bg external therm sensor monitor and sets


<string> print and read intervals

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 13
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Start bg external therm sensor monitor and sets


-bg_ext_temp_flush
print and read intervals, flushing to disk after
<string> <string>
every print
Start bg fan RPM monitor and sets print and read
-bg_fan <string> <string>
intervals
-bg_int_temp <string> Start bg internal therm sensor monitor and sets
<string> print and read intervals
-bg_int_temp_flush <string> Start bg internal therm sensor and sets print and
<string> read intervals, flushing to disk after every print
-bg_ipmi_temp <string> Start bg ipmi therm sensor monitor and sets print
<string> and read intervals
Start bg ipmi therm sensor monitor and sets print
-bg_ipmi_temp_flush
and read intervals, flushing to disk after every
<string> <string>
print
Start bg power monitor and sets print and read
-bg_power <string> <string>
intervals
-bg_power_flush <string> Start bg power monitor and sets print and read
<string> intervals, flushing to disk after every print
-bg_smbus_temp <string> Start bg smbus therm sensor monitor and sets
<string> print and read intervals
Start bg smbus therm sensor monitor and sets
-bg_smbus_temp_flush
print and read intervals, flushing to disk after
<string> <string>
every print
-bg_tsosc <string> <string>
Poll tsosc speedo [countSel] [clks per meas]
<string> <string> <string>
[outDiv] [adj] [print interval] [read interval].
<string>
Start Background Thermal monitor (including
-bg_volterra <string>
Volterra slave devices) and sets print and read
<string>
intervals
Start Background Thermal monitor (including
-bg_volterra_flush <string>
Volterra slave devices) and sets print and read
<string>
intervals, flushing to disk after every print
Start background GLStress test on all other
-bgdev
devices.

-bgfunc <string> Start the given function as a background there.

-bgstress Start the background 3d stress task.

Start Background Thermal monitor and sets print


-bgtemp <string> <string>
and read intervals
-bgtemp_flush <string> Start Background Thermal monitor and sets print
<string> and read intervals, flushing to disk after every

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 14
MODULAR DIAGNOSTIC SOFTWARE (MODS)

print

-bgtest <string> Loop this test number in a background thread.

-bgtest_flags <string>
Run a bgtest with various flags, separated by a ','
<string>
-bgvolt <string> <string> Log voltage droop [fbp/gpc/sys] [clks per meas]
<string> <string> [print interval] [read interva].
Blacklists bad physical address pages if memory
-blacklist_pages_on_error
tests hit an error.

-blcg Block Level Clock Gating

-blcg2 <string> Block Level Clock Gating.

-blcgIdleCGEnable Block Level Clock Gating to enable IdleCG

-blcg_idle Force BLCG Idle settings

-blcg_off Block Level Clock Gating Off.

-blcg_quiescent Force BLCG Quiescent settings

-blcg_stall Force BLCG Stall settings

Blink lights (keyboard LEDs for example) to


-blink_lights
show mods is not hung.

-boot_0_strap Set Boot 0 strap

-boot_3_strap Set Boot 3 strap

-bus_width <string> Test for specific framebuffer bus width

-check_display Run the CheckDisplay test.

-check_display_bar Run the CheckDisplayBar test.

-check_display_bars Run the CheckDisplayBars test.

-check_displays Run the CheckDisplay test on all displays.

Check whether particular version of MODS


-check_driver_ver <string>
kernel driver is installed

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 15
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-check_ecc_on_init Check ECC errors on init and fail if any exist

Test for a specific set of feature bits (3 args for 96


-check_features <string>
bits).
-check_features3 <string> Test for a specific set of feature bits (3 args for 96
<string> <string> bits).

-check_fp_gray Run the gray CheckDisplays test on flat panels

-check_fp_stripes Run the stripe CheckDisplays test on flat panels

Test an explicit HDMI mode. Pass -check_hdmi ?


-check_hdmi <string>
for details.

-check_hotplug Run the HotPlug/Unplug test.

Check whether MODS is running on the


-check_kernel_ver <string>
specified kernel
-check_linkspeed <string> Test if RM initialize the PexDev device and
<string> downstream device at right speed
-check_linkwidth <string> Test if RM initialize the PexDev device and
<string> <string> downstream device with the right width

-check_pxl <string> Test for explicit number of PCI-X lanes

-check_sm_mask <string> Check SM enable mask for specifed tpc.

-chipset_aspm <string> Configure chipset ASPM setting.

-clock_slowdown <string> Clock slowdown-global, nv, host, and thermal

-cml_training <string> Enable/Disable types of CML training.

-cmos_training <string> Enable/Disable types of CMOS training.

-compute Run the compute manufacture tests.

If true, will run MODS on multiple GPUs


-concurrent_devices
concurrently and synchronize test starts
-
Abort testing on all devices when one device fails
concurrent_devices_abort_o
during concurrent testing
n_error
If true, will run MODS on multiple GPUs
-concurrent_devices_sync
concurrently and synchronize test starts

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 16
MODULAR DIAGNOSTIC SOFTWARE (MODS)

tolerance for number of correctable error for each


-corr_error_tol <string>
pcie link

-count_edc_error Count EDC errors during mods tests

-csum_report Report checksum differences for quals.

-cuda_in_sys Put CUDA in system memory.

-cvb_check_ignore Ignore the cvb check

Digital display scaler mode; native, scaled or


-dd_scaler_mode <string>
centered.
Print the test name and error description given a
-decode_error <string>
code.

-deep_idle_pstate <string> Set the pstate to go into deep idle.

-def_powerrail_funcs Import and specify the Set/Get power rail


<string> <string> <string> functions

-detect_dfp Detect a DFPs resolution to find goldens

-dev <string> Select a resman device

-device_id <string> Check the device ID

-disable_bc Disable broadcast

-disable_def_img Disable the default image on display.

-disable_dpu_sc_dma Disables DPU SC DMA feature

-disable_edc Disable EDC

-disable_elpg_on_init Disable Elpg on Init for all engines

Disable printing of `FAIL message in big bold


-disable_fail_message_print
red block if mods fails

-disable_fandiag <string> Do not initialize fan-diag

-disable_mods_console Disable the MODS console

Disable the big PASS/FAIL message at the end of


-disable_passfail_msg
mods.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 17
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-disable_pgob Disable power gate on boot

-disable_pstate20 Tell RM to disable PState 2.0 on all boards.

Set regkey for RM to disable IsModePossible


-disp_ignore_imp
functionality.

-dispclk <string> Display clock in mhz.

-display <string> Run the tests on the specified display.

Combined mask of displays to enable clone


-display_clones <string>
mode on.

-display_config Get the display configuration.

Only send display updates explictily requested


-display_manual_updates
by tests

-dramclk <string> DRAM clock in MHz.

-dramclk_percent <string> Set dramclk to X % of default. 50 <= X <= 150

Dump a .PNG file on Golden Store and Error


-dump_png
events.
Enable (1 or 2) or disable (0) dynamic engine
-dyn_eng_ctrl <string>
control.
-dynamic_mempool Enable or disable dynamic mempool. 0=disable,
<string> 1=enable

-early_exit_on_err_count Exit tests early when error count is violated

ECC Asynchronous scrubbing: Enable = 1,


-ecc_async_scrub <string>
Disable=0
-ecc_dbe_tol <string> tolerance for number of ECC double bit errors,
<string> expects <fb, l2, l1, sm> <tol>

-ecc_fuse_ignore Ignore the ecc fuse

-ecc_sbe_tol <string> tolerance for number of ECC single bit errors,


<string> expects <fb, l2, l1, sm> <tol>
Bitmask for ECC verbose reporting (bit 0 = print
-ecc_verbose <string>
on checkpoint, bit 1 = print on count change)
tolerance for maximum of EDC error rate,
-edc_rate_limit <string>
Maximum value 0xFFFFFF

-edc_tol <string> tolerance for overall number of EDC CRC errors

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 18
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Bitmask for EDC verbose reporting (bit 0 = print


-edc_verbose <string>
on checkpoint, bit 1 = print on count change)

-elcg <string> Engine Level Clock Gating.

-elcg_off Engine Level Clock Gating Off.

-elpg_idle_thresh <string> Set the PowerGate Idle threshold (in clocks)


<string> (usage : <gr|vid|vic|ms> clocks)

-elpg_mask <string> Mask for enabling ELPG

-elpg_off Engine Level Power Gating Off.

-elpg_ppu_thresh <string> Set the PowerGate Post Powerup threshold (in


<string> clocks) (usage : <gr|vid|vic> clocks)

-enable_aelpg Enable AELPG by setting RM registry.

-enable_clk2 <string> Enable Clocks 2.0 in Resman

-
enable_ecc_inforom_reporti RM to blacklist pages in InfoROM on ECC error
ng

-enable_gen2 Allow RM to transition PEX speed to Gen2.

-enable_gen3 Allow RM to transition PEX speed to Gen3.

Enable or disable HDA engine. 0=disable,


-enable_hda <string>
1=enable
Enable No Snoop of the host/FIFO through
-enable_no_snoop
registry-control
Enable nvDPS (hardware detection of screen
-enable_nvdps
activity)

-enable_pgob_compute Enable power gate on boot for compute

-enable_pgob_dfma Enable power gate on boot for DFMA

-enable_pgob_tex Enable power gate on boot for tex

Enable RM functionality for generating


-enable_replayable
replayable log files

-enable_ticks Enable ticks in the OpenGL driver

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 19
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-end_dump_addr <string> End BAR0 address to log.

-errcode_test_offset <string> Add this to testnumber in error codes.

-etmp_range <string> Set min, max degrees Celsius for External temp
<string> sensor sanity-check.
-exit_on_breakpoint_count Exit MODS when the breakpoint count is reached
<string> (0 = dont abort)

-ext_banks <string> Test for explicit number of external banks

Forces RM to use external heap mgmt (similar to


-external_heap
Vista).
Check for CORR Pex Errrors inside supported
-extra_pexcheck
GpuTests.
-fail_critical_fb_range
make FB errors in this range critical
<string> <string>

-fan_skip_unload <string> Bypass fan state restoration for ogtests

-fan_speed <string> Force current gpu devices fan to this pct of max.

-fb_gddr5_x16 Enable GDDR5 x16 FB mode

-fb_gddr5_x8 Enable GDDR5 x8 FB mode

Set dram overclock during retest to determine if


-fbi_check <percent> a test failure is due to FB problems. Default 15,
set to 0 to disable this feature.

-force <string> Add and Force the specified test(s).

-force_SMbus_temp Force displaying / checking temperature with


<string> <string> <string> Smbus external sensor.
Force displaying / checking temperature with
-force_canoas_temp
external sensor read through Canoas
<string>
microcontroller.
Force all surfaces and pushbuffers to coherent
-force_coh
(cached sysmem).

-force_ecc_L1 Force enable ECC L1

-force_ecc_SM Force enable ECC S

Force displaying / checking temperature with


-force_ext_temp <string>
external sensor.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 20
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-force_fb Force all surfaces and pushbuffers to FB memory.

Force only GL tests to use coherent (cached


-force_gl_coh
sysmem).

-force_hdmi_info Force sending HDMI info frames.

-force_head_routing Specify head indexes for active displays - 4 bits


<string> per head.
Force displaying / checking temperature with
-force_int_temp <string>
internal sensor.
-force_ipmi_temp <string> Force displaying / checking temperature with
<string> <string> Ipmi external sensor.
Force all surfaces and pushbuffers to
-force_ncoh
noncoherent (uncached sysmem).
Force sysmem surfaces and pushbuffers to
-force_ncoh_sysmem
noncoherent (uncached sysmem).

-force_repost Force repost of the GPU.

Forces the use of small pages while allocating


-force_small_pages
framebuffer memory.
Print to VGA screen regardless of whether user
-force_vga_print
interface is enabled or disabled
Dont send any display commands to HW.
-foreign_display Instead copy rendered output to a foreign
display, not recognized by RM.
-foreign_display_dev Device index of device on which to set up foreign
<string> display (default is 0).
-foreign_display_fps Number of frames per second copied to foreign
<string> display.

-freq_offset_khz <string> Shift the VF curve by y Khz.

-fspg <string> Floorsweep Power Gating.

-fspg_off Floorsweep Power Gating Off.

-full_gc5 GC5 instead of GC5 minus

-full_power Full power mode

-full_power_display Full power for display

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 21
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-fullpower Full power mode

Force surfaces other than Z/color to SysMem


-gl_force_mem_space
only.

-gl_force_sysmem_buffers Force render to SysMem only.

-gl_no_zbc Disable GL zero-bandwidth-clear updates.

Ask GL driver not to flush the GLS channel


-gl_one_channel
(reduces context switching).

-glkey <string> <string> Set a registry key for OpenGL.

-global_surface_overrides
Set the GlobalSurfaceOverrides registry key.
<string>
Set the FrameRetries on all glrandom tests (used
-glr_frame_retries <string>
for reporting soft/hard)

-glrandom Run the GLRandom tests

-glsbg <string> Run GLStress in background on dev N

-goldenfile <string> Specify the golden value file

-gpc2clk <string> GPC2 clock in mhz.

Sets Gpc2 clock (and related perf parameters in


-gpc2perf <string>
PState 2.x)

-gpc_mask <string> Set GPC enable mask.

-gpcclk <string> GPC clock in mhz.

-gpio_activity <string> enables GPIO activity monitor. User provides the


<string> <string> <string> edge to look for + threshold number

-gpu_aspm <string> Configure GPU ASPM setting.

-gpu_cache_alloc_policy
Set the GPU cache allocation policy
<string>
-
gpu_cache_promotion_poli Set the GPU cache promotion policy
cy <string>
-gpu_cache_write_mode
0=default 1=writeback 2=writethrough
<string>

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 22
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-gpu_codec <string> Azalia port number for gpu output

-gpu_dma_tm <string> Run GpuDmaTest in a particular mode

-gpu_num <string> Set which graphics processor to test (default 0).

-gpu_out <string> Azalia codec index for gpu output

Set context switching mode. 1=hybrid, 2=hw,


-grctx <string>
3=sw
Set graphics register init override behavior.
-grreginitoverride <string>
1=prod diff

-h Display Help

-hasbug_override <string>
Override Has Bug.
<string>
SDI Line the GPU Codec is connected to for the
-hd_codec_sdi <string>
HD Codec test

-hdcp_adksv_only Only check HDCP A and D keys

-hdcp_delay <string> Settle time for CheckHDCP test

-hdcp_keys Get the HDCP keys. Not a complete test.

-hdcp_loops <string> Settle time for CheckHDCP test

-hdcp_skip Dont do HDCP for interactive display tests

-hdcp_timeout <string> Timeout for negotiating an HDCP connection

-hdmi_fft_ratio <string> Set signal energy ratio to determine pass/fail

-help Display Help

-hostclk <string> Host clock in MHz.

-hw_speedo_override
Override hw speedo value
<string>

-hybrid_fm_rev <string> firmware revision override

Prompt user for test ID, which is written to the


-id
logfile.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 23
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-iddq_check_ignore Ignore the iddq check

-idle_channels_retries Retry count when idling channels using polling


<string> idle (default: 0)

-idle_slowdown <string> Sets override for IDLE slowdown settings.

Ignore a gpu family (curie, tesla, fermi, kepler)


-ignore_family <string>
during MODS init

-ignore_fatal_errors Ignore Fatal PEX errors

-ignore_gr_checksum Do not use graphics checksum for identification

-ignore_ot_event Ignore thermal overtemp events.

-
Ignore unexpected gpu interrupts in tests. Please
ignore_unexpected_interru
use with caution.
pts

-inst_in_sys Put instance memory in system memory.

-int_therm_calibrate
Thermal Calibration of Internal Sensor
<string> <string>
Set the stuck interrupt threshold for
-intr_thresh <string>
ResourceManager.
-ipmi_temp_range <string> Set min and max ipmi temperature range -
<string> trigger errors for tests.
-itmp_range <string> Set min, max degrees Celsius for Internal temp
<string> sensor sanity-check.
Enable single-run JSON logfile to
-json
modsNNNN.log.

-json_append <string> Enable multi-run JSON logfile, appending to file

-json_clobber <string> Enable multi-run JSON logfile, overwriting file

Enable single-run JSON logfile, and specify


-json_name <string>
filename template

-kickoff_thresh <string> Set channel auto-flush threshold.

-l2_mode <string> Sets the L2 Mode

-legacy_interrupts Use legacy GPU interrupts.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 24
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-legacyclk <string> Legacy clock in mhz.

-line_in <string> Azalia port number of the LINE-IN audio jack

-line_in_codec <string> Azalia codec index for LINE-IN

-link_speed_override
override link speed of a perf point
<string>
-link_width_override
override link width of a perf point
<string>

-list_errors List all the MODS errors.

-list_tests List all the MODS tests and their test numbers.

-log_file_limit_mb <string> Limit of the log file size in Mb.

-log_imp Capture IsModePossible log

-log_imp_io Dump IsModePossible I/O

-logcmp Dump logcmps.

-loops <string> Loop the tests count times.

-low_power Lower Power Settings

-lowest_power Lowest Power Settings

-lowestpower Lowest Power Settings

-lowpower Lower Power Settings

-ltc2clk <string> LTC2 clock in mhz.

-ltcclk <string> LTC clock in mhz.

-lvds_loop_clk <string> Pixel Clock frequency in LVDS loopback test.

Override Mats memory coverage percentage (0


-mats_cov <string>
to 100), default is 10.

-mats_rd_delay <string> delay in NS before a fb Read

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 25
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-mats_wr_delay <string> delay in NS before a fb Write

-matsinfo If a mats-derived test fails, print out more info

-max_ext_minus_int Set max diff of (external - internal) temperature


<string> value, in degrees C (default 7)
-max_int_minus_ext Set max diff of (internal - external) temperature
<string> value, in degrees C (default 7)
-max_pwr_range <string> Set the maximum power range on <sensor>
<string> <string> <min> <max>
Set max internal vs. external temperature
-max_temp_diff <string>
mismatch, in degrees C (default 7)

-maxframes <string> Limit max frames per test (shorten test times).

If a mats-derived test fails, print at most that


-maxmemerr <string>
many errors.

-maxwh <string> <string> Set max screen resolution (w, h only).

Enable multiboard (SLI) modem with device N


-mboard <string>
as master

-memqual Enable memqual specific (RM) behavior.

Disable debug messages from mods modules.


-message_disable <string> Separate with colons, e.g. -message_disable
ModsCore:ModsNvGpu
Enable debug messages from mods modules.
-message_enable <string> Separate with colons, e.g. -message_disable
ModsCore:ModsNvGpu

-mevp_mask <string> Set MEVP enable mask.

-mfg Run the board manufacturing tests.

-mfg2 Run the board manufacturing tests.

-min_fb_mem_percent Minimum fraction of framebuffer memory that


<string> needs to be allocated
Enable or disable minimum mempool. 0=disable,
-min_mempool <string>
1=enable

-mobile Tell RM that this is a Mobile GPU

-mode <string> <string>


Set mode X x Y @ Z bpp at W Hz.
<string> <string>

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 26
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-monitor <string> Monitor the GPU status every X milliseconds.

Run MPEG tests out of framebuffer rather than


-mpeg_in_fb
AGP.

-msdclk <string> MSD clock in mhz.

-msi_interrupts Use MSI protocol for GPU interrupts.

-multiheap_en enable MODS multi-heap code

-must_be_supported Treat unsupported tests as errors

-no_autoflush Disable auto-flush of channels.

-no_backdoor Disable FB backdoor.

-no_compress Disable FB compression.

indicate to the RM to not use the dynamic


-no_dynamic_mempool
mempool.

-no_ecc_fb_scrub Don’t scrub FB

-no_ext_power Do not check if external power is connected.

do not restore floorsweeping state on GPU


-no_fs_restore
Subdevice shutdown

-no_gart Disable NVGART.

-no_gen2 Disallow RM to transition PEX speed to Gen2.

-no_gen3 Disallow RM to transition PEX speed to Gen3.

-no_glext <string> disable one feature at a time

-no_gold Do not load golden values.

-no_golden_dma Use CPU reads for surface readbacks/CRC.

-no_inst_in_sys Keep instance memory in FB.

Do not use the sequencer when changing the


-no_mseq
memory clock

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 27
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-no_pex_aspm Disable PCI-E ASPM.

-no_pstate_lock_at_init Do not lock to a pstate at initialization

-no_pte_in_sys Keep page table entries in FB.

-no_rc Disable robust channels.

-no_rcwd Disable robust channels watchdog.

Allow glr_display to pass even if we cant FOS


-no_require_fos
because we are on head 1.

-no_restore_aspm Dont restore ASPM on exit.

-no_restore_clocks Dont restore clocks on exit.

-no_shader_cache Disable GL binary shader cache.

do not load sim symbols when dumping sim


-no_sim_symbols
stack

-no_sse_memcpy Disallow SSE 16-byte reads in memcpy.

-no_stack_dump do not dump call stack

Disables temperature range check after each


-no_temp_range
gputest

-no_thermal_slowdown Disable thermal slowdown.

Dont use the TwoD class if WfMats, (default for


-no_twod_in_wfmats
>= G80)

-no_vga Disallow VGA text mode.

-no_wc_linux Disable write-combining.

-no_zcull Disable Zcull.

Force all tests except Class1774 and Class3174 to


-non_coherent
use NonCoherent memory.
-nonconcurrent_test Run this test one device at a time with -
<string> concurrent_devices.
Get ready to run tests, but don't actually run
-notest
them.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 28
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-notiled Do not use tiled surfaces.

-null_display Dont send any display commands to HW.

Set the graphics core voltage in millivolts on


-nvvdd <string>
PState 1.0 boards.

-oceb_size <string> Override OCEB Size

-old_gold Using old golden values.

Ignore gpus not in this family (curie, tesla, fermi,


-only_family <string>
kepler) during mods init
Make RM see only the specified PCI device
-only_pci_dev <string>
(format is bus:dev.func - all in hex)
set RM regkey to override the Optional Power
-opsb_override <string>
Saving Bundle fuse/vbios values

-oqa Run the outgoing quality assurance tests.

Temporary hack to WAR the fact that testgen


-outputfilename <string>
always passes outputfilename

-override <string> Execute the script file to override GPU settings.

-override_fb_size <string> New FB Size in MB.

-pclk_overclock_pct
Set display PClk overclock percent
<string>
-perlink_aspm <string> sets ASPM for each PEX device. Parameter is
<string> <string> Depth, Loc ASPM, Host ASPM
sets the allowed number of CORR error per PCIE
-perlink_corr_error <string>
node. Parameter is Depth, LocTolerance,
<string> <string>
HostTolerance

-pex_crc_tol <string> tolerance for number of PEX Crc errors

-pex_l0s_tol <string> tolerance for number of PEX L0s Failed exits

-pex_line_error_tol <string> tolerance for number of PEX line errors

tolerance for number of PEX NAK Recieved


-pex_nak_rcvd_tol <string>
errors

-pex_nak_sent_tol <string> tolerance for number of PEX NAK Sent errors

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 29
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-pex_non_fatal_rate_tol Tolerance for number of non fatal PEX rate

-pex_verbose <string> Enable verbose PEX reports

-pg_offload <string> Override the PG log operational parameters

-pg_pr_gate_enable Set the PowerGate Power Rail Gate enable (usage


<string> <string> : index <0|1>)
-pg_pr_idle_thresh <string> Set the PowerGate Power Rail Idle threshold (in
<string> clocks) (usage : index clocks)
-pg_pr_predictive_thresh Set the PowerGate Power Rail Predictive
<string> <string> threshold (in clocks) (usage : index clocks)
-pgctrl_abort_timeout Override the abort timeout value used to abort
<string> PG ON process.

-pgctrl_parameters <string> Override various PG Controllers settings.

-pglog_parameters <string> Override the PG log operational parameters

-pglog_surface <string> Override the PG log surface attributes

-pgob_mask <string> Set bitmask for power gate on boot

-pll_settle_time <string> PLL settle time in nanoseconds

-pmu_bootstrap_mode
PMU Bootstrap Mode
<string>

-pmu_force_phys <string> Force all PMU memory mappings to be physical

Location of the PMU instance block


-pmu_instloc_inst <string>
(def/coh/ncoh/vid)
-pmu_instloc_ucode Location of the PMU ucode surface
<string> (def/coh/ncoh/vid)
-pmu_ucode_addrmode
PMU Ucode Addressing Mode
<string>

-poll_hw_hz <string> Max frequency at which to poll hw registers.

-poll_interrupts Poll for GPU interrupts.

-power_cap_max Set all power limits to the max allowed in vbios

-power_cap_policy <string> (policyIdx, mw) Set a power-capping policys


<string> mW or mA limit.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 30
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-power_cap_rtp <string> (mw) Set Room Temperature Power (soft limit)

-power_cap_tgp <string> (mw) Set Total GPU Power (hard limit)

-power_feature <string> Set power feature

-power_feature2 <string> Set power feature

-power_mizer <string> Set PowerMizer levels for AC and battery (2


<string> args).

-preheat <string> <string> Preheat the chip to a certain temperature.

List IsSupported responses from all tests in the


-print_is_supported
current spec.

-print_sys_time Print [hh:mm:ss] prefix on all messages.

Simply prints the tests that would be run in the


-print_tests_to_run
current configuration.

-print_wallclock_time Print [hh:mm:ss] prefix on all messages.

-printcsv Enable Golden.PrintCsv mode.

-privsec_disable Disable Priv Security for debugging purpose

-pstate <string> Test only this pstate.

-pstate_callbacks <string>
Set the PState callback script and function names.
<string> <string> <string>
Do not allow RM to change clocks or initialize
-pstate_disable
perf tables

-pstate_hard Use "hard" pstate locks on all devices

-pstate_soft Use "soft" pstate locks on all devices

Allocate all system memory in randomly


-pte_random
scattered pages.

-pte_reverse Allocate all system memory from top down.

Random seed for scrambling system memory


-pte_seed <string>
allocations.

-pwr_cap <string> 1:enable, 0:disable SmartPower power capping

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 31
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-pwr_rail_gate_off disable power rail gating

-pwr_rail_gate_on Enable power rail gating - and ELPG as well

-pwr_range_mw <string> Set min and max power range for a sensor -
<string> <string> trigger errors for tests.

-pwrclk <string> Power clock in mhz.

-queued_print_enable Enable or disabled queued printing (multithread


<string> support), 0 = disable, 1 = enable

-ram_config_strap <string> Kind of FB memory attached to Gpu.

Randomize input for check_displays correct


-random_prompt
image prompt
Set timeout value for Robust Channels in
-rc_timeout_sec <string>
seconds.

-readspec <string> Use the user defined spec.

-reboot_threshold <string> RebootCounter: failure count threshold.

-reboot_total_count <string> RebootCounter: total number of reboot attempts.

Store register ranges for GpuInstances alongwith


-reg_write_mask <string>
bitmask to be used to write them

-regress_using_gltrace Run regressed loop using gltrace

-regwr <string> <string> Override gpu register (addr, value)

-regwr_early <string> pre VBIOS register setting

-regwr_mask <string> Override gpu register (addr, andMask, orMask) -


<string> <string> do AND first then OR

-relocate_inst Forces instance space at the begining of FB

-require_displays <string> Specify required present and not-present display


<string> masks.
Reset exceptions when shutting down gpu
-reset_exceptions_on_exit
subdev.

-restore_clocks Do restore clocks on exit.

-retry_copy_check_on_fail Retry Comparision in system memory if any


<string> miscompare found

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 32
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-reverse_lvds_tmds Reverse LVDS and TMDS entries in VBIOS

-revision <string> Check the NVIDIA chip revision.

-rm_clients <string> Number of RM clients to create

-rmmsg <string> Override RM Message handling.

Select which device to run the RMStress


-rmsbg <string>
Background test on.

-rom <string> Check the rom version.

-run_after_init <string> Execute specified javascript after GPU initialize.

-run_on_error Continue running if error occurs.

-run_only_gold Only run tests that use golden values.

Only run tests that use DacCrc, TmdsCrc, and/or


-run_only_hw_crc
TvCrc.

-runlist Enable buffer-based runlists

Set the size of buffer-based runlists. Must be a


-runlistsize <string>
power of 2.

-safe_dmas Use safe DMA protocol rather than fast.

-savespec <string> Save specified spec to file

-savespec_args <string> Save string of arguments in user defined spec.

-screen_off Disable output to the screen during tests.

-seed <string> Random number seed.

-serial_ports <string> Total number of serial ports to be tested.

-set_canoas_mapping Set the mapping of a GpuInstance to SysconId +


<string> <string> GpuIndex (inside a syscon).
-set_nvvdd_of_pstate
Set NvVdd based on PState number.
<string>

-set_power_cap <string> Set Canoas Power Cap in Watt. 0=disable.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 33
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-
set_powerrail_leakage_thre set the power rail thresholds
sh <string> <string>
-set_powerrail_voltage
set the power rail voltage at init
<string>

-setgpio <string> <string> toggle GPIO after RM initializes.

-shd_mask <string> Set nv4x shader enable mask.

-show_gold Display contents of goldenXX.bin file.

-sim_int_temp <string> Simulate internal thermal sensor temperature

Simulate a flat panel with the specified EDID on


-simulate_all_dfps <string>
all possible DFPs.

-simulate_dfp <string> Simulate a flat panel with the specified EDID.

-skip <string> Skip the specified test(s).

-skip_board_detect Skip board detection in the ValidSkuCheck test.

Skip simulation configuration during


-skip_config_sim
initialization

-skip_fan_rpm_sense Skip fan RPM sense portion of CheckFanSanity.

-skip_fan_sense Skip fan sense portion of CheckFanSanity.

Skip the INTA interrupt check done at Gpu


-skip_inta_intr_check
initialization.
Skip the MSI interrupt check done at Gpu
-skip_msi_intr_check
initialization.
-
skip_pertest_pex_speed_ch Skips the init and per GPU test PEX speed check
eck
-
skip_pertest_pex_width_ch Skips the init and per GPU test PEX width check
eck

-skip_pertest_pexcheck Skips the init and per GPU test PEX check

-skip_rm_state_init Skip RM init; just do the VBIOS init

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 34
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Make PEX tests check only the link directly


-skip_upstream_check
above GPU

-slcg <string> Second Level Clock Gating.

-slcg_off Second Level Clock Gating Off.

Enable multiboard (SLI) mode, with the primary


-sli
device as master.

-sli_always_approved Force SLI approval in RM

Bitmask of GPUs to be linked into an SLI


-sli_config <string>
configuration
Enable multiboard (SLI) mode, with device N as
-sli_master <string>
master.
Force testing of SLI connector only in one
-sli_only_dir <string>
direction

-sli_pixel_clock <string> Select Pixel Clock in MHz for SLI testing

-sli_use_display <string> Select display output connector for SLI testing

-slt Run the chip manufacturing tests.

-smbus_temp_range Set min and max smbus temperature range -


<string> <string> trigger errors for tests.

-soak <string> Soak the chip for given seconds.

-sockCmdLine <string> Pass Command to start sockserver.

-sor_loadadj <string> Set PLL1 LoadAdj value in SOR loopback test.

Run only the selected pattern in SOR loopback


-sor_pattern <string>
test.

-spdif_codec <string> Azalia codec index for SPDIF output

-spdif_out <string> Azalia port number of the SPDIF-OUT audio jack

-spec <string> Use the user specified table for this mode.

-strap_fb <string> Set the framebuffer strap in megabytes.

-subdev <string> Select a resman subdevice.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 35
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-subsystem <string>
Check the subsystem vendor and device IDs.
<string>

-suggest_pstate_at_init Suggest a PState at init

-swSlowdown Enable clock slowdown

Run GPU in big endian mode on a little endian


-swap_endian
computer.

-sys2clk <string> Sys2 clock in mhz.

-sysclk <string> Sys clock in mhz.

-syspll <string> Sys PLL clock in mhz.

-tc_weak_key Use weak turbo cipher key

Allow Nforce systems to reboot automatically in


-tco
a crash
Control gpu fan to reach given temperature (if
-temp <string>
tgt < 0, just report temps).

-test <string> Run only the specified test(s).

Tell ValidSkuCheck that were are intentionally


-test_for_gen1
testing with Gen1 chipset.

-test_gpu <string> Name of the gpu to be tested

-testarg <string> <string> (test, property, expression) sets tests property to


<string> result of expression.
-testargstr <string> <string>
(test, property, str) sets tests property to str
<string>

-testforce <string> Run only and force the specified test(s).

Display available -testarg options for a given test


-testhelp <string>
number.
Turn on the prepending of the thread ID to a line
-threadid
of log spew

-time Record duration of tests.

Set default Timeout in MS (for both Tests and


-timeout_ms <string>
RM).

-tmds_crc Use TMDS/LVDS CRCs.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 36
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-tmds_loop_clk <string> Pixel Clock frequency in TMDS loopback test.

Set PLL0 PllRegLevel value in TMDS loopback


-tmds_pllreg <string>
test.
Set PLL0 TxRegLevel value in TMDS loopback
-tmds_txreg <string>
test.

-tpc_mask <string> Set TPC enable mask.

-tpc_mask_on_gpc <string> Set TPC enable mask for the given GPC. (usage:
<string> <gpc_num> <mask>)

-trepfile <string> Set the name of the trep (test report) file

tolerance for number of unsupported request for


-un_supp_req_tol <string>
each pcie link

-unlock_aslm unlock ASLM on the current chipset

unlock Gen2 capability on chipset that RM


-unlock_chipset_gen2
prevents from going into Gen2.

-use_dynamic_mempool indicate to the RM to use the dynamic mempool.

-use_mods_console Use the MODS console

indicate to the RM to use the original (unpadded)


-use_orig_fb_req_size size of the FB alloc request for comptag
calculations.
Add a PerfPoint for each pstate to test:
-use_perfpoints <string>
min/nom/mid/tdp/etc

-use_raw_console Use the raw console

Use an RM callback for robust-channel errors


-use_rc_callback
(aka two-stage recovery).

-use_vfpoints Build PerfPoint for each vfpoint

Use GPU virtual addressing for accessing


-use_virtual_dma
memory.

-user_strap <string> Check user strap value in CheckConfig.

-va_reverse Allocate virtual addresses from the top down

-vas_ram_size <string> Set Ram size to scale VAS size (option: 0, 1 or 2)

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 37
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-verbose Run the tests in verbose mode.

-verify_fuse <string> Append fuse & spec to the list to be checked for
<string> Test 1 - CheckConfig
Fail if PCIE lanes are different than specified
-verify_lanes <string>
number.

-verify_sku <string> Checks if the fuses are burnt for the given sku

-verify_suspect_shader Pass in a special js filename to verify a suspect


<string> shader

-volt_offset_mv <string> Shift the VF curve by x mV.

-volterra_max_temp_delta Set the max difference between two consecutive


<string> readings on the same volterra slave (default = 15)
-volterra_temp_range Set min and max volterra temperature range -
<string> <string> trigger errors for tests.

-weak_skey Disable DH Key Exchange

-wlm_enable <string> Enable WLM during test with optional


<string> freezeUsec runUsec Ratio(-wlm_enable x y).

-xbar2clk <string> XBar2 clock in mhz.

-xbarclk <string> XBar clock in mhz.

-zcull_mask_on_gpc Set ZCULL enable mask for the given GPC.


<string> <string> (usage: <gpc_num> <mask>)

3.5 Test Selection


MODS has an object-oriented test selection mechanism. Embedded in each test is a
function that reports back whether that particular test is supported on the unit being
tested. Therefore, normal MODS operation only requires that the user runs mods
gputest.js –mfg (or –oqa, see section 2.0 above). When this default command-line is
used, each test is queried to determine if it can run (i.e., the hardware and operating
system both support it) and should be run (i.e., it is part of the selected test suite) on the
unit being tested, and then this subset is run sequentially.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 38
MODULAR DIAGNOSTIC SOFTWARE (MODS)

The following command-line options can be used to change this behavior:

Table 4. Test selection arguments

Option Description

-add X In addition to running the normal suite of tests,


run test X if it can be run, regardless of whether it
should be run. If test X can’t be run, ignore the
request and silently not run test X. To force
execution of test X, use the –force option.

-skip X Run the normal suite of tests, but skip test X.

Using –skip on the same test number as –force or –


add will cause an error. However, combinations
of these command-line options are legal if they do
not refer to the same test number.

-test X Do not query each individual test to determine if it


should be run. Run only test X. –test and –skip
cannot be used simultaneously.

-force X Run test X even if it can’t be run and/or shouldn’t


be run. This may result in errors if the hardware
being tested does not support test X. This option
attempts to run test X in addition to the normal
test suite.

-testforce X Run test X even if it can’t be run and/or shouldn’t


be run. This may result in errors if the hardware
being tested does not support test X. This option
runs only test X.

Using –test on the same invocation with –add, -force or –skip will cause an error, even if
they refer to different tests.

3.6 Installation
Place all distribution package files into single directory.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 39
MODULAR DIAGNOSTIC SOFTWARE (MODS)

On MacOSX, click on the ".tgz" package to unpack it. To run MODS, type "./mods
gputest.js" or another command line in the "Mods.app/Contents/Resources" directory.
Alternately, you can edit "Mods.app/Contents/Resources/mods.arg" to contain the
command line you want to run, then click on the MODS icon

3.7 Prerequisites for Running Linux


This section applies to users who wish to run MODS on a Linux distribution other than
the one provided by NVIDIA described in section 3.9. If you are using the NVIDIA-
supplied distribution you don’t need to read this section.

Linux manufacturing MODS requires minimum kernel version of 2.6.16. Version 2.6.29
or newer is recommended for performance reasons. Older versions have not been tested
and may not be working. Kernel 2.4 is not supported. The version of the running kernel
can be established by running:

$ uname -r

Linux manufacturing MODS is a 64-bit application and it requires kernel compiled for
x86_64 architecture. To determine kernel architecture, type:

$ uname -m

The system on which MODS is run must be built on glibc-2.3.2 or newer. To determine
glibc version, type:

$ /lib/libc.so.6

Linux manufacturing MODS includes a kernel module. The purpose of this module is to
expose certain kernel-mode APIs to MODS, which runs as a user-mode application. In
order to be able to install the kernel module, the system must contain configured kernel
sources and development tools, including make and gcc. Without them it is not possible
to compile the kernel module. Use package manager provided by your distribution to
install kernel sources. Typically the package's name is kernel-sources or linux-sources.
For example on Debian, type:

$ sudo apt-get install linux-source-`uname -r`

If you run MODS as root, MODS will automatically run the included install_module.sh
script to compile and insert the MODS kernel module. However if MODS is not run by
the root user, it is necessary to install the kernel module, which is recommended.

For successful MODS runs the NVIDIA GPUs in the system must be in their original,
unaltered state, as initialized by VBIOS. This means X must not have been run on the
NVIDIA GPUs prior to running MODS. Please make absolutely sure that the nvidia
kernel module is not loaded, otherwise the system may become unstable. In order to
unload the nvidia kernel module it is necessary to first kill X. Killing X is also
recommended even if it is using vesa or fb driver.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 40
MODULAR DIAGNOSTIC SOFTWARE (MODS)

To disable X in SuSE, disable the xdm service in YaST.

On Debian-based systems (including Ubuntu), type:

$ sudo update-rc.d -f gdm remove

If not using gnome, type kde or xdm instead of gdb (as applicable).

Some newer Linux distributions include the nouveau driver in the kernel. This driver
performs a kernel mode set and it also supports a framebuffer console. For MODS to
function correctly, this driver has to be unloaded, preferably blacklisted so that it is not
automatically loaded at boot.

Framebuffer consoles are also not recommended, because they may modify memory of
the tested device during the tests. To disable the framebuffer console, edit
/boot/grub/menu.lst and make sure the kernel arguments contain vga=normal instead of
any other value. Make sure they do not contain anything like video=.

3.8 Installing the Kernel Module


This section applies to users who wish to run MODS on a Linux distribution other than
the one provided by NVIDIA described in section 3.9. If you are using the NVIDIA-
supplied distribution you don’t need to read this section.

Linux MODS relies on a kernel driver to handle cases where it is necessary to use kernel-
mode APIs. The easiest way to install the MODS kernel module is to use the provided
installation script, which you will find in MODS runspace:

$ ./install_module.sh --install

[Note: You can’t install the kernel module from a network directory where the root user
does not have write access. In this case copy install_module.sh and driver.tgz to /tmp
and run it there.]

This script also creates an udev configuration file in /etc/udev/rules.d/99-mods.rules.


This file specifies group which will be able to access the kernel module. By default it's
the video group, like for the nvidia driver. Please make sure your user is in this group or
change the group in the 99-mods.rules file to match one of yours. On some systems, the
install script created file /etc/udev/permissions.d/99-mods.permissions instead, which
simply lists user, group and mode for the driver file.

To find out what group the driver has been assigned to, type:

$ ls -l /dev/mods

crw-rw---- 1 root video 10, 60 Jan 7 08:21 /dev/mods

To find out which groups you are in:

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 41
MODULAR DIAGNOSTIC SOFTWARE (MODS)

$ id

uid=4384(modsuser) gid=30(hardware) groups=30(hardware),1606(gpu-tesla)

If you decide to modify the group in 99-mods.rules, you have to reload the kernel
module:

$ modprobe -r mods

$ modprobe mods

To make sure the kernel module is always loaded when the system starts up, follow
your distribution specific guidelines.

On Debian-based distros (such as Ubuntu) add the mods module name to /etc/modules
if the installation script didn't add it.

On SuSE-based distros add the mods module name to /etc/sysconfig/kernel file in the
MODULES_LOADED_ON_BOOT variable.

On RedHat-based distros (such as CentOS) add line modprobe mods to /etc/rc.d/rc.local.

3.9 Creating a Linux Disk Image


NVIDIA distributes a turnkey Linux package that can be used to create a disk image.
This package can be obtained on the internal NVIDIA network from
\\nvcorp\applied\diags\mods\tinylinux. Customers should contact their NVIDIA
representative to obtain this file. This distribution will fit on a 64 MB flash drive.

The package default.zip can be installed on a target drive (e.g. USB stick) from Windows
XP. The installation procedure is as follows:
 Insert the USB stick or make sure the drive where you want to install it is connected.

 Format it using FAT32 filesystem. For USB sticks right click on the drive and choose
"Format...". For non-removable drives (such as SATA) you need to go to Control
Panel->Administrative Tools->Computer Management->Storage->Disk Management
and create a partition smaller than 32GB (max size for FAT32) and then format it.
 Unzip the modsdisk.zip package to the target drive you've just formatted e.g. by right
clicking on the zip file in Explorer, choosing "Extract all..." and entering drive letter
(e.g. e:) as the destination.
 Open command prompt (e.g. Start->Run... and type "cmd<ENTER>").

 Go to the target drive by typing drive letter of the target drive and pressing enter (e.g.
"e:<ENTER>").
 Install the boot loader (assuming e: is your drive) (for non-removable drives you
might need to also add the -f switch):

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 42
MODULAR DIAGNOSTIC SOFTWARE (MODS)

 $ syslinux -m -a e:

3.9.2 Disk contents of the Linux Disk Image

File Description

/ldlinux.sys File used by the bootloader to load Linux.

/syslinux.exe Bootloader installer for Windows.

/syslinux/kernel Compressed Linux kernel image

/syslinux/initrd Compressed initial ramdisk containing files


needed to access the compressed Linux filesystem.

/syslinux/squash.bin Compressed Linux filesystem with all files needed


to run MODS.

/syslinux/syslinux.cfg Bootloader configuration file containing command


line for the Linux kernel.

/syslinux/commands Linux shell commands which are executed after


the system finishes loading itself. This file can be
edited on Windows.

/mods/mods.tgz MODS package.

/mods/args Arguments for MODS. This file is explicitly


specified in /syslinux/commands. It can be edited
on Windows.

3.9.3 Usage

After you made the disk bootable, insert it or connect it and order BIOS to boot from it.

You can edit and customize the /syslinux/commands file to load additional kernel
modules, initialize networking, send mods.log created by MODS over the network, etc.

You can also customize the MODS arguments in file /mods/args.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 43
MODULAR DIAGNOSTIC SOFTWARE (MODS)

When you boot the Linux distribution it will load Linux and execute commands in
/syslinux/commands which by default will launch MODS with arguments specified in
/mods/args.

3.9.4 Running MODS again

The default image will run MODS immediately after boot. To run MODS again:

$ cd /mnt/dos/mods

$ /tmp/mods/mods @args

You can also replace "@args" with actual arguments.

3.9.5 How It Works

This is an explanation what happens from the moment BIOS boots from the Linux disk:

First the bootloader locates the /syslinux/syslinux.cfg file and finds where to find the
kernel and the initial ramdisk.

The bootloader loads the kernel (/syslinux/kernel) and the initial ramdisk
(/syslinux/initrd) to memory and uncompresses them.

Then the bootloader invokes the kernel code.

The kernel boots and initializes all devices it has drivers for.

After the kernel finishes booting it runs the /linuxrc script located in the initial ramdisk.
This is our mods-linuxrc0 script.

The script mounts basic filesystems (/dev, /proc, /sys), finds the drive where the DOS
filesystem is located, mounts it and then mounts the squashed Linux root filesystem. It
also prepares a new ramdisk which will become the new, final root filesystem.

The script executes another, final /linuxrc script from the squashed root filesystem. This
is our mods-linuxrc1 script. At the same time it rotates the root directory so that the final
squashed root filesystem becomes the main root filesystem.

The second linuxrc script again mounts basic filesystems under the new root filesystem,
loads MODS driver and then executes commands from /syslinux/commands.

4.0 GPU TESTS


A typical GPU test performs the following operations:

Disable the windowing system to take over the entire screen.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 44
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Set the display mode and refresh rate.

Loop N times:

Exercise some aspect of the graphics hardware.

Read back the resulting image. Calculate a 32-bit CRC, or possibly a


checksum, to compare against the known correct value (golden value) for
this GPU version and platform. For video and cursor tests use the
hardware DAC CRC.

If the golden values do not match, report an error and abort the loop.
Optionally, capture image file(s) in .TGA format for failure analysis.

Restore previous display mode and refresh rate.

Release screen to windowing system.

Report test status.

Each test carefully chooses the random test parameters, i.e. invalid values are avoided,
edge cases are properly covered, and proper weighting is given to more common cases.

4.1 Test Descriptions


Normally, MODS is invoked by using the command-line:

Table 5. List of GPU tests

T# Test Name Description

1 CheckConfig This configuration test is run if one of the command-line


options listed below is used with gputest.js.

-subsystem Check the subsystem vendor and device ID's.

-device_id Check the device ID

-revision Check the NVIDIA chip revision.

-foundry Check the NVIDIA chip foundry.

-tv_encoder Type of TV encoder.

-require_displays Specify required present and not-present display masks.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 45
MODULAR DIAGNOSTIC SOFTWARE (MODS)

-tv_encoder_strap Desired Tv standard of encoder at boot time.

-ram_config_strap Kind of FB memory attached to Gpu.

-io_3gio_config_strap reference into LUT for IO3GIO pad configuration.

-user_strap Check user strap value in CheckConfig.

-verify_sku Checks if the fuses are burnt for the given sku

2 GLStress GLStress is an OpenGL-based graphics stress test. It first clears


its surface to black, then repeatedly draws a variety of textured,
lit triangle meshes that fill the surface. By using color logic-op
XOR on an originally black surface, and always drawing each
mesh twice, we know that the result should be black. Any
rendering inconsistency will result in errors that will be
preserved to the end of the test by the subsequent XOR
drawing. At the end of the test, we report a failure if any pixels
are not black. In addition to lighting and texturing, the test uses
depth-test and stencil-test. This test is not included in some
MODS builds.

3 MatsTest A generic frame buffer memory test designed to catch coupling


faults within memory arrays. Stepping both up and down
through the array as well as alternating reads and writes is
important for catching certain cases of array-coupling faults.
The test indicates which data bits fail, front or back banks,
which memory lane, and whether the failure was on a read or a
write. The –matsinfo flag can be used in conjunction with this
test for extra error reporting.

4 EvoCurs Test the cursor rendering circuitry. This test randomly positions
the cursor and performs a DAC CRC to verify if the rendered
cursor is correct. This test cycles through all combinations of
display devices so that all heads get tested.

7 EvoOvrl Test the GPU's overlay video circuitry. This test reads a given
YUV image from specific location with certain size, and renders
it as an RGB image at a specific screen location, pixel size, and
magnification. A DAC CRC is used to verify if the rendered
image is correct.

8 CheckHotPlug Interactive display hotplug test.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 46
MODULAR DIAGNOSTIC SOFTWARE (MODS)

9 Random2d This is the same as Rnd2dTest.FbRun, but it renders to


NonCoherent memory and is shorter. Both tests draw the exact
same thing (in fact, the same CRCs are used).

11 EvoSor Display pipeline loopback test for G80 and higher

12 EvoSli Tests whether the SLI video-bridge is working correctly or not


by computing CRCs of scanned-out image data.

17 ValidSkuCheck The purpose of this test is to confirm that it matches a valid sku
configuration. It is used to catch these failures:

1. Cards that have been mis-sized by the video BIOS’ sizing


algorithm.

2. Incorrect "GL-ness" (for workstation boards)

3. Correct number of TPCs

4. Correct number of SMs

5. Correct number of ROPs.

6. Correct number of framebuffer partitions

7. Correct fuse configuration

18 ByteTest Just like Mats, except perform 8 bit reads/writes instead of 32


bit read/writes.

19 FastMatsTest Similar to Mats, except use GPU hardware writes instead of


CPU writes.

The algorithm works like this:

1. It first divides the framebuffer into "boxes" of 64x64 pixels.

2. Render a pattern to a random rectangle

3. DMA the contents of that rendered box to system memory

4. Pick a different random rectangle and render a pattern to it

(note: #3 and #4 occur in parallel to make the test more


stressful)

5. Start the rendering of the pattern to the next random

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 47
MODULAR DIAGNOSTIC SOFTWARE (MODS)

rectangle.

6. Check the contents of the box DMA'd to system memory in


#3.

(note: #5 and #6 occur in parallel to make the test more


stressful)

7. Repeat steps 3-6 for all boxes

8. Repeat steps 3-7 for all patterns

22 CheckDisplay Display a red-green-blue-white diagonal image on the specified


display head and display type for a visual inspection. This is an
interactive test.

23 GLStressDots This OpenGL-based test renders offscreen and uses a GPU


shader to check for errors without reading back the color buffer
over the PCIE bus once per second. Dots are drawn to the
screen to indicate loop count. It produces a slightly "bursty"
power load (less total power than test 2) but at fixed frequency
(varies per board).

24 CheckHDCP Checks to make sure HDCP is working. Requires an HDCP-


capable GPU and an HDCP-capable display. This test is
enabled by default on GPUs that support HDCP. This means
that if your card supports HDCP but you do not test with an
HDCP display, the test will fail. This is the intended behavior.
If the user wishes to waive HDCP testing, he or she can use the
“-skip 24” command-line option.

25 MSDECTest This is a test for the video decompression engine. This engine is
also called “VP3.”

30 PcieSpeedChange The primary purpose of PcieSpeedChange is to test the ability


to change between PCIE Gen1, Gen2, and Gen3 speed. There
are two phases to this test: The first section switches the bus
speed between Gen1, Gen2, and Gen3, and checks if we can
read back the current speed correctly. The second part of the
test switches bus speed, and does two DMA operations
between system memory and frame buffer after the speed
change. At the end of each DMA operation, MODS will also
check whether a PCIE error has been flagged. The DMA
operations will construct a 'result surface' that MODS will

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 48
MODULAR DIAGNOSTIC SOFTWARE (MODS)

validate.

31 CheckThermalSan If there is a thermal measuring device on-board, this test makes


ity sure that the values returned are reasonable and not out of
bounds.

33 GetDisplayConfig Get the display configuration, i.e. print the attached display
devices on each display head.

34 CheckDisplayBars Display red-green-blue-white bars on all display heads for a


visual inspection. This is an interactive test.

35 CheckDisplays Display a red-green-blue-white diagonal image on all display


heads for a visual inspection. This is an interactive test.

36 StereoTest Checks if the stereo hardware is working correctly. This is an


interactive test and requires special hardware such as stereo
glasses or an LED wired to the stereo pin output.

38 CheckFpGray Displays a special gray image on all flat panels for a visual
inspection. This is an interactive test.

41 MultiBoardDma This test is based on DmaTest.RunTest. It has been modified to


keep 2 GPUs and the CPU busy at once to stress multi-board,
including peer-to-peer DMA and broadcast.

42 CheckDisplayBar Display red-green-blue-white bars on the specified display


head and display type for a visual inspection. This is an
interactive test.

43 SpdifCheck A SPDIF cable check. The support chipset will output SPDIF
signal out of the motherboard. An SPDIF cable coming out of
motherboard should be plugged into the graphics card. Using
the Azalia chipset, this test will output 3 different sampling
frequencies and expect the GPU to see that the sampling
frequency changed.

44 SecTest GPU mfg test for the SEC (SECurity) engine. The SEC engine is
a DMA engine that also handles encryption and decryption, as
required by HD-DVD video data in memory-spaces that might
be accessible to hackers trying to make unlicensed copies of
movies. This tests a randomized sequence of transfers using all

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 49
MODULAR DIAGNOSTIC SOFTWARE (MODS)

possible features. Encrypts and decrypts are checked by first


confirming that the data mismatches the source data, then
performing the inverse operation to restore the original data
and checking again for a match.

45 CheckFpStripes Display a special stripe image on all flat panels for a visual
inspection. This is an interactive test.

48 CheckFbCalib The test checks if auto-calibration of FB interface ended with a


reasonable values. When auto-calibration ends with maximum
strength values (which should be signaled by the test) it may
mean that something is wrong with the interface and memory
transactions to FB may become corrupted. Usually this can be
corrected by changing voltages, FB calibration resistors on PCB,
etc.

50 I2CTest Check if the GPU's external I2C bus is properly equipped with
pull-up resistors.

51 CheckTvo Check TV-out functionality.

52 MarchTest This is an alternate way to call the Mats test (see below). This
version does a "marching ones and zeros" memory pattern.

54 GLRandomCtxSw glr_ctxsw is glr_hwtest with GLStress running in the


background. Both tests must pass.

58 Random2d This is a combined 2d rendering test. It tests blit (rectangular


pixel region copy), 2d line, triangle, rectangle, and text drawing,
texture downloading and format conversion including palette
lookup and dithering, image scaling and stretching, and video
colorspace conversion with compositing.

63 Optimus Tests the GPU power down, power up, and re-initialization in
Optimus notebooks

65 CheckOvertemp Checks if the GPU has overheated during the test.

69 CheckHiResCrcs Check that the DAC can handle high-resolution video modes
and still scan them out correctly.

70 PatternTest This is an alternate way to call the Mats test. This version uses

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 50
MODULAR DIAGNOSTIC SOFTWARE (MODS)

a memory pattern supplied by Apple.

71 AppleGL A port of Apple’s OpenGL test. This test only runs on the
Macintosh version of MODS.

73 HeatStressTest Thermal.RunStress is similar to RmStress.Run in that it runs the


RM's internal stress test, but instead of using mods code for
surface alloc/init, looping, and surface testing, it just does the
same ConfigSet call that the win32 "auto-overclock" control-
panel utility uses to see if a given nv/dramclk setting is stable.
It doesn't return much data.

74 AppleAddrTest This is a variant of Mats using an address-on-data pattern.

75 AppleKHTest This is a variant of Mats using the Knaizuk Hartmann test


pattern.

76 AppleMOD3Test This is a variant of Mats using the MOD3 test pattern.

78 CheckFanSanity Checks if:

1. If the fan is spinning or not

2. If the fan RPM at 100% PWM is at least 30% more than the
RPM at 30% PWM.

3. If the fan RPM at 65% PWM is the average of the fan speeds
at 30% PWM and 100% PWM with a 30% tolerance.

4. If the fan RPM at 30% PWM is more than 500 RPM.

5. If the fan RPM at 100% PWM is more than 2000 RPM.

79 TurboCipher Test the TurboCipher data encryption engine.

81 GLRandomHw Obsolete Test. Replaced by new GLRandom tests. These are test
130 through 141.
TestCrc

83 CheckVbridge Tests for the presence of an SLI video bridge. This test is
enabled with with –check_vbridge command-line option.

84 HdmiLoopback Internal HDMI loopback test. HdmiLoopback test works in


different ways depending on whether the gpu being tested has

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 51
MODULAR DIAGNOSTIC SOFTWARE (MODS)

an onboard azalia controller or not.

For GPUs who don't have on-board audio controller:

The loopback for this test would be as follows:

a. From SPDIF-out of chipset to SPDIF-in of GPU

b. HDMI port of GPU to HDMI port of display

c. Audio-out of display to line-in of chipset

For GPUs who have on-board audio controller:

The loopback for this test would be as follows:

a. HDMI port of GPU to HDMI port of display

b. Audio-out of display to line-in of chipset

85 HdmiCrcTest HDMI test. See section 9.1.

86 CheckHdmi HDMI test. See section 9.1.

87 CudaMatsTest Deprecated and replaced by Test 143.

88 CudaLrfTest This tests the allocation of thread local variables in CUDA.


These are allocated from the local register file (LRF).

89 CudaGrfTest This tests the allocation of shared variables in CUDA. Variables


are shared in the sense that all the threads within a given
cooperative thread array can access the value of the variable.
Such variables are allocated from the global register file (GRF).

90 FbioLinkTest This is a simple FB memory interface test that uses the GPU's
built-in FBIO training engine to generate the read/write traffic
and count errors. Our other FB tests use either CPU traffic over
the PCI-E bus (Mats) or the 2d engine (FastMats, WfMats) or 3d
engine (RmStress, GLStress). In theory the built-in FBIO
training engine should be usable by runtime resman operations
to adapt a board on each boot. We're prototyping such adaptive
training here. Before the FBIO link training engine starts the test
operation, it blocks all other memory clients and swaps in a
whole alternate set of "tunable" FBIO registers: read and write
strobe timeing and voltage-ref. This allows tuning to proceed
to fairly extreme values without corrupting instance memory.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 52
MODULAR DIAGNOSTIC SOFTWARE (MODS)

91 CudaDgemmTest Deprecated and replaced by Test 190

92 GLStressPulse This test runs a reduced-resolution version of GLStress with


pauses between bursts of frames, to produce an 11khz 50% duty
cycle load on the power supply. It sweeps its pulse frequency
rather than using a fixed frequency. The actual frequencies vary
with board performance, but GT200 sweeps over about 1khz to
70khz.

93 NewWfMatsShort An alternate run of NewWfMats (see test 94) which runs only
the blit loop with no CPU loop at all.

94 NewWfMats NewWfMats divides the FB memory up into many 2d "boxes",


each 1024 pixels wide by BoxHeight lines high. These are put
into two lists, one for CPU read/write (as in the MATS test) and
the other for the gpu blit loop. This is an updated version of the
WfMats test for new GPUs. New features include bandwidth
reporting (GB/sec in the blit loop) and new controls of CPU
box-list such that test time will be about the same for any board
regardless of FB size.

Each box-list is shuffled randomly so that our read/write


operations jump around in the FB address space, allowing us to
catch addressing errors that sequential operations would miss.

The duration of the test is controlled by the size of the CPU box
list. By default, the CPU box list contains about 1/8th of all FB
memory. The Coverage property reduces this ratio, reducing
test time. The End property limits the FB memory in total, also
reducing test time.

When each cpu box has been read/written to each pattern in the
CpuPattern list (default is 4 patterns: 0x00000000, 0xffffffff,
0xaaaaaaaa, 0x555555555), the blit loop is stopped and the blit
boxes are checked for errors.

The blit loop boxes are each initially filled with a different
pattern, by default using all 29 patterns supported by the
PatternClass object. Each time the blit loop runs, each box is
copied to the next box 29 times so that at the end of the loop
each box returns to its original pattern.

Errors in reading or writing accumulate for the duration of the


test, and are recorded at the end when each box is read back by
the CPU and checked against its original pattern.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 53
MODULAR DIAGNOSTIC SOFTWARE (MODS)

The errors detected by the blit loop cannot be definitively


mapped to any particular memory cell, since they accumulate
across many boxes. But WfMats is careful to start each box at an
address that is a multiple of the partition-swizzling interleave,
so that we know the most important info about the error:
partition, burst-order, byte-lane, data-bit.

95 GLStressFbPulse This is a variant of GLStressPulse, in mods 177.5 and later. It is


intended to catch power-supply problems in the FB section of
the board. This is achieved by using smaller minimum chunks
of drawing to hit a higher puplse frequency and tuning the
GLStress options to increase the framebuffer interface
workload. Be aware that GLStressFBPulse is still experimental
and is not run by default.

97 GLRandomTpc Runs the GLRandom_hwtest, which issues random graphics


operations through the OpenGL driver. This check CRC's per
texture processing cluster (based on screen coordinates) and
allows the test to isolate any failures to a particular TPC.

98 CudaStress CudaStress (test 98) is a CUDA based compute stress test. It is


intended to keep the GPU busy with minimal CPU loading (to
avoid being CPU bound even when run multi-threaded on host
side).

It has LoopMs and KeepRunning test-time controls like


GLStress, and is good for running in the background behind
other tests for context-switching stress.

It uses a fairly small FB memory footprint, so should be mostly


GPU bound rather than FB bound.

99 DPLoopback This test is similar to HDMILoopback test except for 2 things:

1. Instead of HDMI port of the GPU, we use the DP port to


carry audio.

2. You need a display which enables audio over Display


Port(which is currently hard to find).

100 CpyEngTest This test focuses on testing the CopyEngine by performing a


series of copies between surfaces and checking the results in a
manner similar to GpuDma. The surfaces can vary in size and
must be in either frame buffer or system memory. They can
have either block linear or a pitch layout. Once a copy is

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 54
MODULAR DIAGNOSTIC SOFTWARE (MODS)

completed, if either the source or destination surface is in the


frame buffer, that surface is copied again to system memory.
This is done to speed up the integrity check. The source and
destination surfaces (or their copies in system memory) are then
compared to make sure the copy was performed correctly. If an
error is detected and we have performed the secondary copies
(moving a surface from frame buffer to system memory to
speed up the check), then we will also check the surfaces in the
frame buffer to see if the secondary copies are at fault.

101 Elpg This is the most basic MODS power gating test based on PMU
messages .

First, MODS allocates graphics, video and VIC engines (if


supported) and waits for the PMU to signal power gate. The
test then attempts to check whether power gating can be
disengaged when MODS attempts to do a privileged register
read or push buffer flush on the corresponding engines.

For chips that supports power rail gating, this will also attempt
to make sure that the chip can enter and exit power rail gating.

102 DispClkStatic Iterate through all available dispclk perf points and run the
display tests (4, 7, 11) at each disp clock point

103 IntAzalia Some GPUs starting with GT21x have an onboard azalia audio
controller. This test makes sure that that azalia controller is
Loopback working by creating a loopback between different codecs. It
sends an output stream through one codec and will try to
receive through another codec. It will then compare the input
stream to output stream. If the stream cannot be read or was
corrupted, the test fails.

104 PcieLinkTest This test stresses ASLM (link width change) by changing the
PEX link width and throws bursty data on the PEX bus. The
test checks for whether the link width change is successful,
determines the correctness of the data transfer to sysmem and
FB, and also checks whether PCIE errors occurred during the
test.

105 I2CSTest The GPU's internal sensor can act as a slave I2C device. On a
system that has another master (happens sometimes in
notebook), Gpu temperature can be read through I2C read on
the I2CS port. This test checks out if this interface is functional

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 55
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Requirements: a special board is required. The test will attempt


to issue I2CS read by issuing read through the I2C port. As a
result, the board must have a loopback between the I2CS port
and the GPU's I2C port.

We try to verify that the value read by I2C -> I2CS is the same
as the value read back through RM->register reads
(Subdev.Thermal.ChipTempViaInt).

106 KFuseSanity This checks whether valid HDCP keys were blown into the fuse
block and verifies that the keys are not all zero. It also confirms
that the CRC of the KFuses is correct

107 PStateTest This is a legacy PState switching test. (A PState is a


combination of voltages and clocks and may include some other
power-saving features.) The does random PState switches
(picked from the available pstates in the perf table) while doing
bursty DMA transfers between system memory and frame
buffer memory.

The test checks the integrity of the data transfers and whether
PCIE errors accumulated during the test. In addition, since link
width change and link speed change are tied to pstate switches,
this test will also attempt to verify that the correct link speed
and link width are set for each pstate change.

For PState 2.0 and GPU Boost systems, this test has be replaced
by test 145.

108 LineTest This is a verification test to detect reads-passing-writes bugs in


corelogic hardware.

110 CudaBoxTest This is a variant of test 3 (MatsTest) that uses CUDA instead of
CPU “dumb framebuffer” accesses to exercise memory.

111 CudaByteTest This is a variant of test 18 (ByteTest) that uses CUDA instead of
CPU “dumb framebuffer” accesses to exercise memory.

112 CudaMarchTest This is a variant of test 52 (MarchTest) that uses CUDA instead
of CPU “dumb framebuffer” accesses to exercise memory.

113 CudaPatternTest This is a variant of test 70 (PatternTest) that uses CUDA instead
of CPU “dumb framebuffer” accesses to exercise memory.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 56
MODULAR DIAGNOSTIC SOFTWARE (MODS)

114 CudaAppleMOD3 This is a variant of test 76 (AppleMOD3Test) that uses CUDA


Test instead of CPU “dumb framebuffer” accesses to exercise
memory.

115 CudaAppleAddr This is a variant of test 74 (AppleAddrTest) that uses CUDA


instead of CPU “dumb framebuffer” accesses to exercise
Test memory.

116 CudaAppleKH This is a variant of test 75 (AppleKHTest) that uses CUDA


instead of CPU “dumb framebuffer” accesses to exercise
Test memory.

117 ClockPulseTest This test programs the GPU's thermal-slowdown feature to


“pulse” the effective “gpcclk” speed and thus provide a stress
to the power supply. This test only works on the hardware
slowdown part of the subsystem we know as the thermal
slowdown. It is intended to be run as a background test, while
graphics tests run in other threads, to detect any data
corruption caused by power-supply glitches.

For example you might run it this way:

mods gputest.js -engr -test 2 -bgtest 117

119 CudaRandom This test uses CUDA to test out all of the single & double
precision mathmetically operations for a given compute
capability. It is designed to verify consistency not accuracy of
these operations.

120 CheckPower This that creates a high graphics processing load on the GPU
and reads values from power controller on a board to check if
Phases all power phases are giving expected output.

122 ElpgGraphics This test toggles graphics ELPG while running GLStress. This is
to make sure that engaging power gating has no effect on the
Stress correctness of graphics operations.

Through command line options, this test can additionally


toggle Video ELPG in the background. This creates extra noise
and larger change of di/dt on the power rail.

123 NewWfMatsBus This is a memory test that tries to determine if memory failures
occur on read or on writes. The amount of IO done by the GPU

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 57
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Test is deterministic (and very small!). The CPU does no IO at all.

It follows this procedure:

set dramclk to WriteSpeed

fill gpu-loop boxes with initial patterns (via sysmem->FB blits)

set dramclk to ReadSpeed

read back (via FB->sysmem blits) and verify each gpu-loop

124 ElpgVideoStress This test engages video ELPG in the background while running
MSDEC (test 25). The purpose is similar to test 122. It can
additionally toggle Graphics ELPG in the background as well.
This creates noise and larger change of di/dt on the power rail.

125 DeepIdleStress This test runs a version of GLStress that periodically forces a
transition into the Deep Idle low power state. The graphics
operations generated by the OpenGL driver force the transition
out of the low power state. This test requires the VBIOS to
support P-State 12, and the upstream bridge device connected
to the GPU to support L1 ASPM.

126 GLRandomOcg Test the internal OpenGL shader compiler by creating a large
amount of randomly generated vertex/geometry/fragment
programs and then issue random graphics operations through
the OpenGL driver. This test is a derivation of test 16. The
major difference is that a new set of random shaders are created
at the start of each loop instead of each frame. This is a
consistency test not an accuracy test.

127 CudaColumnTest FB memory test for long time retention of data in DRAM cells
with sparse write changes. Designed to expose spurious bit flips
correlated to refresh cycles and content of DRAM.

128 DeepIdleVETest This test validates nvDPS and Deep Idle Video Enabled
functionality. Random rectangles are rendered to the screen via
the 2D engine and periodically the rendering is paused. When
this occurs the nvDPS hardware detects a lack of screen activity
and signals entry into the Deep Idle Video Enabled state (a low
power state with display enabled, and a forced lower refresh
rate). This test has the same requirements as DeepIdleStress,
but in addition it also requires a LVDS or eDP display.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 58
MODULAR DIAGNOSTIC SOFTWARE (MODS)

130 GlrA8R8G8B8 Test the 3-D graphics engine by issuing random graphics
operations through OpenGL driver. It uses normal 32-bpp
color/Z, i.e. a8r8g8b8 & s8d24

131 GlrR5G6B5 Directed OpenGL test for normal 16-bpp color/Z, i.e. r5g6b5 &
d16

132 GlrFsaa2xQx Directed OpenGL test for 32-bpp color/Z, 2x full-scene-anti-


aliasing

133 GlrFsaa4xGs Directed OpenGL test for 32-bpp color/Z, 4x full-scene-anti-


aliasing

135 GlrMrtRgbU Directed OpenGL test for 3-way Multi-Render-Target 32,24,16


[r8g8b8a8,r8g8b8,r5g6b5] & 32-bpp Z

136 GlrMrtRgbF Directed OpenGL test for 2-way Multi-Render-Target floating-


point 64,128 [rgba_F16, rgba_F32] & 32-bpp Z

137 GlrY8 Directed OpenGL test for 8-bpp color (GL_INTENSITY8) & 32-
bpp Z

138 GlrFsaa8x Directed OpenGL test for 32-bpp color/Z, 8x full-scene-anti-


aliasing

139 GlrFsaa4v4 Directed OpenGL test for 32-bpp color, 64-bpp Z, 4v4 VCAA
full-scene-anti-aliasing

140 GlrFsaa8v8 Directed OpenGL test for 32-bpp color, 64-bpp Z, 8v8 VCAA
full-scene-anti-aliasing

141 GlrFsaa8v24 Directed OpenGL test for 32-bpp color, 64-bpp Z, 8v24 VCAA
full-scene-anti-aliasing

142 MultiCellFlipTest This is a directed test which targets specific single-bit failures
found on Hynix memory chips. It attempts to minimize
modifying adjacent cells in the same row when testing target
cells by limiting modifications to dwords from four adjacent
columns (burst length) across all internal banks, external banks,
partitions and lanes. It also modifies a few rows
simultaneously, because it was discovered that row switching

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 59
MODULAR DIAGNOSTIC SOFTWARE (MODS)

increases failure rate.

143 NewCudaMats This is an improved CUDA-based MATS test for testing


framebuffer memory. Device memory is broken into chunks,
each of which is stressed by an individual CUDA thread. The
test fills the memory with every 32-bit pattern from the chosen
pattern set in both ascending and descending directions.

144 CudaMatsPatCom CudaMatsPatCombi is a test based on NewCudaMats which goes


bi through various combinations of pattern pairs. Some pattern pair
combinations are more stressful at exciting single bit memory errors
than others.

145 PerfSwitch This test attempts to switch between available voltage-frequency


points while running another mods test in foreground. There are two
modes for this test:

a. PerfJumps (default mode): The test jumps through the


inflection points on the V-F curve

b. PerfSweep: Sweep through the V-F curves

The default foreground test is GLStress (test 2).

In case of a failure, the error code specifies the foreground test


number.

This test is a replacement for PStateTest (test 107) on GPUs supporting


GPU Boost.

146 PexBandwidth This test uses CopyEngine to saturate the PCIE bus

147 GpuGc6Test Test for GC6 feature. Enter GC6 and exit G6 by various wakeup
events. In each loop, verify FB is not corrupted

148 GlrA8R8G8B8Sys Directed OpenGL test for 32-bpp color Z, with render to System
Memory instead of Frame Buffer

150 MMERandomTest This is a test of the Method Macro Expander, which is a unit at
the front end that can generate pushbuffer methods
programmatically via a small simple language. Random MME
programs are generated and their output is routed to a surface
(rather than to host as pushbuffer methods). This output is then
check against the output of a software MME simulator for

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 60
MODULAR DIAGNOSTIC SOFTWARE (MODS)

consistency

151 Bar1Remapper This test validates that the BAR 1 remapper hardware functions
correctly by creating randomized block linear surfaces with
Test known data and then reading them back in a pitch linear
fashion via the CPU with the BAR 1 remapper correctly
configured.

152 GLStressNoFB GLStress tuned for much reduced FB bandwidth

153 GLPowerStress A more stressful version of GLStress (test 2)

154 CudaL2Mats This tests validates the L2 cache on Fermi and newer GPUs. The
test monitors the number of hits and misses to the L2 cache. In
order for the test to pass, the L2 misses must be under
AllowedMissPercent, which defaults to 10%.

155 EccFbTest This is a test of Frame Buffer ECC logic on ECC-enabled boards.

156 EccL2Test This is a test of L2 cache ECC logic on ECC-enabled boards.

157 NewWfMatsMem This is a variant of test 94 (NewWfMats) that uses the “memory
ToMem to memory format” engine rather than the “2D rendering
engine” to do framebuffer->framebuffer memory copies. This
engine is less efficient and makes the test run slower, but it is
useful for isolating framebuffer problems on GPUs where the
graphics pipeline is not working correctly.

161 NewWfMatsCEO This is a variant of test 94 (NewWfMats) that uses the Copy
nly Engine rather than the “2D rendering engine” or “memory-to-
memory format” to do framebuffer->framebuffer memory
copies.

170 GpuPllLockTest Ensure that NVPLL/HPLL/SPPLL can be locked properly

174 CheckPwrSensor This test validates that the power sensors on the board matches
the description in boards.js

175 GpuResetTest This test validates the suspend resume functionality of the
GPU. Option for reset via XVE or cold reset (like Optimus)

178 WfMatsBgStress This test runs WfMats on Copy Engine while running GLStress

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 61
MODULAR DIAGNOSTIC SOFTWARE (MODS)

on the 3D engine. This hybrid test is yet another way to stress


the framebuffer interface.

180 NewWfMats Runs WfMats in narrow mode. This causes WfMats blits to be
broken into one-pixel wide blits. This causes lower bandwidth,
Narrow but better exercises the FBIO byte-enable lines.

185 CudaRadixTest Stress the GPU by using radix sort algorithm by Duane Merrill

187 CudaMatsShmoo This test is a variant of test 87 (CudaMatsTest) that iterates with
different input parameters to find the most stressful
Test configuration for a given board.

190 DPStressTest This is a CUDA based double precision test. On some Tesla
system, this test was found to be more stressful than GLStress

191 CudaJuliaTest This is a CUDA based double precision test that generates Julia
set fractal images

198 CudaStress2 CudaStress2 is a variant of test CudaStress (test 98). It is tuned


for more stress on Fermi and newer GPUs.

202 GLRandomGCx GLRandom test with GC6 bubbles

205 I2MTest This tests the InlineToMemory class on Kepler GPUs.

225 MSENCTest Test for Video Encoder Engine. This test runs four streams with
different H.264 coding CAVLC and CABAC.

227 CudaColumn This test loops test 127 (CudaColumnTest) with various
parameters in an attempt to find specific types of DRAM faults.
ShmooTest

231 GlrLayered GLRandom for Layered randering

247 GpuGc5 Test for GC5 feature. Enter/exit GC6 while verifying the
wakeup reason is correct and verify that FB is not corrupted.

275 BoostBaseClockTe GLStress based test to hit Boost target temperature, clocks,
st voltage, and fan speeds.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 62
MODULAR DIAGNOSTIC SOFTWARE (MODS)

277 NVDECTest Test for the Maxwell+ NvDec engine

278 NVENCTest Test for the Maxwell+ NvEnc engine

286 FillRectangleTest Test for GL_FILL_RECTANGLE_NV with multi-sampled cases

287 MsAAPR Test GL Path rendering on multisampled render buffers

289 GLPRTir Test for target independent rasterization

293 I2cDcbSanityTest Test to ensure all the I2C devices in the DCB table are stuffed

295 MscgMatsTest Test to exercise MSCG

296 CudaLinpackPuls Single precision CudaLinpack Pulse


eSP

347 GcxTest New generation of GC6/5 test – intermix the two power saving
states.

999 This test number is reserved for external scripts.

5.0 TEST RESULT


As each test is executed, it is logged to the log file when it begins, and when it ends. By
default the log file name is mods.log. The log file name can be changed via the ‘-l’
command line argument to MODS.

When a test begins, the following message is printed to the log file. The portion in
brackets is only printed if the –time command-line option is used.

Enter FastMatsTest.Run [Thu Jan 11 14:42:47 2001]

When a test ends, the following message is printed to the log file.

Exit 19083: FastMatsTest.Run golden value miscompare [5.293 seconds]

In this case, 19083 is the error code of the test. “FastMatsTest.Run” is the name of the
test. “golden value miscompare” is a description of the error code. The time in brackets

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 63
MODULAR DIAGNOSTIC SOFTWARE (MODS)

is how long the test took to execute. The execution time is only displayed if the –time
command-line option is used.

5.1 Error Codes


MODS error codes are now 12 digits long. These include
1. One digit to signify whether this test was
a. run at static voltages and clocks
b. switching between inflection points on a V-F curve
c. Sweeping the V-F curve
2. Two digits to specify the V-F index at which the test failed. This value is 0 if we
are using legacy P-states.
3. Two digit to represent P-states used during this run
4. Three digit test number
5. Three digit error number

P-states are generally 0, 8 or 12, but others are possible. The test numbers start at 1 and
end at 227. Errors are between 1 and 999.

For example, an error code of 201208119083 would mean that the CudaRandom, test 119,
failed in PerfSweep function of test 145, while at index 12 on VF point table, in p-state 8
with an 83 error, which a golden value miscompare.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 64
MODULAR DIAGNOSTIC SOFTWARE (MODS)

The test number 999 is reserved for scripts and tools external to MODS. Error codes like
“999123” are not returned by MODS itself.

6.0 DEBUGGING TECHNIQUES

If a card fails, take a look at the log, then attempt deduce what could be wrong. If the log
is very long, attempt to look for keywords like “failure” or “error”.

You may attempt to isolate to whether the problem is display related. Adding –
null_display would disable display.

You may attempt to isolate using –test or testspec to find out which test is catching the
problem on the graphics card.

You may attempt to isolate whether the problem is perf related.You may wish to try
these experiments:

Try lowering dramclk and gpc2clk:


 mods gputest.js –mfg -dramclk 100 -gpc2clk 200

 mods gputest.js –mfg –dramclk_percent 85 –gpc2clk_percent 85

Note: Some DDR Drams require that the dramclk be above a certain frequency for the
DLL to work. Furthermore, some products require that you keep dramclk and gpc2clk
in less than a 2:1 ratio.

Try looping the test that is failing.


 mods gputest.js -mfg -test 5 -loops 100

Look at the debug-level mods.log output file.


 mods -C gputest.js –mfg

 mods gputest.js –mfg –verbose

 mods –d gputest.js –mfg

If memory tests are failing, you can get extra information on the failure in the log file by
using the –matsinfo command-line option.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 65
MODULAR DIAGNOSTIC SOFTWARE (MODS)

 mods gputest.js –mfg –matsinfo

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 66
MODULAR DIAGNOSTIC SOFTWARE (MODS)

7.0 STAND-ALONE MATS


In certain situations MODS cannot initialize the GPU due to marginal frame buffer
interface timings or defective memory. In such situations you can try running the stand
alone MATS. One will have to boot the GPU as primary in order to run MATS. This
utility will do a rudimentary test of the framebuffer. It prints its results to the screen
and also to a file named “report.txt.” This utility is only available for Linux.
Stand-alone mats.exe produces an output file called “report.txt” that contains data about
which framebuffer bits failed.

It is not usually necessary to test the entire framebuffer to collect enough error statistics
to be useful. The user can run “mats –c 1” which will test 1% of memory distributed
throughout the framebuffer. This is useful because it will complete in a very short time
and still produce meaningful debug information in the report.txt file.

8.0 GPU TESTS


This section describes special tests that require unusual configurations or user activity.

8.1 HDMI
MODS includes an HDMI test that uses audio loopback. It requires extra hardware and
setup. Since HDMI-audio requires us to send a SPDIF signal into the board, the test
requires a motherboard that meets the following requirements:

The motherboard must have an electrical SPDIF-out port.

The GPU being tested must have an embedded Azalia audio controller.

To run the loopback test, you must:

Connect the headphone jack on the HDMI display to the line-in or mic-in jack on the
motherboard.

Run either "mods gputest.js -engr -test 84" or "mods gputest.js -engr -test 85". (The
latter runs all the usual gputest tests in addition to the hdmi tests.)

There are some known issues due to differences between various motherboards and
displays:

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 67
MODULAR DIAGNOSTIC SOFTWARE (MODS)

1. Try the loopback test without the headphone-jack cable. You should hear a hum as
long as the display volume is high enough. This step ensures MODS is correctly
driving on audio signal over HDMI.

2. There's no telling which port number is used for the line-in or mic-in jack. Try
connecting the audio cable and run MODS with different "-line_in" values (-line_in 0,
-line_in 1, etc) until you get a successful loopback -- or at least an error message that
says something like "frequency mismatch" rather than "unexpected silence".

3. Finally, you may need to adjust the volume of the display. It needs to be loud enough
so that the motherboard can "hear" it, but not so loud that the signal gets clipped. An
"unexpected silence" error indicates the volume's too low, while a "frequency
mismatch" error indicates the volume's too high. (A good starting point is to run the
test while listening to a pair of headphones attached to the display. Find a volume
that can be heard but isn’t painfully loud.)

8.2 HDCP
Some graphics cards support an encryption protocol called HDCP (High Definition
Content Protection). This protocol encrypts data between an HDCP-enabled digital flat
panel and an HDCP-enabled graphics card.

The only way to test HDCP is to enable it with an HDCP-enabled display attached. One
of the goals of MODS is to enforce a textbook-correct test by default. Therefore, an
HDCP test is run automatically on HDCP-enabled cards. The upshot of this is that you
must have an HDCP-enabled display attached when testing an HDCP-enabled card or
MODS will fail. If the user does not want this behavior, then he or she should explicitly
skip HDCP testing using the “-skip 24” command-line argument.

There are three HDCP tests in MODS:

1. A key exchange test. This is done with “mods gputest.js –hdcp_keys”. This test does
not actually enable HDCP, it only does a key exchange and passes if the exchange
was successful. If it passes, the key selection vectors (Aksv and Bksv) are printed in
the log file. There are many types of manufacturing faults that cannot be caught by
this test.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 68
MODULAR DIAGNOSTIC SOFTWARE (MODS)

2. The default HDCP test. This test does the key exchange above, then enables HDCP.
If the hardware detects that HDCP was successfully enabled, then the test passes.
This test will catch most (but not all) types of manufacturing problems. In particular,
there is a rare type of defect that can occur when the key exchange and enabling of
HDCP are both successful, but there will be snow on the screen.

3. The interactive test. This is does the key exchange and enables HDCP, then prompts
the user to ensure that the display looks OK. The key selection vectors and the HDCP
status (pass or fail) are displayed on the screen. This test is enabled with “mods
gputest.js –check_displays”. See section 9.3 below for more information on
interactive display tests.

There are some displays by specific manufacturers that are slow to enable HDCP
encryption. If you are having problems with a specific display, try using the following
command line arguments individually or in combination:

-hdcp_delay 2000

-hdcp_timeout 5000

8.3 Interactive Display Testing


MODS supports several interactive display tests. There are some types of hardware
faults that can only be detected with an interactive test. This is a list of interactive
display tests and the command lines to enable them:

mods gputest.js -check_display Display a slanted red, white and blue pattern on
the primary display and prompt the user if it is
OK.

mods gputest.js -check_display_bar Display vertical bars on the primary display and
prompt the user if it is OK.

mods gputest.js -check_display_bars Display vertical bars iteratively on all possible


display combinations and prompt the user if each
one is OK.

mods gputest.js -check_displays Display a slanted red, white and blue pattern on
all possible display combinations and prompt the
user if each one is OK.

mods gputest.js -check_fp_gray Display various black, white and gray geometry
on all detected DFPs one at a time and prompt

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 69
MODULAR DIAGNOSTIC SOFTWARE (MODS)

the user if each one is OK. The pattern displayed


has been demonstrated to detect some types of
TMDS timing and noise issues.

mods gputest.js -check_fp_stripes Display various a special TMDS stress pattern on


one at a time and prompt the user if each one is
OK. The pattern displayed has been
demonstrated to detect some types of TMDS
timing and noise issues.

9.0 PERFPOINT TESTING AND TEST SPECIFICATIONS


MODS defaults to iterating over many “PerfPoint”s. A PerfPoint is a set of performance
(clock) settings.

By default, -mfg will run tests at each memory-clock setting (pstate) twice: once at max
shader clocks and voltage, and again at min voltage for that pstate. The PerfPoint
testing infrastructure is built on top of test specifications. Test specifications are lists of
tests that control when a given test is run. A simple example of a test specification is
shown below:

function addEngrStressTests(spec, perfPoints)


{
spec.AddTests(["RmStress"
,"WfMatsMedium"
,"GLStress"
,"GLStressPulse"
,"NewWfMatsNarrow"
,"GLRandomCtxSw"
]);
}

We can also control the parameters for each test. We have two additional tests that allow
the user to set the PerfPoint and to run user defined functions.

function addSltTests(spec, perfPoints)


{
spec.AddTest("RunUserFunc", {"UserFunc": SetFanSpeed, "PctOfMax": 42});
spec.AddTest("SetPState", {"InfPts": perfPoints[0]});
addSltPerPStateTests(spec);

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 70
MODULAR DIAGNOSTIC SOFTWARE (MODS)

spec.AddTest("SetPState", {"InfPts":perfPoints[1]});
spec.AddTest("RunUserFunc", {"UserFunc": SetFanSpeed, "PctOfMax": 100});
addSltPerPStateTests(spec);
}

The standard test ones are visibile in gpulist.js The following contains examples of how
test specifications can be used.

function addDvsTests(spec, perfPoints)


{
spec.AddTests(["EvoSli"
]);
addEngrTests(spec);
spec.RemoveTests(["FuseRdCheck"
,"GLStressPulse"
,"GLRandomCtxSw"
]);
}

function addSltTests(spec, perfPoints)


{
spec.AddTest("RunUserFunc", {"UserFunc": SetFanSpeed, "PctOfMax": 42});
spec.AddTest("SetPState", {"InfPts": perfPoints[0]});
addSltPerPStateTests(spec);
spec.AddTest("SetPState", {"InfPts":[1]});
spec.AddTest("RunUserFunc", {"UserFunc": SetFanSpeed, "PctOfMax": 100});
addSltPerPStateTests(spec);
}

function addSltPerPStateTests(spec)
{
spec.AddTests(["FuseRdCheck"
,"MultiBoardDma"
,"SMRom"
,"ElpgGraphicsStress"
,"DeepIdleStress"
,"RmStress"
,"WfMatsMedium"
,"GLStress"
,"GLStressPulse"
,"NewWfMatsNarrow"
,"GLRandomCtxSw"
]);
addEngrComputeTests(spec);
spec.AddTests(["CheckFbCalib"

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 71
MODULAR DIAGNOSTIC SOFTWARE (MODS)

]);
}

function SetFanSpeed()
{
var rc;
var g = this.BoundGpuSubdevice;
CHECK_RC(g.Thermal.SetCoolerPolicy (Thermal.CoolerPolicyManual));
CHECK_RC(g.Thermal.SetFanSpeed(this.PctOfMax));
Out.Printf(Out.PriHigh, "SetFanSpeed: %d pct, trying for %d pct \n",
g.Thermal.FanSpeed, this.PctOfMax);
return OK;
}

There are two functions to allow users to write out and read back in their own test
specifications.

mods gputest.js -mfg -savespec {filename} ...

-savespec will save the identified specification to a filename specified by the user.

mods gputest.js -readspec {filename} ...

-readspec uses the test specification as defined in filename. If readspec is is used, then
do not specify a test specification (-mfg, -slt, etc); this will override the specification in
the user defined file.

Using these two functions, an enduser can write out a test specification, modify it as
they see fit, and then use it for their testing.

10.0 CONCURRENT TESTING


Concurrent MODS gives a user the ability to run multiple MODS tests simultaneously
on one or multiple GPUs. These tests can either be setup manually on the command-line
or MODS can try to run as many tests concurrently as possible.

Within concurrent MODS, there are two different types of threads: foreground and
background. A background thread is controlled by a foreground thread and will be
stopped once the foreground thread finishes its execution. The specific details of these
background threads can be controlled through the various command-line arguments
listed below.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 72
MODULAR DIAGNOSTIC SOFTWARE (MODS)

Note: By default, concurrent MODS is not turned on. You must specify specific
command-line arguments listed below to enable concurrent MODS.

10.1 Command-line Arguments


If an argument is listed as "device sensitive", that means its placement relative to a "-dev
Y" is important. If the argument comes before any "-dev Y" options, it will be applied to
every GPU. If it comes after a "-dev Y" argument, it will only be applied to that specific
device.

Any MODS arguments that are NOT device sensitive, must come before any "-dev Y"
are used on the command-line.

The last "-dev Y" used on the command-line will set the primary GPU tested by MODS
unless you're running with the "-concurrent_devices" argument.

-bgfunc X

A device sensitive argument that will run the given function X as a background
thread. Note, this function needs to call "this.SignalSetupCompleteAndWait()" at
some point. It should also check "this.KeepRunning" to know when to stop.

function foo()

// Initial setup

this.SignalSetupCompleteAndWait();

do

// Do stuff

} while (this.KeepRunning);

-bgtest X

A device sensitive argument that will run test number X as a background thread.

Example, to run Random2D as a background thread on device 1 and run


GpuDmaTest as a background thread on all devices:

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 73
MODULAR DIAGNOSTIC SOFTWARE (MODS)

mods gputest.js -mfg -bgtest 61 ... -dev 1 -bgtest 58 ...

-bgtest_flags X Y

Similar to -bgtest, but takes a comma-separated list of flags for the second
argument

Current flags are

disp - Display the background test

roe - Run On Error, continue running the background test even if it fails

mods gputest.js -mfg -bgtest_flags 16 disp,roe ...

-concurrent_devices

This argument will run the tests (the set of tests can be different per GPU) on
each GPU in the system, concurrently.

-threadid

This argument simply prepends the ID of the calling thread to the beginning of
each line of text in the log. This is useful for seeing which test or GPU printed
which line.

You will see the name and ID of each thread listed with +++ thread_name
thread_ID +++ in the log

+++ main_tests 10 +++

+++ bg_tests_0 11 +++

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 74
MODULAR DIAGNOSTIC SOFTWARE (MODS)

10.2 Command-line Examples


Run test 16 as a "background" thread on device 1 with display and ignore errors, run the
full MODS suite minus a few tests on device 0. Also print thread information. (Note:
since “-dev 0” comes last on the command-line, it will be the primary/foreground GPU
tested by MODS.)

 mods gputest.js -mfg -threadid -dev 1 -bgtest_flags 16 disp,roe -dev 0

-skip 24 -skip 17

Run the full MODS suite on all GPUs in the system concurrently
 mods gputest.js -mfg -concurrent_devices

Run Random2d and GpuDma on device 0 sequentially, Run Random2d, GLStress and
TurboCipher on device 1 concurrently. Have both GPUs running their set of tests at the
same time.
 mods gputest.js -mfg -concurrent_devices -test 58 -dev 0 -test 61 -dev 1

-test 2 -test 79 -concurrent

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 75
MODULAR DIAGNOSTIC SOFTWARE (MODS)

11.0 ERROR CODES


1 exit 63 NVRM invalid owner
2 software error 64 NVRM invalid heap
3 function is not supported 65 NVRM multiple memory types
4 did not install singleton 66 NVRM object has children
5 bad command line argument 67 NVRM object in use
6 on entry failed 68 NVRM operating system error
7 bad help string 69 NVRM protection fault
8 bad parameter passed to function 70 NVRM dual link in use
9 cannot allocate memory 71 NVRM gpu not full power
10 cannot open file 72 NVRM invalid dma specifier
11 file does not exist 73 NVRM no free memory
12 failed while reading a file 74 NVRM generic error
13 cannot log method 75 NVRM irq not firing
14 cannot log functions 76 error occurred while preprocessing file
15 method is still being logged 77 timeout error
16 user aborted the script 78 unsupported depth
17 could not create JavaScript engine 79 unsupported surface offset
18 could not create a JavaScript method 80 unsupported color format
19 could not create a JavaScript object 81 unsupported DOS configuration (EMM is loaded or
20 could not initialize the JavaScript standard classes XMM is missing)
21 script failed to execute 82 stored CRC/Checksum not found
22 script failed to compile and execute 83 CRC/Checksum miscompare
23 could not compile file 84 file parse error
24 cannot convert integer to a jsval 85 syntax error in FancyPicker configuration
25 cannot convert jsval to an integer 86 incorrect file format
26 cannot convert boolean to a jsval 87 failed while writing a file
27 cannot convert jsval to a boolean 88 failed to copy memory
28 cannot convert jsval to a float 89 bad data in trace file or unsupported trace file feature
29 cannot convert float to a jsval 90 unsupported 3D primitive class
30 cannot convert jsval to a string 91 failed to render a solid rectangle
31 cannot convert string to a jsval 92 cannot disable user interface
32 cannot convert jsval to an array 93 cannot enable user interface
33 cannot convert array to a jsval 94 memory location must be one of Memory::Fb,
34 cannot convert jsval to an object Memory::Coherent or Memory::NonCoherent
35 cannot convert jsval to a function 95 golden value miscompare in Z buffer
36 invalid object property 96 test configuration has invalid channel type, try
37 cannot enumerate object TestConfiguration.DmaChannel
38 cannot get element 97 unexpected device interrupts
39 cannot set element 98 cannot initialize OpenGL
40 bad format specification 99 unknown GL error
41 cannot hook interrupt 100 OpenGL error INVALID_ENUM
42 did not initialize resource manager 101 OpenGL error INVALID_VALUE
43 did not initialize resource manager hardware abstraction 102 OpenGL error INVALID_OPERATION
layer 103 OpenGL error STACK_OVERFLOW
44 did not map device in to resource manager 104 OpenGL error STACK_UNDERFLOW
45 did not initialize client 105 OpenGL error OUT_OF_MEMORY
46 NVRM invalid base 106 OpenGL error TABLE_TOO_LARGE
47 NVRM invalid class 107 OpenGL util error INVALID_ENUM
48 NVRM invalid client 108 OpenGL util error INVALID_VALUE
49 NVRM invalid device 109 OpenGL util error OUT_OF_MEMORY
50 NVRM invalid event 110 OpenGL util error INVALID_OPERATION
51 NVRM invalid flags 111 OpenGL util error NURBS_ERROR(n)
52 NVRM invalid index 112 OpenGL util error TESS_ERROR(n)
53 NVRM invalid limit 113 OpenGL util error TESS_MISSING_BEGIN_POLYGON
54 NVRM invalid object buffer 114 OpenGL util error TESS_MISSING_BEGIN_CONTOUR
55 NVRM invalid object error 115 OpenGL util error TESS_MISSING_END_POLYGON
56 NVRM invalid object new 116 OpenGL util error TESS_MISSING_END_CONTOUR
57 NVRM invalid object old 117 OpenGL util error TESS_COORD_TOO_LARGE
58 NVRM invalid object parent 118 OpenGL util error TESS_NEED_COMBINE_CALLBACK
59 NVRM invalid offset 119 RestartPointLoops must be > 0
60 NVRM invalid param struct 120 A description for this hardware could not be found.
61 NVRM insufficient resources 121 This GPU has an invalid configuration
62 NVRM invalid function 122 Invalid encryption key

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 76
MODULAR DIAGNOSTIC SOFTWARE (MODS)

123 Decompressed data differs from expected results. 187 cannot set graphics clock
124 Invalid InfoROM 188 bad dac
125 No display device is connected 189 invalid channel
126 CRC capture failed 190 invalid subchannel
127 Vbios Certificate Error 191 bad format
128 Invalid input 192 put caught up to get
129 Invalid input (driver level) 193 invalid ram amount
130 Invalid input (test level) 194 bad memory
131 cannot allocate event 195 EDVR: system error
132 Robust channel Unexpected Error 196 ECIC: not CIC or lost CIC during command
133 HDCP did not operate properly 197 ENOL: write detected no listeners
134 EDC detected a memory-bus error 198 EADR: board not addressed correctly
135 Encryption and/or decryption of data failed 199 EARG: bad argument to function call
136 Request for Power state change failed. 200 ESAC: function requires board to be SAC
137 invalid window 201 EABO: asynchronous operation was aborted
138 A read/write to a register failed. 202 ENEB: non-existent board
139 Acceptable temperature limits exceeded or the thermal 203 EDMA: DMA hardware error detected
sensor is broken or miscalibrated 204 EBTO: DMA hardware uP bus timeout
140 Unused error code 140 205 EOIP: new I/O with old I/O in progress
141 The only devices found in the system are obsolete 206 ECAP: no capability for intended operation
142 Display mode is not possible 207 EFSO: file system operation error
143 PCI Express bus error 208 EOWN: Shareable board exclusively owned
144 CUDA error 209 EBUS: bus error
145 cuInit failed 210 ESTB: serial poll queue overflow
146 cuDeviceGet failed 211 ESRQ: SRQ line 'stuck' on
147 cuCtxCreate failed 212 ETAB: the return buffer is full
148 cuFuncGetByName failed 213 ELCK: board or address is locked
149 A specific test was requested to run, but was skipped. 214 unknown GPIB Error
150 No tests were run. 215 could not allocate a buffer
151 Primary surface already in use 216 Could not find the specified device
152 USB invalid RhPort 217 pci bios is not present
153 Display HW in use by another test 218 pci function is not supported
154 compute test failed 219 pci invalid vendor identification
155 Test exceeded the expected threshold 220 pci device not found
156 Test exceeded the maximum number of allowed memory 221 pci invalid register number
leaks 222 cpuid instruction is not supported
157 This board needs to be reflashed with different vbios 223 cpu does not support MTRR
158 CRC values are not unique 224 cpu is not supported
159 NVRM display is not ready. 225 invalid register number
160 Resource is reserved by another thread or test 226 invalid address
161 NVRM invalid address. 227 could not map physical address
162 USB Reg_Bits not set as expected 228 could not free physical memory map
163 USB reg not set as expected 229 hardware was not initialized
164 USB setup packet fail 230 invalid graphics aperture base
165 ECC detected a single-bit error 231 invalid graphics aperture size
166 ECC detected a double-bit error 232 wrong bios
167 USB DataIn packet fail 233 bad NVIDIA chip
168 USB DataOut packet fail 234 error occurred while reading or writing serial data
169 registry key not found 235 could not set environment variable
170 registry error 236 the expected value and the destination memory value do
171 incorrect rom version not match
172 golden check found bad pixel, continuing 237 unable to set mode
173 stored golden values have wrong NumCodeBins 238 specified video mode not found in mode timings table
174 golden value miscompare 239 invalid display type
175 invalid z pitch 240 invalid tv standard
176 IRQ not assigned 241 invalid head
177 invalid IRQ 242 failed to set image offset
178 invalid NV base address 243 failed to disable the cursor
179 invalid NV size 244 feature is not supported in the hardware
180 invalid FB base address 245 TIMEOUT: Timeout occurred on WaitSRQ
181 invalid max AGP requests 246 SRQ from Unknown source.
182 cannot set state 247 Javascript method is not defined
183 invalid AGP request dept 248 Bad SOR - CRC miscompare
184 invalid AGP data rate 249 AUDIO all descriptor entries have buffer
185 cannot set pixel clock 250 AUDIO no valid buffer in descriptor.
186 cannot set memory clock 251 AUDIO invalid 16bit sample number.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 77
MODULAR DIAGNOSTIC SOFTWARE (MODS)

252 CANNOT enable Io or Mem Space. 317 Unused error code 317
253 CANNOT enable Bus Master. 318 Unused error code 318
254 MemSize detected an invalid framebuffer size. 319 Unused error code 319
255 AUDIO not any buffer get freed. 320 Unused error code 320
256 MODEM all descriptor entries have buffer 321 Unused error code 321
257 Unused error code 257 322 Unused error code 322
258 MODEM not any buffer get freed. 323 Unused error code 323
259 Golden testname or recname too long. 324 Unused error code 324
260 CODEC NOT ready. 325 Unused error code 325
261 golden value miscompare in instance memory 326 Generic I2C error
262 oven communication error 327 TIMER TEST Invalid Counter number
263 couldn't reach target temperature 328 TIMER TEST No counter value Returned
264 temperature value not valid 329 TIMER TEST timer ticket number doesn't match the
265 CRC error while communicating with oven expected
266 must first initialize oven 330 Audio Invalid Aci Type
267 PMU device failure, operation attempted failed 331 Hardware does not support this FSAA mode
268 Invalid Bar(s) assigned to device 332 Unused error code 332
269 No Sub Devices found 333 Unused error code 333
270 Acoustic test failed, noise too high 334 Unused error code 334
271 Sub Device Index Invalid 335 Unused error code 335
272 Read parameter differs from expected 336 Unused error code 336
273 Clock speed below specified limit 337 Pool CANNOT allocate anymore memory
274 Current MODS version doesn't support this Tegra 338 Pool exceed maxim size
version 339 Pool invalid request size
275 HW entries have run out 340 Pool Invalid address to free
276 HW reports wrong status 341 Buffer mismatch
277 Error bit set in status register after command was issued 342 PMU Test Failed
278 Interrupt status differs from expected 343 Audio Requested channels cannot be enabled
279 No free head available 344 Unused error code 344
280 Power above specified limit 345 Unused error code 345
281 Temperature above specified limit 346 Unused error code 346
282 Performance varies from expected value 347 The Current Codec doesn't have loopback mode.
283 Incorrect OpenGL driver version. 348 Unused error code 348
284 unsupported system configuration 349 Out of date golden file.
285 NVRM buffer too small 350 incorrect chip revision
286 NVRM reset required 351 memory not strapped correctly
287 NVRM invalid request 352 AUDIO Loopback test amplitude mismatch
288 Power is below specified limit 353 Unused error code 353
289 Display underflow detected 354 Unused error code 354
290 Unused error code 290 355 Audio Processing Unit timeout
291 Unused error code 291 356 Audio Processing Unit CRC miscompare
292 Unused error code 292 357 Audio Processing Unit failed to get resources
293 Data too large. 358 Audio Processing Unit error
294 Cannot use loops with PIO channel. 359 Each board description must be unique
295 Must set a jump point before writing a jump. 360 Audio timeout Error
296 Subsequent channel writes wrote over jump location. 361 Unused error code 361
297 No loop to stop. 362 Unused error code 362
298 Usb port not connected to any device 363 Unused error code 363
299 Usb Test Fail at configuration 364 Audio CODEC power down register has wrong value
300 AUDIO Test Fail 365 CRTC FIFO underflow occurred
301 AUDIO Loopback test frequency mismatch 366 The order of commands in the MPEG stream was not
302 Drive test failed correct
303 MODEM Test Fail 367 Found a bad command in the MPEG stream
304 MODEM Loopback test frequency mismatch 368 MPEG hardware sent the wrong number of notifiers
305 incorrect subsystem id 369 Audio Resource Manager initialization failed
306 Ism experiment is not complete 370 bad stereo glasses connector
307 Timed out waiting for MINI Isms to complete 371 Device Register PIO Access not enabled
308 InfoROM not found 372 Device Register Memory Access not enabled
309 Unused error code 309 373 Device DMA not enabled
310 Unused error code 310 374 Not High Speed Device connected to Usb2 port
311 Unused error code 311 375 The user determined that the TV quality was
312 bad index into FancyPicker array unacceptable
313 Unused error code 313 376 Unused error code 376
314 Unused error code 314 377 Unused error code 377
315 Unused error code 315 378 Unused error code 378
316 Unused error code 316 379 Unused error code 379

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 78
MODULAR DIAGNOSTIC SOFTWARE (MODS)

380 Unused error code 380 444 Unused error code 444
381 Unused error code 381 445 Unused error code 445
382 Unused error code 382 446 incorrect mode
383 Unused error code 383 447 incorrect vga windows
384 Unused error code 384 448 File size would become larger tha the implementation
385 Unused error code 385 can support.
386 Unused error code 386 449 File exists but cannot be accessed with given flags.
387 Unused error code 387 450 File write followed a nonblocked write before the latter
388 Unused error code 388 was complete.
389 Unused error code 389 451 File argument isn't valid file descriptor or isn't open for
390 Unused error code 390 writing.
391 Unused error code 391 452 File device or resource is busy.
392 Unused error code 392 453 No child process.
393 Unused error code 393 454 File deadlock.
394 Unused error code 394 455 File open with O_CREAT and O_EXCL set but the file
395 Unused error code 395 already exists.
396 Unused error code 396 456 File bad address.
397 Unused error code 397 457 File is too large.
398 Unused error code 398 458 File operation was interrupted by a signal.
399 Unused error code 399 459 File argument not valid.
400 Unused error code 400 460 File I/O error
401 Unused error code 401 461 The open operation was interrupted by a signal.
402 Unused error code 402 462 The process has too many files open.
403 Unused error code 403 463 Too many file links.
404 Unused error code 404 464 Filename is too long.
405 Unused error code 405 465 The system has too many files open.
406 Unused error code 406 466 No such device in file operation.
407 Unused error code 407 467 No such file or directory.
408 incorrect TV encoder type 468 Exec() format error in file operation.
409 Unused error code 409 469 The system has run out of file lock resources.
410 Unused error code 410 470 Not enough memory for file operation.
411 Unused error code 411 471 Not enough disk space left.
412 Unused error code 412 472 File function not implemented.
413 Unused error code 413 473 File argument is not a directory.
414 Remote Controller Test Not ALL Key were tested. 474 Directory isn't empty.
415 Remote Controller Test Key Pressed Mismatch expected. 475 Inappropriate I/O control operation.
416 Remote Controller Test Register value Mismatch 476 No such device or address in file operation.
expected. 477 File operation not permitted.
417 Network is not initialized. 478 Write to pipe or FIFO that isn't open for reading by any
418 Network cannot create socket. process
419 Network socket cannot bind to the specified port. 479 File on read-only file system and invalid flags are set.
420 Network socket cannot connect to peer. 480 Illegal file seek.
421 Network socket is not connected. 481 Invalid process during file operation.
422 Network socket is already connected. 482 Invalid cross-device link during file operation.
423 Network read error. 483 Unknown file error.
424 Network write error. 484 golden value miscompare on 2nd GPU
425 Network cannot determine host address. 485 golden value miscompare in Z buffer on 2nd GPU
426 A network error has occurred. 486 timeout waiting for notifier from GPU
427 Unused error code 427 487 timeout waiting for notifier from 2nd GPU
428 Data vector size mismatch expected. 488 Cannot access device registers.
429 Data vector value miscompare with expected. 489 the memory or frame buffer interface is marginal
430 error occurred trying to write a call pushbuffer 490 Cannot set AGP data rate.
instruction 491 Cannot set AGP sideband addressing mode.
431 not enough pushbuffer memory 492 Cannot set AGP fastwrite mode.
432 cdrom audio quality was unacceptable 493 Couldn't lock on to the input signal.
433 avpod audio quality was unacceptable 494 Couldn't lock on to the chroma data.
434 tuner audio quality was unacceptable 495 Actual crystal value does not match the strapped crystal
435 Unused error code 435 value.
436 Unused error code 436 496 invalid display mask
437 Unused error code 437 497 failed to get image offset
438 Unused error code 438 498 Invalid device Id
439 vbe call failed 499 SBIOS test failed
440 wrong vbe signature 500 A problem has been detected in the array of tests
441 wrong vbe version 501 Test failed due to an already-known problem.
442 Unused error code 442 502 Invalid Mfgtest test number
443 Unused error code 443 503 Invalid Mfgtest test mode

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 79
MODULAR DIAGNOSTIC SOFTWARE (MODS)

504 Unused error code 504 570 GPU channel software method parameter error.
505 AUDIO Loopback Left and Right Channel Crossed 571 Unused error code 571
506 Unused error code 506 572 The required function is not supported by present
507 Unused error code 507 CODEC.
508 Invalid Chip Version 573 Audio CODEC failure.
509 Not an NV Device 574 Unused error code 574
510 Test Cannot run on this Tegra Chip Version 575 Unused error code 575
511 Required chip library interface not found 576 Audio Test Invalid loopback Mode.
512 Unused error code 512 577 Could not acquire I2C port.
513 Unused error code 513 578 I2C SCL pull-up resistor missing.
514 Unused error code 514 579 I2C SDA pull-up resistor missing.
515 Unused error code 515 580 The auxiliary power connector is not plugged in.
516 Unused error code 516 581 can not generate golden values using an official release
517 Unused error code 517 582 gpu stress test found pixel miscompares
518 Usb Port mapping value is wrong. 583 thermal sensor reports overheating
519 Unused error code 519 584 Unused error code 584
520 Unused error code 520 585 failed to capture internal TV encoder crc
521 Number of Channel and number of input mismatch. 586 the internal TV encoder is bad
522 Unused error code 522 587 Smbus Cannot set DDC base.
523 Unused error code 523 588 invalid EDID
524 Unused error code 524 589 FramLock Test Check Reg Fail
525 Unused error code 525 590 FramLock Test Invalid DispalySync Unit of Invalid
526 Unused error code 526 Displays
527 System Control Invalid IO Base. 591 Unused error code 591
528 Unused error code 528 592 FramLock Test Set display(s) to Master fail
529 Unused error code 529 593 FramLock Test Set display(s) to Slave fail
530 Unused error code 530 594 FramLock Test Loopback Test fail
531 Usb invalid device. 595 FramLock Test Sync Test fail
532 Unused error code 532 596 FramLock Test Sync Test, User and Auto result mismatch
533 Unused error code 533 597 NVRM not supported
534 Unused error code 534 598 Unused error code 598
535 Unused error code 535 599 fan does not seem to cool the chip
536 Unused error code 536 600 Usb failure related to port mapping, port number.
537 Unused error code 537 601 Acpi timer failure.
538 Unused error code 538 602 NVRM bad channel
539 Unused error code 539 603 NVRM timeout
540 Unused error code 540 604 the counter overflowed
541 Unused error code 541 605 the frequency is incorrect
542 Unused error code 542 606 API call never returned
543 Unused error code 543 607 Bad compression-tag-ram in GPU
544 Unused error code 544 608 Interrupt request line stuck asserted
545 Unused error code 545 609 Interrupt request mechanism does not work
546 Unused error code 546 610 Unused error code 610
547 Invalid CPU Frequency measured. 611 Unused error code 611
548 Unused error code 548 612 Invalid value for Tegra configuration variable(s).
549 Unused error code 549 613 Invalid Tegra configuration filename.
550 Unused error code 550 614 Extra golden code miscompare
551 Unused error code 551 615 Extra golden code miscompare on 2nd GPU
552 Real time clock test failed to restore. 616 Unused error code 616
553 Unused error code 553 617 Unused error code 617
554 Graphics fifo method error. 618 Unused error code 618
555 GPU channel fifo software method error. 619 Unused error code 619
556 GPU channel fifo unknown method error. 620 DLL could not be loaded.
557 GPU channel fifo channel busy error. 621 Unused error code 621
558 GPU channel fifo runout overflow error. 622 Unused error code 622
559 GPU channel fifo parse error. 623 Unused error code 623
560 GPU channel fifo PTE error. 624 Error in VBIOS DCB tables.
561 GPU channel fifo idle timeout error. 625 Unused error code 625
562 GPU channel instance lookup failure. 626 Unused error code 626
563 GPU channel debug single-step. 627 Supplied mode not supported by the display.
564 GPU channel missing hardware error. 628 The framebuffer base address register is too small
565 GPU channel software method. 629 Memory leak detected.
566 GPU channel software notify. 630 Perfmon was already running an experiment
567 GPU channel fake error. 631 Memory access spans page boundary.
568 GPU channel scan line timeout error. 632 Memory access to unmapped page.
569 GPU channel vblank callback error. 633 Write access to read-only page.

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 80
MODULAR DIAGNOSTIC SOFTWARE (MODS)

634 Read access to write-only page. 700 Unused error code 700
635 Unused error code 635 701 Unused error code 701
636 could not create a JavaScript property 702 Unused error code 702
637 Invalid clock domain specified 703 Unused error code 703
638 Perfmon could not be reserved 704 Unhook ISR failed
639 Perfmon was not reserved 705 Unused error code 705
640 Unused error code 640 706 Unused error code 706
641 MsiTest of BR02 Failed. 707 Unused error code 707
642 Atapi Test Error 708 Unused error code 708
643 Unused error code 643 709 selected device is not supported
644 Unused error code 644 710 Unused error code 710
645 Unused error code 645 711 Msi is not supported for this device
646 Unused error code 646 712 Cannot enable Intx in Pci Cfg Space
647 Unused error code 647 713 Cannot enable Msi in Pci Cfg Space
648 Unused error code 648 714 Cannot disable Intx in Pci Cfg Space
649 Unused error code 649 715 Cannot disable Msi in Pci Cfg Space
650 Bad RAM in the GPU. 716 Given Cap. is not supported for this device
651 GPU did not get the expected number of lanes 717 Sata Loopback Test fail
652 Unused error code 652 718 invalid starting number of VPEs and/or SHDs
653 Unused error code 653 719 Read parameter differs from expected
654 Unused error code 654 720 Measured Jitter exceeded maximum amount
655 nvrm invalid parameter 721 Failed genlock
656 nvrm too many primaries 722 Non-GL device on GL board
657 Unused error code 657 723 Codec error detected
658 memory size mismatch expected 724 Stream Error Detected
659 wrong number of TPCs detected 725 Ring Buffer Error Detected
660 wrong number of framebuffer units detected 726 Azalia Test failed
661 memory fragment size mismatch expected 727 Unused error code 727
662 wrong number of ROPs detected 728 Ahci Port Error
663 wrong number of shader pipes detected 729 External drive (hardrive, cdRom, est) error
664 wrong number of vertex engines detected 730 ATA Descriptor table is not initialized
665 wrong number of PCI express lanes detected 731 Unused error code 731
666 incorrect feature set for this SKU 732 Unused error code 732
667 could not set NV_PBUS_FS to the desired values 733 External device does not support the function
668 could not meet floorsweeping requirements 734 External device is not found
669 Requested function not supported by Codec 735 Unused error code 735
670 Requested function not supported by Aci 736 Unused error code 736
671 Error testing L2 cache 737 Unused error code 737
672 Unused error code 672 738 Unused error code 738
673 Unused error code 673 739 Unused error code 739
674 NVRM object not found 740 Unused error code 740
675 NVRM gpu is still busy or possibly hung 741 Unused error code 741
676 NVRM card not present 742 Unused error code 742
677 NVRM in use 743 Unused error code 743
678 NVRM invalid access type 744 Unused error code 744
679 NVRM invalid argument 745 Unused error code 745
680 Unused error code 680 746 Unused error code 746
681 NVRM invalid command 747 Unused error code 747
682 NVRM invalid data 748 Unused error code 748
683 Unused error code 683 749 Unused error code 749
684 NVRM invalid method 750 Unused error code 750
685 NVRM invalid pointer 751 Unused error code 751
686 Unused error code 686 752 Unused error code 752
687 NVRM invalid registry key 753 Unused error code 753
688 NVRM invalid state 754 Unused error code 754
689 NVRM invalid string length 755 Unused error code 755
690 NVRM FB Training Failed 756 Unused error code 756
691 method count too large 757 Unused error code 757
692 pushbuffer too small 758 Unused error code 758
693 Unused error code 693 759 Unused error code 759
694 Unused error code 694 760 GPU channel bus master timeout error.
695 Unused error code 695 761 GPU channel display missed notifier.
696 Unused error code 696 762 GPU channel MPEG software method error.
697 Unused error code 697 763 GPU channel ME software method error.
698 Unused error code 698 764 GPU channel VP software method error.
699 Unused error code 699 765 Unused error code 765

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 81
MODULAR DIAGNOSTIC SOFTWARE (MODS)

766 Unused error code 766 832 FB link training failure


767 Unused error code 767 833 FB memory error
768 Unused error code 768 834 PMU error
769 Invalid command for Tegra device/controller 835 SEC2 error
770 Invalid memory(buffer or dtables) for hw 836 PMU Breakpoint
771 Invalid JsObject for test. 837 PMU Halt error
772 Invalid setup for Tegra test. 838 Inforom Dynamic Page Retirement Event
773 Data/buffer mismatch with expected 839 Inforom Dynamic Page Retirement Failure
774 Tegra test fail 840 NVDEC error
775 SLI generic error 841 FECS Err: Register Access Violation
776 Invalid argument. 842 FECS Err: Verif Method Violation
777 Error configuring overclocking Error codes 900=999 reserved for errors from auxiliary scripts
778 Error while testing board that are not part of the normal MODS build. MODS may
779 Voltage value out of range reclaim these error codes in the future.
780 Peripheral device not found
781 Gpu is already linked in a SLI device
782 Video bridge not present
783 Mismatched GPUs not valid for SLI device
784 GPU channel RC Logging Enabled.
785 GPU channel Semaphore Timeout.
786 GPU channel Illegal Notify.
787 GPU channel fifo FBISTATE Timeout Error.
788 GPU channel VP: Unknown Error.
789 GPU channel Bad Address Accessed Error.
790 GPU channel VP2: Unknown Error..
791 GPU channel BSP: Unknown Error..
792 SEC Error
793 MSVLD Error
794 MSPDEC Error
795 MSPPP Error
796 Fifo: MMU Error
797 PBDMA Error
798 FECS Err: Unimpl Firmware Method
799 FECS Err: Watchdog Timeout
800 GPU channel CE0: Unknown Error.
801 GPU channel CE1: Unknown Error.
802 GPU channel VIC: Unknown Error.
803 GPU channel: Reset Channel Verif Error.
804 GPU channel GR: Fault During Context Switch.
805 OS: Preemptive Channel Removal.
806 OS Indicates GPU Has Timed Out.
807 GPU channel CE2: Unknown Error.
808 GPU channel MSENC: Unknown Error.
809 GPU channel NVENC0: Unknown Error.
810 GPU channel NVENC1: Unknown Error.
811 Unused error code 810
812 Unused error code 811
813 Unused error code 812
814 Unused error code 813
815 Unused error code 814
816 Bad MAC Address Programmed
817 PLL could not be locked
818 Critical Memory failure(used -fail_critical_fb_range)
819 Mods detected an assertion failure
820 ELG call failure or unexpected behavior
821 KD call failure or unexpected behavior
822 GPU Double-bit Error.
823 Reset in progress.
824 Silent running constant level set by registry
825 Silent running level transition due to RC error
826 Silent running stress test failure
827 Silent running level transition due to temperature rise
828 Silent running clocks reduced due to temperature rise
829 Silent running clocks reduced due to power limits
830 Silent running temperature read error
831 Display channel exception

NVIDIA CONFIDENTIAL
MODS
Modular diagnostic software
for 343.X diagnostics MODS.DOCX_R343_v02 | 82
Notice
ALL NVIDIA DESIGN SPECIFICATIONS, REFERENCE BOARDS, FILES, DRAWINGS, DIAGNOSTICS, LISTS, AND OTHER
DOCUMENTS (TOGETHER AND SEPARATELY, “MATERIALS”) ARE BEING PROVIDED “AS IS.” NVIDIA MAKES NO
WARRANTIES, EXPRESSED, IMPLIED, STATUTORY, OR OTHERWISE WITH RESPECT TO THE MATERIALS, AND
EXPRESSLY DISCLAIMS ALL IMPLIED WARRANTIES OF NONINFRINGEMENT, MERCHANTABILITY, AND FITNESS FOR
A PARTICULAR PURPOSE.
Information furnished is believed to be accurate and reliable. However, NVIDIA Corporation assumes no
responsibility for the consequences of use of such information or for any infringement of patents or other
rights of third parties that may result from its use. No license is granted by implication of otherwise under
any patent rights of NVIDIA Corporation. Specifications mentioned in this publication are subject to change
without notice. This publication supersedes and replaces all other information previously supplied. NVIDIA
Corporation products are not authorized as critical components in life support devices or systems without
express written approval of NVIDIA Corporation.

HDMI
HDMI, the HDMI logo, and High-Definition Multimedia Interface are trademarks or registered trademarks of
HDMI Licensing LLC.

Macrovision Compliance Statement


NVIDIA Products that are Macrovision enabled can only be sold or distributed to buyers with a valid and
existing authorization from Macrovision to purchase and incorporate the device into buyer’s products.
Macrovision copy protection technology is protected by U.S. patent numbers 5,583,936; 6,516,132;
6,836,549; and 7,050,698 and other intellectual property rights. The use of Macrovision’s copy protection
technology in the device must be authorized by Macrovision and is intended for home and other limited pay-
per-view uses only, unless otherwise authorized in writing by Macrovision. Reverse engineering or disassembly
is prohibited.

Trademarks
NVIDIA and the NVIDIA logo are trademarks or registered trademarks of NVIDIA Corporation in the U.S. and
other countries. Other company and product names may be trademarks of the respective companies with
which they are associated.

Copyright
© 2011 NVIDIA Corporation. All rights reserved.

www.nvidia.com

You might also like