RAM parity

From Infogalactic: the planetary knowledge core
(Redirected from Extended Error Correction)
Jump to: navigation, search

RAM parity checking is the storing of a redundant parity bit representing the parity (odd or even) of a small amount of computer data (typically one byte) stored in random access memory, and the subsequent comparison of the stored and the computed parity to detect whether a data error has occurred.

The parity bit was originally stored in additional individual memory chips; with the introduction of plug-in DIMM, SIMM, etc. modules, they became available in non-parity and parity (with an extra bit per byte, storing 9 bits for every 8 bits of actual data) versions.

History

Early computers sometimes required the use of parity RAM, and parity-checking could not be disabled. A parity error typically caused the machine to halt, with loss of unsaved data; this is usually a better option than saving corrupt data. Logic parity RAM, also known as fake parity RAM, is non-parity RAM that can be used in computers that require parity RAM. Logic parity RAM recalculates an always-valid parity bit each time a byte is read from memory, instead of storing the parity bit when the memory is written to; the calculated parity bit, which will not reveal if the data has been corrupted (hence the name "fake parity"), is presented to the parity-checking logic. It is a means of using cheaper 8-bit RAM in a system designed to use only 9-bit parity RAM.

Memory errors

In earlier times faulty memory was relatively common, and parity errors, very noticeable to the user, were not infrequent. Since then errors have become less visible as simple parity RAM has fallen out of use; either they are invisible as they are not detected, or they are corrected invisibly with ECC RAM. Modern RAM is believed, with much justification, to be reliable, and error-detecting RAM has largely fallen out of use for non-critical applications. Most machines in the twenty-first century do not support parity or ECC, with consequent risk of data corruption; this has become acceptable as a consequence of the increased reliability of memory. Some machines that support parity or ECC allow checking to be enabled or disabled in the BIOS, permitting cheaper non-parity RAM to be used. If parity RAM is used the chipset will usually use it to implement error correction, rather than halting the machine on a single-bit parity error.

However, as discussed in the article on ECC memory, errors, while not everyday events, are not negligibly infrequent. Even in the absence of manufacturing defects, naturally occurring radiation causes random errors; tests on Google's very many servers found that memory errors were not rare events, and that the incidence of memory errors and the range of error rates across different DIMMs were much higher than previously reported.[1]

Error correction

Simple go/no go parity checking requires that the memory have extra, redundant bits beyond those needed to store the data; but if extra bits are available, they can be used to correct, as well as detect, errors. Earlier memory as used in, for example, the IBM PC/AT (FPM and EDO memory) were available in versions that supported either no checking or parity checking[2] (in earlier computers that used individual RAM chips rather than DIMM or SIMM modules, extra chips were used to store parity bits); if the computer detected a parity error it would display a message to that effect and stop. The SDRAM and DDR modules that replaced the earlier types are usually available either without error-checking or with ECC (full correction, not just parity).[2]

An example of a single-bit error that would be ignored by a system with no error-checking, would halt a machine with parity checking, or would be invisibly corrected by ECC: a single bit is stuck at 1 due to a faulty chip, or becomes changed to 1 due to background or cosmic radiation; a spreadsheet storing numbers in ASCII format is loaded, and the number "8" is stored in the byte which contains the stuck bit as its eighth bit; then another change is made to the spreadsheet and it is stored. However, the "8" (00111000 binary) has become a "9" (00111001).

If the stored parity is different from the parity computed from the stored data, at least one bit must have been changed due to data corruption. Undetected memory errors can have results ranging from undetectable and without consequence, to permanent corruption of stored data or machine crash. In the case of the home PC where data integrity is often perceived to be of little importance—certainly true for, say games and web browsing, less so for Internet banking and home finances—non-parity memory is an affordable option. However, if data integrity is required, parity memory will halt the computer and prevent the corrupt data from affecting results or stored data, although losing intermediate unstored data and preventing use until any faulty RAM is replaced. For the expense of some computational overhead, of negligible impact with modern fast computers, detected errors can be corrected—this is increasingly important on networked machines serving many users.

ECC type RAM

RAM with ECC or Error Correction Code can detect and correct errors. As with parity RAM, additional information needs to be stored and more processing needs to be done, making ECC RAM more expensive and a little slower than non-parity and logic parity RAM. This type of ECC memory is especially useful for any application where uptime is a concern: failing bits in a memory word are detected and corrected on the fly with no impact to the application. The occurrence of the error is typically logged by the operating system for analysis by a technical resource. In the case where the error is persistent, server downtime can be scheduled to replace the failing memory unit. This mechanism of detection and correction is known as EEC or Extended Error Correction.

See also

References

  1. Cnet news - Google: Computer memory flakier than expected
  2. 2.0 2.1 crucial.com FAQ: Are ECC and parity the same thing? If not what's the difference?