Tutorial Acordma
Tutorial Acordma
Tutorial Acordma
"Text file" refers to a type of container, while plain text refers to a type of
content. Text files can contain plain text, but they are not limited to such.
At a generic level of description, there are two kinds of computer files: text
files and binary files.[1]
Contents [hide]
1 Data storage
2 Encoding
3 Formats
3.1 Windows text files
3.2 Unix text files
3.3 Apple Macintosh text files
4 Rendering
5 See also
6 Notes and references
7 External links
Data storage[edit]
Encoding[edit]
The ASCII character set is the most common format for English-language text files,
and is generally assumed to be the default file format in many situations. For
accented and other non-ASCII characters, it is necessary to choose a character
encoding. In many systems, this is chosen on the basis of the default locale
setting on the computer it is read on. Common character encodings include ISO 8859-
1 for many European languages.
Because many encodings have only a limited repertoire of characters, they are often
only usable to represent text in a limited subset of human languages. Unicode is an
attempt to create a common standard for representing all known languages, and most
known character sets are subsets of the very large Unicode character set. Although
there are multiple character encodings available for Unicode, the most common is
UTF-8, which has the advantage of being backwards-compatible with ASCII; that is,
every ASCII text file is also a UTF-8 text file with identical meaning.
Formats[edit]
On most operating systems the name text file refers to file format that allows only
plain text content with very little formatting (e.g., no bold or italic types).
Such files can be viewed and edited on text terminals or in simple text editors.
Text files usually have the MIME type "text/plain", usually with additional
information indicating an encoding.
Most Windows text files use "ANSI", "OEM", "Unicode" or "UTF-8" encoding. What
Windows terminology calls "ANSI encodings" are usually single-byte ISO/IEC 8859
encodings (i.e. ANSI in the Microsoft Notepad menus is really "System Code Page",
non-Unicode, legacy encoding), except for in locales such as Chinese, Japanese and
Korean that require double-byte character sets. ANSI encodings were traditionally
used as default system locales within Windows, before the transition to Unicode. By
contrast, OEM encodings, also known as DOS code pages, were defined by IBM for use
in the original IBM PC text mode display system. They typically include graphical
and line-drawing characters common in DOS applications. "Unicode"-encoded Windows
text files contain text in UTF-16 Unicode Transformation Format. Such files
normally begin with Byte Order Mark (BOM), which communicates the endianness of the
file content. Although UTF-8 does not suffer from endianness problems, many Windows
programs (i.e. Notepad) prepend the contents of UTF-8-encoded files with BOM,[2] to
differentiate UTF-8 encoding from other 8-bit encodings.[3]
Additionally, POSIX defines a printable file as a text file whose characters are
printable or space or backspace according to regional rules. This excludes control
characters, which are not printable.[6]
Being certified Unix, macOS uses POSIX format for text files.[8] Uniform Type
Identifier (UTI) used for text files in macOS is "public.plain-text"; additional,
more specific UTIs are: "public.utf8-plain-text" for utf-8-encoded text,
"public.utf16-external-plain-text" and "public.utf16-plain-text" for utf-16-encoded
text and "com.apple.traditional-mac-plain-text" for classic Mac OS text files.[7]
Rendering[edit]
When opened by a text editor, human-readable content is presented to the user. This
often consists of the file's plain text visible to the user. Depending on the
application, control codes may be rendered either as literal instructions acted
upon by the editor, or as visible escape characters that can be edited as plain
text. Though there may be plain text in a text file, control characters within the
file (especially the end-of-file character) can render the plain text unseen by a
particular method.
See also[edit]
ASCII
EBCDIC
Filename extension
List of file formats
Newline
Text editor
Unicode
Notes and references[edit]
Jump up ^ Lewis, John (2006). Computer Science Illuminated. Jones and Bartlett.
ISBN 0-7637-4149-3.
Jump up ^ "Using Byte Order Marks". Internationalization for Windows Applications.
Microsoft. Retrieved 2015-12-15.
Jump up ^ Freytag, Asmus (2015-12-18). "FAQ UTF-8, UTF-16, UTF-32 & BOM". The
Unicode Consortium. Retrieved 2016-05-30. Yes, UTF-8 can contain a BOM. However, it
makes no difference as to the endianness of the byte stream. UTF-8 always has the
same byte order. An initial BOM is only used as a signature an indication that an
otherwise unmarked text file is in UTF-8. Note that some recipients of UTF-8
encoded data do not expect a BOM. Where UTF-8 is used transparently in 8-bit
environments, the use of a BOM will interfere with any protocol or file format that
expects specific ASCII characters at the beginning, such as the use of "#!" of at
the beginning of Unix shell scripts.
Jump up ^ "3.397 Text File". IEEE Std 1003.1, 2013 Edition. IEEE Computer Society.
Retrieved 2015-12-15.
Jump up ^ "3.206 Line". IEEE Std 1003.1, 2013 Edition. IEEE Computer Society.
Retrieved 2015-12-15.
Jump up ^ "3.284 Printable File". IEEE Std 1003.1, 2013 Edition. IEEE Computer
Society. Retrieved 2015-12-15.
^ Jump up to: a b "System-Declared Uniform Type Identifiers". Guides and Sample
Code. Apple Inc. 2009-11-17. Retrieved 2016-09-12.
^ Jump up to: a b "Designing Scripts for Cross-Platform Deployment". Mac Developer
Library. Apple Inc. 2014-03-10. Retrieved 2016-09-12.
External links[edit]