PDF A ISO Vorschlag

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

COMMITTEE DRAFT ISO/CD 19005-1

Date Reference number


2004-05-10
ISO/TC 171 / SC 2 N 264
Supersedes document

WARNING: This document is not an International Standard. It is distributed for review and comment. It is subject to change
without notice and may not be referred to as an International Standard.

ISO/TC 171 / SC 2 Circulated to P- and O-members, and to technical committees and


organizations in liaison for:
Title
discussion at on
Document Management Applications [venue/date of meeting]

Application Issues
comments by
[date]

approval for registration as a DIS in accordance with 2.5.6 of


part 1 of the ISO/IEC Directives, by

2004-08-11
[date]

(P-members vote only: ballot form attached)


Secretariat ANSI P-members of the technical committee or subcommittee
concerned have an obligation to vote.

English title
ISO/CD 19005-1, Document management - Electronic document file format for
long-term preservation - Part 1: Use of PDF 1.4 (PDF/A)

French title

Reference language version: English French Russian

Introductory note

This is a second 3-month CD ballot of the document that has been revised based upon
discussions at the New Orleans meeting.

FORM 7 (ISO) Page 1 of 1


Version 2001-07
© ISO 2004 — All rights reserved

ISO TC 171/SC 2 N
Date: 2004-05-6

ISO/CD 19005-1

ISO TC 171/SC 2/WG 5

Secretariat: ANSI

Document management — Electronic document file format for long-term


preservation — Part 1: Use of PDF 1.4 (PDF/A)

Warning

This document is not an ISO International Standard. It is distributed for review and comment. It is subject to
change without notice and may not be referred to as an International Standard.

Recipients of this draft are invited to submit, with their comments, notification of any relevant patent rights of
which they are aware and to provide supporting documentation.

Document type: International Standard


Document subtype:
Document stage: (30) Committee
Document language: E

K:\TC171SC2\ISODOCS\19005-1\ISO-CD2_19005-1.doc STD Version 2.1


ISO/CD 19005-1

Copyright notice
This ISO document is a working draft or committee draft and is copyright-protected by ISO. While the
reproduction of working drafts or committee drafts in any form for use by participants in the ISO standards
development process is permitted without prior permission from ISO, neither this document nor any extract
from it may be reproduced, stored or transmitted in any form for any other purpose without prior written
permission from ISO.

Requests for permission to reproduce this document for the purpose of selling it should be addressed as
shown below or to ISO's member body in the country of the requester:
[Indicate the full address, telephone number, fax number, telex number, and electronic mail address, as
appropriate, of the Copyright Manger of the ISO member body responsible for the secretariat of the TC or
SC within the framework of which the working document has been prepared.]

Reproduction for sales purposes may be subject to royalty payments or a licensing agreement.

Violators may be prosecuted.

ii © ISO 2004 — All rights reserved


ISO/CD 19005-1

Contents Page

Foreword .............................................................................................................................................................v
Introduction........................................................................................................................................................vi
1 Scope ......................................................................................................................................................1
2 Normative references............................................................................................................................1
3 Terms and definitions ...........................................................................................................................2
4 Notation ..................................................................................................................................................3
5 Conformance levels ..............................................................................................................................3
5.1 Full conformance...................................................................................................................................3
5.2 Minimal conformance level ..................................................................................................................4
5.3 Conforming PDF/A readers ..................................................................................................................4
6 Technical requirements ........................................................................................................................4
6.1 File structure..........................................................................................................................................4
6.1.1 General ...................................................................................................................................................4
6.1.2 File header..............................................................................................................................................4
6.1.3 File trailer ...............................................................................................................................................5
6.1.4 Cross reference table............................................................................................................................5
6.1.5 Document information dictionary........................................................................................................5
6.1.6 String objects.........................................................................................................................................5
6.1.7 Stream objects.......................................................................................................................................5
6.1.8 Indirect objects ......................................................................................................................................5
6.1.9 Linearized PDF.......................................................................................................................................6
6.1.10 Filters ......................................................................................................................................................6
6.1.11 Streams ..................................................................................................................................................6
6.1.12 Embedded files ......................................................................................................................................6
6.1.13 Implementation limits............................................................................................................................6
6.1.14 Optional content ....................................................................................................................................6
6.2 Graphics .................................................................................................................................................6
6.2.1 General ...................................................................................................................................................6
6.2.2 Output intent ..........................................................................................................................................6
6.2.3 Colour spaces........................................................................................................................................7
6.2.4 Images ....................................................................................................................................................7
6.2.5 Form XObjects .......................................................................................................................................8
6.2.6 Reference XObjects...............................................................................................................................8
6.2.7 PostScript XObjects ..............................................................................................................................8
6.2.8 Extended graphics state .......................................................................................................................8
6.2.9 Rendering intents ..................................................................................................................................8
6.2.10 Content streams ....................................................................................................................................8
6.3 Fonts .......................................................................................................................................................8
6.3.1 General ...................................................................................................................................................8
6.3.2 Font types ..............................................................................................................................................9
6.3.3 Composite fonts ....................................................................................................................................9
6.3.4 Embedded font programs.....................................................................................................................9
6.3.5 Font subsets ........................................................................................................................................10
6.3.6 Font metrics .........................................................................................................................................10
6.3.7 Character encodings...........................................................................................................................10
6.3.8 Unicode character maps.....................................................................................................................10
6.4 Transparency .......................................................................................................................................10
6.5 Annotations..........................................................................................................................................11
6.5.1 General .................................................................................................................................................11

© ISO 2004 — All rights reserved iii


ISO/CD 19005-1

6.5.2 Annotation types................................................................................................................................. 11


6.5.3 Annotation dictionaries...................................................................................................................... 11
6.6 Actions................................................................................................................................................. 12
6.6.1 General................................................................................................................................................. 12
6.6.2 Trigger events ..................................................................................................................................... 12
6.6.3 Hypertext links .................................................................................................................................... 12
6.7 Metadata .............................................................................................................................................. 12
6.7.1 General................................................................................................................................................. 12
6.7.2 Properties ............................................................................................................................................ 12
6.7.3 Document information dictionary ..................................................................................................... 13
6.7.4 Normalization ...................................................................................................................................... 14
6.7.5 XMP header ......................................................................................................................................... 14
6.7.6 File identifiers...................................................................................................................................... 14
6.7.7 File provenance information.............................................................................................................. 14
6.7.8 Extension schemas ............................................................................................................................ 15
6.7.9 Validation............................................................................................................................................. 17
6.7.10 Font metadata ..................................................................................................................................... 17
6.7.11 Version and conformance level identification ................................................................................. 17
6.8 Logical structure................................................................................................................................. 18
6.8.1 General................................................................................................................................................. 18
6.8.2 Tagged PDF ......................................................................................................................................... 18
6.8.3 Artifacts ............................................................................................................................................... 18
6.8.4 Natural language specification.......................................................................................................... 19
6.8.5 Alternate descriptions........................................................................................................................ 19
6.8.6 Non-textual annotations..................................................................................................................... 19
6.8.7 Replacement text ................................................................................................................................ 20
6.8.8 Expansions of abbreviations and acronyms ................................................................................... 20
6.9 Forms ................................................................................................................................................... 20
Annex A (informative) PDF/A conformance summary................................................................................. 21
A.1 General................................................................................................................................................. 21
A.2 Operators............................................................................................................................................. 21
A.3 Objects and keys ................................................................................................................................ 21
Annex B (informative) Best practices for PDF/A........................................................................................... 25
B.1 Use of non-XMP metadata ................................................................................................................. 25
B.2 Natural language identifiers .............................................................................................................. 25
B.3 Recommendations for Capturing or Converting Documents to PDF/A ........................................ 25
Bibliography ..................................................................................................................................................... 27

iv © ISO 2004 — All rights reserved


ISO/CD 19005-1

Foreword
ISO (the International Organization for Standardization) is a worldwide federation of national standards bodies
(ISO member bodies). The work of preparing International Standards is normally carried out through ISO
technical committees. Each member body interested in a subject for which a technical committee has been
established has the right to be represented on that committee. International organizations, governmental and
non-governmental, in liaison with ISO, also take part in the work. ISO collaborates closely with the
International Electrotechnical Commission (IEC) on all matters of electrotechnical standardization.

International Standards are drafted in accordance with the rules given in the ISO/IEC Directives, Part 2.

The main task of technical committees is to prepare International Standards. Draft International Standards
adopted by the technical committees are circulated to the member bodies for voting. Publication as an
International Standard requires approval by at least 75 % of the member bodies casting a vote.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights. ISO shall not be held responsible for identifying any or all such patent rights.

ISO 19005-1 was prepared by Technical Committee ISO/TC 171, Document management applications,
Subcommittee SC 2, Application issues, consisting of representatives of ISO/TC 171/SC 2, Document
Management Applications, Application Issues, ISO/TC 130, Graphic Technology, ISO/TC 42, Photography,
and ISO/TC 46/SC 11, Information and documentation – Archives records management.

ISO 19005 consists of the following parts, under the general title Document management — Electronic
document file format for long-term preservation:

 Part 1: Use of PDF 1.4 (PDF/A)

© ISO 2004 — All rights reserved v


ISO/CD 19005-1

Introduction
PDF is a digital format for representing documents, whether they are created natively in PDF, converted from
other electronic formats, or digitised from paper or microform. Businesses, governments, libraries, archives,
and other institutions and individuals around the world use PDF to represent considerable bodies of important
information. Much of this information must be kept for substantial lengths of time; some must be kept
permanently. These PDF documents must remain useable and accessible across multiple generations of
technology. The future use of, and access to, these objects depends upon maintaining their visual
appearance as well as their higher-order properties, such as the logical organization of pages, sections, and
paragraphs, machine recoverable text stream in natural reading order, and a variety of administrative,
preservation, and descriptive metadata.

Adobe Systems Incorporated makes the PDF specification publicly available. However, the inclusive, feature-
rich nature of the format requires that additional constraints be placed on its use to make it suitable for the
long-term preservation of electronic documents. This International Standard specifies how to represent
unambiguously:

 The visual appearance of PDF documents

 The associated structural and semantic information that maps PDF components into more meaningful
concepts

These goals are accomplished by identifying the set of PDF components that may be used and restrictions on
the form of their use.

This International Standard should be used as one component of an organisation's electronic archival
environment for long-term retention of documents. Successful implementation of this Standard for archival
purposes depends upon:

 The retention requirements of an organisation's archival environment, records management policies and
procedures as specified in ISO 15489-1, Information and documentation – Records management – Part
1: General [8]

 Any additional requirements and conditions necessary to ensure the persistence of electronic documents
and their characteristics over time, including, but not limited to, those defined by:

 ISO 14721, Space data and information transfer systems — Archival information system —
Reference model [7]

 ISO/TR 15801, Electronic imaging — Information stored electronically — Recommendations for


trustworthiness and reliability [9]

 ISO/CD TR 18492, Electronic imaging — Ensuring long-term access to digital information and
images [11]

 ISO/WD 18509-1, Electronic archival storage — Specifications relative to the design and operation of
information processing systems in view of ensuring the storage and integrity on recordings stored in
these systems — Part 1: Long term access strategy [12]

 ISO/WD 18509-2, Electronic archival storage — Specifications relative to the design and operation of
information processing systems in view of ensuring the storage and integrity on recordings stored in
these systems — Part 2: Technical specifications [13]

vi © ISO 2004 — All rights reserved


ISO/CD 19005-1

 Quality assurance processes necessary to verify conformance with applicable requirements and
conditions; for example, an inspection regime to verify the quality and integrity of converted source data

It is important to note that the goal of PDF/A is to ensure that future renderings of a given PDF/A file will match
the rendering that was available at the time the PDF/A file was created. The standard does not specify how to
create a PDF/A file, i.e. the standard does not aim to ensure that a PDF/A file is an accurate representation of
any source materials that have been used in the creation of the PDF/A file, although many workflows may be
expected to do so. Therefore it is important that PDF/A files be checked for correct visual appearance when
created.

This International Standard should lead to the development of various applications that read, render, write,
and validate conforming PDF objects. Different applications will incorporate various capabilities to prepare,
interpret, and process conforming objects based on needs as perceived by the suppliers of those applications.
However, it is important to note that a conforming application must be able to read and process appropriately
all files complying with a specified conformance level.

This document has been created as Part 1 of ISO 19005 to allow future parts to be created that provide
compatibility with future versions of the underlying PDF specification without obsoleting this document or
applications based on PDF Version 1.4.

The International Organization for Standardization (ISO) draws attention to the fact that it is claimed that
compliance with this document may involve the use of patents concerning technology disclosed in the Adobe
PDF specification. ISO takes no position concerning the evidence, validity and scope of these patent rights.
Adobe Systems Incorporated has made a statement concerning these patent claims in the PDF Reference
and its associated documentation.

Attention is drawn to the possibility that some of the elements of this document may be the subject of patent
rights other than those identified above. ISO shall not be held responsible for identifying any or all such patent
rights

NPES and AIIM (accredited standards developing organizations) maintain an ongoing series of application
notes for guiding developers and users of this ISO standard. These application notes are available at
<http://www.npes.org/pdfa/app-notes> and <http://www.aiim.org/pdfa/app-notes>. Both NPES and AIIM will
also retain copies of the specific non-ISO normative references of this International Standard that are publicly
available electronic documents.

© ISO 2004 — All rights reserved vii


COMMITTEE DRAFT ISO/CD 19005-1

Document management — Electronic document file format for


long-term preservation — Part 1: Use of PDF 1.4 (PDF/A)

1 Scope
This International Standard specifies how to use the Portable Document Format (PDF) 1.4 for long-term
preservation of electronic documents. It is applicable to documents containing combinations of character,
raster, and vector data.

This International Standard does not apply to:

 Converting paper or electronic documents to the PDF/A format

 Specific technical design, user interface, implementation, or operational details of rendering

 Specific physical methods of storing these documents such as media and storage conditions

 Required computer hardware and/or operating systems

2 Normative references
The following referenced documents are indispensable for the application of this document. For dated
references, only the edition cited applies. For undated references, the latest edition of the referenced
document (including any amendments) applies.

NOTE 1 AIIM and NPES (accredited standards developing organizations) maintain copies of the non-ISO references
that are publicly available electronic documents.

Date and Time Formats, W3C Note. Available from Internet <http://www.w3.org/TR/NOTE-datetime>

Errata for PDF Reference, third edition, 18 June 2003. Available from Internet <http://partners.adobe.com/
asn/acrobat/docs/PDF14errata.txt>

Extensible Markup Language (XML) 1.0 (Third Edition), W3C Recommendation, 4 February 2004. Available
from Internet <http://www.w3.org/TR/2004/REC-xml-20040204>

ICC.1:1998-09, File Format for Color Profiles, International Color Consortium. Available from Internet
<http://www.color.org/ICC-1_1998-09.PDF>

ICC.1A:1999-04, Addendum 2 to Spec. ICC.1:1998-09, International Color Consortium. Available from


Internet <http://www.color.org/ICC-1A_1999-04.PDF>

ISO/IEC 10646-1, Information technology – Universal Multiple-Octet Coded Character Set (UCS) – Part 1:
Architecture and Basic Multilingual Plane.

NOTE 2 The character code values defined in ISO/IEC 10646-1 are equivalent to those of Unicode [15].

© ISO 2004 — All rights reserved 1


ISO/CD 19005-1

PDF Reference: Adobe Portable Document Format, Version 1.4, Adobe Systems Incorporated – 3rd ed.
(ISBN 0-201-75839-3). Available from Internet <http://partners.adobe.com/asn/acrobat/docs/File_Format_
Specifications/PDFReference.pdf>

RDF/XML Syntax Specification (Revised), W3C Recommendation, 10 February 2004. Available from Internet
<http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/>

Tags for the Identification of Languages, RFC 1766, March 1995. Available from Internet <http://www.ietf.org/
rfc/rfc1766.txt>

XMP Specification, January 2004, Adobe Systems Incorporated. Available from Internet
<http://partners.adobe.com/asn/tech/xmp/pdf/xmpspecification.pdf>

3 Terms and definitions


For the purposes of this document, the following terms and definitions apply.

3.1
conformance level
identified set of restrictions and requirements to which files and readers must comply [ISO 15930-4]

3.2
cross reference table
PDF data structure that contains the byte offset of the start of each indirect object within the file

3.3
dictionary
associative table containing key-value pairs, specifying the name and value of an attribute for objects, which is
generally used to collect and tie together the attributes of a complex object [ISO 15930-4]

3.4
electronic document
electronic representation of a page-oriented aggregation of text and graphic data, and metadata useful to
identify, understand, and render that data, that can be reproduced on paper or optical microform without
significant loss of its information content

3.5
end-of-file marker
five character sequence %%EOF marking the end of a PDF file

3.6
end-of-line marker
EOL marker
one or two character sequence marking the end of a line of text, consisting of a CARRIAGE RETURN
character (U+000D) or a LINE FEED character (U+000A) or a CARRIAGE RETURN followed immediately by
a LINE FEED

3.7
font
identified collection of graphics that may be glyphs or other graphic elements [ISO 15930-4]

3.8
glyph
recognizable abstract graphic symbol that is independent of any specific design [ISO/IEC 9541-1]

3.9
interactive reader
reader that requires or allows human interaction during the software's processing phase

2 © ISO 2004 — All rights reserved


ISO/CD 19005-1

NOTE A pre-flight tool is an example of an interactive reader; a raster image processor is an example of a reader that
is not interactive.

3.10
long-term
period of time long enough for there to be concern about the impacts of changing technologies, including
support for new media and data formats, and of a changing user community, on the information being held in
a repository, which may extend into the indefinite future [ISO 14721]

3.11
PDF
Portable Document Format
file format defined in the PDF Reference and its Errata [ISO 15930-4]

3.12
reader
software application that is able to read and process files appropriately [ISO 15930-4]

3.13
string object
sequence of characters enclosed within parentheses or a sequence of hexadecimal data enclosed within
angle brackets

3.14
stream data
sequence of bytes delimited by the stream and endstream keywords

3.15
writer
software application that is able to write files [ISO 15930-4]

3.16
XMP packet
structured wrapper for serialized XMP metadata that can be embedded in a wide variety of file formats

4 Notation
PDF operators, PDF keywords, the names of keys in PDF dictionaries, and other predefined names are
written in bold sans serif font; operands of PDF operators or values of dictionary keys are written in italic san
serif font. For example: the Default value for the TR2 key.

Individual characters may be identified by their ISO/IEC 10646-1 character name written in uppercase in bold
sans serif font followed by a parenthetic four digit hexadecimal character code value with the prefix "U+". For
example: CARRIAGE RETURN (U+000D).

For the purposes of this part of ISO 19005, references to the "PDF Reference" are to PDF Reference: Adobe
Portable Document Format as amended by Errata for PDF Reference as identified in 2.

5 Conformance levels

5.1 Full conformance

A PDF/A file shall adhere to all requirements of PDF Reference, as modified by this part of ISO 19005. A
conforming PDF/A file may include any valid PDF 1.4 feature that is not explicitly forbidden by this part of ISO
19005. Features described in PDF specifications prior to Version 1.4 that are not explicitly documented in the

© ISO 2004 — All rights reserved 3


ISO/CD 19005-1

PDF Reference should not be used. A PDF file meeting all of these requirements is said to be a "fully
conforming PDF/A file" or a PDF file that meets the "PDF/A full conformance level."

NOTE A conforming PDF/A file is not obligated to use any PDF feature other than those explicitly required by PDF
Reference or this part of ISO 19005.

5.2 Minimal conformance level

In recognition of the varying preservation needs of the diverse user communities making use of PDF files, this
part of ISO 19005 also defines a second conformance level identified as "minimally conforming." A file
meeting this conformance level is referred to as a "minimally conforming PDF/A file" or a PDF file that meets
the "PDF/A minimal conformance level." For such files, complete conformance with 6.3.8 and 6.8 is not
required. However, minimally conforming documents may meet some of the requirements defined in those
two clauses.

NOTE 1 The minimal conformance requirements are intended to ensure that the rendered visual appearance of a
PDF/A document is preservable over the long-term, but minimally conforming files might not have sufficiently rich internal
information to allow for the preservation of the document's logical structure and content text stream in natural reading
order, which is provided by full conformance. The requirements for full conformance place greater burdens on PDF/A
writers but these requirements allow for a higher level of document preservation service and confidence over time.
Additionally, full conformance facilitates the accessibility of PDF/A documents for physically impaired users.

NOTE 2 The proper mechanism by which a file can presumptively identify itself as being a PDF/A file of given
conformance level is described in 6.7.11.

5.3 Conforming PDF/A readers

A conforming PDF/A reader shall comply with all requirements regarding reader functional behaviour specified
in this part of ISO 19005. The requirements of this part of ISO 19005 with respect to reader behaviour are
stated in terms of general functional requirements applicable to all conforming readers. This part of ISO
19005 does not prescribe any specific technical design, user interface, or implementation details of
conforming readers.

The rendering of conforming PDF/A files shall be performed as defined in the PDF Reference subject to the
further requirements specified by this part of ISO 19005. Features described in PDF specifications prior to
Version 1.4 that are not explicitly documented in the PDF Reference may be ignored by conforming readers.

6 Technical requirements

6.1 File structure

6.1.1 General

6.1.2 through 6.1.14 address overall file format issues and the base elements that form the general structure
of a PDF/A file.

6.1.2 File header

The ‘%’ character of the file header shall occur at byte offset 0 of the file.

The file header line shall be immediately followed by a comment containing at least four characters, each of
whose encoded byte values shall be greater than 127.

NOTE The presence of character byte values greater than 127 near the beginning of a file is used by various
software tools and protocols to classify the file as containing arbitrary binary data that should be preserved during
processing.

4 © ISO 2004 — All rights reserved


ISO/CD 19005-1

6.1.3 File trailer

The file trailer dictionary shall contain the items listed in Table 1. The keyword Encrypt shall not be used in
the trailer dictionary. No data shall follow the last end-of-file marker except a single optional end-of-line
marker.

NOTE The explicit prohibition of the Encrypt keyword has the implicit effect of disallowing encryption and password-
protected access permissions.

Table 1 — Trailer dictionary entries

Key Type Value


Size integer Total number of entries in the cross reference table
ID array Array of two hexadecimal strings specifying file identifiers

6.1.4 Cross reference table

In a cross reference subsection header the starting object number and the range shall be separated by a
single SPACE character (U+0020).

The xref keyword and the cross reference subsection header shall be separated by a single EOL marker.

6.1.5 Document information dictionary

Requirements regarding the synchronization of document information dictionary elements with analogous
XMP metadata properties are presented in 6.7.3.

6.1.6 String objects

Literal strings that are broken across lines shall contain a BACKSLASH character (U+005C) immediately
before any EOL markers.

Hexadecimal strings shall contain an even number of characters, each in the range 0 to 9, A to F, or a to f.

6.1.7 Stream objects

The endstream keyword shall be preceded by an EOL marker.

The value of the Length key specified in the stream dictionary shall match the number of bytes in the file
following the EOL marker after the stream keyword and preceding the EOL marker before the endstream
keyword.

NOTE These requirements remove potential ambiguity regarding the ending of stream content.

6.1.8 Indirect objects

The object number and generation number shall be separated by a single white-space character. The
generation number and obj keyword shall be separated by a single white-space character.

The object number and endobj keyword shall each be preceded by an EOL marker. The obj and endobj
keywords shall each be followed by an EOL marker.

© ISO 2004 — All rights reserved 5


ISO/CD 19005-1

6.1.9 Linearized PDF

Linearization shall be permitted but any linearization information supplied within a file should be ignored by a
conforming PDF/A reader.

6.1.10 Filters

The LZWDecode filter shall not be permitted.

NOTE The use of the LZW decompression algorithm is subject to intellectual property constraints.

6.1.11 Streams

A stream object dictionary shall not contain the F, FFilter, or FDecodeParams keys.

NOTE These keys are used to point to document content external to the file.

6.1.12 Embedded files

A file specification dictionary, as defined in PDF Reference, 3.10.2, shall not contain the EF key. A document’s
name dictionary, as defined in PDF Reference, 3.6.3, shall not contain the EmbeddedFiles key.

NOTE These keys are used to encapsulate files containing arbitrary content within a PDF file

6.1.13 Implementation limits

A conforming PDF/A file shall not violate any of the architectural limits specified in PDF Reference, Table C.1.

NOTE By complying with these limits, a PDF/A file is compatible with the widest possible range of readers.

6.1.14 Optional content

The document catalog dictionary shall not contain a key with the name OCProperties.

NOTE This key is used in PDF 1.5 to specify optional content that generates alternative renderings of a document.

6.2 Graphics

6.2.1 General

6.2.2 through 6.2.10 describe restrictions placed on both conforming PDF files and readers. They are intended
to address graphical rendering issues that do not involve fonts and interactive elements.

6.2.2 Output intent

A PDF/A file may specify the colour characteristics of the device on which it is intended to be rendered by
specifying a PDF/A output intent dictionary, as defined by PDF Reference, 9.10.4, in the file’s OutputIntents
array. If the PDF/A file is not also a PDF/X file then the value of the output intent dictionary’s S key shall be
GTS_PDFA; if the file is also a PDF/X file then the value of the S key shall be GTS_PDFX. Except for the
value of the S key, a PDF/A output intent dictionary shall conform to all requirements of a PDF/X output
dictionary, as defined by PDF Reference, Table 9.46, and shall include the DestOutputProfile key with a
valid ICC profile stream as its value.

If the OutputIntents array contains more than one entry then all entries that contain a DestOutputProfile key
shall have as the value of that key a single indirect object that shall be a valid ICC profile stream.

6 © ISO 2004 — All rights reserved


ISO/CD 19005-1

6.2.3 Colour spaces

6.2.3.1 General

All colours shall be specified in a device-independent manner, either directly by the use of a device
independent colour space, or indirectly by the use of an OutputIntent. A conforming file may use any colour
space specified in the PDF Reference, except as restricted in 6.2.3.2 through 6.2.3.4.

NOTE 1 Predictable rendering behaviour is not possible with the use of device-dependent colour specifications.

NOTE 2 For uses of PDF/A files for colour-critical purposes the additional requirements regarding colour defined by
ISO 15930-4 (PDF/X-1a) [10] are appropriate.

6.2.3.2 ICC Based colour spaces

Any ICCBased colour space shall be embedded and shall conform to ICC specification [ICC.1:1998-09] and
its addendum [ICC.1A:1999-04].

A conforming reader shall render ICCBased colour spaces as specified by the ICC specification, and shall not
use the Alternate colour space specified in an ICC profile stream dictionary.

6.2.3.3 Uncalibrated colour spaces

A conforming file may use either the DeviceRGB or DeviceCMYK colour space but shall not use both. If an
uncalibrated colour space is used in a file then that file shall contain a PDF/A output intent as defined in 6.2.2

When rendering a DeviceGray colour specification in a document whose OutputIntent is an RGB profile, a
conforming reader shall convert the DeviceGray colour specification to RGB by the method described in PDF
Reference, 6.2,1.

When rendering a DeviceGray colour specification in a document whose OutputIntent is a CMYK profile, a
conforming reader shall convert the DeviceGray colour specification to DeviceCMYK by the method
described in PDF Reference, 6.2.2.

When rendering colours specified in a device-dependent colour space a conforming reader shall use the file’s
PDF/A output intent dictionary, as defined in 6.2.2, as the source colour space.

6.2.3.4 Named colorants in Separation and DeviceN colour spaces

A conforming reader shall follow the following rules when rendering colour spaces based on DeviceN or
Separation colour spaces:

 If the named colorants in the colour space are all from the list Cyan, Magenta, Yellow, Black, and the
document's OutputIntent is a CMYK profile, then the colorants shall be treated as components of the
colour space specified by the PDF/A output intent dictionary, as defined in 6.2.2, and the alternate colour
space shall not be used

 In all other cases the Alternate colour space shall be used

6.2.4 Images

An Image dictionary shall not contain the Alternates key or the OPI key.

If an Image dictionary contains the Interpolate key, its value shall be false.

Use of the Intent key shall conform to the rules in 6.2.9.

© ISO 2004 — All rights reserved 7


ISO/CD 19005-1

6.2.5 Form XObjects

A form XObject dictionary shall not contain any of the following:

 the OPI key

 the Subtype2 key with a value of PS

 the PS key

NOTE In earlier versions of PDF the Subtype2 key with a value of PS and the PS key were used to define arbitrary
executable PostScript code streams, which have the potential to interfere with reliable and predictable rendering.

6.2.6 Reference XObjects

A conforming file shall not contain any reference XObjects.

NOTE Reference XObjects import arbitrary document content from external PDF files, creating external
dependencies that complicate preservation efforts.

6.2.7 PostScript XObjects

A conforming file shall not contain any PostScript XObjects.

NOTE PostScript XObjects contain arbitrary executable PostScript code streams, which have the potential to
interfere with reliable and predictable rendering.

6.2.8 Extended graphics state

An ExtGState dictionary shall not contain the TR key. An ExtGState dictionary shall not contain the TR2 key
with a value other than Default. A conforming reader may ignore any instance of the HT key in an ExtGState
dictionary.

Use of the RI key shall be governed by 6.2.9.

6.2.9 Rendering intents

Where a rendering intent is specified its value shall be one of the four values defined in the PDF Reference:
RelativeColorimetric, AbsoluteColorimetric, Perceptual, or Saturation.

NOTE The default rendering intent is RelativeColorimetric.

6.2.10 Content streams

A content stream shall not contain any operators not documented in the PDF Reference, even if such
operators are bracketed by the BX/EX compatibility operators.

NOTE Content streams are used for page descriptions, for example, the Contents stream of a page object or the
stream of a form Xobject, as well as for the appearance stream of annotations, including form fields or Widget annotations.

6.3 Fonts

6.3.1 General

The intent of the requirements in 6.4.3.2 through 6.43.78 is to ensure that future rendering of the textual
content of a PDF/A file matches, on a glyph by glyph basis, the static appearance of the file as originally
created and to allow the recovery of semantic properties for each character of the textual content.

8 © ISO 2004 — All rights reserved


ISO/CD 19005-1

6.3.2 Font types

All fonts used in a PDF/A file shall conform to the font specifications as defined in PDF Reference 5.5.

NOTE It is the responsibility of the file writer to ensure the conformance of all fonts. This part of ISO 19005 does not
prescribe the manner in which conformance is determined.

6.3.3 Composite fonts

6.3.3.1 General

For any given composite (Type 0) font referenced within a PDF/A file, the CIDSystemInfo entries of its
CIDFont and CMap dictionaries shall be compatible, as described in PDF Reference, 5.6.2; in other words,
the Registry and Ordering strings of the CIDSystemInfo dictionaries for that font shall be identical, unless the
value of the CMap dictionary UserCMap key is Identity-H or Identity-V.

6.3.3.2 CIDFonts

For all Type 2 CIDFonts, the CIDFont dictionary shall contain a CIDToGIDMap entry that shall be a stream
mapping from CIDs to glyph indices or the name Identity, as described in PDF Reference, Table 5.13.

6.3.3.3 CMaps

The integer value of the WMode entry in a CMap dictionary shall be identical to the WMode value in the
embedded CMap stream.

6.3.4 Embedded font programs

All Type 0, Type 1, and TrueType fonts, including any of the 14 standard Type1 fonts, used within the
Contents stream of a page object, the stream of a form XObject, or the appearance stream of an annotation,
including form fields, of a PDF/A file shall be embedded within that file except when the fonts are used
exclusively with text rendering mode 3.

NOTE 1 As discussed in PDF Reference, 5.2.5, text rendering mode 3 specifies that glyphs are not stroked, filled, or
used as a clipping boundary. A font referenced for use solely by such text in this mode is therefore not rendered and is
thus exempt from the embedding requirement.

Only fonts that are legally embeddable in a file for unlimited, universal rendering shall be used.

All PDF/A conforming readers shall use the embedded fonts, rather than other locally resident, substituted, or
simulated fonts, for rendering.

NOTE 2 The requirements for font program metadata are described in 6.7.10.

NOTE 3 Only fonts whose characters are referenced within a file are embedded in that file. Furthermore, as stated in
6.3.45, font subsets are acceptable as long as the embedded font programs provide glyph definitions for all characters
referenced within the file. Embedding the font programs allows any PDF/A conforming reader to reproduce correctly all
glyphs in the manner in which they were originally published without reference to possibly ephemeral external resources.
By definition, Type 3 fonts always include an embedded font program in the form of per-glyph streams of PDF graphics
operators that paint the glyphs.

NOTE 4 This part of ISO 19005 precludes the embedding of fonts whose legality depends upon special agreement with
the font copyright holder. Such an allowance places unacceptable burdens on an archive to verify the existence, validity,
and longevity of such claims.

© ISO 2004 — All rights reserved 9


ISO/CD 19005-1

6.3.5 Font subsets

As stated in 6.43.34, embedded font programs shall define all font glyphs referenced for rendering with a
PDF/A file. Type 0 CIDFont and Type 1 and TrueType font subsets, as described in PDF Reference, 5.5.3,
may be used as long as the embedded font programs define all appropriate glyphs.

For all Type 1 font subsets referenced within a PDF/A file, the font descriptor dictionary shall include a
CharSet string listing the character names defined in the font subset, as described in PDF Reference, Table
5.18.

For all CIDFont subsets referenced within a PDF/A file, the font descriptor dictionary shall include a CIDSet
stream identifying which CIDs are present in the embedded CIDFont file, as described in PDF Reference,
Table 5.20.

NOTE The use of font subsets allows a potentially substantial reduction in the size of PDF/A files.

6.3.6 Font metrics

For all embedded fonts, a conforming PDF/A reader shall use the font metrics specified inside the embedded
font program and shall ignore the metrics given in the required Widths entry of the font dictionary.

6.3.7 Character encodings

All non-symbolic TrueType fonts shall specify MacRomanEncoding or WinAnsiEncoding as the value of the
Encoding entry in the font dictionary. All symbolic TrueType fonts shall not specify an Encoding entry in the
font dictionary, and their font programs' “cmap” tables shall contain exactly one encoding.

NOTE This requirement makes normative the suggested guidelines described in PDF Reference, Section 5.5.5.

6.3.8 Unicode character maps

NOTE 1 This sub-clause is applicable only for files meeting the full conformance level of this part of ISO 19005. For
minimal conformance the requirements of this sub-clause can be ignored.

The font dictionary shall include a ToUnicode entry whose value is a CMap stream object that maps
character codes to Unicode values [15], as described in PDF Reference, Section 5.9.

Fonts meeting any of following three conditions shall be exempted from this requirement:

1) Fonts that use the predefined encodings MacRomanEncoding, MacExpertEncoding, or


WinAnsiEncoding, or that use the predefined Identify-H or Identity-V CMaps

2) Type 1 fonts whose character names are taken from the Adobe standard Latin character set or the
set of named characters in the Symbol font, as defined in PDF Reference, Appendix D

3) Type 0 fonts whose descendent CIDFont uses the Adobe-GB1, Adobe-CNS1, Adobe-Japan1, or
Adobe-Korea1 character collections

NOTE 2 The Unicode mapping allows the retrieval of semantic properties about every character referenced in the file.

6.4 Transparency

The SMask key shall not be used in an ExtGState object or in an Image XObject with any value other than
None.

A Group object with an S key with a value of Transparency shall not be included in a form XObject .

The following keys, if present in an ExtGState object, shall have the values shown:

10 © ISO 2004 — All rights reserved


ISO/CD 19005-1

 BM Normal or Compatible

 CA 1.0

 ca 1.0

NOTE These provisions prohibit the use of transparency within a conforming PDF/A file. The visual effect of partially
transparent graphics can be achieved using techniques other than the use of the PDF 1.4 transparency keys, including
pre-rendered data or flattened vector objects. The use of such techniques does not prevent a file from being PDF/A
conformant.

6.5 Annotations

6.5.1 General

In addition to the rendering behaviour defined by the PDF Reference, as modified by this part of ISO 19005,
conforming PDF/A interactive readers shall provide a mechanism to display the values of the Contents key of
annotation dictionaries.

NOTE This part of ISO 19005 does not prescribe the specific behaviour or technical implementation details that
interactive readers may use to implement this functional requirement.

6.5.2 Annotation types

Annotation types not defined in the PDF Reference shall not be permitted. Additionally, the FileAttachment,
Sound, and Movie types shall not be permitted.

NOTE Support for multimedia content is outside the scope of this part of ISO 19005.

6.5.3 Annotation dictionaries

An annotation dictionary shall not contain the CA key with a value other than 1.0.

An annotation dictionary shall contain thee F key with the Print flag shall be set and all of the following flags
shall not be set:

 Hidden

 Invisible

 NoView

Text annotations should set the NoZoom and NoRotate flags of the F key.

NOTE 1 The restrictions on annotation flags prevent the use of annotations that are hidden or that are viewable but not
printable. The NoZoom and NoRotate flags are permitted, which allows the use of annotation types that have the same
behaviour as the commonly-used text annotation type. By definition, text annotations exhibit the NoZoom and NoRotate
behaviour even if the flags are not set, as described in PDF Reference, 8.4.5; explicitly setting these flags removes any
potential ambiguity between the annotation dictionary settings and reader behaviour.

An annotation dictionary shall not contain the C array or the IC array unless the colour space of the
DestOutputProfile in the PDF/A output intent dictionary defined in 6.2.2 is RGB.

NOTE 2 These provisions make ensure that the device colour spaces used in annotations by mechanisms other than
an appearance stream are indirectly defined by means of the PDF/A output intent.

If an annotation dictionary contains the AP key, the appearance dictionary that it defines as its value shall
contain only the N key, whose value shall be a stream defining the appearance of the annotation.

© ISO 2004 — All rights reserved 11


ISO/CD 19005-1

NOTE 3 All of the provisions of this sub-clause apply to all annotation types, including the Widget type used for form
fields.

6.6 Actions

6.6.1 General

The Launch, Sound, Movie, ResetForm, ImportData, and JavaScript actions shall not be permitted.
Additionally, the deprecated set-state and no-op actions shall not be permitted. Named actions other than
NextPage, PrevPage, FirstPage, and LastPage shall not be permitted. In response to each of the four
allowed named actions, conforming PDF/A interactive readers shall perform the appropriate action described
in PDF Reference, Table 8.45.

NOTE Support for multimedia content is outside the scope of this part of ISO 19005. The ResetForm action
changes the rendered appearance of a form. The ImportData action imports form data from an external file. JavaScript
actions permit arbitrary executable code that has the potential to interfere with reliable and predictable rendering.

6.6.2 Trigger events

An interactive form field shall not include an AA entry for an additional-actions dictionary. The document
catalog shall not include an AA entry for an additional-actions dictionary.

NOTE These additional-actions dictionaries define arbitrary JavaScript actions .

6.6.3 Hypertext links

Conforming PDF/A interactive readers may choose to make hyperlinks non-actionable, but in addition to the
rendering behaviour defined by the PDF Reference, as modified by this part of ISO 19005, they shall provide
a mechanism to display the F and D keys of a GoToR action dictionary, the URI key of a URI action
dictionary, and the F key of a SubmitForm action dictionary.

NOTE Since hyperlinks transfer the thread of execution outside the control of an interactive reader, this clause allows
an interactive reader to choose to make them not actionable. For purposes of archival disclosure of the complete
information content of PDF/A documents it is important for interactive readers to provide some mechanism to expose the
destination of all hyperlinks. However, this part of ISO 19005 does not prescribe any specific behaviour or the technical
implementation details that interactive readers might use to meet the functional requirement of this clause.

6.7 Metadata

6.7.1 General

6.87.2 through 6.87.11 specify requirements for metadata within PDF/A files. Metadata is essential for
effective management of a file throughout its life cycle. A file depends on metadata for identification and
description, as well as for documenting appropriate technical and administrative matters. As a result, PDF/A
file writers may have to comply with various domain-specific metadata requirements defined external to this
part of ISO 19005. This part of ISO 19005 outlines a structured, consistent process that supports a broad
variety of metadata requirements.

6.7.2 Properties

The document catalog dictionary of a conforming PDF/A file shall contain the Metadata key. The metadata
stream that forms the value of that key shall conform to XMP Specification. All metadata properties pertaining
to a file, except for document information dictionary entries that have no analogue in predefined XMP
schemas as defined in 6.7.3, shall be embedded in the file in one or more XMP packets as defined by XMP
Specification, 3. Metadata properties shall be defined in predefined XMP schemas or in one or more extension
schemas that comply with XMP requirements. Metadata object stream dictionaries shall not contain the Filter
key.

12 © ISO 2004 — All rights reserved


ISO/CD 19005-1

NOTE Since XMP metadata streams are unfiltered their contents are visible as plain text to non-PDF/A aware tools.

6.7.3 Document information dictionary

A document information dictionary may appear within a PDF/A file. If it does appear, then all of its entries that
have analogous properties in predefined XMP schemas, as defined by Table 2, shall also be embedded in the
file in XMP form with equivalent values. Any document information dictionary entry not listed in Table 2 shall
not also be embedded using a predefined XMP schema property.

NOTE 1 Since a document information dictionary is allowed within a PDF/A file, it is possible for a single file to be both
PDF/A and PDF/X compatible.

NOTE 2 If a metadata property was represented in both the document information dictionary and XMP metadata, and
the values of those two representations were inconsistent with one another, then the proper interpretation of that
property's value would be ambiguous.

Table 2 — Crosswalk between document information dictionary and XMP properties

Document information dictionary XMP

Entry PDF type Property XMP type

Title text string dc:title Text

Author text string dc:creator seq Text

Subject text string dc:subject Text

Keywords text string pdf:keywords Text

Creator text string xmp:CreatorTool Text

Producer text string pdf:Producer Text

CreationDate date xmp:CreateDate Date

ModDate date xmp:ModifyDate Date

The value of the document information dictionary entries and their analogous XMP properties shall be
equivalent. For properties that map from the PDF text string type to the XMP Text type, value equivalence
shall be on a character-by-character basis, independent of encoding, comparing the numeric ISO/IEC 10646-1
code points for the characters.

If the dc:creator property is present in XMP metadata then it shall be represented by an ordered Text array of
length one whose single entry shall consist of one or more names. The value of dc:creator and the
document information dictionary Author entry shall be equivalent. For Author and dc:creator, equivalence
shall be on a character-by-character basis, independent of encoding, comparing the numeric ISO/IEC 10646-1
code points for the characters.

EXAMPLE The document information dictionary entry:

/Author (Peter, Paul, and Mary)

is equivalent to the XMP property:

<dc:creator>
<rdf:Seq>

© ISO 2004 — All rights reserved 13


ISO/CD 19005-1

<rdf.:li>Peter, Paul, and Mary</rdf:li>


</rdf:Seq>
</dc:creator>

Date properties are formatted as a variable-length sequence of temporal components ranging in granularity:
year, month, day, hour, minute, second. For properties that map between the PDF date type, defined by PDF
Reference, 3.8.2, and the XMP Date type, defined by Date and Time Formats, value equivalence shall be on a
component-by-component basis, relative to Coordinated Universal Time (UTC), i.e., correcting for local time
zone offset.

EXAMPLE The document information dictionary entries:

/CreationDate (D:20040402)
/ModDate (D:200404080+91132-05'00')

are equivalent to the XMP properties:

<xmp:CreateDate>2004-04-02</xmp:CreateDate>
<xmp:ModifyDate>2004-04-08T14:11:32Z</xmp:ModifyDate>

6.7.4 Normalization

All XMP schemas should define the normalization rules that are applicable for their properties. For all
metadata properties defined in schemas that do provide normalization rules, the property values shall be
entered, saved, and retained in the normalized fashion defined by those schemas to facilitate interchange and
support consistent interpretation of metadata by conforming PDF/A readers.

6.7.5 XMP header

The deprecated bytes attribute shall not be used in XMP headers.

If the XML encoding for a packet is other than UTF-8, the encoding attribute shall be used. The packet body
shall conform to the encoding indicated in the header.

NOTE PDF readers rely upon the correctness of the encoding attribute to parse and interpret properly the packet
body.

6.7.6 File identifiers

A PDF/A file should have one or more metadata properties to characterize, categorize, and otherwise identify
the file. This part of ISO 19005 does not mandate any specific identification scheme. Identifiers may be
externally based, such as an International Standard Book Number (ISBN) or a Digital Object Identifier (DOI),
or internally based, such as a Globally Unique Identifier/Universally Unique Identifier (GUID/UUID) or another
designation assigned during workflow operations. Identifiers may be included through use of the
xmp:Identifier property; use of the xmpMM:DocumentID, xmpMM:VersionID, and
xmpMM:RenditionClass properties; or use of properties from an extension schema. Any identification
system may be used so long as the properties comply with XMP requirements and this part of ISO 19005.

6.7.7 File provenance information

In order to document all high-level user actions taken to create, transform, or otherwise instantiate a PDF/A
file, each of those actions should be recorded in the xmpMM:History property. For each action that is
recorded:

 the action, parameters, and when fields shall be specified

 the softwareAgent field should be specified

 the instanceID field shall be not be specified

14 © ISO 2004 — All rights reserved


ISO/CD 19005-1

In cases where original paper, microform, or electronic files are transformed into PDF/A format,
xmpMM:History should document all high-level processing (e.g., transformed from PDF 1.4 to PDF/A);
alterations to file content or functionality (e.g., embedded JavaScript and audio objects were not retained);
handling of pre-existing metadata (e.g., all document information dictionary values converted to XMP); and
any other processes that have an impact on content or other
significant properties of the file.

Once a document is in PDF/A format, whether by conversion from paper, microform, or another electronic
format, or by being created natively, xmpMM:History should document all subsequent high-level workflow
processes (e.g., descriptions of activities and handoffs); citations to policies governing file handling (e.g., titles
of official directives under which files are collected, processed, and used); names and versions of software
tools; and any other matters that are needed to indicate the context of the document's creation and use.

In cases where XMP metadata properties have been changed or deleted as a file moves through its life cycle,
xmpMM:History should document those change by including entries whose parameters fields specify the
name of the properties and their previous values. This recommendation applies to all metadata properties
except the xmpMM:History itself. If a metadata property has been deleted, the action field of its entry in
xmpMM:History shall be pdfa:deleted.

6.7.8 Extension schemas

All extension schemas used in a PDF/A file shall have their descriptions embedded within that file in the
metadata stream defined by 6.7.2. These descriptions shall be specified using the PDF/A extension schema
description schema defined in this clause.

The extension schema description schema namespace URI is <http://www.aiim.org/pdfa/ns/schema.html>.


The required schema namespace prefix is pdfaSchema.

Table 3 — PDF/A extension schema description schema

Property Value type Category Description

pdfaSchema:schema Text External Optional description of schema

pdfaSchema:namespaceURI URI External Schema namespace URI

pdfaSchema:prefix Text External Preferred schema namespace prefix

pdfaSchema:property seq Property Internal Description of schema properties

pdfaSchema:valueType seq ValueType Internal Description of schema-specific value types

The Property type is an XMP structure containing the description of a schema property. The field namespace
URI is <http://www.aiim.org/pdfa/ns/property.html>. The required field namespace prefix is pdfaProperty.

© ISO 2004 — All rights reserved 15


ISO/CD 19005-1

Table 4 — PDF/A property type schema

Field name Value type Description

pdfaProperty:name Text Property name

pdfaProperty:valueType Open Choice of Value type of the property, drawn from XMP
Text Specification, 4, or an embedded PDF/A value type
extension schema

pdfaProperty:category Closed Choice Property category: internal or external


of Text

pdfaProperty:description Text Description of the property

The preferred values for pdfaProperty:valueType should be the non-deprecated property value types
defined in XMP Specification, 4. Array types shall be preceded by their container type: alt, bag, or seq,
separated from the base type by a single white-space character.

The ValueType type is an XMP structure containing the description of all property value types used by
embedded extension schemas that are not defined in XMP Specification, 4. The field namespace URI is
<http://www.aiim.org/pdfa/ns/type.html>. The required field namespace prefix is pdfaType.

Table 5 — PDF/A value type schema

Field name Value type Description

pdfaType:type Text Optional description of the property value type

pdfaType:namespaceURI URI Property value type field namespace URI

pdfaType:prefix Text Preferred value type field namespace prefix

pdfaType:field seq Field Description of the property value type

The Field type is an XMP structure containing the description of a property value type. The field namespace
URI is <http://www.aiim.org/pdfa/ns/field.html>. The required field namespace prefix is pdfaField.

Table 6 — PDF/A field schema

Field name Value type Description

pdfaField:name Text Field name

pdfaField:valueType Open Choice of Field value type, drawn from XMP Specification, 4, or
Text an embedded PDF/A value type extension schema

pdfaField:description Text Field description

16 © ISO 2004 — All rights reserved


ISO/CD 19005-1

6.7.9 Validation

All content of all XMP packets shall be well-formed as defined by Extensible Markup Language (XML) 1.0
(Third Edition), 2.1, and RDF/XML Syntax Specification (Revised), 7. At the time a writer creates or resaves a
PDF/A file all of the content of the file's XMP packets should be validated if possible.

6.7.10 Font metadata

For all embedded Type 0, Type 1, or TrueType font programs, the embedded font file stream dictionary should
include a Metadata entry whose value is an XMP metadata stream. The following XMP metadata elements
should be supplied: xmp:Title, giving the value of the FontName key from the font's font descriptor dictionary;
xmpRights:Copyright, giving the copyright statement; xmpRights:Marked, with the Boolean value true;
xmpRights:Owner, giving the legal owner of the font; and xmpRights:UsageTerms, giving a statement of
the licensing terms under which the font is being used. Additional XMP metadata may be included at the
discretion of the file writer.

NOTE Font rights information is helpful in order to preserve the identity and scope of the intellectual property rights of
the font copyright holder. While many fonts embed statements of copyright and licensing terms within the font itself, this is
not a uniform practice. Therefore it is advantageous to require the explicit representation of rights statements in the PDF/A
file. Even though this may be redundant, it obviates the necessity for some future system to have the ability to parse
through the particular internal structure of font programs.

6.7.11 Version and conformance level identification

The PDF/A version and conformance level of a file shall be specified using the PDF/A Identification extension
schema defined in this clause.

The Identification schema namespace URI is <http://www.aiim.org/pdfa/ns/id.html>. The required schema


namespace prefix is pdfaid.

Table 7 — PDF/A identification schema

Property Value type Category Description

pdfaid:version Open Choice of Internal PDF/A version identifier


Integer

pdfaid:amd Open Choice of Internal Optional PDF/A amendment identifier.


Text

pdfaid:conformance Closed Choice Internal PDF/A conformance level: full or minimal


of Text

The value of pdfaid:version shall be the part number of ISO 19005 to which the file conforms. If the file
conforms to a version of ISO 19005 that is defined by an amendment to a part, then the value of pdfaid:amd
shall be the amendment number and year, separated by a colon.

A fully conforming PDF/A document shall specify the value of pdfaid:conformance as full. A minimally
conforming PDF/A document shall specify the value of pdfaid:conformance as minimal.

The values of the pdfaid:version, pdfaid:adm, and pdfaid:conformance properties do not by themselves
determine conformance with a part of ISO 19005. The actual determination of conformance shall be
performed as specified in 5.

© ISO 2004 — All rights reserved 17


ISO/CD 19005-1

6.8 Logical structure

6.8.1 General

NOTE This sub-clause is applicable only for files meeting the full conformance level of this part of ISO 19005. For
minimal conformance the requirements of this clause can be ignored.

The intent of the requirements in 6.8.2 through 6.8.7 is to ensure the recovery of a PDF file’s textual content
as a sequence of words defined in the natural reading order of the language in which they are written.
Similarly, the individual characters of each word must be recoverable in their natural reading order.
Furthermore, these requirements allow the recovery of higher-level semantic information concerning the
logical structure of the document.

6.8.2 Tagged PDF

6.8.2.1 General

A fully conforming PDF/A file shall meet of all the requirements set forth for Tagged PDF in PDF Reference,
9.7.

NOTE Tagged PDF defines conventions for explicitly declaring and describing the logical structural aspects of
document content.

6.8.2.2 Mark information dictionary

The document catalog shall include a MarkInfo dictionary whose sole entry, Marked, shall have a value of
true.

NOTE This setting indicates that the file conforms to the Tagged PDF conventions.

6.8.3 Artifacts

Pagination features such as running heads or page numbers, cosmetic layout features such as footnote rules
or background screens, and production aids such as cut marks and color bars should be specified as
pagination, layout, and page artifacts, respectively, as described in PDF Reference, 9.7.2.

6.8.3.1 Word breaks

For languages and script systems that normally use white space to indicate word breaks the following
additional restriction shall apply:

Within show strings, word breaks shall be explicitly indicated by the presence of one or more spacing
characters between all of the individual words in the show string. If a word ends at a show string boundary,
one or more spacing characters shall be inserted at the end of the show string. Note that a single word may
span two or more show strings; word breaks are indicated only by the explicit presence of one or more
spacing characters, not by the boundaries of a show string. For the purposes of indicating word breaks, a
sequence of two or more consecutive spacing characters is semantically equivalent to a single spacing
character.

The commonly used spacing characters are: HORIZONTAL TABULATION (Unicode value U+0009 as
defined by [Unicode]), LINE FEED (U+000A), VERTICAL TABULATION (U+000B), FORM FEED (U+000C),
CARRIAGE RETURN (U+000D), SPACE (U+0020), NO-BREAK SPACE (U+00A0), EN SPACE (U+2002),
EM SPACE (U+2003), ZERO WIDTH SPACE (U+200B), and IDEOGRAPHIC SPACE (U+3000).

6.8.3.2 Structure hierarchy

The logical structure of the PDF/A document shall be described by a structure hierarchy rooted in the
StructTreeRoot entry of the document catalog, as described in PDF Reference, 9.6.

18 © ISO 2004 — All rights reserved


ISO/CD 19005-1

Each structure element dictionary in the structure hierarchy shall have a Type entry with the name value of
StructElem.

The explicit documentation of a document’s logical structure may prove valuable to future efforts to recover
the document’s full semantic value for the purposes of rendering or migration to other data formats. PDF/A
writers should attempt to capture a document’s logical structure hierarchy to the finest granularity possible,
making use of the standard structure types for grouping elements, block-level structure elements, paragraph-
like elements, list elements, table elements, inline-level structure elements, link elements, and illustration
elements, as defined in PDF Reference, 9.7.4, to the fullest extent possible.

6.8.3.3 Structure types

The definition of block-level structuring elements should follow the strongly structured paradigm as described
in PDF Reference, 9.7.4.

All non-standard structure types shall be mapped to the nearest functionally equivalent standard type, as
defined in PDF Reference, 9.7.4, in the role map dictionary of the structure tree root. This mapping may be
indirect; within the role map a non-standard type can map directly to another non-standard type, but
eventually, the mapping must arrive at a standard type.

6.8.4 Natural language specification

The default natural language for all text in a document shall be specified by the Lang entry in the document
catalog.

All textual content within a document that differs from the default language should be indicated by use of a
Lang property attached to a marked-content sequence, or by a Lang entry in a structure element dictionary,
as described in PDF Reference, 9.8.1.

The value of the Lang entry in the document catalog, structure element dictionary, or property list shall be a
language identifier as defined by RFC 1766, Tags for the Identification of Languages, as described in PDF
Reference, 9.8.1.

NOTE The distinction between words foreign to a language and foreign words incorporated by common usage into a
language is problematic. The intent of these requirements is to allow for future unambiguous semantic interpretation of
textual content.

All text strings encoded in Unicode whose language is not the default natural language for document or not
the natural language defined by the innermost enclosing structure element or marked-content sequence shall
indicate their language using the internal escape sequence described in PDF Reference, 3.8.1.

6.8.5 Alternate descriptions

All structure elements whose content does not have a natural predetermined textual analog, e.g., images,
formulas, etc., should supply an alternate text description using the Alt entry in the structure element
dictionary, as described in PDF Reference, 9.8.2.

NOTE Alternate descriptions provide textual descriptions that aid in the proper interpretation of otherwise opaque non-
textual content.

6.8.6 Non-textual annotations

For annotation types that do not display text, the Contents key of an annotation dictionary should be specified
with an alternative description of the annotation's contents in human-readable form.

© ISO 2004 — All rights reserved 19


ISO/CD 19005-1

6.8.7 Replacement text

All textual structure elements that are represented in a non-standard manner, e.g., custom characters or inline
graphics, should supply replacement text using the ActualText entry in the structure element dictionary, as
described in PDF Reference, 9.8.3.

NOTE Replacement text provides textual equivalents that aid in the proper interpretation of otherwise opaque, unusual
representations of textual components.

6.8.8 Expansions of abbreviations and acronyms

All instances of abbreviations and acronyms in textual content should be placed in a marked-content
sequence with a Span tag whose E property provides a textual expansion of the abbreviation or acronym, as
described in PDF Reference, 9.8.4.

NOTE Abbreviation and acronym expansion provides textual equivalents that aid in the proper interpretation of
otherwise opaque nomenclature. It is inadvisable for writers to generate these expansions using automated heuristic
methods without appropriate verification.

6.9 Forms

The intent of the requirements of this clause is to ensure that there is no ambiguity about the rendering of form
fields.

A conforming PDF/A reader shall not use form fields to change the rendered representation of the page or the
content of the document at any time. Form fields shall not perform actions of any type.

The NeedAppearances flag of the interactive form dictionary either shall not be present or shall be false.

Every form field shall have an appearance dictionary associated with the field's data. A conforming PDF/A
reader shall render the field according to the appearance dictionary without regard to the form data.

20 © ISO 2004 — All rights reserved


ISO/CD 19005-1

Annex A
(informative)

PDF/A conformance summary

A.1 General
The tables below list the PDF 1.4 operators, objects, and keys within objects for which the requirements of this
part of ISO 19005 vary from the PDF Reference for the purposes of PDF/A conformance. The tables indicate
the status of the operator, object, or key, and the normative clause where that status is defined. The following
status values are used:

 Required The operator, object, or key is required in PDF/A files

 Prohibited The operator, object, or key is prohibited from PDF/A files

 Restricted The operator, object, or key may appear in PDF/A files, but only subject to specific
constraints on its use, contents, or value

 Recommended The operator, object, or key should appear in PDF/A files

 Ignored The operator, object, or key may appear in PDF/A files but is ignored by conforming
readers

If a reference to a PDF dictionary object is included in tables, but keys within that object are not explicitly listed,
then all keys within that object and its descendants, if any, inherit their status from the object that is shown in
the table. An object is a descendent from another object, called its ancestor, if any of the following conditions
are true:

 The object is the value of a key in the ancestor object

 The ancestor is an array and the object is an element of that array

 The object is a descendant of a descendant of the ancestor object

A.2 Operators
All operators defined in PDF Reference for use in Contents streams may be included in a conforming PDF/A
file, subject to the conditions shown in Table A.1.

A.3 Objects and keys


All objects and keys defined in PDF Reference may be included in a conforming PDF/A file, subject to the
conditions shown in Table A.2. Some of the requirements for keys are relative to a specific key/value pair. In
such cases the relevant value is shown following the key.

© ISO 2004 — All rights reserved 21


ISO/CD 19005-1

Table A.1 — Operator status

Operator Status Clause

CS Restricted 6.2.3

cs Restricted 6.2.3

ri Restricted 6.2.9

Operators not defined in PDF Reference Prohibited 6.2.10

Table A.2 — Object and key status

Object Key (and value) Status Clause

AcroForm NeedsApperances Restricted 6.9

Action N NOP Prohibited 6.6.1

S Named Restricted 6.6.1

S ImportData Prohibited 6.6.1


S JavaScript
S Launch
S Movie
S ResetForm
S SetState
S Sound

Annot AA Prohibited 6.6.2

CA Restricted 6.5.3

Contents Recommended (for full conformance 6.8.6


of non-textual annotations)

Subtype Prohibited 6.5.2

Artifact propertyList dictionary Recommended (for full conformance) 6.8.3

Catalog AA Prohibited 6.6.2

Lang Required (for full conformance) 6.8.4

Metadata Required 6.7.2

Names Restricted 6.1.12

OCProperties Prohibited 6.1.14

OutputIntents Restricted 6.2.2

StructTreeRoot Recommended (for full conformance) 6.8.3.2

22 © ISO 2004 — All rights reserved


ISO/CD 19005-1

Table A.2 (Continued)

Object Key (and value) Status Clause

CMap CIDSystemInfo Restricted 6.3.3.1

WMode Restricted 6.3.3.3

ExtGState BM Restricted 6.4

CA Restricted 6.4

ca Restricted 6.4

HT Ignored 6.2.8

SMask Restricted 6.4

TR Prohibited 6.2.8

TR2 Restricted 6.2.8

Field dictionary AA Prohibited 6.6.2

Filespec EF Prohibited 6.1.12

Filters LZWDecode Prohibited 6.1.10

Font FontDescriptor Required (unless Type3) 6.3.4


a
ToUnicode Required (for full conformance) 6.3.8

Type Restricted 6.3.2

Widths Ignored 6.3.6

Font CIDSystemInfo Restricted 7.3.3.1


(Subtype CIDFontType0 or
CIDFontType2)

Font CIDtoGIDMap Restricted 7.3.3.2


(Subtype CIDFontType2)

Font Encoding Prohibited (if symbolic font) 6.3.7


(Subtype TrueType) Restricted (if non-symbolic)

Font file stream Metadata Recommended 6.7.10

FontDescriptor CharSet Required 6.3.5

CIDSet Required (if CIDFont) 6.3.5

FontFile or Required 6.3.5


FontFile2 or
FontFile3
a
There are three specific exemptions from this status defined in 6.3.8.

© ISO 2004 — All rights reserved 23


ISO/CD 19005-1

Table A.2 (Continued)

Object Key (and value) Status Clause

Group S Restricted 6.4

MarkInfo Marked true Required (for full conformance) 6.8.2.2

Page AA Prohibited 6.6.2

PDF/A output intent dictionary DestOutputProfile Restricted 6.6.2

S Restricted 6.2.2

Span dictionary E Recommended (for full conformance) 6.8.8

Lang Recommended (for full conformance 6.8.4


of non-default language content)

Stream dictionary Alternate Ignored 6.2.3.2

F Prohibited 6.1.11

FDecodeParams Prohibited 6.1.11

FFilter Prohibited 6.1.11

ICCBased Restricted 6.2.3.2

Structure element dictionary ActualText Recommended (for full conformance 6.8.7


of non-standard elements)

Alt Recommended (for full conformance 6.8.5


of non-textual elements)

Type StructElem Required (for full conformance) 6.8.3.2

Trailer ID Required 6.1.3

Size Required 6.1.3

XObject Subtype PS Prohibited 6.2.5

XObject Group Restricted 6.4


(Subtype Form)
OPI Prohibited 6.2.5

Ref Prohibited 6.2.6

XObject Alternates Prohibited 6.2.4


(Subtype Image)
Intent Restricted 6.2.4

OPI Prohibited 6.2.4

SMask Restricted 6.4

24 © ISO 2004 — All rights reserved


ISO/CD 19005-1

Annex B
(informative)

Best practices for PDF/A

B.1 Use of non-XMP metadata


Use of non-XMP metadata at the file level is strongly discouraged as there is no assurance that such
metadata can be preserved in accordance with this specification. In cases where non-XMP metadata is
present, the preference is to convert it to XMP, embed it in the file, and document the conversion in the
xmpMM:History property. The xmpMM:History property should also be used to indicate any non-XMP
elements that have not been converted.

Failure to preserve metadata may cause problems in locating, interpreting, managing, and authenticating a file
in the future, which may in turn diminish or cancel its archival value.

B.2 Natural language identifiers


PDF/A writers should identify languages using ISO 639-1/ISO 3166-1 or IANA registered identifiers [1, 4, 14].
Private use identifiers should be used only if the language does not have a defined identifier within ISO 639-
1/ISO 3166-1 or IANA registry. In the event that a language is truly unknown, the identifier x-unknown should
be used.

NOTE The use of ISO 639-1, ISO 3166-1, and IANA-registered identifiers is defined in RFC 1766, Tags for the
Identification of Languages, which PDF uses as the basis for its language identifiers. ISO 639-2 defines three-letter
language identifiers that are not allowed under RFC 1766.

B.3 Recommendations for Capturing or Converting Documents to PDF/A


For archival preservation purposes, this Best Practices statement provides recommendations for processes
that captures or converts documents to PDF/A to ensure that resulting PDF/A documents retain their quality
and integrity as records.

ISO 15489-1, 7.1, specifies that “to support the continuing conduct of business, comply with the regulatory
environment, and provide necessary accountability, organizations should create and maintain authentic,
reliable and useable records, and protect the integrity of those records for as long as required” [8].

The regulatory environment for submitting documents to an organization’s archival institution may include
requirements, standards and policies for electronic documents that stipulate document quality rules such as
minimum image resolution, compression restrictions, or prohibited processes that either alter or dispose of
approved data. For archival preservation purposes, the quality and integrity of documents created according
to these legal and regulatory requirements, applicable standards, and organizational policy should be retained
when they are captured or converted to PDF/A.

To meet this critical archival need, PDF/A capture or conversion processes should replicate the exact content
and quality of the source document within the PDF/A file. Following are examples of specific software
requirements that accomplish this:

 PDF/A writers should not use lossy compression, subsampling, downsampling, or any other process that
either alters the content or degrades the quality of source data in the PDF/A document

© ISO 2004 — All rights reserved 25


ISO/CD 19005-1

 Software should not substitute searchable text, based on optical character recognition, for the original
scanned text within the bit-mapped image of documents that are scanned to PDF/A from paper or
converted to PDF/A from image formats

NOTE Optical character recognition processes may involve loss of data through imprecise interpretation of scanned
characters.

26 © ISO 2004 — All rights reserved


ISO/CD 19005-1

Bibliography

[1] ISO 639-1, Codes for the representation of names of languages — Part 1: Alpha-2 code

[2] ISO/IEC 9541-1, Information technology — Font information interchange — Part 1: Architecture

[3] ISO 2108, Information and documentation – International standard book numbering (ISBN)

[4] ISO 3166-1, Codes for the representation of names of countries and their subdivisions – Part 1:
Country codes

[5] ISO/IEC 10646-1:2000/Amd 1:2002, Mathematical symbols and other characters (Available in English
only)

[6] ISO/IEC 10646-2:2001, Information technology – Universal multiple-octet coded character set (UCS) –
Part 2: Supplementary planes (Available in English only)

[7] ISO 14721, Space data and information transfer systems — Archival information system — Reference
model

[8] ISO 15489-1, Information and documentation — Records management — Part 1: General

[9] ISO/TR 15801, Electronic imaging — Information stored electronically — Recommendations for
trustworthiness and reliability

[10] ISO 15930-4, Graphic technology — Prepress digital data exchange Use of PDF — Part 1: Complete
exchange using CMYK and spot colour printing data using PDF 1.4 (PDF/X-1a)

[11] ISO/CD TR 18492, Electronic imaging — Ensuring long-term access to digital information and images

[12] ISO/WD 18509-1, Electronic archival storage — Specifications relative to the design and operation of
information processing systems in view of ensuring the storage and integrity on recordings stored in
these systems — Part 1: Long term access strategy

[13] ISO/WD 18509-2, Electronic archival storage — Specifications relative to the design and operation of
information processing systems in view of ensuring the storage and integrity on recordings stored in
these systems — Part 2: Technical specifications

[14] Language Tags, IANA. Available from Internet <http://www.iana.org/assignments/language-tags>

[15] The Unicode Standard, Unicode Consortium. Available from Internet


<http://www.unicode.org/versions/>

[16] Unicode Standard Annex #15, Unicode Normalization Forms, Unicode Consortium, 17 April 2003.
Available from Internet <http://www.unicode.org/unicode/reports/tr15/>

© ISO 2004 — All rights reserved 27

You might also like