Document Formats and Image Formats: James C. King
Document Formats and Image Formats: James C. King
Document Formats and Image Formats: James C. King
James C. King
PDF Architect/Senior Principal Scientist
Advanced Technology Laboratory
Adobe Systems Incorporated
1
Outline
Some Fundamentals
PDF Documents
PDF Pages
Synthesized Pages versus Scanned Pages
PDF and JPEG2000
PDF and ISO Standards
2
Some Fundamentals
3
Image Formats versus Document Formats
picture
“Sampled” Image
(e.g., JPEG2000)
(e.g., PDF)
4
Image Resolution and Size
lower resolution
display
subsample supersample
higher resolution
display (x2)
higher resolution
sampled image (x2)
5
Image Sampling (JPEG2000)
display or page
(arbitrary image size)
subs
amp
supe le
rsam
ple
JPEG2000 Image
(multiple resolutions)
6
PDF Documents
7
PDF: Multi-page Compound Documents
9
PDF Pages
10
Text, Graphics and Image
Typographic Text
Vector Graphics
11
Coordinate Transforms
x t
T e
2.5 0 0 -1 235 170 cm
12
Clipping and Masking
Picture
picture Mask
Typographic
Typogra
grap Text
13
Text as Text
Text as text
(using outline fonts) Text as image
14
Various Resolutions for Image Text
9.5 in x 5.3 in
15
Resolution Independence
16
Resolution Independence
17
Synthesized Pages
versus
Scanned Pages
18
Document Sources
Born digital
More compact
Editable
Device independent/resolution independent
Zoom-able
19
OCR’ed Text as Underlayer
OCR’d Text
• underlaid
• made invisible
• may have mistakes
• used for search
Scanned Text as Image
A PDF Page
20
Image Text and Image Picture Require Different Treatment
21
PDF and JPEG2000
22
PDF Support for JPEG2000
23
Software Support
Key to use of any image format or document format are the tools available
Tools for creation
support advanced features
Tools for presentation
Tools for incorporating with other formats
Ubiquity of viewing tools
OCR and DR capabilities
24
Tools for Scan to PDF
25
PDF and
ISO Standards
26
Establishing the ISO PDF Umbrella
27
PDF/A
28
Long-term Preservation Needs for Electronic Documents
29
PDF/A -- A PDF Subset of PDF 1.4
(Standard: ISO 19005-1)
Some useful PDF features work against, and are incompatible with,
preserving information over the long-term
PDF/A
PDF Subset: restricted from using some PDF features, for example
Anything that would alter the visual appearance over time (forms)
No external references or embedded files
Encryption
PDF Subset: required to use some PDF features, for example
Accessibility features for recoverable text (tagged PDF)
Embed all fonts
Specific metadata requirements
Device independent color
30
Uses for PDF/A
31
bc
32