Skip to content

Commit 6086f5c

Browse files
yunshiSteve Canny
yunshi
authored and
Steve Canny
committed
docs: document hyperlink analysis
1 parent 17dce95 commit 6086f5c

File tree

2 files changed

+304
-0
lines changed

2 files changed

+304
-0
lines changed
Lines changed: 301 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,301 @@
1+
2+
Hyperlink
3+
=========
4+
5+
Word allows hyperlinks to be placed in a document.
6+
7+
The target of a hyperlink may be external, such as a web site, or internal,
8+
to another location in the document.
9+
10+
A hyperlink can contain multiple runs of text, each with its own distinct
11+
text formatting (font).
12+
13+
14+
Candidate protocol
15+
------------------
16+
17+
An external hyperlink has an address and an optional anchor. An internal
18+
hyperlink has only an anchor.
19+
20+
.. highlight:: python
21+
22+
**Add the external hyperlink** `http://us.com#about`::
23+
24+
>>> hyperlink = paragraph.add_hyperlink('About', address='http://us.com', anchor='about')
25+
>>> hyperlink
26+
<docx.text.hyperlink.Hyperlink at 0x7f...>
27+
>>> hyperlink.text
28+
'About'
29+
>>> hyperlink.address
30+
'http://us.com'
31+
>>> hyperlink.anchor
32+
'about'
33+
34+
**Add an internal hyperlink (to a bookmark)**::
35+
36+
>>> hyperlink = paragraph.add_hyperlink('Section 1', anchor='Section_1')
37+
>>> hyperlink.text
38+
'Section 1'
39+
>>> hyperlink.anchor
40+
'Section_1'
41+
>>> hyperlink.address
42+
None
43+
44+
**Modify hyperlink properties**::
45+
46+
>>> hyperlink.text = 'Froogle'
47+
>>> hyperlink.text
48+
'Froogle'
49+
>>> hyperlink.address = 'mailto:info@froogle.com?subject=sup dawg?'
50+
>>> hyperlink.address
51+
'mailto:info@froogle.com?subject=sup%20dawg%3F'
52+
>>> hyperlink.anchor = None
53+
>>> hyperlink.anchor
54+
None
55+
56+
**Add additional runs to a hyperlink**::
57+
58+
>>> hyperlink.text = 'A '
59+
>>> # .insert_run inserts a new run at idx, defaults to idx=-1
60+
>>> hyperlink.insert_run(' link').bold = True
61+
>>> hyperlink.insert_run('formatted', idx=1).bold = True
62+
>>> hyperlink.text
63+
'A formatted link'
64+
>>> [r for r in hyperlink.iter_runs()]
65+
[<docx.text.run.Run at 0x7fa...>,
66+
<docx.text.run.Run at 0x7fb...>,
67+
<docx.text.run.Run at 0x7fc...>]
68+
69+
**Iterate over the run-level items a paragraph contains**::
70+
71+
>>> paragraph = document.add_paragraph('A paragraph having a link to: ')
72+
>>> paragraph.add_hyperlink(text='github', address='http://github.com')
73+
>>> [item for item in paragraph.iter_run_level_items()]:
74+
[<docx.text.paragraph.Run at 0x7fd...>, <docx.text.paragraph.Hyperlink at 0x7fe...>]
75+
76+
**Paragraph.text now includes text contained in a hyperlink**::
77+
78+
>>> paragraph.text
79+
'A paragraph having a link to: github'
80+
81+
82+
Word Behaviors
83+
--------------
84+
85+
* What are the semantics of the w:history attribute on w:hyperlink? I'm
86+
suspecting this indicates whether the link should show up blue (unvisited)
87+
or purple (visited). I'm inclined to think we need that as a read/write
88+
property on hyperlink. We should see what the MS API does on this count.
89+
90+
* We probably need to enforce some character-set restrictions on w:anchor.
91+
Word doesn't seem to like spaces or hyphens, for example. The simple type
92+
ST_String doesn't look like it takes care of this.
93+
94+
* We'll need to test URL escaping of special characters like spaces and
95+
question marks in Hyperlink.address.
96+
97+
* What does Word do when loading a document containing an internal hyperlink
98+
having an anchor value that doesn't match an existing bookmark? We'll want
99+
to know because we're sure to get support inquiries from folks who don't
100+
match those up and wonder why they get a repair error or whatever.
101+
102+
103+
Specimen XML
104+
------------
105+
106+
.. highlight:: xml
107+
108+
109+
External links
110+
~~~~~~~~~~~~~~
111+
112+
The address (URL) of an external hyperlink is stored in the document.xml.rels
113+
file, keyed by the w:hyperlink@r:id attribute::
114+
115+
<w:p>
116+
<w:r>
117+
<w:t xml:space="preserve">This is an external link to </w:t>
118+
</w:r>
119+
<w:hyperlink r:id="rId4">
120+
<w:r>
121+
<w:rPr>
122+
<w:rStyle w:val="Hyperlink"/>
123+
</w:rPr>
124+
<w:t>Google</w:t>
125+
</w:r>
126+
</w:hyperlink>
127+
</w:p>
128+
129+
... mapping to relationship in document.xml.rels::
130+
131+
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
132+
<Relationship Id="rId4" Mode="External" Type="http://..." Target="http://google.com/"/>
133+
</Relationships>
134+
135+
A hyperlink can contain multiple runs of text (and a whole lot of other
136+
stuff, including nested hyperlinks, at least as far as the schema indicates)::
137+
138+
<w:p>
139+
<w:hyperlink r:id="rId2">
140+
<w:r>
141+
<w:rPr>
142+
<w:rStyle w:val="Hyperlink"/>
143+
</w:rPr>
144+
<w:t xml:space="preserve">A hyperlink containing an </w:t>
145+
</w:r>
146+
<w:r>
147+
<w:rPr>
148+
<w:rStyle w:val="Hyperlink"/>
149+
<w:i/>
150+
</w:rPr>
151+
<w:t>italicized</w:t>
152+
</w:r>
153+
<w:r>
154+
<w:rPr>
155+
<w:rStyle w:val="Hyperlink"/>
156+
</w:rPr>
157+
<w:t xml:space="preserve"> word</w:t>
158+
</w:r>
159+
</w:hyperlink>
160+
</w:p>
161+
162+
163+
Internal links
164+
~~~~~~~~~~~~~~
165+
166+
An internal link provides "jump to another document location" behavior in the
167+
Word UI. An internal link is distinguished by the absence of an r:id
168+
attribute. In this case, the w:anchor attribute is required. The value of the
169+
anchor attribute is the name of a bookmark in the document.
170+
171+
Example::
172+
173+
<w:p>
174+
<w:r>
175+
<w:t xml:space="preserve">See </w:t>
176+
</w:r>
177+
<w:hyperlink w:anchor="Section_4">
178+
<w:r>
179+
<w:rPr>
180+
<w:rStyle w:val="Hyperlink"/>
181+
</w:rPr>
182+
<w:t>Section 4</w:t>
183+
</w:r>
184+
</w:hyperlink>
185+
<w:r>
186+
<w:t xml:space="preserve"> for more details.</w:t>
187+
</w:r>
188+
</w:p>
189+
190+
... referring to this bookmark elsewhere in the document::
191+
192+
<w:p>
193+
<w:bookmarkStart w:id="0" w:name="Section_4"/>
194+
<w:r>
195+
<w:t>Section 4</w:t>
196+
</w:r>
197+
<w:bookmarkEnd w:id="0"/>
198+
</w:p>
199+
200+
201+
Schema excerpt
202+
--------------
203+
204+
.. highlight:: xml
205+
206+
::
207+
208+
<xsd:complexType name="CT_P">
209+
<xsd:sequence>
210+
<xsd:element name="pPr" type="CT_PPr" minOccurs="0"/>
211+
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
212+
</xsd:sequence>
213+
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
214+
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
215+
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
216+
<xsd:attribute name="rsidP" type="ST_LongHexNumber"/>
217+
<xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/>
218+
</xsd:complexType>
219+
220+
<xsd:group name="EG_PContent"> <!-- denormalized -->
221+
<xsd:choice>
222+
<xsd:element name="r" type="CT_R"/>
223+
<xsd:element name="hyperlink" type="CT_Hyperlink"/>
224+
<xsd:element name="fldSimple" type="CT_SimpleField"/>
225+
<xsd:element name="sdt" type="CT_SdtRun"/>
226+
<xsd:element name="customXml" type="CT_CustomXmlRun"/>
227+
<xsd:element name="smartTag" type="CT_SmartTagRun"/>
228+
<xsd:element name="dir" type="CT_DirContentRun"/>
229+
<xsd:element name="bdo" type="CT_BdoContentRun"/>
230+
<xsd:element name="subDoc" type="CT_Rel"/>
231+
<xsd:group ref="EG_RunLevelElts"/>
232+
</xsd:choice>
233+
</xsd:group>
234+
235+
<xsd:complexType name="CT_Hyperlink">
236+
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
237+
<xsd:attribute name="tgtFrame" type="s:ST_String"/>
238+
<xsd:attribute name="tooltip" type="s:ST_String"/>
239+
<xsd:attribute name="docLocation" type="s:ST_String"/>
240+
<xsd:attribute name="history" type="s:ST_OnOff"/>
241+
<xsd:attribute name="anchor" type="s:ST_String"/>
242+
<xsd:attribute ref="r:id"/>
243+
</xsd:complexType>
244+
245+
<xsd:group name="EG_RunLevelElts">
246+
<xsd:choice>
247+
<xsd:element name="proofErr" type="CT_ProofErr"/>
248+
<xsd:element name="permStart" type="CT_PermStart"/>
249+
<xsd:element name="permEnd" type="CT_Perm"/>
250+
<xsd:element name="bookmarkStart" type="CT_Bookmark"/>
251+
<xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
252+
<xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
253+
<xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
254+
<xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
255+
<xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
256+
<xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
257+
<xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
258+
<xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
259+
<xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
260+
<xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
261+
<xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
262+
<xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
263+
<xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
264+
<xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
265+
<xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
266+
<xsd:element name="ins" type="CT_RunTrackChange"/>
267+
<xsd:element name="del" type="CT_RunTrackChange"/>
268+
<xsd:element name="moveFrom" type="CT_RunTrackChange"/>
269+
<xsd:element name="moveTo" type="CT_RunTrackChange"/>
270+
<xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/>
271+
</xsd:choice>
272+
</xsd:group>
273+
274+
<xsd:complexType name="CT_R">
275+
<xsd:sequence>
276+
<xsd:group ref="EG_RPr" minOccurs="0"/>
277+
<xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
278+
</xsd:sequence>
279+
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
280+
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
281+
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
282+
</xsd:complexType>
283+
284+
<xsd:simpleType name="ST_OnOff">
285+
<xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
286+
</xsd:simpleType>
287+
288+
<xsd:simpleType name="ST_OnOff1">
289+
<xsd:restriction base="xsd:string">
290+
<xsd:enumeration value="on"/>
291+
<xsd:enumeration value="off"/>
292+
</xsd:restriction>
293+
</xsd:simpleType>
294+
295+
<xsd:simpleType name="ST_RelationshipId">
296+
<xsd:restriction base="xsd:string"/>
297+
</xsd:simpleType>
298+
299+
<xsd:simpleType name="ST_String">
300+
<xsd:restriction base="xsd:string"/>
301+
</xsd:simpleType>

docs/dev/analysis/features/text/index.rst

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,10 +5,13 @@ Text
55
.. toctree::
66
:titlesonly:
77

8+
hyperlink
89
font-highlight-color
910
paragraph-format
1011
font
1112
font-color
1213
underline
1314
run-content
1415
breaks
16+
hyperlink
17+

0 commit comments

Comments
 (0)