Skip to content

Commit 4935c30

Browse files
yunshiscanny
yunshi
authored andcommitted
docs: document hyperlink analysis
1 parent f07823c commit 4935c30

File tree

3 files changed

+355
-0
lines changed

3 files changed

+355
-0
lines changed

docs/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -115,6 +115,8 @@
115115
116116
.. |HeaderPart| replace:: :class:`.HeaderPart`
117117
118+
.. |Hyperlink| replace:: :class:`.Hyperlink`
119+
118120
.. |ImageParts| replace:: :class:`.ImageParts`
119121
120122
.. |Inches| replace:: :class:`.Inches`
Lines changed: 352 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,352 @@
1+
2+
Hyperlink
3+
=========
4+
5+
Word allows hyperlinks to be placed in a document wherever paragraphs can appear.
6+
7+
The target (URL) of a hyperlink may be external, such as a web site, or internal, to
8+
another location in the document.
9+
10+
The visible text of a hyperlink is held in one or more runs. Technically a hyperlink can
11+
have zero runs, but this occurs only in contrived cases (otherwise there would be
12+
nothing to click on). As usual, each run can have its own distinct text formatting
13+
(font), so for example one word in the hyperlink can be bold, etc. By default, Word
14+
applies the built-in `Hyperlink` character style to a newly inserted hyperlink.
15+
16+
Note that rendered page-breaks can occur in the middle of a hyperlink.
17+
18+
A |Hyperlink| is a child of |Paragraph|, a peer of |Run|.
19+
20+
21+
Candidate protocol
22+
------------------
23+
24+
An external hyperlink has an address and an optional anchor. An internal hyperlink has
25+
only an anchor. An anchor is also known as a *URI fragment* and follows a hash mark
26+
("#").
27+
28+
Note that the anchor and URL are stored in two distinct attributes, so you need to
29+
concatenate `.address` and `.anchor` if you want the whole thing.
30+
31+
.. highlight:: python
32+
33+
**Access hyperlinks in a paragraph**::
34+
35+
>>> hyperlinks = paragraph.hyperlinks
36+
[<docx.text.hyperlink.Hyperlink at 0x7f...>]
37+
38+
**Access hyperlinks in a paragraph in document order with runs**::
39+
40+
>>> list(paragraph.iter_inner_content())
41+
[
42+
<docx.text.run.Run at 0x7f...>
43+
<docx.text.hyperlink.Hyperlink at 0x7f...>
44+
<docx.text.run.Run at 0x7f...>
45+
]
46+
47+
**Access hyperlink address**::
48+
49+
>>> hyperlink.address
50+
'https://google.com/'
51+
52+
**Access hyperlinks runs**::
53+
54+
>>> hyperlink.runs
55+
[
56+
<docx.text.run.Run at 0x7f...>
57+
<docx.text.run.Run at 0x7f...>
58+
<docx.text.run.Run at 0x7f...>
59+
]
60+
61+
**Determine whether a hyperlink contains a rendered page-break**::
62+
63+
>>> hyperlink.contains_page_break
64+
False
65+
66+
**Access visible text of a hyperlink**::
67+
68+
>>> hyperlink.text
69+
'an excellent Wikipedia article on ferrets'
70+
71+
**Add an external hyperlink**::
72+
73+
>>> hyperlink = paragraph.add_hyperlink(
74+
'About', address='http://us.com', anchor='about'
75+
)
76+
>>> hyperlink
77+
<docx.text.hyperlink.Hyperlink at 0x7f...>
78+
>>> hyperlink.text
79+
'About'
80+
>>> hyperlink.address
81+
'http://us.com'
82+
>>> hyperlink.anchor
83+
'about'
84+
85+
**Add an internal hyperlink (to a bookmark)**::
86+
87+
>>> hyperlink = paragraph.add_hyperlink('Section 1', anchor='Section_1')
88+
>>> hyperlink.text
89+
'Section 1'
90+
>>> hyperlink.anchor
91+
'Section_1'
92+
>>> hyperlink.address
93+
None
94+
95+
**Modify hyperlink properties**::
96+
97+
>>> hyperlink.text = 'Froogle'
98+
>>> hyperlink.text
99+
'Froogle'
100+
>>> hyperlink.address = 'mailto:info@froogle.com?subject=sup dawg?'
101+
>>> hyperlink.address
102+
'mailto:info@froogle.com?subject=sup%20dawg%3F'
103+
>>> hyperlink.anchor = None
104+
>>> hyperlink.anchor
105+
None
106+
107+
**Add additional runs to a hyperlink**::
108+
109+
>>> hyperlink.text = 'A '
110+
>>> # .insert_run inserts a new run at idx, defaults to idx=-1
111+
>>> hyperlink.insert_run(' link').bold = True
112+
>>> hyperlink.insert_run('formatted', idx=1).bold = True
113+
>>> hyperlink.text
114+
'A formatted link'
115+
>>> [r for r in hyperlink.iter_runs()]
116+
[<docx.text.run.Run at 0x7fa...>,
117+
<docx.text.run.Run at 0x7fb...>,
118+
<docx.text.run.Run at 0x7fc...>]
119+
120+
**Iterate over the run-level items a paragraph contains**::
121+
122+
>>> paragraph = document.add_paragraph('A paragraph having a link to: ')
123+
>>> paragraph.add_hyperlink(text='github', address='http://github.com')
124+
>>> [item for item in paragraph.iter_run_level_items()]:
125+
[<docx.text.paragraph.Run at 0x7fd...>, <docx.text.paragraph.Hyperlink at 0x7fe...>]
126+
127+
**Paragraph.text now includes text contained in a hyperlink**::
128+
129+
>>> paragraph.text
130+
'A paragraph having a link to: github'
131+
132+
133+
Word Behaviors
134+
--------------
135+
136+
* What are the semantics of the w:history attribute on w:hyperlink? I'm
137+
suspecting this indicates whether the link should show up blue (unvisited)
138+
or purple (visited). I'm inclined to think we need that as a read/write
139+
property on hyperlink. We should see what the MS API does on this count.
140+
141+
* We probably need to enforce some character-set restrictions on w:anchor.
142+
Word doesn't seem to like spaces or hyphens, for example. The simple type
143+
ST_String doesn't look like it takes care of this.
144+
145+
* We'll need to test URL escaping of special characters like spaces and
146+
question marks in Hyperlink.address.
147+
148+
* What does Word do when loading a document containing an internal hyperlink
149+
having an anchor value that doesn't match an existing bookmark? We'll want
150+
to know because we're sure to get support inquiries from folks who don't
151+
match those up and wonder why they get a repair error or whatever.
152+
153+
154+
Specimen XML
155+
------------
156+
157+
.. highlight:: xml
158+
159+
160+
External links
161+
~~~~~~~~~~~~~~
162+
163+
The address (URL) of an external hyperlink is stored in the document.xml.rels
164+
file, keyed by the w:hyperlink@r:id attribute::
165+
166+
<w:p>
167+
<w:r>
168+
<w:t xml:space="preserve">This is an external link to </w:t>
169+
</w:r>
170+
<w:hyperlink r:id="rId4">
171+
<w:r>
172+
<w:rPr>
173+
<w:rStyle w:val="Hyperlink"/>
174+
</w:rPr>
175+
<w:t>Google</w:t>
176+
</w:r>
177+
</w:hyperlink>
178+
</w:p>
179+
180+
... mapping to relationship in document.xml.rels::
181+
182+
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
183+
<Relationship Id="rId4" Mode="External" Type="http://..." Target="http://google.com/"/>
184+
</Relationships>
185+
186+
A hyperlink can contain multiple runs of text (and a whole lot of other
187+
stuff, including nested hyperlinks, at least as far as the schema indicates)::
188+
189+
<w:p>
190+
<w:hyperlink r:id="rId2">
191+
<w:r>
192+
<w:rPr>
193+
<w:rStyle w:val="Hyperlink"/>
194+
</w:rPr>
195+
<w:t xml:space="preserve">A hyperlink containing an </w:t>
196+
</w:r>
197+
<w:r>
198+
<w:rPr>
199+
<w:rStyle w:val="Hyperlink"/>
200+
<w:i/>
201+
</w:rPr>
202+
<w:t>italicized</w:t>
203+
</w:r>
204+
<w:r>
205+
<w:rPr>
206+
<w:rStyle w:val="Hyperlink"/>
207+
</w:rPr>
208+
<w:t xml:space="preserve"> word</w:t>
209+
</w:r>
210+
</w:hyperlink>
211+
</w:p>
212+
213+
214+
Internal links
215+
~~~~~~~~~~~~~~
216+
217+
An internal link provides "jump to another document location" behavior in the
218+
Word UI. An internal link is distinguished by the absence of an r:id
219+
attribute. In this case, the w:anchor attribute is required. The value of the
220+
anchor attribute is the name of a bookmark in the document.
221+
222+
Example::
223+
224+
<w:p>
225+
<w:r>
226+
<w:t xml:space="preserve">See </w:t>
227+
</w:r>
228+
<w:hyperlink w:anchor="Section_4">
229+
<w:r>
230+
<w:rPr>
231+
<w:rStyle w:val="Hyperlink"/>
232+
</w:rPr>
233+
<w:t>Section 4</w:t>
234+
</w:r>
235+
</w:hyperlink>
236+
<w:r>
237+
<w:t xml:space="preserve"> for more details.</w:t>
238+
</w:r>
239+
</w:p>
240+
241+
... referring to this bookmark elsewhere in the document::
242+
243+
<w:p>
244+
<w:bookmarkStart w:id="0" w:name="Section_4"/>
245+
<w:r>
246+
<w:t>Section 4</w:t>
247+
</w:r>
248+
<w:bookmarkEnd w:id="0"/>
249+
</w:p>
250+
251+
252+
Schema excerpt
253+
--------------
254+
255+
.. highlight:: xml
256+
257+
::
258+
259+
<xsd:complexType name="CT_P">
260+
<xsd:sequence>
261+
<xsd:element name="pPr" type="CT_PPr" minOccurs="0"/>
262+
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
263+
</xsd:sequence>
264+
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
265+
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
266+
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
267+
<xsd:attribute name="rsidP" type="ST_LongHexNumber"/>
268+
<xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/>
269+
</xsd:complexType>
270+
271+
<xsd:group name="EG_PContent"> <!-- denormalized -->
272+
<xsd:choice>
273+
<xsd:element name="r" type="CT_R"/>
274+
<xsd:element name="hyperlink" type="CT_Hyperlink"/>
275+
<xsd:element name="fldSimple" type="CT_SimpleField"/>
276+
<xsd:element name="sdt" type="CT_SdtRun"/>
277+
<xsd:element name="customXml" type="CT_CustomXmlRun"/>
278+
<xsd:element name="smartTag" type="CT_SmartTagRun"/>
279+
<xsd:element name="dir" type="CT_DirContentRun"/>
280+
<xsd:element name="bdo" type="CT_BdoContentRun"/>
281+
<xsd:element name="subDoc" type="CT_Rel"/>
282+
<xsd:group ref="EG_RunLevelElts"/>
283+
</xsd:choice>
284+
</xsd:group>
285+
286+
<xsd:complexType name="CT_Hyperlink">
287+
<xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/>
288+
<xsd:attribute name="tgtFrame" type="s:ST_String"/>
289+
<xsd:attribute name="tooltip" type="s:ST_String"/>
290+
<xsd:attribute name="docLocation" type="s:ST_String"/>
291+
<xsd:attribute name="history" type="s:ST_OnOff"/>
292+
<xsd:attribute name="anchor" type="s:ST_String"/>
293+
<xsd:attribute ref="r:id"/>
294+
</xsd:complexType>
295+
296+
<xsd:group name="EG_RunLevelElts">
297+
<xsd:choice>
298+
<xsd:element name="proofErr" type="CT_ProofErr"/>
299+
<xsd:element name="permStart" type="CT_PermStart"/>
300+
<xsd:element name="permEnd" type="CT_Perm"/>
301+
<xsd:element name="bookmarkStart" type="CT_Bookmark"/>
302+
<xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
303+
<xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
304+
<xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
305+
<xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
306+
<xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
307+
<xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
308+
<xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
309+
<xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
310+
<xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
311+
<xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
312+
<xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
313+
<xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
314+
<xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
315+
<xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
316+
<xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
317+
<xsd:element name="ins" type="CT_RunTrackChange"/>
318+
<xsd:element name="del" type="CT_RunTrackChange"/>
319+
<xsd:element name="moveFrom" type="CT_RunTrackChange"/>
320+
<xsd:element name="moveTo" type="CT_RunTrackChange"/>
321+
<xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/>
322+
</xsd:choice>
323+
</xsd:group>
324+
325+
<xsd:complexType name="CT_R">
326+
<xsd:sequence>
327+
<xsd:group ref="EG_RPr" minOccurs="0"/>
328+
<xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/>
329+
</xsd:sequence>
330+
<xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/>
331+
<xsd:attribute name="rsidDel" type="ST_LongHexNumber"/>
332+
<xsd:attribute name="rsidR" type="ST_LongHexNumber"/>
333+
</xsd:complexType>
334+
335+
<xsd:simpleType name="ST_OnOff">
336+
<xsd:union memberTypes="xsd:boolean ST_OnOff1"/>
337+
</xsd:simpleType>
338+
339+
<xsd:simpleType name="ST_OnOff1">
340+
<xsd:restriction base="xsd:string">
341+
<xsd:enumeration value="on"/>
342+
<xsd:enumeration value="off"/>
343+
</xsd:restriction>
344+
</xsd:simpleType>
345+
346+
<xsd:simpleType name="ST_RelationshipId">
347+
<xsd:restriction base="xsd:string"/>
348+
</xsd:simpleType>
349+
350+
<xsd:simpleType name="ST_String">
351+
<xsd:restriction base="xsd:string"/>
352+
</xsd:simpleType>

docs/dev/analysis/features/text/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,7 @@ Text
55
.. toctree::
66
:titlesonly:
77

8+
hyperlink
89
tab-stops
910
font-highlight-color
1011
paragraph-format

0 commit comments

Comments
 (0)