|
| 1 | + |
| 2 | +Hyperlink |
| 3 | +========= |
| 4 | + |
| 5 | +Word allows hyperlinks to be placed in a document wherever paragraphs can appear. |
| 6 | + |
| 7 | +The target (URL) of a hyperlink may be external, such as a web site, or internal, to |
| 8 | +another location in the document. |
| 9 | + |
| 10 | +The visible text of a hyperlink is held in one or more runs. Technically a hyperlink can |
| 11 | +have zero runs, but this occurs only in contrived cases (otherwise there would be |
| 12 | +nothing to click on). As usual, each run can have its own distinct text formatting |
| 13 | +(font), so for example one word in the hyperlink can be bold, etc. By default, Word |
| 14 | +applies the built-in `Hyperlink` character style to a newly inserted hyperlink. |
| 15 | + |
| 16 | +Note that rendered page-breaks can occur in the middle of a hyperlink. |
| 17 | + |
| 18 | +A |Hyperlink| is a child of |Paragraph|, a peer of |Run|. |
| 19 | + |
| 20 | + |
| 21 | +Candidate protocol |
| 22 | +------------------ |
| 23 | + |
| 24 | +An external hyperlink has an address and an optional anchor. An internal hyperlink has |
| 25 | +only an anchor. An anchor is also known as a *URI fragment* and follows a hash mark |
| 26 | +("#"). |
| 27 | + |
| 28 | +Note that the anchor and URL are stored in two distinct attributes, so you need to |
| 29 | +concatenate `.address` and `.anchor` if you want the whole thing. |
| 30 | + |
| 31 | +.. highlight:: python |
| 32 | + |
| 33 | +**Access hyperlinks in a paragraph**:: |
| 34 | + |
| 35 | + >>> hyperlinks = paragraph.hyperlinks |
| 36 | + [<docx.text.hyperlink.Hyperlink at 0x7f...>] |
| 37 | + |
| 38 | +**Access hyperlinks in a paragraph in document order with runs**:: |
| 39 | + |
| 40 | + >>> list(paragraph.iter_inner_content()) |
| 41 | + [ |
| 42 | + <docx.text.run.Run at 0x7f...> |
| 43 | + <docx.text.hyperlink.Hyperlink at 0x7f...> |
| 44 | + <docx.text.run.Run at 0x7f...> |
| 45 | + ] |
| 46 | + |
| 47 | +**Access hyperlink address**:: |
| 48 | + |
| 49 | + >>> hyperlink.address |
| 50 | + 'https://google.com/' |
| 51 | + |
| 52 | +**Access hyperlinks runs**:: |
| 53 | + |
| 54 | + >>> hyperlink.runs |
| 55 | + [ |
| 56 | + <docx.text.run.Run at 0x7f...> |
| 57 | + <docx.text.run.Run at 0x7f...> |
| 58 | + <docx.text.run.Run at 0x7f...> |
| 59 | + ] |
| 60 | + |
| 61 | +**Determine whether a hyperlink contains a rendered page-break**:: |
| 62 | + |
| 63 | + >>> hyperlink.contains_page_break |
| 64 | + False |
| 65 | + |
| 66 | +**Access visible text of a hyperlink**:: |
| 67 | + |
| 68 | + >>> hyperlink.text |
| 69 | + 'an excellent Wikipedia article on ferrets' |
| 70 | + |
| 71 | +**Add an external hyperlink**:: |
| 72 | + |
| 73 | + >>> hyperlink = paragraph.add_hyperlink( |
| 74 | + 'About', address='http://us.com', anchor='about' |
| 75 | + ) |
| 76 | + >>> hyperlink |
| 77 | + <docx.text.hyperlink.Hyperlink at 0x7f...> |
| 78 | + >>> hyperlink.text |
| 79 | + 'About' |
| 80 | + >>> hyperlink.address |
| 81 | + 'http://us.com' |
| 82 | + >>> hyperlink.anchor |
| 83 | + 'about' |
| 84 | + |
| 85 | +**Add an internal hyperlink (to a bookmark)**:: |
| 86 | + |
| 87 | + >>> hyperlink = paragraph.add_hyperlink('Section 1', anchor='Section_1') |
| 88 | + >>> hyperlink.text |
| 89 | + 'Section 1' |
| 90 | + >>> hyperlink.anchor |
| 91 | + 'Section_1' |
| 92 | + >>> hyperlink.address |
| 93 | + None |
| 94 | + |
| 95 | +**Modify hyperlink properties**:: |
| 96 | + |
| 97 | + >>> hyperlink.text = 'Froogle' |
| 98 | + >>> hyperlink.text |
| 99 | + 'Froogle' |
| 100 | + >>> hyperlink.address = 'mailto:info@froogle.com?subject=sup dawg?' |
| 101 | + >>> hyperlink.address |
| 102 | + 'mailto:info@froogle.com?subject=sup%20dawg%3F' |
| 103 | + >>> hyperlink.anchor = None |
| 104 | + >>> hyperlink.anchor |
| 105 | + None |
| 106 | + |
| 107 | +**Add additional runs to a hyperlink**:: |
| 108 | + |
| 109 | + >>> hyperlink.text = 'A ' |
| 110 | + >>> # .insert_run inserts a new run at idx, defaults to idx=-1 |
| 111 | + >>> hyperlink.insert_run(' link').bold = True |
| 112 | + >>> hyperlink.insert_run('formatted', idx=1).bold = True |
| 113 | + >>> hyperlink.text |
| 114 | + 'A formatted link' |
| 115 | + >>> [r for r in hyperlink.iter_runs()] |
| 116 | + [<docx.text.run.Run at 0x7fa...>, |
| 117 | + <docx.text.run.Run at 0x7fb...>, |
| 118 | + <docx.text.run.Run at 0x7fc...>] |
| 119 | + |
| 120 | +**Iterate over the run-level items a paragraph contains**:: |
| 121 | + |
| 122 | + >>> paragraph = document.add_paragraph('A paragraph having a link to: ') |
| 123 | + >>> paragraph.add_hyperlink(text='github', address='http://github.com') |
| 124 | + >>> [item for item in paragraph.iter_run_level_items()]: |
| 125 | + [<docx.text.paragraph.Run at 0x7fd...>, <docx.text.paragraph.Hyperlink at 0x7fe...>] |
| 126 | + |
| 127 | +**Paragraph.text now includes text contained in a hyperlink**:: |
| 128 | + |
| 129 | + >>> paragraph.text |
| 130 | + 'A paragraph having a link to: github' |
| 131 | + |
| 132 | + |
| 133 | +Word Behaviors |
| 134 | +-------------- |
| 135 | + |
| 136 | +* What are the semantics of the w:history attribute on w:hyperlink? I'm |
| 137 | + suspecting this indicates whether the link should show up blue (unvisited) |
| 138 | + or purple (visited). I'm inclined to think we need that as a read/write |
| 139 | + property on hyperlink. We should see what the MS API does on this count. |
| 140 | + |
| 141 | +* We probably need to enforce some character-set restrictions on w:anchor. |
| 142 | + Word doesn't seem to like spaces or hyphens, for example. The simple type |
| 143 | + ST_String doesn't look like it takes care of this. |
| 144 | + |
| 145 | +* We'll need to test URL escaping of special characters like spaces and |
| 146 | + question marks in Hyperlink.address. |
| 147 | + |
| 148 | +* What does Word do when loading a document containing an internal hyperlink |
| 149 | + having an anchor value that doesn't match an existing bookmark? We'll want |
| 150 | + to know because we're sure to get support inquiries from folks who don't |
| 151 | + match those up and wonder why they get a repair error or whatever. |
| 152 | + |
| 153 | + |
| 154 | +Specimen XML |
| 155 | +------------ |
| 156 | + |
| 157 | +.. highlight:: xml |
| 158 | + |
| 159 | + |
| 160 | +External links |
| 161 | +~~~~~~~~~~~~~~ |
| 162 | + |
| 163 | +The address (URL) of an external hyperlink is stored in the document.xml.rels |
| 164 | +file, keyed by the w:hyperlink@r:id attribute:: |
| 165 | + |
| 166 | + <w:p> |
| 167 | + <w:r> |
| 168 | + <w:t xml:space="preserve">This is an external link to </w:t> |
| 169 | + </w:r> |
| 170 | + <w:hyperlink r:id="rId4"> |
| 171 | + <w:r> |
| 172 | + <w:rPr> |
| 173 | + <w:rStyle w:val="Hyperlink"/> |
| 174 | + </w:rPr> |
| 175 | + <w:t>Google</w:t> |
| 176 | + </w:r> |
| 177 | + </w:hyperlink> |
| 178 | + </w:p> |
| 179 | + |
| 180 | +... mapping to relationship in document.xml.rels:: |
| 181 | + |
| 182 | + <Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships"> |
| 183 | + <Relationship Id="rId4" Mode="External" Type="http://..." Target="http://google.com/"/> |
| 184 | + </Relationships> |
| 185 | + |
| 186 | +A hyperlink can contain multiple runs of text (and a whole lot of other |
| 187 | +stuff, including nested hyperlinks, at least as far as the schema indicates):: |
| 188 | + |
| 189 | + <w:p> |
| 190 | + <w:hyperlink r:id="rId2"> |
| 191 | + <w:r> |
| 192 | + <w:rPr> |
| 193 | + <w:rStyle w:val="Hyperlink"/> |
| 194 | + </w:rPr> |
| 195 | + <w:t xml:space="preserve">A hyperlink containing an </w:t> |
| 196 | + </w:r> |
| 197 | + <w:r> |
| 198 | + <w:rPr> |
| 199 | + <w:rStyle w:val="Hyperlink"/> |
| 200 | + <w:i/> |
| 201 | + </w:rPr> |
| 202 | + <w:t>italicized</w:t> |
| 203 | + </w:r> |
| 204 | + <w:r> |
| 205 | + <w:rPr> |
| 206 | + <w:rStyle w:val="Hyperlink"/> |
| 207 | + </w:rPr> |
| 208 | + <w:t xml:space="preserve"> word</w:t> |
| 209 | + </w:r> |
| 210 | + </w:hyperlink> |
| 211 | + </w:p> |
| 212 | + |
| 213 | + |
| 214 | +Internal links |
| 215 | +~~~~~~~~~~~~~~ |
| 216 | + |
| 217 | +An internal link provides "jump to another document location" behavior in the |
| 218 | +Word UI. An internal link is distinguished by the absence of an r:id |
| 219 | +attribute. In this case, the w:anchor attribute is required. The value of the |
| 220 | +anchor attribute is the name of a bookmark in the document. |
| 221 | + |
| 222 | +Example:: |
| 223 | + |
| 224 | + <w:p> |
| 225 | + <w:r> |
| 226 | + <w:t xml:space="preserve">See </w:t> |
| 227 | + </w:r> |
| 228 | + <w:hyperlink w:anchor="Section_4"> |
| 229 | + <w:r> |
| 230 | + <w:rPr> |
| 231 | + <w:rStyle w:val="Hyperlink"/> |
| 232 | + </w:rPr> |
| 233 | + <w:t>Section 4</w:t> |
| 234 | + </w:r> |
| 235 | + </w:hyperlink> |
| 236 | + <w:r> |
| 237 | + <w:t xml:space="preserve"> for more details.</w:t> |
| 238 | + </w:r> |
| 239 | + </w:p> |
| 240 | + |
| 241 | +... referring to this bookmark elsewhere in the document:: |
| 242 | + |
| 243 | + <w:p> |
| 244 | + <w:bookmarkStart w:id="0" w:name="Section_4"/> |
| 245 | + <w:r> |
| 246 | + <w:t>Section 4</w:t> |
| 247 | + </w:r> |
| 248 | + <w:bookmarkEnd w:id="0"/> |
| 249 | + </w:p> |
| 250 | + |
| 251 | + |
| 252 | +Schema excerpt |
| 253 | +-------------- |
| 254 | + |
| 255 | +.. highlight:: xml |
| 256 | + |
| 257 | +:: |
| 258 | + |
| 259 | + <xsd:complexType name="CT_P"> |
| 260 | + <xsd:sequence> |
| 261 | + <xsd:element name="pPr" type="CT_PPr" minOccurs="0"/> |
| 262 | + <xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/> |
| 263 | + </xsd:sequence> |
| 264 | + <xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/> |
| 265 | + <xsd:attribute name="rsidR" type="ST_LongHexNumber"/> |
| 266 | + <xsd:attribute name="rsidDel" type="ST_LongHexNumber"/> |
| 267 | + <xsd:attribute name="rsidP" type="ST_LongHexNumber"/> |
| 268 | + <xsd:attribute name="rsidRDefault" type="ST_LongHexNumber"/> |
| 269 | + </xsd:complexType> |
| 270 | + |
| 271 | + <xsd:group name="EG_PContent"> <!-- denormalized --> |
| 272 | + <xsd:choice> |
| 273 | + <xsd:element name="r" type="CT_R"/> |
| 274 | + <xsd:element name="hyperlink" type="CT_Hyperlink"/> |
| 275 | + <xsd:element name="fldSimple" type="CT_SimpleField"/> |
| 276 | + <xsd:element name="sdt" type="CT_SdtRun"/> |
| 277 | + <xsd:element name="customXml" type="CT_CustomXmlRun"/> |
| 278 | + <xsd:element name="smartTag" type="CT_SmartTagRun"/> |
| 279 | + <xsd:element name="dir" type="CT_DirContentRun"/> |
| 280 | + <xsd:element name="bdo" type="CT_BdoContentRun"/> |
| 281 | + <xsd:element name="subDoc" type="CT_Rel"/> |
| 282 | + <xsd:group ref="EG_RunLevelElts"/> |
| 283 | + </xsd:choice> |
| 284 | + </xsd:group> |
| 285 | + |
| 286 | + <xsd:complexType name="CT_Hyperlink"> |
| 287 | + <xsd:group ref="EG_PContent" minOccurs="0" maxOccurs="unbounded"/> |
| 288 | + <xsd:attribute name="tgtFrame" type="s:ST_String"/> |
| 289 | + <xsd:attribute name="tooltip" type="s:ST_String"/> |
| 290 | + <xsd:attribute name="docLocation" type="s:ST_String"/> |
| 291 | + <xsd:attribute name="history" type="s:ST_OnOff"/> |
| 292 | + <xsd:attribute name="anchor" type="s:ST_String"/> |
| 293 | + <xsd:attribute ref="r:id"/> |
| 294 | + </xsd:complexType> |
| 295 | + |
| 296 | + <xsd:group name="EG_RunLevelElts"> |
| 297 | + <xsd:choice> |
| 298 | + <xsd:element name="proofErr" type="CT_ProofErr"/> |
| 299 | + <xsd:element name="permStart" type="CT_PermStart"/> |
| 300 | + <xsd:element name="permEnd" type="CT_Perm"/> |
| 301 | + <xsd:element name="bookmarkStart" type="CT_Bookmark"/> |
| 302 | + <xsd:element name="bookmarkEnd" type="CT_MarkupRange"/> |
| 303 | + <xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/> |
| 304 | + <xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/> |
| 305 | + <xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/> |
| 306 | + <xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/> |
| 307 | + <xsd:element name="commentRangeStart" type="CT_MarkupRange"/> |
| 308 | + <xsd:element name="commentRangeEnd" type="CT_MarkupRange"/> |
| 309 | + <xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/> |
| 310 | + <xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/> |
| 311 | + <xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/> |
| 312 | + <xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/> |
| 313 | + <xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/> |
| 314 | + <xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/> |
| 315 | + <xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/> |
| 316 | + <xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/> |
| 317 | + <xsd:element name="ins" type="CT_RunTrackChange"/> |
| 318 | + <xsd:element name="del" type="CT_RunTrackChange"/> |
| 319 | + <xsd:element name="moveFrom" type="CT_RunTrackChange"/> |
| 320 | + <xsd:element name="moveTo" type="CT_RunTrackChange"/> |
| 321 | + <xsd:group ref="EG_MathContent" minOccurs="0" maxOccurs="unbounded"/> |
| 322 | + </xsd:choice> |
| 323 | + </xsd:group> |
| 324 | + |
| 325 | + <xsd:complexType name="CT_R"> |
| 326 | + <xsd:sequence> |
| 327 | + <xsd:group ref="EG_RPr" minOccurs="0"/> |
| 328 | + <xsd:group ref="EG_RunInnerContent" minOccurs="0" maxOccurs="unbounded"/> |
| 329 | + </xsd:sequence> |
| 330 | + <xsd:attribute name="rsidRPr" type="ST_LongHexNumber"/> |
| 331 | + <xsd:attribute name="rsidDel" type="ST_LongHexNumber"/> |
| 332 | + <xsd:attribute name="rsidR" type="ST_LongHexNumber"/> |
| 333 | + </xsd:complexType> |
| 334 | + |
| 335 | + <xsd:simpleType name="ST_OnOff"> |
| 336 | + <xsd:union memberTypes="xsd:boolean ST_OnOff1"/> |
| 337 | + </xsd:simpleType> |
| 338 | + |
| 339 | + <xsd:simpleType name="ST_OnOff1"> |
| 340 | + <xsd:restriction base="xsd:string"> |
| 341 | + <xsd:enumeration value="on"/> |
| 342 | + <xsd:enumeration value="off"/> |
| 343 | + </xsd:restriction> |
| 344 | + </xsd:simpleType> |
| 345 | + |
| 346 | + <xsd:simpleType name="ST_RelationshipId"> |
| 347 | + <xsd:restriction base="xsd:string"/> |
| 348 | + </xsd:simpleType> |
| 349 | + |
| 350 | + <xsd:simpleType name="ST_String"> |
| 351 | + <xsd:restriction base="xsd:string"/> |
| 352 | + </xsd:simpleType> |
0 commit comments