You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a disclaimer, I'm very new to the DOCX XML syntax and structure and am learning as I go, so I may be describing this issue with atypical terms.
Without going into great detail, I'm parsing several DOCX files that are close to 10 years old (or older) by instantiating them one at a time as a Document(), creating a global variable with the Document's element.body(), and iterating through it with iterchildren() -- parsing each document line by line (or using DOCX terminology, paragraph by paragraph). In one of these documents, there are two HTTPS URLs as hyperlinks on two different lines relatively close to one another. However, one of these hyperlinks isn't being recognized as text (as in, the local variable named "text" in the children elements is None) while the other is. The raw XML of these two lines is below.
This is the line with a hyperlink that isn't being recognized as text:
Due to the obvious difference in how the underlying XML of these two lines are constructed, I'm not sure if this should be logged as a bug against this project or if it's an issue with the document. Attached is a screenshot of the relevent portion of the document.
The text was updated successfully, but these errors were encountered:
As a disclaimer, I'm very new to the DOCX XML syntax and structure and am learning as I go, so I may be describing this issue with atypical terms.
Without going into great detail, I'm parsing several DOCX files that are close to 10 years old (or older) by instantiating them one at a time as a Document(), creating a global variable with the Document's element.body(), and iterating through it with iterchildren() -- parsing each document line by line (or using DOCX terminology, paragraph by paragraph). In one of these documents, there are two HTTPS URLs as hyperlinks on two different lines relatively close to one another. However, one of these hyperlinks isn't being recognized as text (as in, the local variable named "text" in the children elements is None) while the other is. The raw XML of these two lines is below.
This is the line with a hyperlink that isn't being recognized as text:
This is the hyperlink that is parsed as text without issue:
Due to the obvious difference in how the underlying XML of these two lines are constructed, I'm not sure if this should be logged as a bug against this project or if it's an issue with the document. Attached is a screenshot of the relevent portion of the document.

The text was updated successfully, but these errors were encountered: