Return Format: Output - Body (I) (J) (K) (L)
Return Format: Output - Body (I) (J) (K) (L)
Return Format: Output - Body (I) (J) (K) (L)
Some structure will be maintained. Text will be returned in a nested list, with paragraphs
always at depth 4 (i.e., output.body[i][j][k][l] will be a paragraph).
If your docx has no tables, output.body will appear as one a table with all content in one cell:
[ # document
[ # table
[ # row
[ # cell
"Paragraph 1",
"Paragraph 2",
" a) sublist",
]
]
Table cells will appear as table cells. Text outside tables will appear as table cells.
A docx document can be tables within tables within tables. Docx2Python flattens most of this
to more easily navigate within the content.
def remove_empty_paragraphs(tables):
>>> remove_empty_paragraphs(tables)
* docx_to_text_output.document
* docx_to_text_output.body
"""