How to detect merged cells when reading tables #232

sylvain-bougnoux · 2015-11-30T15:40:27Z

Hello, I've created a simple .docx document (Word 2010) with a simple 2x2 table, the 1st col is merged as with:

+---+---+
|   | b |
+ a +---+
|   | c |
+---+---+

When reading this with python the cells (0,0) and (1,0) are different!
How can I detect that they are merged on the original document?
The text of both cell is indeed 'a'. But the 'pointer' are different.
Thanks
Python 2.7, docx 0.85

The text was updated successfully, but these errors were encountered:

sylvain-bougnoux · 2015-12-01T18:01:50Z

I've found a workaround (though the doc does not stand).
With the above table in simple.docx

from docx import Document
doc=Document("simple.docx")
table=doc.tables[0]
c00=table.cell(0,0)
c10=table.cell(1,0)
c00==c01  # is false as reported
#however
c00._tc==c01._tc # is true

Regards

oltish · 2017-05-06T20:01:43Z

It has been a while, but someone else may need this information in future.

As mentioned in previous comment, docx.table._Cell object has property _tc which contains useful information about cell's span:

from docx import Document
doc=Document("simple.docx")
table=doc.tables[0]

c=table.cell(0,0)
print(c.text, c._tc.top, c._tc.bottom, c._tc.left, c._tc.right)

c=table.cell(0,1)
print(c.text, c._tc.top, c._tc.bottom, c._tc.left, c._tc.right)

c=table.cell(1,1)
print(c.text, c._tc.top, c._tc.bottom, c._tc.left, c._tc.right)

The output will be:

a 0 2 0 1
b 0 1 1 2
c 1 2 1 2

From this you will easily understand not only if the cell is merged or not, but also its shape and size.

0-173 · 2018-10-28T15:55:23Z

I found that some docx files created by word have inconsistend grid_span attributes in a row.
When accessing a specific cell the ._cells property will not care about the rows defined by the xml but just count through the cells and assume that the cell count per row and all grid_span are correct.
In my case there were 7 columns in the table but one row contained only grid_span attribues of 1 + 1 + 2 + 2 = 6. After this row, all the succeeding rows are broken.
The workaround for detecting this has already been mentioned by @oltish.
I will provide a merge request as a fix for the _cells function.

mrlnc · 2019-07-30T17:56:10Z

The workaround from @0-173 works for me. Thanks! However, the new cell doesn't have any attributes like the "text" field.

python2
docx 0.8.10

fusted2 · 2021-08-28T09:07:51Z

c00._tc==c01._tc # is true

Thank you for your idea. I create a list, and add cell_.tc, then check to skip the 2nd+ merged cells. This is my project:

https://github.com/fusted2/Copy-Word-tables-to-Excel

ElBloque mentioned this issue Jul 31, 2017

Tables have incorrect rows' and columns' cells when there are merged cells #422

Open

0-173 mentioned this issue Oct 29, 2018

detect beginning of new row. if last row is incomplete, add empty cells. #564

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

How to detect merged cells when reading tables #232

How to detect merged cells when reading tables #232

sylvain-bougnoux commented Nov 30, 2015

sylvain-bougnoux commented Dec 1, 2015

Uh oh!

oltish commented May 6, 2017

Uh oh!

0-173 commented Oct 28, 2018

Uh oh!

mrlnc commented Jul 30, 2019 •

edited

Loading

Uh oh!

fusted2 commented Aug 28, 2021

Uh oh!

How to detect merged cells when reading tables #232

How to detect merged cells when reading tables #232

Comments

sylvain-bougnoux commented Nov 30, 2015

sylvain-bougnoux commented Dec 1, 2015

Uh oh!

oltish commented May 6, 2017

Uh oh!

0-173 commented Oct 28, 2018

Uh oh!

mrlnc commented Jul 30, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fusted2 commented Aug 28, 2021

Uh oh!

mrlnc commented Jul 30, 2019 •

edited

Loading