Skip to content

Exception during table parsing when borders are not aligned #1146

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
radoslav006 opened this issue Oct 21, 2022 · 4 comments
Closed

Exception during table parsing when borders are not aligned #1146

radoslav006 opened this issue Oct 21, 2022 · 4 comments

Comments

@radoslav006
Copy link

The exception is raised when parsing table with not aligned borders.

Library version: 0.8.11

Table picture:
obraz

Xml:
err.xml.txt

Code:

from docx import Document


if __name__ == '__main__':
    doc = Document('doc.docx')
    for table in doc.tables:
        for row in table.rows:
            for cell in row.cells:
                print(cell.text)

Exception:

  File "C:\Python39\lib\site-packages\docx\table.py", line 173, in _cells 
    cells.append(cells[-col_count])
IndexError: list index out of range
@djplaner
Copy link

FWIW, I've stumbled across a similar issue due to the same bad practice in the HTML (rows where the number of cells in the row exceeded the initial default)

The problem appears to be that the cell function in the tables.py file doesn't check if the cell exists before trying to access it. I've modified my local version to the following.

Check the index doesn't exceed the array, if it does return None. Depending on how you're using this, you may need to make further changes upstream.

    def cell(self, row_idx, col_idx):
        """
        Return |_Cell| instance correponding to table cell at *row_idx*,
        *col_idx* intersection, where (0, 0) is the top, left-most cell.
        """
        cell_idx = col_idx + (row_idx * self._column_count)
        if cell_idx >= len(self._cells):
            return None
        return self._cells[cell_idx]

@tonal
Copy link

tonal commented Feb 21, 2023

try #881 (comment)

@radoslav006
Copy link
Author

try #881 (comment)

Thanks for a workaround, maybe someone find it helpful, but in the end this need to be fixed within the library.

@scanny
Copy link
Contributor

scanny commented Apr 29, 2024

Fixed in v1.1.1 circa Apr 30, 2024.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants