Skip to content

Extracting hyperlinks from table cell #925

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
divyeshlad18 opened this issue Feb 9, 2021 · 2 comments
Open

Extracting hyperlinks from table cell #925

divyeshlad18 opened this issue Feb 9, 2021 · 2 comments
Labels
hyperlink Read and write hyperlinks in paragraph

Comments

@divyeshlad18
Copy link

divyeshlad18 commented Feb 9, 2021

Hello Everyone,

I'm trying to extract the text from the table cells in word and populating them into pandas DataFrame. I was successfully able to do that mainly with the help of this code:


document = Document(path_to_your_docx)
tables = document.tables
for table in tables:
    for row in table.rows:
        for cell in row.cells:
            for paragraph in cell.paragraphs:
                print(paragraph.text)

Thanks to @scanny

However, I get empty text when hyperlinks are encountered in the cell.

Alternatively, I'm able to extract all the hyperlinks from the document using this code:

rels = document_name.part.rels
for rel in rels:
    if rels[rel].reltype == RT.HYPERLINK:
        print( rels[rel]._target)

But I would much rather prefer extracting them using cells object, this would allow me to place the hyperlinks corresponding to the row they belong.

Any help is appreciated !!!

Many Thanks,
Divyesh

@TylerReedMC
Copy link

I'm having this exact problem. Any word on a possible resolution?

@scanny
Copy link
Contributor

scanny commented Sep 20, 2021

The solution here might be of help: #85 (comment)

@scanny scanny added the hyperlink Read and write hyperlinks in paragraph label Sep 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hyperlink Read and write hyperlinks in paragraph
Projects
None yet
Development

No branches or pull requests

3 participants