Skip to content

Commit 7d418f5

Browse files
committed
add converting pdf to docx tutorial
1 parent 08ebfb5 commit 7d418f5

File tree

5 files changed

+35
-0
lines changed

5 files changed

+35
-0
lines changed

README.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
9595
- [How to Watermark PDF Files in Python](https://www.thepythoncode.com/article/watermark-in-pdf-using-python). ([code](general/add-watermark-pdf))
9696
- [Highlighting Text in PDF with Python](https://www.thepythoncode.com/article/redact-and-highlight-text-in-pdf-with-python). ([code](handling-pdf-files/highlight-redact-text))
9797
- [How to Extract Text from Images in PDF Files with Python](https://www.thepythoncode.com/article/extract-text-from-images-or-scanned-pdf-python). ([code](handling-pdf-files/pdf-ocr))
98+
- [How to Convert PDF to Docx in Python](https://www.thepythoncode.com/article/convert-pdf-files-to-docx-in-python). ([code](handling-pdf-files/convert-pdf-to-docx))
9899

99100

100101
- ### [Web Scraping](https://www.thepythoncode.com/topic/web-scraping)
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
# [How to Convert PDF to Docx in Python](https://www.thepythoncode.com/article/convert-pdf-files-to-docx-in-python)
2+
To run this:
3+
- `pip3 install -r requirements.txt`
4+
- To convert `letter.pdf` to `letter.docx`, run:
5+
```
6+
$ python convert_pdf2docx.py letter.pdf letter.docx
7+
```
Lines changed: 26 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
# Import Libraries
2+
from pdf2docx import parse
3+
from typing import Tuple
4+
5+
6+
def convert_pdf2docx(input_file: str, output_file: str, pages: Tuple = None):
7+
"""Converts pdf to docx"""
8+
if pages:
9+
pages = [int(i) for i in list(pages) if i.isnumeric()]
10+
result = parse(pdf_file=input_file,
11+
docx_with_path=output_file, pages=pages)
12+
summary = {
13+
"File": input_file, "Pages": str(pages), "Output File": output_file
14+
}
15+
# Printing Summary
16+
print("## Summary ########################################################")
17+
print("\n".join("{}:{}".format(i, j) for i, j in summary.items()))
18+
print("###################################################################")
19+
return result
20+
21+
22+
if __name__ == "__main__":
23+
import sys
24+
input_file = sys.argv[1]
25+
output_file = sys.argv[2]
26+
convert_pdf2docx(input_file, output_file)
Binary file not shown.
Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1 @@
1+
pdf2docx==0.5.1

0 commit comments

Comments
 (0)