MartinTonne
diff --git a/‎README.md
Lines changed: 1 addition & 0 deletions b/‎README.md
Lines changed: 1 addition & 0 deletions
diff --git a/‎handling-pdf-files/pdf-ocr/README.md
Lines changed: 32 additions & 0 deletions b/‎handling-pdf-files/pdf-ocr/README.md
Lines changed: 32 additions & 0 deletions
diff --git a/‎handling-pdf-files/pdf-ocr/example-image-containing-text.jpg
152 KB b/‎handling-pdf-files/pdf-ocr/example-image-containing-text.jpg
152 KB
diff --git a/‎handling-pdf-files/pdf-ocr/image.pdf
162 KB b/‎handling-pdf-files/pdf-ocr/image.pdf
162 KB
@@ -94,6 +94,7 @@ This is a repository of all the tutorials of [The Python Code](https://www.thepy
     - [How to Create a Watchdog in Python](https://www.thepythoncode.com/article/create-a-watchdog-in-python). ([code](general/directory-watcher))
     - [How to Watermark PDF Files in Python](https://www.thepythoncode.com/article/watermark-in-pdf-using-python). ([code](general/add-watermark-pdf))
     - [Highlighting Text in PDF with Python](https://www.thepythoncode.com/article/redact-and-highlight-text-in-pdf-with-python). ([code](handling-pdf-files/highlight-redact-text))
+    - [How to Extract Text from Images in PDF Files with Python](https://www.thepythoncode.com/article/extract-text-from-images-or-scanned-pdf-python). ([code](handling-pdf-files/highlight-redact-text))
 
 
 - ### [Web Scraping](https://www.thepythoncode.com/topic/web-scraping)
 
@@ -0,0 +1,32 @@
+# [How to Extract Text from Images in PDF Files with Python](https://www.thepythoncode.com/article/extract-text-from-images-or-scanned-pdf-python)
+To run this:
+- `pip3 install -r requirements.txt`
+-
+    ```
+    $ python pdf_ocr.py --help
+    ```
+
+    **Output:**
+    ```
+    usage: pdf_ocr.py [-h] -i INPUT_PATH [-a {Highlight,Redact}] [-s SEARCH_STR] [-p PAGES] [-g]
+
+    Available Options
+
+    optional arguments:
+    -h, --help            show this help message and exit
+    -i INPUT_PATH, --input-path INPUT_PATH
+                            Enter the path of the file or the folder to process
+    -a {Highlight,Redact}, --action {Highlight,Redact}
+                            Choose to highlight or to redact
+    -s SEARCH_STR, --search-str SEARCH_STR
+                            Enter a valid search string
+    -p PAGES, --pages PAGES
+                            Enter the pages to consider in the PDF file, e.g. (0,1)
+    -g, --generate-output
+                            Generate text content in a CSV file
+    ```
+- To extract text from scanned image in `image.pdf` file:
+    ```
+    $ python pdf_ocr.py -s "BERT" -i image.pdf -o output.pdf --generate-output -a Highlight
+    ```
+    Passing `-s` to search for the keyword, `-i` is to pass the input file, `-o` is to pass output PDF file, `--generate-output` or `-g` to generate CSV file containing all extract text from all images in the PDF file, and `-a` for specifiying the action to perform in the output PDF file, "Highlight" will highlight the target keyword, you can also pass "Redact" to redact the text instead.