From f80ed8fdeb87e3429728636e36b79eafe572eba0 Mon Sep 17 00:00:00 2001
From: pchang388 <pchang388@gmail.com>
Date: Tue, 21 Nov 2023 20:47:39 -0500
Subject: [PATCH 1/5] update description of feature

---
 README.md | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/README.md b/README.md
index 76c558d..0536722 100644
--- a/README.md
+++ b/README.md
@@ -17,14 +17,14 @@ Table of Contents
 ## Background
 _Features are actively being developed. See `Future Items` section for more details. Open an issue for a feature request._
 
-This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational directory-tree layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized.
+This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized.
 
 This small project was mainly created to run as a cron job in k8s but works anywhere. This tool allows me to export my docs in markdown, or other formats like pdf. I use Bookstack's markdown editor as default instead of WYSIWYG editor and this makes my notes portable anywhere even if offline.
 
 ### Features
 What it does:
 
-- Build relationships between Bookstack `Shelves/Books/Chapters/Pages` to create a relational directory-tree layout
+- Discover and build relationships between Bookstack `Shelves/Books/Chapters/Pages` to create a relational parent-child layout
 - Export Bookstack pages and their content to a `.tgz` archive
 - Additional content for pages like their images and metadata and can be exported
 - YAML configuration file for repeatable and easy runs

From b5966688a7fce05ba586950bd3107c4ab6d8a034 Mon Sep 17 00:00:00 2001
From: pchang388 <pchang388@gmail.com>
Date: Wed, 22 Nov 2023 01:06:13 -0500
Subject: [PATCH 2/5] add ability to modify markdown files for image url to
 exported image path replacing

---
 README.md                                     |  40 +++++-
 bookstack_file_exporter/archiver/archiver.py  |  12 +-
 .../archiver/minio_archiver.py                |   1 -
 .../archiver/page_archiver.py                 | 134 +++++++++++++-----
 4 files changed, 145 insertions(+), 42 deletions(-)

diff --git a/README.md b/README.md
index 0536722..e4d77de 100644
--- a/README.md
+++ b/README.md
@@ -11,6 +11,8 @@ Table of Contents
     - [Options and descriptions](#options-and-descriptions)
     - [Environment variables](#valid-environment-variables)
 - [Backup Behavior](#backup-behavior)
+    - [Images](#images)
+    - [Modify Markdown Files](#modify-markdown-files)
 - [Object Storage](#object-storage)
     - [Minio](#minio-backups)
 
@@ -27,6 +29,7 @@ What it does:
 - Discover and build relationships between Bookstack `Shelves/Books/Chapters/Pages` to create a relational parent-child layout
 - Export Bookstack pages and their content to a `.tgz` archive
 - Additional content for pages like their images and metadata and can be exported
+- The exporter can also [Modify Markdown Files](#modify-markdown-files) to replace image links with local exported image paths for a more portable backup
 - YAML configuration file for repeatable and easy runs
 - Can be run via [Python](#run-via-pip) or [Docker](#run-via-docker)
 - Can push archives to remote object storage like [Minio](https://min.io/)
@@ -244,6 +247,7 @@ More descriptions can be found for each section below:
 | `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. If not provided, will use current run directory by default |
 | `assets` | `object` | `false` | Optional section to export additional assets from pages. |
 | `assets.export_images` | `bool` | `false` | Optional (default: `false`), export all images for a page to an `image` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout |
+| `assets.modify_markdown` | `bool` | `false` | Optional (default: `false`), modify markdown files to replace image links with local exported image paths. This requires `assets.export_images` to be `true` in order to work. See [Modify Markdown Files](#modify-markdown-files) for more information.
 | `assets.export_meta` | `bool` | `false` | Optional (default: `false`), export of metadata about the page in a json file |
 | `assets.verify_ssl` | `bool` | `false` | Optional (default: `true`), whether or not to check ssl certificates when requesting content from Bookstack host |
 | `keep_last` | `int` | `false` | Optional (default: `None`), if exporter can delete older archives. valid values are:<br>- set to `-1` if you want to delete all archives after each run (useful if you only want to upload to object storage)<br>- set to `1+` if you want to retain a certain number of archives<br>- `0` will result in no action done |
@@ -261,9 +265,12 @@ General
 - `MINIO_ACCESS_KEY`
 - `MINIO_SECRET_KEY`
 
-### Backup Behavior
+## Backup Behavior
+
+### Export File
 Backups are exported in `.tgz` format and generated based off timestamp. Export names will be in the format: `%Y-%m-%d_%H-%M-%S` (Year-Month-Day_Hour-Minute-Second). *Files are first pulled locally to create the tarball and then can be sent to object storage if needed*. Example file name: `bookstack_export_2023-09-22_07-19-54.tgz`.
 
+### General
 The exporter can also do housekeeping duties and keep a configured number of archives and delete older ones. See `keep_last` property in the [Configuration](#options-and-descriptions) section. Object storage provider configurations include their own `keep_last` property for flexibility. 
 
 For file names, `slug` names (from Bookstack API) are used, as such certain characters like `!`, `/` will be ignored and spaces replaced from page names/titles.
@@ -349,6 +356,37 @@ Empty/New Pages will be ignored since they have not been modified yet from creat
 
 You may notice some directories (books) and/or files (pages) in the archive have a random string at the end, example - `nKA`: `user-and-group-management-nKA`. This is expected and is because there were resources with the same name created in another shelve and bookstack adds a string at the end to ensure uniqueness.
 
+### Images
+
+### General
+Images will be dumped in a separate directory, `images` within the page directory it belongs to. As shown earlier:
+
+```
+bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/YKvimage.png
+bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/dwwimage.png
+bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/NzZimage.png
+bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/Mymimage.png
+```
+
+> **Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them.
+
+### Modify Markdown Files
+**To use this feature, `assets.export_images` should be set to `true`**
+
+The configuration item, `assets.modify_markdown`, can be set to `true` to modify markdown files to replace image url links with local exported image paths. This feature allows for you to make your `markdown` exports much more portable.
+
+Page (parent) -> Images (Children) relationships are created and then each image url is replaced with its own respective local export path. Example:
+```
+## before
+[![pool-topology-1.png](https://demo.bookstack/uploads/images/gallery/2023-07/scaled-1680-/pool-topology-1.png)](https://demo.bookstack/uploads/images/gallery/2023-07/pool-topology-1.png)
+
+## after
+[![pool-topology-1.png](./images/pool-topology-1.png)](https://demo.bookstack/uploads/images/gallery/2023-07/pool-topology-1.png)
+```
+This allows the image to be found locally within the export files and allow your `markdown` docs to have all the images display properly like it would normally would.
+
+**Note: This will work properly if your pages are using the notation used by Bookstack for Markdown image links, example: ` [![image alt text](Bookstack Markdown image URL link)](anchor/url link)` The `(anchor/url link)` is optional.**
+
 ## Object Storage
 Optionally, target(s) can be specified to upload generated archives to a remote location. Supported object storage providers can be found below:
 - [Minio](#minio-backups)
diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py
index d2d1f91..329343c 100644
--- a/bookstack_file_exporter/archiver/archiver.py
+++ b/bookstack_file_exporter/archiver/archiver.py
@@ -5,7 +5,7 @@
 
 from bookstack_file_exporter.exporter.node import Node
 from bookstack_file_exporter.archiver import util
-from bookstack_file_exporter.archiver.page_archiver import PageArchiver
+from bookstack_file_exporter.archiver.page_archiver import PageArchiver, ImageNode
 from bookstack_file_exporter.archiver.minio_archiver import MinioArchiver
 from bookstack_file_exporter.config_helper.remote import StorageProviderConfig
 from bookstack_file_exporter.config_helper.config_helper import ConfigNode
@@ -49,23 +49,23 @@ def get_bookstack_exports(self, page_nodes: Dict[int, Node]):
             self._get_page_files(page, page_image_meta)
             self._get_page_images(page.file_path, page_image_meta)
 
-    def _get_page_files(self, page_node: Node, image_meta: List[str]):
+    def _get_page_files(self, page_node: Node, image_meta: List[ImageNode]):
         """pull all bookstack pages into local files/tar"""
         log.debug("Exporting bookstack page data")
         self._page_archiver.archive_page(page_node, image_meta)
 
-    def _get_page_image_map(self) -> Dict[int, List[str]]:
+    def _get_page_image_map(self) -> Dict[int, ImageNode]:
         if not self._page_archiver.export_images:
             log.debug("skipping image export based on user input")
             return {}
         return self._page_archiver.get_image_meta()
 
-    def _get_page_images(self, page_path: str, urls: List[str]):
-        if not urls:
+    def _get_page_images(self, page_path: str, img_nodes: List[ImageNode]):
+        if not img_nodes:
             log.debug("page has no images to pull")
             return
         log.debug("Exporting bookstack page images")
-        self._page_archiver.archive_page_images(page_path, urls)
+        self._page_archiver.archive_page_images(page_path, img_nodes)
 
     def create_archive(self):
         """create tgz archive"""
diff --git a/bookstack_file_exporter/archiver/minio_archiver.py b/bookstack_file_exporter/archiver/minio_archiver.py
index 27b11e4..ccd8598 100644
--- a/bookstack_file_exporter/archiver/minio_archiver.py
+++ b/bookstack_file_exporter/archiver/minio_archiver.py
@@ -96,7 +96,6 @@ def _get_stale_objects(self, file_extension: str) -> List[MinioObject]:
             # last copy that remains if local is deleted
             log.debug("Minio 'keep_last' set to negative number, ignoring")
             return []
-        # keep_last > 0 condition
         to_delete = []
         if len(minio_objects) > self.keep_last:
             log.debug("Number of minio objects is greater than 'keep_last'")
diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py
index 3fe9305..7537362 100644
--- a/bookstack_file_exporter/archiver/page_archiver.py
+++ b/bookstack_file_exporter/archiver/page_archiver.py
@@ -31,6 +31,61 @@
 # _MARKDOWN_IMAGE_REGEX= re.compile(r"\[\!\[^$|.*\].*\]")
 _MARKDOWN_STR_CHECK = "markdown"
 
+class ImageNode:
+    """
+    ImageNode provides metadata and convenience for Bookstack images.
+
+    Args:
+        :img_meta_data: <Dict[str, Union[int, str]> = image meta data
+
+    Returns:
+        :ImageNode: instance with attributes to help handle images.
+    """
+    def __init__(self, img_meta_data: Dict[str, Union[int, str]]):
+        self.id: int = img_meta_data['id']
+        self.page_id:  int = img_meta_data['uploaded_to']
+        self.url: str = img_meta_data['url']
+        self.name: str = self._get_image_name()
+        self._markdown_str = ""
+        self.image_relative_path: str = f"./{_IMAGE_DIR_NAME}/{self.name}"
+
+    def _get_image_name(self) -> str:
+        return self.url.split('/')[-1]
+    
+    # def _get_markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str:
+    #     if 'content' in img_details:
+    #         if _MARKDOWN_STR_CHECK in img_details['content']:
+    #             print(img_details['content'][_MARKDOWN_STR_CHECK])
+    #             return self._get_md_url_str(img_details['content'][_MARKDOWN_STR_CHECK])
+    #     return ""
+
+    @property
+    def markdown_str(self):
+        return self._markdown_str
+    
+    def set_markdown_content(self, img_details: Dict[str, Union[int, str]]):
+        self._markdown_str = self._get_md_url_str(img_details)
+    # @markdown_str.setter
+    # def markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str:
+    #     self._markdown_str = self._get_md_url_str(img_details)
+        
+
+    def get_replace_str(self) -> str:
+        """return str for regex replace in page md content"""
+        # return f"[![{self.name}]({self.image_relative_path})]"
+        return self.image_relative_path
+
+    @staticmethod
+    def _get_md_url_str(img_data: Dict[str, Union[int, str]]) -> str:
+        url_str = ""
+        if 'content' in img_data:
+            if _MARKDOWN_STR_CHECK in img_data['content']:
+                url_str = img_data['content'][_MARKDOWN_STR_CHECK]
+        # check to see if empty before doing find
+        if not url_str:
+            return ""
+        return url_str[url_str.find("(")+1:url_str.find(")")]
+
 # pylint: disable=too-many-instance-attributes
 class PageArchiver:
     """
@@ -75,13 +130,12 @@ def archive_page(self, page: Node,
             self._archive_page_meta(page.name, page.file_path, page.meta)
 
     def _archive_page(self, page: Node, export_format: str, data: bytes,
-                      image_urls: List[str] = None):
+                      image_nodes: List[ImageNode] = None):
         page_file_name = f"{self.archive_base_path}/" \
             f"{page.file_path}/{page.name}{_FILE_EXTENSION_MAP[export_format]}"
-        
-        # note yet implemented
-        # if export_format == _MARKDOWN_STR_CHECK and image_urls and self.modify_md:
-        #     data = self._update_image_links(data, image_urls)
+        # not yet implemented
+        if self.modify_md and export_format == _MARKDOWN_STR_CHECK and image_nodes:
+            data = self._update_image_links(data, image_nodes)
         self.write_data(page_file_name, data)
 
     def _get_page_data(self, page_id: int, export_format: str):
@@ -96,7 +150,7 @@ def _archive_page_meta(self, page_name: str, page_path: str,
         bytes_meta = archiver_util.get_json_bytes(meta_data)
         self.write_data(file_path=meta_file_name, data=bytes_meta)
 
-    def get_image_meta(self) -> Dict[int, List[str]]:
+    def get_image_meta(self) -> Dict[int, List[ImageNode]]:
         """Get all image metadata into a {page_number: [image_url]} format"""
         img_meta_response: Response = common_util.http_get_request(
             self.api_urls['images'],
@@ -105,28 +159,14 @@ def get_image_meta(self) -> Dict[int, List[str]]:
         img_meta_json = img_meta_response.json()['data']
         return self._create_image_map(img_meta_json)
 
-    @staticmethod
-    def _create_image_map(json_data: List[Dict[str, Union[str,int]]]) -> Dict[int, List[str]]:
-        image_page_map = {}
-        for image_node in json_data:
-            image_page_id = image_node['uploaded_to']
-            image_url = image_node['url']
-            if image_page_id in image_page_map:
-                image_page_map[image_page_id].append(image_url)
-            else:
-                image_page_map[image_page_id] = [image_url]
-        return image_page_map
-
-    def archive_page_images(self, page_path: str, image_urls: List[str]):
+    def archive_page_images(self, page_path: str, image_nodes: List[ImageNode]):
         """pull images locally into a directory based on page"""
         # image_base_path = f"{self.archive_base_path}/{page_path}{_IMAGE_DIR_SUFFIX}"
         image_base_path = f"{self.archive_base_path}/{page_path}/{_IMAGE_DIR_NAME}"
-        for image_url in image_urls:
-            img_data: bytes = archiver_util.get_byte_response(image_url, self._headers,
+        for img_node in image_nodes:
+            img_data: bytes = archiver_util.get_byte_response(img_node.url, self._headers,
                                                               self.verify_ssl)
-            # seems safer to use this instead of image['name'] field
-            img_file_name = image_url.split('/')[-1]
-            image_path = f"{image_base_path}/{img_file_name}"
+            image_path = f"{image_base_path}/{img_node.name}"
             self.write_data(image_path, img_data)
 
     def write_data(self, file_path: str, data: bytes):
@@ -142,19 +182,32 @@ def gzip_archive(self):
         """provide the tar to gzip and the name of the gzip output file"""
         archiver_util.create_gzip(self.tar_file, self.archive_file)
 
-    def _update_image_links(self, page_data: bytes, urls: List[str]) -> bytes:
+    def _update_image_links(self, page_data: bytes, image_nodes: List[ImageNode]) -> bytes:
         """regex replace links to local created directories"""
-        # 1 - what to replace, 2 - replace with, 3 is the data to replace
-        # re.sub(b'pfsense', b'lol', x.content)
+        for img_node in image_nodes:
+            img_meta_url = f"{self.api_urls["images"]}/{img_node.id}"
+            img_details = common_util.http_get_request(img_meta_url,
+                                                         self._headers, self.verify_ssl)
+            
+            img_node.set_markdown_content(img_details.json())
+            if not img_node.markdown_str:
+                continue
+
+            # re_pattern_bytes = self._get_regex_expr(img_node.markdown_str)
+
+            # re_pattern_bytes = self._get_regex_expr(img_node.url)
+
+            # 1 - what to replace, 2 - replace with, 3 is the data to replace
+            # re.sub(b'pfsense', b'lol', x.content)
+            print(img_node.markdown_str)
+            print(img_node.get_replace_str())
+            page_data = re.sub(img_node.markdown_str.encode(), img_node.get_replace_str().encode(), page_data)
+        # print(page_data)
+        return page_data
 
         # string to bytes
         # >>> k = 'lol'
         # >>> k.encode()
-        pass
-
-    def _valid_image_link(self):
-        """should contain bookstack host"""
-        pass
 
     @property
     def file_extension_map(self) -> Dict[str, str]:
@@ -171,6 +224,19 @@ def verify_ssl(self) -> bool:
         """return whether or not to verify ssl for http requests"""
         return self.asset_config.verify_ssl
 
+    # @staticmethod
+    # def _get_regex_expr(image_str: str) -> bytes:
+    #     # regex_str = fr"\[\!\[^$|.*\]\({url}\)\]"
+    #     # print(regex_str)
+    #     return re.compile(image_str.encode())
+
     @staticmethod
-    def _get_regex_expr(url: str) -> re.Pattern:
-        return re.compile(fr"\[\!\[^$|.*\].*{url}.*\]")
+    def _create_image_map(json_data: List[Dict[str, Union[str,int]]]) -> Dict[int, List[ImageNode]]:
+        image_page_map = {}
+        for img_meta in json_data:
+            img_node = ImageNode(img_meta)
+            if img_node.page_id in image_page_map:
+                image_page_map[img_node.page_id].append(img_node)
+            else:
+                image_page_map[img_node.page_id] = [img_node]
+        return image_page_map

From f3cbf8b772fb5870d4bf1c9444832af4a4b37326 Mon Sep 17 00:00:00 2001
From: pchang388 <pchang388@gmail.com>
Date: Wed, 22 Nov 2023 01:13:02 -0500
Subject: [PATCH 3/5] code clean up

---
 .../archiver/page_archiver.py                 | 44 ++++---------------
 1 file changed, 9 insertions(+), 35 deletions(-)

diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py
index 7537362..df1d572 100644
--- a/bookstack_file_exporter/archiver/page_archiver.py
+++ b/bookstack_file_exporter/archiver/page_archiver.py
@@ -47,17 +47,14 @@ def __init__(self, img_meta_data: Dict[str, Union[int, str]]):
         self.url: str = img_meta_data['url']
         self.name: str = self._get_image_name()
         self._markdown_str = ""
-        self.image_relative_path: str = f"./{_IMAGE_DIR_NAME}/{self.name}"
+        self._image_relative_path: str = f"./{_IMAGE_DIR_NAME}/{self.name}"
 
     def _get_image_name(self) -> str:
         return self.url.split('/')[-1]
-    
-    # def _get_markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str:
-    #     if 'content' in img_details:
-    #         if _MARKDOWN_STR_CHECK in img_details['content']:
-    #             print(img_details['content'][_MARKDOWN_STR_CHECK])
-    #             return self._get_md_url_str(img_details['content'][_MARKDOWN_STR_CHECK])
-    #     return ""
+
+    @property
+    def image_relative_path(self):
+        return self._image_relative_path
 
     @property
     def markdown_str(self):
@@ -65,15 +62,6 @@ def markdown_str(self):
     
     def set_markdown_content(self, img_details: Dict[str, Union[int, str]]):
         self._markdown_str = self._get_md_url_str(img_details)
-    # @markdown_str.setter
-    # def markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str:
-    #     self._markdown_str = self._get_md_url_str(img_details)
-        
-
-    def get_replace_str(self) -> str:
-        """return str for regex replace in page md content"""
-        # return f"[![{self.name}]({self.image_relative_path})]"
-        return self.image_relative_path
 
     @staticmethod
     def _get_md_url_str(img_data: Dict[str, Union[int, str]]) -> str:
@@ -185,30 +173,17 @@ def gzip_archive(self):
     def _update_image_links(self, page_data: bytes, image_nodes: List[ImageNode]) -> bytes:
         """regex replace links to local created directories"""
         for img_node in image_nodes:
-            img_meta_url = f"{self.api_urls["images"]}/{img_node.id}"
+            img_meta_url = f"{self.api_urls['images']}/{img_node.id}"
             img_details = common_util.http_get_request(img_meta_url,
                                                          self._headers, self.verify_ssl)
             
             img_node.set_markdown_content(img_details.json())
             if not img_node.markdown_str:
                 continue
-
-            # re_pattern_bytes = self._get_regex_expr(img_node.markdown_str)
-
-            # re_pattern_bytes = self._get_regex_expr(img_node.url)
-
             # 1 - what to replace, 2 - replace with, 3 is the data to replace
-            # re.sub(b'pfsense', b'lol', x.content)
-            print(img_node.markdown_str)
-            print(img_node.get_replace_str())
-            page_data = re.sub(img_node.markdown_str.encode(), img_node.get_replace_str().encode(), page_data)
-        # print(page_data)
+            page_data = re.sub(img_node.markdown_str.encode(), img_node.image_relative_path().encode(), page_data)
         return page_data
 
-        # string to bytes
-        # >>> k = 'lol'
-        # >>> k.encode()
-
     @property
     def file_extension_map(self) -> Dict[str, str]:
         """file extension metadata"""
@@ -225,10 +200,9 @@ def verify_ssl(self) -> bool:
         return self.asset_config.verify_ssl
 
     # @staticmethod
-    # def _get_regex_expr(image_str: str) -> bytes:
+    # def _get_regex_expr(url: str) -> bytes:
     #     # regex_str = fr"\[\!\[^$|.*\]\({url}\)\]"
-    #     # print(regex_str)
-    #     return re.compile(image_str.encode())
+    #     return re.compile(regex_str.encode())
 
     @staticmethod
     def _create_image_map(json_data: List[Dict[str, Union[str,int]]]) -> Dict[int, List[ImageNode]]:

From 39214aaef85debf07c5028934815b07c0a3205a0 Mon Sep 17 00:00:00 2001
From: pchang388 <pchang388@gmail.com>
Date: Wed, 22 Nov 2023 06:26:48 +0000
Subject: [PATCH 4/5] more code clean up

---
 .../archiver/page_archiver.py                    | 14 ++++++--------
 bookstack_file_exporter/archiver/util.py         | 16 ----------------
 2 files changed, 6 insertions(+), 24 deletions(-)

diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py
index df1d572..e159bbb 100644
--- a/bookstack_file_exporter/archiver/page_archiver.py
+++ b/bookstack_file_exporter/archiver/page_archiver.py
@@ -3,13 +3,11 @@
 # pylint: disable=import-error
 from requests import Response
 
-
 from bookstack_file_exporter.exporter.node import Node
 from bookstack_file_exporter.archiver import util as archiver_util
 from bookstack_file_exporter.config_helper.config_helper import ConfigNode
 from bookstack_file_exporter.common import util as common_util
 
-
 _META_FILE_SUFFIX = "_meta.json"
 _TAR_SUFFIX = ".tar"
 _TAR_GZ_SUFFIX = ".tgz"
@@ -26,9 +24,7 @@
     "tgz": _TAR_GZ_SUFFIX
 }
 
-
 _IMAGE_DIR_NAME = "images"
-# _MARKDOWN_IMAGE_REGEX= re.compile(r"\[\!\[^$|.*\].*\]")
 _MARKDOWN_STR_CHECK = "markdown"
 
 class ImageNode:
@@ -54,13 +50,16 @@ def _get_image_name(self) -> str:
 
     @property
     def image_relative_path(self):
+        """return image path local to page directory"""
         return self._image_relative_path
 
     @property
     def markdown_str(self):
+        """return markdown url str to replace"""
         return self._markdown_str
-    
+
     def set_markdown_content(self, img_details: Dict[str, Union[int, str]]):
+        """provide image metadata to set markdown properties"""
         self._markdown_str = self._get_md_url_str(img_details)
 
     @staticmethod
@@ -121,7 +120,6 @@ def _archive_page(self, page: Node, export_format: str, data: bytes,
                       image_nodes: List[ImageNode] = None):
         page_file_name = f"{self.archive_base_path}/" \
             f"{page.file_path}/{page.name}{_FILE_EXTENSION_MAP[export_format]}"
-        # not yet implemented
         if self.modify_md and export_format == _MARKDOWN_STR_CHECK and image_nodes:
             data = self._update_image_links(data, image_nodes)
         self.write_data(page_file_name, data)
@@ -176,12 +174,12 @@ def _update_image_links(self, page_data: bytes, image_nodes: List[ImageNode]) ->
             img_meta_url = f"{self.api_urls['images']}/{img_node.id}"
             img_details = common_util.http_get_request(img_meta_url,
                                                          self._headers, self.verify_ssl)
-            
             img_node.set_markdown_content(img_details.json())
             if not img_node.markdown_str:
                 continue
             # 1 - what to replace, 2 - replace with, 3 is the data to replace
-            page_data = re.sub(img_node.markdown_str.encode(), img_node.image_relative_path().encode(), page_data)
+            page_data = re.sub(img_node.markdown_str.encode(),
+                               img_node.image_relative_path.encode(), page_data)
         return page_data
 
     @property
diff --git a/bookstack_file_exporter/archiver/util.py b/bookstack_file_exporter/archiver/util.py
index bda4c1a..0980fb6 100644
--- a/bookstack_file_exporter/archiver/util.py
+++ b/bookstack_file_exporter/archiver/util.py
@@ -27,13 +27,6 @@ def write_tar(base_tar_dir: str, file_path: str, data: bytes):
         log.debug("Adding file: %s with size: %d bytes to tar file", tar_info.name, tar_info.size)
         tar.addfile(tar_info, fileobj=data_obj)
 
-# create files first for manipulation/changes and tar later
-# def write_file(file_path: str, data: bytes):
-#     """write byte data to a local file"""
-#     os.makedirs(os.path.dirname(file_path), exist_ok=True)
-#     with open(file_path, 'wb') as file_obj:
-#         file_obj.write(data)
-
 def get_json_bytes(data: Dict[str, Union[str, int]]) -> bytes:
     """dump dict to json file"""
     return json.dumps(data, indent=4).encode('utf-8')
@@ -55,12 +48,3 @@ def scan_archives(base_dir: str, extension: str) -> str:
     """scan export directory for archives"""
     file_pattern = f"{base_dir}_*{extension}"
     return glob.glob(file_pattern)
-
-# def find_file_matches(file_path: str, regex_expr: re.Pattern) -> List[str]:
-#     """find all matching lines for regex pattern"""
-#     matches=[]
-#     with open(file_path, encoding="utf-8") as open_file:
-#         for line in open_file:
-#             for match in re.finditer(regex_expr, line):
-#                 matches.append(match.group)
-#     return matches

From 319d69a45445f2a1d74f9edebaab703cc28e0e17 Mon Sep 17 00:00:00 2001
From: pchang388 <pchang388@gmail.com>
Date: Wed, 22 Nov 2023 01:36:41 -0500
Subject: [PATCH 5/5] bump version and prep docs for new release

---
 Makefile                  | 2 +-
 README.md                 | 8 ++++++--
 examples/config.yml       | 3 +++
 examples/minio_config.yml | 5 ++++-
 4 files changed, 14 insertions(+), 4 deletions(-)

diff --git a/Makefile b/Makefile
index 457f080..5d9e5f0 100644
--- a/Makefile
+++ b/Makefile
@@ -4,7 +4,7 @@ BASE_IMAGE_TAG=3.12-slim-bookworm
 IMAGE_NAME=homeylab/bookstack-file-exporter
 # keep this start sequence unique (IMAGE_TAG=)
 # github actions will use this to create a tag
-IMAGE_TAG=0.0.3
+IMAGE_TAG=1.0.0
 DOCKER_WORK_DIR=/export
 DOCKER_CONFIG_DIR=/export/config
 DOCKER_EXPORT_DIR=/export/dump
diff --git a/README.md b/README.md
index e4d77de..874f691 100644
--- a/README.md
+++ b/README.md
@@ -15,6 +15,7 @@ Table of Contents
     - [Modify Markdown Files](#modify-markdown-files)
 - [Object Storage](#object-storage)
     - [Minio](#minio-backups)
+- [Future Items](#future-items)
 
 ## Background
 _Features are actively being developed. See `Future Items` section for more details. Open an issue for a feature request._
@@ -76,6 +77,7 @@ formats:
 output_path: "bkps/"
 assets:
     export_images: false
+    modify_markdown: false
     export_meta: false
     verify_ssl: true
 ```
@@ -196,6 +198,7 @@ formats:
 output_path: "bkps/"
 assets:
     export_images: false
+    modify_markdown: false
     export_meta: false
     verify_ssl: true
  ```
@@ -228,6 +231,7 @@ minio:
 output_path: "bkps/"
 assets:
   export_images: true
+  modify_markdown: false
   export_meta: false
   verify_ssl: true
 keep_last: 5
@@ -368,7 +372,7 @@ bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/NzZimage.pn
 bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/Mymimage.png
 ```
 
-> **Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them.
+**Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them.**
 
 ### Modify Markdown Files
 **To use this feature, `assets.export_images` should be set to `true`**
@@ -426,7 +430,7 @@ minio:
 ## Future Items
 1. ~~Be able to pull images locally and place in their respective page folders for a more complete file level backup.~~
 2. ~~Include the exporter in a maintained helm chart as an optional deployment. The helm chart is [here](https://github.com/homeylab/helm-charts/tree/main/charts/bookstack).~~
-3. Be able to modify markdown links of images to local exported images in their respective page folders for a more complete file level backup.
+3. ~~Be able to modify markdown links of images to local exported images in their respective page folders for a more complete file level backup.~~
 4. Be able to pull attachments locally and place in their respective page folders for a more complete file level backup.
 5. Export S3 and more options.
 6. Filter shelves and books by name - for more targeted backups. Example: you only want to share a book about one topic with an external friend/user.
diff --git a/examples/config.yml b/examples/config.yml
index c02fb1e..2a12cb1 100644
--- a/examples/config.yml
+++ b/examples/config.yml
@@ -29,6 +29,9 @@ assets:
   # optional export of all the images used in a page(s).
   # omit this or set to false if not needed
   export_images: false
+  # optional modify markdown files to replace image url links
+  # with local exported image paths
+  modify_markdown: false
   ## optional export of metadata about the page in a json file
   # this metadata contains general information about the page
   # like: last update, owner, revision count, etc.
diff --git a/examples/minio_config.yml b/examples/minio_config.yml
index 0e83e70..692cacd 100644
--- a/examples/minio_config.yml
+++ b/examples/minio_config.yml
@@ -58,7 +58,10 @@ assets:
   # optional export of all the images used in a page(s).
   # omit this or set to false if not needed
   export_images: false
-  ## optional export of metadata about the page in a json file
+  # optional modify markdown files to replace image url links
+  # with local exported image paths
+  modify_markdown: false
+  # optional export of metadata about the page in a json file
   # this metadata contains general information about the page
   # like: last update, owner, revision count, etc.
   # omit this or set to false if not needed