From f80ed8fdeb87e3429728636e36b79eafe572eba0 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Tue, 21 Nov 2023 20:47:39 -0500 Subject: [PATCH 1/5] update description of feature --- README.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 76c558d..0536722 100644 --- a/README.md +++ b/README.md @@ -17,14 +17,14 @@ Table of Contents ## Background _Features are actively being developed. See `Future Items` section for more details. Open an issue for a feature request._ -This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational directory-tree layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. +This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. This small project was mainly created to run as a cron job in k8s but works anywhere. This tool allows me to export my docs in markdown, or other formats like pdf. I use Bookstack's markdown editor as default instead of WYSIWYG editor and this makes my notes portable anywhere even if offline. ### Features What it does: -- Build relationships between Bookstack `Shelves/Books/Chapters/Pages` to create a relational directory-tree layout +- Discover and build relationships between Bookstack `Shelves/Books/Chapters/Pages` to create a relational parent-child layout - Export Bookstack pages and their content to a `.tgz` archive - Additional content for pages like their images and metadata and can be exported - YAML configuration file for repeatable and easy runs From b5966688a7fce05ba586950bd3107c4ab6d8a034 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 22 Nov 2023 01:06:13 -0500 Subject: [PATCH 2/5] add ability to modify markdown files for image url to exported image path replacing --- README.md | 40 +++++- bookstack_file_exporter/archiver/archiver.py | 12 +- .../archiver/minio_archiver.py | 1 - .../archiver/page_archiver.py | 134 +++++++++++++----- 4 files changed, 145 insertions(+), 42 deletions(-) diff --git a/README.md b/README.md index 0536722..e4d77de 100644 --- a/README.md +++ b/README.md @@ -11,6 +11,8 @@ Table of Contents - [Options and descriptions](#options-and-descriptions) - [Environment variables](#valid-environment-variables) - [Backup Behavior](#backup-behavior) + - [Images](#images) + - [Modify Markdown Files](#modify-markdown-files) - [Object Storage](#object-storage) - [Minio](#minio-backups) @@ -27,6 +29,7 @@ What it does: - Discover and build relationships between Bookstack `Shelves/Books/Chapters/Pages` to create a relational parent-child layout - Export Bookstack pages and their content to a `.tgz` archive - Additional content for pages like their images and metadata and can be exported +- The exporter can also [Modify Markdown Files](#modify-markdown-files) to replace image links with local exported image paths for a more portable backup - YAML configuration file for repeatable and easy runs - Can be run via [Python](#run-via-pip) or [Docker](#run-via-docker) - Can push archives to remote object storage like [Minio](https://min.io/) @@ -244,6 +247,7 @@ More descriptions can be found for each section below: | `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. If not provided, will use current run directory by default | | `assets` | `object` | `false` | Optional section to export additional assets from pages. | | `assets.export_images` | `bool` | `false` | Optional (default: `false`), export all images for a page to an `image` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | +| `assets.modify_markdown` | `bool` | `false` | Optional (default: `false`), modify markdown files to replace image links with local exported image paths. This requires `assets.export_images` to be `true` in order to work. See [Modify Markdown Files](#modify-markdown-files) for more information. | `assets.export_meta` | `bool` | `false` | Optional (default: `false`), export of metadata about the page in a json file | | `assets.verify_ssl` | `bool` | `false` | Optional (default: `true`), whether or not to check ssl certificates when requesting content from Bookstack host | | `keep_last` | `int` | `false` | Optional (default: `None`), if exporter can delete older archives. valid values are:
- set to `-1` if you want to delete all archives after each run (useful if you only want to upload to object storage)
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done | @@ -261,9 +265,12 @@ General - `MINIO_ACCESS_KEY` - `MINIO_SECRET_KEY` -### Backup Behavior +## Backup Behavior + +### Export File Backups are exported in `.tgz` format and generated based off timestamp. Export names will be in the format: `%Y-%m-%d_%H-%M-%S` (Year-Month-Day_Hour-Minute-Second). *Files are first pulled locally to create the tarball and then can be sent to object storage if needed*. Example file name: `bookstack_export_2023-09-22_07-19-54.tgz`. +### General The exporter can also do housekeeping duties and keep a configured number of archives and delete older ones. See `keep_last` property in the [Configuration](#options-and-descriptions) section. Object storage provider configurations include their own `keep_last` property for flexibility. For file names, `slug` names (from Bookstack API) are used, as such certain characters like `!`, `/` will be ignored and spaces replaced from page names/titles. @@ -349,6 +356,37 @@ Empty/New Pages will be ignored since they have not been modified yet from creat You may notice some directories (books) and/or files (pages) in the archive have a random string at the end, example - `nKA`: `user-and-group-management-nKA`. This is expected and is because there were resources with the same name created in another shelve and bookstack adds a string at the end to ensure uniqueness. +### Images + +### General +Images will be dumped in a separate directory, `images` within the page directory it belongs to. As shown earlier: + +``` +bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/YKvimage.png +bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/dwwimage.png +bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/NzZimage.png +bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/Mymimage.png +``` + +> **Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them. + +### Modify Markdown Files +**To use this feature, `assets.export_images` should be set to `true`** + +The configuration item, `assets.modify_markdown`, can be set to `true` to modify markdown files to replace image url links with local exported image paths. This feature allows for you to make your `markdown` exports much more portable. + +Page (parent) -> Images (Children) relationships are created and then each image url is replaced with its own respective local export path. Example: +``` +## before +[![pool-topology-1.png](https://demo.bookstack/uploads/images/gallery/2023-07/scaled-1680-/pool-topology-1.png)](https://demo.bookstack/uploads/images/gallery/2023-07/pool-topology-1.png) + +## after +[![pool-topology-1.png](./images/pool-topology-1.png)](https://demo.bookstack/uploads/images/gallery/2023-07/pool-topology-1.png) +``` +This allows the image to be found locally within the export files and allow your `markdown` docs to have all the images display properly like it would normally would. + +**Note: This will work properly if your pages are using the notation used by Bookstack for Markdown image links, example: ` [![image alt text](Bookstack Markdown image URL link)](anchor/url link)` The `(anchor/url link)` is optional.** + ## Object Storage Optionally, target(s) can be specified to upload generated archives to a remote location. Supported object storage providers can be found below: - [Minio](#minio-backups) diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py index d2d1f91..329343c 100644 --- a/bookstack_file_exporter/archiver/archiver.py +++ b/bookstack_file_exporter/archiver/archiver.py @@ -5,7 +5,7 @@ from bookstack_file_exporter.exporter.node import Node from bookstack_file_exporter.archiver import util -from bookstack_file_exporter.archiver.page_archiver import PageArchiver +from bookstack_file_exporter.archiver.page_archiver import PageArchiver, ImageNode from bookstack_file_exporter.archiver.minio_archiver import MinioArchiver from bookstack_file_exporter.config_helper.remote import StorageProviderConfig from bookstack_file_exporter.config_helper.config_helper import ConfigNode @@ -49,23 +49,23 @@ def get_bookstack_exports(self, page_nodes: Dict[int, Node]): self._get_page_files(page, page_image_meta) self._get_page_images(page.file_path, page_image_meta) - def _get_page_files(self, page_node: Node, image_meta: List[str]): + def _get_page_files(self, page_node: Node, image_meta: List[ImageNode]): """pull all bookstack pages into local files/tar""" log.debug("Exporting bookstack page data") self._page_archiver.archive_page(page_node, image_meta) - def _get_page_image_map(self) -> Dict[int, List[str]]: + def _get_page_image_map(self) -> Dict[int, ImageNode]: if not self._page_archiver.export_images: log.debug("skipping image export based on user input") return {} return self._page_archiver.get_image_meta() - def _get_page_images(self, page_path: str, urls: List[str]): - if not urls: + def _get_page_images(self, page_path: str, img_nodes: List[ImageNode]): + if not img_nodes: log.debug("page has no images to pull") return log.debug("Exporting bookstack page images") - self._page_archiver.archive_page_images(page_path, urls) + self._page_archiver.archive_page_images(page_path, img_nodes) def create_archive(self): """create tgz archive""" diff --git a/bookstack_file_exporter/archiver/minio_archiver.py b/bookstack_file_exporter/archiver/minio_archiver.py index 27b11e4..ccd8598 100644 --- a/bookstack_file_exporter/archiver/minio_archiver.py +++ b/bookstack_file_exporter/archiver/minio_archiver.py @@ -96,7 +96,6 @@ def _get_stale_objects(self, file_extension: str) -> List[MinioObject]: # last copy that remains if local is deleted log.debug("Minio 'keep_last' set to negative number, ignoring") return [] - # keep_last > 0 condition to_delete = [] if len(minio_objects) > self.keep_last: log.debug("Number of minio objects is greater than 'keep_last'") diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index 3fe9305..7537362 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -31,6 +31,61 @@ # _MARKDOWN_IMAGE_REGEX= re.compile(r"\[\!\[^$|.*\].*\]") _MARKDOWN_STR_CHECK = "markdown" +class ImageNode: + """ + ImageNode provides metadata and convenience for Bookstack images. + + Args: + :img_meta_data: = image meta data + + Returns: + :ImageNode: instance with attributes to help handle images. + """ + def __init__(self, img_meta_data: Dict[str, Union[int, str]]): + self.id: int = img_meta_data['id'] + self.page_id: int = img_meta_data['uploaded_to'] + self.url: str = img_meta_data['url'] + self.name: str = self._get_image_name() + self._markdown_str = "" + self.image_relative_path: str = f"./{_IMAGE_DIR_NAME}/{self.name}" + + def _get_image_name(self) -> str: + return self.url.split('/')[-1] + + # def _get_markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str: + # if 'content' in img_details: + # if _MARKDOWN_STR_CHECK in img_details['content']: + # print(img_details['content'][_MARKDOWN_STR_CHECK]) + # return self._get_md_url_str(img_details['content'][_MARKDOWN_STR_CHECK]) + # return "" + + @property + def markdown_str(self): + return self._markdown_str + + def set_markdown_content(self, img_details: Dict[str, Union[int, str]]): + self._markdown_str = self._get_md_url_str(img_details) + # @markdown_str.setter + # def markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str: + # self._markdown_str = self._get_md_url_str(img_details) + + + def get_replace_str(self) -> str: + """return str for regex replace in page md content""" + # return f"[![{self.name}]({self.image_relative_path})]" + return self.image_relative_path + + @staticmethod + def _get_md_url_str(img_data: Dict[str, Union[int, str]]) -> str: + url_str = "" + if 'content' in img_data: + if _MARKDOWN_STR_CHECK in img_data['content']: + url_str = img_data['content'][_MARKDOWN_STR_CHECK] + # check to see if empty before doing find + if not url_str: + return "" + return url_str[url_str.find("(")+1:url_str.find(")")] + # pylint: disable=too-many-instance-attributes class PageArchiver: """ @@ -75,13 +130,12 @@ def archive_page(self, page: Node, self._archive_page_meta(page.name, page.file_path, page.meta) def _archive_page(self, page: Node, export_format: str, data: bytes, - image_urls: List[str] = None): + image_nodes: List[ImageNode] = None): page_file_name = f"{self.archive_base_path}/" \ f"{page.file_path}/{page.name}{_FILE_EXTENSION_MAP[export_format]}" - - # note yet implemented - # if export_format == _MARKDOWN_STR_CHECK and image_urls and self.modify_md: - # data = self._update_image_links(data, image_urls) + # not yet implemented + if self.modify_md and export_format == _MARKDOWN_STR_CHECK and image_nodes: + data = self._update_image_links(data, image_nodes) self.write_data(page_file_name, data) def _get_page_data(self, page_id: int, export_format: str): @@ -96,7 +150,7 @@ def _archive_page_meta(self, page_name: str, page_path: str, bytes_meta = archiver_util.get_json_bytes(meta_data) self.write_data(file_path=meta_file_name, data=bytes_meta) - def get_image_meta(self) -> Dict[int, List[str]]: + def get_image_meta(self) -> Dict[int, List[ImageNode]]: """Get all image metadata into a {page_number: [image_url]} format""" img_meta_response: Response = common_util.http_get_request( self.api_urls['images'], @@ -105,28 +159,14 @@ def get_image_meta(self) -> Dict[int, List[str]]: img_meta_json = img_meta_response.json()['data'] return self._create_image_map(img_meta_json) - @staticmethod - def _create_image_map(json_data: List[Dict[str, Union[str,int]]]) -> Dict[int, List[str]]: - image_page_map = {} - for image_node in json_data: - image_page_id = image_node['uploaded_to'] - image_url = image_node['url'] - if image_page_id in image_page_map: - image_page_map[image_page_id].append(image_url) - else: - image_page_map[image_page_id] = [image_url] - return image_page_map - - def archive_page_images(self, page_path: str, image_urls: List[str]): + def archive_page_images(self, page_path: str, image_nodes: List[ImageNode]): """pull images locally into a directory based on page""" # image_base_path = f"{self.archive_base_path}/{page_path}{_IMAGE_DIR_SUFFIX}" image_base_path = f"{self.archive_base_path}/{page_path}/{_IMAGE_DIR_NAME}" - for image_url in image_urls: - img_data: bytes = archiver_util.get_byte_response(image_url, self._headers, + for img_node in image_nodes: + img_data: bytes = archiver_util.get_byte_response(img_node.url, self._headers, self.verify_ssl) - # seems safer to use this instead of image['name'] field - img_file_name = image_url.split('/')[-1] - image_path = f"{image_base_path}/{img_file_name}" + image_path = f"{image_base_path}/{img_node.name}" self.write_data(image_path, img_data) def write_data(self, file_path: str, data: bytes): @@ -142,19 +182,32 @@ def gzip_archive(self): """provide the tar to gzip and the name of the gzip output file""" archiver_util.create_gzip(self.tar_file, self.archive_file) - def _update_image_links(self, page_data: bytes, urls: List[str]) -> bytes: + def _update_image_links(self, page_data: bytes, image_nodes: List[ImageNode]) -> bytes: """regex replace links to local created directories""" - # 1 - what to replace, 2 - replace with, 3 is the data to replace - # re.sub(b'pfsense', b'lol', x.content) + for img_node in image_nodes: + img_meta_url = f"{self.api_urls["images"]}/{img_node.id}" + img_details = common_util.http_get_request(img_meta_url, + self._headers, self.verify_ssl) + + img_node.set_markdown_content(img_details.json()) + if not img_node.markdown_str: + continue + + # re_pattern_bytes = self._get_regex_expr(img_node.markdown_str) + + # re_pattern_bytes = self._get_regex_expr(img_node.url) + + # 1 - what to replace, 2 - replace with, 3 is the data to replace + # re.sub(b'pfsense', b'lol', x.content) + print(img_node.markdown_str) + print(img_node.get_replace_str()) + page_data = re.sub(img_node.markdown_str.encode(), img_node.get_replace_str().encode(), page_data) + # print(page_data) + return page_data # string to bytes # >>> k = 'lol' # >>> k.encode() - pass - - def _valid_image_link(self): - """should contain bookstack host""" - pass @property def file_extension_map(self) -> Dict[str, str]: @@ -171,6 +224,19 @@ def verify_ssl(self) -> bool: """return whether or not to verify ssl for http requests""" return self.asset_config.verify_ssl + # @staticmethod + # def _get_regex_expr(image_str: str) -> bytes: + # # regex_str = fr"\[\!\[^$|.*\]\({url}\)\]" + # # print(regex_str) + # return re.compile(image_str.encode()) + @staticmethod - def _get_regex_expr(url: str) -> re.Pattern: - return re.compile(fr"\[\!\[^$|.*\].*{url}.*\]") + def _create_image_map(json_data: List[Dict[str, Union[str,int]]]) -> Dict[int, List[ImageNode]]: + image_page_map = {} + for img_meta in json_data: + img_node = ImageNode(img_meta) + if img_node.page_id in image_page_map: + image_page_map[img_node.page_id].append(img_node) + else: + image_page_map[img_node.page_id] = [img_node] + return image_page_map From f3cbf8b772fb5870d4bf1c9444832af4a4b37326 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 22 Nov 2023 01:13:02 -0500 Subject: [PATCH 3/5] code clean up --- .../archiver/page_archiver.py | 44 ++++--------------- 1 file changed, 9 insertions(+), 35 deletions(-) diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index 7537362..df1d572 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -47,17 +47,14 @@ def __init__(self, img_meta_data: Dict[str, Union[int, str]]): self.url: str = img_meta_data['url'] self.name: str = self._get_image_name() self._markdown_str = "" - self.image_relative_path: str = f"./{_IMAGE_DIR_NAME}/{self.name}" + self._image_relative_path: str = f"./{_IMAGE_DIR_NAME}/{self.name}" def _get_image_name(self) -> str: return self.url.split('/')[-1] - - # def _get_markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str: - # if 'content' in img_details: - # if _MARKDOWN_STR_CHECK in img_details['content']: - # print(img_details['content'][_MARKDOWN_STR_CHECK]) - # return self._get_md_url_str(img_details['content'][_MARKDOWN_STR_CHECK]) - # return "" + + @property + def image_relative_path(self): + return self._image_relative_path @property def markdown_str(self): @@ -65,15 +62,6 @@ def markdown_str(self): def set_markdown_content(self, img_details: Dict[str, Union[int, str]]): self._markdown_str = self._get_md_url_str(img_details) - # @markdown_str.setter - # def markdown_str(self, img_details: Dict[str, Union[int, str]]) -> str: - # self._markdown_str = self._get_md_url_str(img_details) - - - def get_replace_str(self) -> str: - """return str for regex replace in page md content""" - # return f"[![{self.name}]({self.image_relative_path})]" - return self.image_relative_path @staticmethod def _get_md_url_str(img_data: Dict[str, Union[int, str]]) -> str: @@ -185,30 +173,17 @@ def gzip_archive(self): def _update_image_links(self, page_data: bytes, image_nodes: List[ImageNode]) -> bytes: """regex replace links to local created directories""" for img_node in image_nodes: - img_meta_url = f"{self.api_urls["images"]}/{img_node.id}" + img_meta_url = f"{self.api_urls['images']}/{img_node.id}" img_details = common_util.http_get_request(img_meta_url, self._headers, self.verify_ssl) img_node.set_markdown_content(img_details.json()) if not img_node.markdown_str: continue - - # re_pattern_bytes = self._get_regex_expr(img_node.markdown_str) - - # re_pattern_bytes = self._get_regex_expr(img_node.url) - # 1 - what to replace, 2 - replace with, 3 is the data to replace - # re.sub(b'pfsense', b'lol', x.content) - print(img_node.markdown_str) - print(img_node.get_replace_str()) - page_data = re.sub(img_node.markdown_str.encode(), img_node.get_replace_str().encode(), page_data) - # print(page_data) + page_data = re.sub(img_node.markdown_str.encode(), img_node.image_relative_path().encode(), page_data) return page_data - # string to bytes - # >>> k = 'lol' - # >>> k.encode() - @property def file_extension_map(self) -> Dict[str, str]: """file extension metadata""" @@ -225,10 +200,9 @@ def verify_ssl(self) -> bool: return self.asset_config.verify_ssl # @staticmethod - # def _get_regex_expr(image_str: str) -> bytes: + # def _get_regex_expr(url: str) -> bytes: # # regex_str = fr"\[\!\[^$|.*\]\({url}\)\]" - # # print(regex_str) - # return re.compile(image_str.encode()) + # return re.compile(regex_str.encode()) @staticmethod def _create_image_map(json_data: List[Dict[str, Union[str,int]]]) -> Dict[int, List[ImageNode]]: From 39214aaef85debf07c5028934815b07c0a3205a0 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 22 Nov 2023 06:26:48 +0000 Subject: [PATCH 4/5] more code clean up --- .../archiver/page_archiver.py | 14 ++++++-------- bookstack_file_exporter/archiver/util.py | 16 ---------------- 2 files changed, 6 insertions(+), 24 deletions(-) diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index df1d572..e159bbb 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -3,13 +3,11 @@ # pylint: disable=import-error from requests import Response - from bookstack_file_exporter.exporter.node import Node from bookstack_file_exporter.archiver import util as archiver_util from bookstack_file_exporter.config_helper.config_helper import ConfigNode from bookstack_file_exporter.common import util as common_util - _META_FILE_SUFFIX = "_meta.json" _TAR_SUFFIX = ".tar" _TAR_GZ_SUFFIX = ".tgz" @@ -26,9 +24,7 @@ "tgz": _TAR_GZ_SUFFIX } - _IMAGE_DIR_NAME = "images" -# _MARKDOWN_IMAGE_REGEX= re.compile(r"\[\!\[^$|.*\].*\]") _MARKDOWN_STR_CHECK = "markdown" class ImageNode: @@ -54,13 +50,16 @@ def _get_image_name(self) -> str: @property def image_relative_path(self): + """return image path local to page directory""" return self._image_relative_path @property def markdown_str(self): + """return markdown url str to replace""" return self._markdown_str - + def set_markdown_content(self, img_details: Dict[str, Union[int, str]]): + """provide image metadata to set markdown properties""" self._markdown_str = self._get_md_url_str(img_details) @staticmethod @@ -121,7 +120,6 @@ def _archive_page(self, page: Node, export_format: str, data: bytes, image_nodes: List[ImageNode] = None): page_file_name = f"{self.archive_base_path}/" \ f"{page.file_path}/{page.name}{_FILE_EXTENSION_MAP[export_format]}" - # not yet implemented if self.modify_md and export_format == _MARKDOWN_STR_CHECK and image_nodes: data = self._update_image_links(data, image_nodes) self.write_data(page_file_name, data) @@ -176,12 +174,12 @@ def _update_image_links(self, page_data: bytes, image_nodes: List[ImageNode]) -> img_meta_url = f"{self.api_urls['images']}/{img_node.id}" img_details = common_util.http_get_request(img_meta_url, self._headers, self.verify_ssl) - img_node.set_markdown_content(img_details.json()) if not img_node.markdown_str: continue # 1 - what to replace, 2 - replace with, 3 is the data to replace - page_data = re.sub(img_node.markdown_str.encode(), img_node.image_relative_path().encode(), page_data) + page_data = re.sub(img_node.markdown_str.encode(), + img_node.image_relative_path.encode(), page_data) return page_data @property diff --git a/bookstack_file_exporter/archiver/util.py b/bookstack_file_exporter/archiver/util.py index bda4c1a..0980fb6 100644 --- a/bookstack_file_exporter/archiver/util.py +++ b/bookstack_file_exporter/archiver/util.py @@ -27,13 +27,6 @@ def write_tar(base_tar_dir: str, file_path: str, data: bytes): log.debug("Adding file: %s with size: %d bytes to tar file", tar_info.name, tar_info.size) tar.addfile(tar_info, fileobj=data_obj) -# create files first for manipulation/changes and tar later -# def write_file(file_path: str, data: bytes): -# """write byte data to a local file""" -# os.makedirs(os.path.dirname(file_path), exist_ok=True) -# with open(file_path, 'wb') as file_obj: -# file_obj.write(data) - def get_json_bytes(data: Dict[str, Union[str, int]]) -> bytes: """dump dict to json file""" return json.dumps(data, indent=4).encode('utf-8') @@ -55,12 +48,3 @@ def scan_archives(base_dir: str, extension: str) -> str: """scan export directory for archives""" file_pattern = f"{base_dir}_*{extension}" return glob.glob(file_pattern) - -# def find_file_matches(file_path: str, regex_expr: re.Pattern) -> List[str]: -# """find all matching lines for regex pattern""" -# matches=[] -# with open(file_path, encoding="utf-8") as open_file: -# for line in open_file: -# for match in re.finditer(regex_expr, line): -# matches.append(match.group) -# return matches From 319d69a45445f2a1d74f9edebaab703cc28e0e17 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 22 Nov 2023 01:36:41 -0500 Subject: [PATCH 5/5] bump version and prep docs for new release --- Makefile | 2 +- README.md | 8 ++++++-- examples/config.yml | 3 +++ examples/minio_config.yml | 5 ++++- 4 files changed, 14 insertions(+), 4 deletions(-) diff --git a/Makefile b/Makefile index 457f080..5d9e5f0 100644 --- a/Makefile +++ b/Makefile @@ -4,7 +4,7 @@ BASE_IMAGE_TAG=3.12-slim-bookworm IMAGE_NAME=homeylab/bookstack-file-exporter # keep this start sequence unique (IMAGE_TAG=) # github actions will use this to create a tag -IMAGE_TAG=0.0.3 +IMAGE_TAG=1.0.0 DOCKER_WORK_DIR=/export DOCKER_CONFIG_DIR=/export/config DOCKER_EXPORT_DIR=/export/dump diff --git a/README.md b/README.md index e4d77de..874f691 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ Table of Contents - [Modify Markdown Files](#modify-markdown-files) - [Object Storage](#object-storage) - [Minio](#minio-backups) +- [Future Items](#future-items) ## Background _Features are actively being developed. See `Future Items` section for more details. Open an issue for a feature request._ @@ -76,6 +77,7 @@ formats: output_path: "bkps/" assets: export_images: false + modify_markdown: false export_meta: false verify_ssl: true ``` @@ -196,6 +198,7 @@ formats: output_path: "bkps/" assets: export_images: false + modify_markdown: false export_meta: false verify_ssl: true ``` @@ -228,6 +231,7 @@ minio: output_path: "bkps/" assets: export_images: true + modify_markdown: false export_meta: false verify_ssl: true keep_last: 5 @@ -368,7 +372,7 @@ bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/NzZimage.pn bookstack_export_2023-11-20_08-00-29/programming/react/basics/images/Mymimage.png ``` -> **Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them. +**Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them.** ### Modify Markdown Files **To use this feature, `assets.export_images` should be set to `true`** @@ -426,7 +430,7 @@ minio: ## Future Items 1. ~~Be able to pull images locally and place in their respective page folders for a more complete file level backup.~~ 2. ~~Include the exporter in a maintained helm chart as an optional deployment. The helm chart is [here](https://github.com/homeylab/helm-charts/tree/main/charts/bookstack).~~ -3. Be able to modify markdown links of images to local exported images in their respective page folders for a more complete file level backup. +3. ~~Be able to modify markdown links of images to local exported images in their respective page folders for a more complete file level backup.~~ 4. Be able to pull attachments locally and place in their respective page folders for a more complete file level backup. 5. Export S3 and more options. 6. Filter shelves and books by name - for more targeted backups. Example: you only want to share a book about one topic with an external friend/user. diff --git a/examples/config.yml b/examples/config.yml index c02fb1e..2a12cb1 100644 --- a/examples/config.yml +++ b/examples/config.yml @@ -29,6 +29,9 @@ assets: # optional export of all the images used in a page(s). # omit this or set to false if not needed export_images: false + # optional modify markdown files to replace image url links + # with local exported image paths + modify_markdown: false ## optional export of metadata about the page in a json file # this metadata contains general information about the page # like: last update, owner, revision count, etc. diff --git a/examples/minio_config.yml b/examples/minio_config.yml index 0e83e70..692cacd 100644 --- a/examples/minio_config.yml +++ b/examples/minio_config.yml @@ -58,7 +58,10 @@ assets: # optional export of all the images used in a page(s). # omit this or set to false if not needed export_images: false - ## optional export of metadata about the page in a json file + # optional modify markdown files to replace image url links + # with local exported image paths + modify_markdown: false + # optional export of metadata about the page in a json file # this metadata contains general information about the page # like: last update, owner, revision count, etc. # omit this or set to false if not needed