From ad38c3b764d2e9cdd91f40f3bb19c409abb140fc Mon Sep 17 00:00:00 2001 From: pchang388 Date: Tue, 23 Jan 2024 05:03:50 -0500 Subject: [PATCH 01/35] update description --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e118dcb..720651f 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Table of Contents ## Background _If you encounter any issues, want to request an additional feature, or provide assistance, feel free to open a Github issue._ -This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. +This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, attachments, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. This small project was mainly created to run as a cron job in k8s but works anywhere. This tool allows me to export my docs in markdown, or other formats like pdf. I use Bookstack's markdown editor as default instead of WYSIWYG editor and this makes my notes portable anywhere even if offline. From 4c02d684129f86882a0b154936c57519c059be46 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Tue, 23 Jan 2024 05:05:40 -0500 Subject: [PATCH 02/35] update description --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index e118dcb..9989b4b 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Table of Contents ## Background _If you encounter any issues, want to request an additional feature, or provide assistance, feel free to open a Github issue._ -This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. +This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. Image and attachment links can also be modified in markdown exports to point to local exported paths. This small project was mainly created to run as a cron job in k8s but works anywhere. This tool allows me to export my docs in markdown, or other formats like pdf. I use Bookstack's markdown editor as default instead of WYSIWYG editor and this makes my notes portable anywhere even if offline. From 7b9e6c21e82ba10a9e8f415802b77db481a3073e Mon Sep 17 00:00:00 2001 From: pchang388 Date: Tue, 23 Jan 2024 05:07:14 -0500 Subject: [PATCH 03/35] update description --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 9989b4b..498345d 100644 --- a/README.md +++ b/README.md @@ -21,7 +21,7 @@ Table of Contents ## Background _If you encounter any issues, want to request an additional feature, or provide assistance, feel free to open a Github issue._ -This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. Image and attachment links can also be modified in markdown exports to point to local exported paths. +This tool provides a way to export [Bookstack](https://github.com/BookStackApp/BookStack) pages and their content (_text, images, attachments, metadata, etc._) into a relational parent-child layout locally with an option to push to remote object storage locations. See [Backup Behavior](#backup-behavior) section for more details on how pages are organized. Image and attachment links can also be modified in markdown exports to point to local exported paths. This small project was mainly created to run as a cron job in k8s but works anywhere. This tool allows me to export my docs in markdown, or other formats like pdf. I use Bookstack's markdown editor as default instead of WYSIWYG editor and this makes my notes portable anywhere even if offline. From 80749cbd3fc6959e2203d68ab26e00602144b7c0 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 31 Jan 2024 02:36:54 -0500 Subject: [PATCH 04/35] update readme for suggestion by @intrepidsilence --- README.md | 15 +++++++++++---- 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/README.md b/README.md index 498345d..5ce6970 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Table of Contents - [Using This Application](#using-this-application) - [Run via Pip](#run-via-pip) - [Run via Docker](#run-via-docker) - - [Authentication](#authentication) + - [Authentication and Permissions](#authentication-and-permissions) - [Configuration](#configuration) - [Backup Behavior](#backup-behavior) - [General](#general) @@ -174,9 +174,16 @@ docker run \ | `config` | `/export/config/config.yml` | A valid configuration file |`-v /local/yourpath/config.yml:/export/config/config.yml:ro`| | `dump` | `/export/dump` | Directory to place exports. **This is optional when using remote storage option(s)**. Omit if you don't need a local copy. | `-v /local/yourpath/bkps:/export/dump` | -### Authentication -**Note visibility of pages is based on user**, so use a user that has access to pages you want to back up. +### Authentication and Permissions +#### Permissions +**Note visibility of pages is based on user**, so use a user that has read access to pages and content you want to back up. *The role assigned to the user* should have the additional permissions for target pages and their content: +- `read` for all images and attachments + - For most users this may already be set - may be required to be set depending on storage option used +- `Export Content` (This can be found in `Edit Role --> System Permissions`) + - For most users this may already set - may be required to be set if using custom roles + - If not set, you may see page contents showing as a HTML login page, as reported in this [issue](https://github.com/homeylab/bookstack-file-exporter/issues/35) +#### Token Authentication Ref: [https://demo.bookstackapp.com/api/docs#authentication](https://demo.bookstackapp.com/api/docs#authentication) Provide a tokenId and a tokenSecret as environment variables or directly in the configuration file. @@ -188,7 +195,7 @@ Env variables for credentials will take precedence over configuration file optio **For object storage authentication**, find the relevant sections further down in their respective sections. ### Configuration -_Ensure [Authentication](#authentication) has been set up beforehand for required credentials._ For a simple example to run quickly, refer to the one in the [Using This Application](#using-this-application) section. A full example is also shown below with descriptions. Optionally, look at `examples/` folder of the github repo for more examples. +_Ensure [Authentication](#authentication-and-permissions) has been set up beforehand for required credentials._ For a simple example to run quickly, refer to the one in the [Using This Application](#using-this-application) section. A full example is also shown below with descriptions. Optionally, look at `examples/` folder of the github repo for more examples. For object storage configuration, find more information in their respective sections - [Minio](#minio-backups) From 7dc399aa92d84aab7e4b037468bffd36d441af1c Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 5 Jul 2024 04:52:15 -0400 Subject: [PATCH 05/35] bump python version to 3.12.4, update pkg depedencies to latest --- .devcontainer/devcontainer.json | 5 +++-- Dockerfile | 2 +- Makefile | 2 +- setup.cfg | 6 +++--- 4 files changed, 8 insertions(+), 7 deletions(-) diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 16d03db..82d6346 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -7,7 +7,7 @@ "features": { "ghcr.io/devcontainers/features/python:1": { "installTools": true, - "version": "3.12.1" + "version": "3.12.4" } }, "customizations": { @@ -15,7 +15,8 @@ "extensions": [ "streetsidesoftware.code-spell-checker", "ms-python.python", - "redhat.vscode-yaml" + "redhat.vscode-yaml", + "GitHub.copilot" ] } } diff --git a/Dockerfile b/Dockerfile index c368f25..c493943 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,5 +1,5 @@ ARG BASE_IMAGE=python -ARG BASE_IMAGE_TAG=3.12.1-slim-bookworm +ARG BASE_IMAGE_TAG=3.12.4-slim-bookworm FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} diff --git a/Makefile b/Makefile index 4eebcfd..93ae354 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ ## DOCKER BUILD VARS BASE_IMAGE=python -BASE_IMAGE_TAG=3.12-slim-bookworm +BASE_IMAGE_TAG=3.12.4-slim-bookworm IMAGE_NAME=homeylab/bookstack-file-exporter # keep this start sequence unique (IMAGE_TAG=) # github actions will use this to create a tag diff --git a/setup.cfg b/setup.cfg index dffaf0c..ef0c2bf 100644 --- a/setup.cfg +++ b/setup.cfg @@ -18,9 +18,9 @@ classifiers = python_requires = >=3.8 install_requires = Pyyaml >= 6.0.1 # https://pypi.org/project/PyYAML/ - Pydantic >= 2.5.3 # https://docs.pydantic.dev/latest/ - requests >= 2.31.0 # https://pypi.org/project/requests/ - minio >= 7.2.3 # https://pypi.org/project/minio/ + Pydantic >= 2.8.2 # https://docs.pydantic.dev/latest/ + requests >= 2.32.3 # https://pypi.org/project/requests/ + minio >= 7.2.7 # https://pypi.org/project/minio/ packages = find: [options.entry_points] From bee6f9511e643e55e5558c8ce49d5a5fbaabcced Mon Sep 17 00:00:00 2001 From: pchang388 Date: Sat, 6 Jul 2024 02:54:44 -0400 Subject: [PATCH 06/35] bump py version for actions --- .github/actions/python/action.yml | 2 +- .github/actions/tests/action.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/actions/python/action.yml b/.github/actions/python/action.yml index afef031..677c051 100644 --- a/.github/actions/python/action.yml +++ b/.github/actions/python/action.yml @@ -23,7 +23,7 @@ runs: - name: Set up Python uses: actions/setup-python@v3 with: - python-version: '3.12.1' + python-version: '3.12.4' - name: Install Dependencies shell: bash run: | diff --git a/.github/actions/tests/action.yml b/.github/actions/tests/action.yml index 3606371..3b7ed0c 100644 --- a/.github/actions/tests/action.yml +++ b/.github/actions/tests/action.yml @@ -8,7 +8,7 @@ runs: - name: Set up Python uses: actions/setup-python@v3 with: - python-version: '3.12.1' + python-version: '3.12.4' - name: Install dependencies shell: bash run: | From b68a9598f318f75d0043cc15a37a9188af1ba471 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Tue, 29 Oct 2024 21:12:56 -0400 Subject: [PATCH 07/35] bump dep versions --- setup.cfg | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/setup.cfg b/setup.cfg index ef0c2bf..2cc3992 100644 --- a/setup.cfg +++ b/setup.cfg @@ -17,10 +17,10 @@ classifiers = [options] python_requires = >=3.8 install_requires = - Pyyaml >= 6.0.1 # https://pypi.org/project/PyYAML/ - Pydantic >= 2.8.2 # https://docs.pydantic.dev/latest/ + Pyyaml >= 6.0.2 # https://pypi.org/project/PyYAML/ + Pydantic >= 2.9.2 # https://docs.pydantic.dev/latest/ requests >= 2.32.3 # https://pypi.org/project/requests/ - minio >= 7.2.7 # https://pypi.org/project/minio/ + minio >= 7.2.10 # https://pypi.org/project/minio/ packages = find: [options.entry_points] From 92af2826fef0f007866e2378691d936f87ebc6fc Mon Sep 17 00:00:00 2001 From: pchang388 Date: Thu, 31 Oct 2024 00:46:49 -0400 Subject: [PATCH 08/35] attempt to add logic to skip an image or attachment export if their API call fails --- .../archiver/asset_archiver.py | 7 ++-- .../archiver/page_archiver.py | 35 ++++++++++++++----- bookstack_file_exporter/exporter/node.py | 1 + bookstack_file_exporter/run.py | 1 + 4 files changed, 33 insertions(+), 11 deletions(-) diff --git a/bookstack_file_exporter/archiver/asset_archiver.py b/bookstack_file_exporter/archiver/asset_archiver.py index 71f2583..dbea2bf 100644 --- a/bookstack_file_exporter/archiver/asset_archiver.py +++ b/bookstack_file_exporter/archiver/asset_archiver.py @@ -24,7 +24,7 @@ class AssetNode: AssetNode instance for use in other classes """ def __init__(self, meta_data: Dict[str, int | str | bool]): - self.id: int = meta_data['id'] + self.id_: int = meta_data['id'] self.page_id: int = meta_data['uploaded_to'] self.url: str = "" self.name: str = "" @@ -88,7 +88,7 @@ class AttachmentNode(AssetNode): def __init__(self, meta_data: Dict[str, Union[int, str, bool]], base_url: str): super().__init__(meta_data) - self.url: str = f"{base_url}/{self.id}" + self.url: str = f"{base_url}/{self.id_}" self.name = meta_data['name'] log.debug("Attachment node has generated url: %s", self.url) self._relative_path_prefix = f"{_ATTACHMENT_DIR_NAME}" @@ -140,7 +140,7 @@ def get_asset_nodes(self, asset_type: str) -> Dict[str, ImageNode | AttachmentNo def get_asset_data(self, asset_type: str, meta_data: Union[AttachmentNode, ImageNode]) -> Dict[str, str | bool | int | dict]: """Get asset data based on type""" - data_url = f"{self.api_urls[asset_type]}/{meta_data.id}" + data_url = f"{self.api_urls[asset_type]}/{meta_data.id_}" asset_data_response: Response = common_util.http_get_request( data_url, self._headers, @@ -164,6 +164,7 @@ def update_asset_links(self, asset_type, page_name: str, page_data: bytes, asset_nodes: List[ImageNode | AttachmentNode]) -> bytes: """update markdown links in page data""" for asset_node in asset_nodes: + # get metadata instead of raw data/bytes asset_data = self.get_asset_data(asset_type, asset_node) asset_node.set_markdown_content(asset_data) if not asset_node.markdown_str: diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index 4398e27..a1106b3 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -1,10 +1,12 @@ from typing import Union, List, Dict - +import logging from bookstack_file_exporter.exporter.node import Node from bookstack_file_exporter.archiver import util as archiver_util from bookstack_file_exporter.archiver.asset_archiver import AssetArchiver, ImageNode, AttachmentNode from bookstack_file_exporter.config_helper.config_helper import ConfigNode +log = logging.getLogger(__name__) + _META_FILE_SUFFIX = "_meta.json" _TAR_SUFFIX = ".tar" _TAR_GZ_SUFFIX = ".tgz" @@ -70,6 +72,16 @@ def archive_pages(self, page_nodes: Dict[int, Node]): page_images = image_nodes[page.id_] if page.id_ in attachment_nodes: page_attachments = attachment_nodes[page.id_] + failed_images = self.archive_page_assets("images", page.parent.file_path, + page.name, page_images) + failed_attach = self.archive_page_assets("attachments", page.parent.file_path, + page.name, page_attachments) + if failed_images: + # exclude from page_images so it doesn't attempt to get modified in markdown file + page_images = [img for img in page_images if img.id_ not in failed_images] + if failed_attach: + # exclude from page_attachments so it doesn't attempt to get modified in markdown file + page_attachments = [attach for attach in page_attachments if attach.id_ not in failed_attach] for export_format in self.export_formats: page_data = self._get_page_data(page.id_, export_format) if page_images and export_format == 'markdown': @@ -80,10 +92,6 @@ def archive_pages(self, page_nodes: Dict[int, Node]): page_data, page_attachments) self._archive_page(page, export_format, page_data) - self.archive_page_assets("images", page.parent.file_path, - page.name, page_images) - self.archive_page_assets("attachments", page.parent.file_path, - page.name, page_attachments) if self.asset_config.export_meta: self._archive_page_meta(page.file_path, page.meta) @@ -123,15 +131,26 @@ def _modify_markdown(self, asset_type: str, asset_nodes) def archive_page_assets(self, asset_type: str, parent_path: str, page_name: str, - asset_nodes: List[ImageNode | AttachmentNode]): + asset_nodes: List[ImageNode | AttachmentNode]) -> Dict[int, int]: """pull images locally into a directory based on page""" if not asset_nodes: - return + return {} + # use a map for faster lookup + failed_assets = {} node_base_path = f"{self.archive_base_path}/{parent_path}/" for asset_node in asset_nodes: - asset_data = self.asset_archiver.get_asset_bytes(asset_type, asset_node.url) + try: + asset_data = self.asset_archiver.get_asset_bytes(asset_type, asset_node.url) + except: + # probably unnecessary, but just in case + if asset_node.id_ not in failed_assets: + failed_assets[asset_node.id_] = 0 + # a 404 or other error occurred, skip this asset, already logged in http request exception + log.error(f"Failed to get image or attachment data for asset located at: {asset_node.url} - skipping") + continue asset_path = f"{node_base_path}/{asset_node.get_relative_path(page_name)}" self.write_data(asset_path, asset_data) + return failed_assets def write_data(self, file_path: str, data: bytes): """write data to a tar file diff --git a/bookstack_file_exporter/exporter/node.py b/bookstack_file_exporter/exporter/node.py index 6fa5a86..5abe41d 100644 --- a/bookstack_file_exporter/exporter/node.py +++ b/bookstack_file_exporter/exporter/node.py @@ -38,6 +38,7 @@ def __init__(self, meta: Dict[str, Union[str, int]], # for convenience/usage for exporter # self.name: str = self.meta['slug'] self.name = self.get_name(self.meta['slug'], self.meta['name']) + # id() is a built-in function and should not be used as a variable name self.id_: int = self.meta['id'] self._display_name = self.meta['name'] # children diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index 2ad787c..20f586d 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -53,4 +53,5 @@ def exporter(args: argparse.Namespace): # clean up the .tgz archive since it is already uploaded archive.clean_up() + log.info(f"Created file archive: {archive.archive_dir}.tgz") log.info("Completed run") From 43700558ae75a764289497db35f7627b73cbf771 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Thu, 31 Oct 2024 01:04:47 -0400 Subject: [PATCH 09/35] update base image to use python 3.13.0 --- Dockerfile | 2 +- Makefile | 12 +++++++++++- 2 files changed, 12 insertions(+), 2 deletions(-) diff --git a/Dockerfile b/Dockerfile index c493943..fd258fa 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,5 +1,5 @@ ARG BASE_IMAGE=python -ARG BASE_IMAGE_TAG=3.12.4-slim-bookworm +ARG BASE_IMAGE_TAG=3.13.0-slim-bookworm FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} diff --git a/Makefile b/Makefile index 93ae354..8957bef 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ ## DOCKER BUILD VARS BASE_IMAGE=python -BASE_IMAGE_TAG=3.12.4-slim-bookworm +BASE_IMAGE_TAG=3.13.0-slim-bookworm IMAGE_NAME=homeylab/bookstack-file-exporter # keep this start sequence unique (IMAGE_TAG=) # github actions will use this to create a tag @@ -27,6 +27,16 @@ upload_testpypi: download_testpypi: python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple bookstack-file-exporter +docker_build_simple: + docker build \ + --build-arg BASE_IMAGE=${BASE_IMAGE} \ + --build-arg BASE_IMAGE_TAG=${BASE_IMAGE_TAG} \ + --build-arg DOCKER_WORK_DIR=${DOCKER_WORK_DIR} \ + --build-arg DOCKER_CONFIG_DIR=${DOCKER_CONFIG_DIR} \ + --build-arg DOCKER_EXPORT_DIR=${DOCKER_EXPORT_DIR} \ + -t ${IMAGE_NAME}:${IMAGE_TAG} \ + --no-cache . + docker_build: docker buildx build \ --platform linux/amd64,linux/arm64 \ From cd553b7bf8d43b2bb311484468282ee05764d6ac Mon Sep 17 00:00:00 2001 From: pchang388 Date: Thu, 31 Oct 2024 01:05:51 -0400 Subject: [PATCH 10/35] bump test python versions to match --- .github/actions/python/action.yml | 2 +- .github/actions/tests/action.yml | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/.github/actions/python/action.yml b/.github/actions/python/action.yml index 677c051..e101b14 100644 --- a/.github/actions/python/action.yml +++ b/.github/actions/python/action.yml @@ -23,7 +23,7 @@ runs: - name: Set up Python uses: actions/setup-python@v3 with: - python-version: '3.12.4' + python-version: '3.13.0' - name: Install Dependencies shell: bash run: | diff --git a/.github/actions/tests/action.yml b/.github/actions/tests/action.yml index 3b7ed0c..97805a3 100644 --- a/.github/actions/tests/action.yml +++ b/.github/actions/tests/action.yml @@ -8,7 +8,7 @@ runs: - name: Set up Python uses: actions/setup-python@v3 with: - python-version: '3.12.4' + python-version: '3.13.0' - name: Install dependencies shell: bash run: | From 82fae8cfa588009666e037c8ca39235b1360bd11 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Thu, 31 Oct 2024 01:32:15 -0400 Subject: [PATCH 11/35] pylint adjustments --- .devcontainer/devcontainer.json | 2 +- Makefile | 3 +++ .../archiver/page_archiver.py | 19 +++++++++++++------ bookstack_file_exporter/run.py | 2 +- 4 files changed, 18 insertions(+), 8 deletions(-) diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 82d6346..14b78b0 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -7,7 +7,7 @@ "features": { "ghcr.io/devcontainers/features/python:1": { "installTools": true, - "version": "3.12.4" + "version": "3.13.0" } }, "customizations": { diff --git a/Makefile b/Makefile index 8957bef..3eef2ba 100644 --- a/Makefile +++ b/Makefile @@ -19,6 +19,9 @@ build: python -m pip install --upgrade build python -m build +lint: + pylint bookstack_file_exporter + upload_testpypi: python -m pip install --upgrade twine python -m twine upload --repository testpypi dist/* diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index a1106b3..84ba201 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -1,5 +1,7 @@ from typing import Union, List, Dict import logging +# pylint: disable=import-error +from requests.exceptions import HTTPError from bookstack_file_exporter.exporter.node import Node from bookstack_file_exporter.archiver import util as archiver_util from bookstack_file_exporter.archiver.asset_archiver import AssetArchiver, ImageNode, AttachmentNode @@ -76,12 +78,15 @@ def archive_pages(self, page_nodes: Dict[int, Node]): page.name, page_images) failed_attach = self.archive_page_assets("attachments", page.parent.file_path, page.name, page_attachments) + # exclude from page_images + # so it doesn't attempt to get modified in markdown file if failed_images: - # exclude from page_images so it doesn't attempt to get modified in markdown file page_images = [img for img in page_images if img.id_ not in failed_images] + # exclude from page_attachments + # so it doesn't attempt to get modified in markdown file if failed_attach: - # exclude from page_attachments so it doesn't attempt to get modified in markdown file - page_attachments = [attach for attach in page_attachments if attach.id_ not in failed_attach] + page_attachments = [attach for attach in page_attachments + if attach.id_ not in failed_attach] for export_format in self.export_formats: page_data = self._get_page_data(page.id_, export_format) if page_images and export_format == 'markdown': @@ -141,12 +146,14 @@ def archive_page_assets(self, asset_type: str, parent_path: str, page_name: str, for asset_node in asset_nodes: try: asset_data = self.asset_archiver.get_asset_bytes(asset_type, asset_node.url) - except: + except HTTPError: # probably unnecessary, but just in case if asset_node.id_ not in failed_assets: failed_assets[asset_node.id_] = 0 - # a 404 or other error occurred, skip this asset, already logged in http request exception - log.error(f"Failed to get image or attachment data for asset located at: {asset_node.url} - skipping") + # a 404 or other error occurred + # skip this asset + log.error("Failed to get image or attachment data " \ + "for asset located at: %s - skipping", asset_node.url) continue asset_path = f"{node_base_path}/{asset_node.get_relative_path(page_name)}" self.write_data(asset_path, asset_data) diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index 20f586d..f63a215 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -53,5 +53,5 @@ def exporter(args: argparse.Namespace): # clean up the .tgz archive since it is already uploaded archive.clean_up() - log.info(f"Created file archive: {archive.archive_dir}.tgz") + log.info("Created file archive: %s.tgz", archive.archive_dir) log.info("Completed run") From 6c7515ba637361b6d2fb89d0d4efbca6398c85b8 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 6 Nov 2024 04:34:19 -0500 Subject: [PATCH 12/35] update readme for assest skip feature --- README.md | 6 +++++- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 5ce6970..732e6ff 100644 --- a/README.md +++ b/README.md @@ -87,7 +87,7 @@ assets: The exporter can be installed via pip and run directly. #### Python Version -_Note: This application is tested and developed on Python version `3.12.X`. The min required version is >= `3.8` but is recommended to install (or set up a venv) a `3.12.X` version._ +_Note: This application is tested and developed on Python version `3.13.X`. The min required version is >= `3.8` but is recommended to install (or set up a venv) a `3.13.X` version._ #### Examples ```bash @@ -386,6 +386,8 @@ bookstack_export_2023-11-28_06-24-25/programming/react/images/nextjs/tips.png **Note you may see old images in your exports. This is because, by default, Bookstack retains images/drawings that are uploaded even if no longer referenced on an active page. Admins can run `Cleanup Images` in the Maintenance Settings or via [CLI](https://www.bookstackapp.com/docs/admin/commands/#cleanup-unused-images) to remove them.** +If an API call to get an image or its metadata fails, the exporter will skip the image and log the error. If using `modify_markdown` option, the image links in the document will be untouched and in its original form. All API calls are retried 3 times after initial failure. + ### Attachments Attachments will be dumped in a separate directory, `attachments` within the page parent (book/chapter) directory it belongs to. The relative path will be `{parent}/attachments/{page}/{attachment_name}`. As shown earlier: @@ -401,6 +403,8 @@ bookstack_export_2023-11-28_06-24-25/programming/react/attachments/nextjs/sample [Reference](https://demo.bookstackapp.com/api/docs#attachments-list) and excerpt from Bookstack API docs: > Get a listing of attachments visible to the user. The external property indicates whether the attachment is simple a link. A false value for the external property would indicate a file upload. +If an API call to get an attachment or its metadata fails, the exporter will skip the attachment and log the error. If using `modify_markdown` option, the attachment links in the document will be untouched and in its original form. All API calls are retried 3 times after initial failure. + ### Modify Markdown Files **To use this feature, `assets.export_images` should be set to `true` and/or `assets.export_attachments`** From 64a8717bac9d807411e529b46948e4f259f98c13 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 6 Nov 2024 04:59:35 -0500 Subject: [PATCH 13/35] create user provided output_path directory if not exists --- README.md | 2 +- bookstack_file_exporter/archiver/archiver.py | 9 +++++++++ bookstack_file_exporter/archiver/util.py | 5 +++++ bookstack_file_exporter/run.py | 3 +++ 4 files changed, 18 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 732e6ff..76b2192 100644 --- a/README.md +++ b/README.md @@ -248,7 +248,7 @@ More descriptions can be found for each section below: | `credentials.token_secret` | `str` | `true` if `credentials`| If `credentials` section is given, this should be a valid tokenSecret | | `additional_headers` | `object` | `false` | Optional section where key/value for pairs can be specified to use in Bookstack http request headers. | `formats` | `list` | `true` | Which export formats to use for Bookstack page content. Valid options are: `["markdown", "html", "pdf", "plaintext"]`| -| `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. If not provided, will use current run directory by default | +| `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. This directory and any parent directories will be attempted to be created if they do not exist. If not provided, will use current run directory by default.| | `assets` | `object` | `false` | Optional section to export additional assets from pages. | | `assets.export_images` | `bool` | `false` | Optional (default: `false`), export all images for a page to an `image` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | | `assets.export_attachments` | `bool` | `false` | Optional (default: `false`), export all attachments for a page to an `attachments` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py index 6af2024..7d64110 100644 --- a/bookstack_file_exporter/archiver/archiver.py +++ b/bookstack_file_exporter/archiver/archiver.py @@ -35,6 +35,15 @@ def __init__(self, config: ConfigNode): self._page_archiver = PageArchiver(self.archive_dir, self.config) self._remote_exports = {'minio': self._archive_minio, 's3': self._archive_s3} + def create_export_dir(self): + """create directory for archiving""" + if not self.config.user_inputs.output_path: + log.info("No output path specified, using current directory for archive") + return + log.info("Creating base directory for archive: %s", + self.config.user_inputs.output_path) + util.create_dir(self.config.user_inputs.output_path) + def get_bookstack_exports(self, page_nodes: Dict[int, Node]): """export all page content""" log.info("Exporting all bookstack page contents") diff --git a/bookstack_file_exporter/archiver/util.py b/bookstack_file_exporter/archiver/util.py index 0980fb6..e6bcb8f 100644 --- a/bookstack_file_exporter/archiver/util.py +++ b/bookstack_file_exporter/archiver/util.py @@ -7,6 +7,7 @@ from io import BytesIO import gzip import glob +from pathlib import Path from bookstack_file_exporter.common import util @@ -48,3 +49,7 @@ def scan_archives(base_dir: str, extension: str) -> str: """scan export directory for archives""" file_pattern = f"{base_dir}_*{extension}" return glob.glob(file_pattern) + +def create_dir(dir_path: str): + """create a directory if not exists""" + Path(dir_path).mkdir(parents=True, exist_ok=True) diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index f63a215..b69a35b 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -41,6 +41,9 @@ def exporter(args: argparse.Namespace): ## start archive ## archive: Archiver = Archiver(config) + # create export directory if not exists + archive.create_export_dir() + # get all page content for each page archive.get_bookstack_exports(page_nodes) From e93eb309ba6e9789333f19a2f45834743d5886a2 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 6 Nov 2024 05:38:26 -0500 Subject: [PATCH 14/35] make create output dir error catchable for docker usage --- bookstack_file_exporter/archiver/archiver.py | 9 ++++++++- bookstack_file_exporter/archiver/page_archiver.py | 2 +- 2 files changed, 9 insertions(+), 2 deletions(-) diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py index 7d64110..706ed56 100644 --- a/bookstack_file_exporter/archiver/archiver.py +++ b/bookstack_file_exporter/archiver/archiver.py @@ -42,7 +42,14 @@ def create_export_dir(self): return log.info("Creating base directory for archive: %s", self.config.user_inputs.output_path) - util.create_dir(self.config.user_inputs.output_path) + # in docker, this may fail if the user id is not the same as the host + try: + util.create_dir(self.config.user_inputs.output_path) + except PermissionError as perm_err: + log.warning("Failed to create base directory: %s", perm_err) + log.warning("This usually occurs in docker environments, \ + attempting to skip this step") + return def get_bookstack_exports(self, page_nodes: Dict[int, Node]): """export all page content""" diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index 84ba201..fba5047 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -142,7 +142,7 @@ def archive_page_assets(self, asset_type: str, parent_path: str, page_name: str, return {} # use a map for faster lookup failed_assets = {} - node_base_path = f"{self.archive_base_path}/{parent_path}/" + node_base_path = f"{self.archive_base_path}/{parent_path}" for asset_node in asset_nodes: try: asset_data = self.asset_archiver.get_asset_bytes(asset_type, asset_node.url) From 12929e3dcdd23152f9cc7e5db42b8029da0dfa8c Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 6 Nov 2024 05:51:01 -0500 Subject: [PATCH 15/35] minor fix on logging and output description --- README.md | 2 +- bookstack_file_exporter/archiver/archiver.py | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 76b2192..a7487cf 100644 --- a/README.md +++ b/README.md @@ -248,7 +248,7 @@ More descriptions can be found for each section below: | `credentials.token_secret` | `str` | `true` if `credentials`| If `credentials` section is given, this should be a valid tokenSecret | | `additional_headers` | `object` | `false` | Optional section where key/value for pairs can be specified to use in Bookstack http request headers. | `formats` | `list` | `true` | Which export formats to use for Bookstack page content. Valid options are: `["markdown", "html", "pdf", "plaintext"]`| -| `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. This directory and any parent directories will be attempted to be created if they do not exist. If not provided, will use current run directory by default.| +| `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. This directory and any parent directories will be attempted to be created if they do not exist. If not provided, will use current run directory by default. If using docker, this option can be omitted. | | `assets` | `object` | `false` | Optional section to export additional assets from pages. | | `assets.export_images` | `bool` | `false` | Optional (default: `false`), export all images for a page to an `image` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | | `assets.export_attachments` | `bool` | `false` | Optional (default: `false`), export all attachments for a page to an `attachments` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py index 706ed56..096beba 100644 --- a/bookstack_file_exporter/archiver/archiver.py +++ b/bookstack_file_exporter/archiver/archiver.py @@ -47,8 +47,8 @@ def create_export_dir(self): util.create_dir(self.config.user_inputs.output_path) except PermissionError as perm_err: log.warning("Failed to create base directory: %s", perm_err) - log.warning("This usually occurs in docker environments, \ - attempting to skip this step") + log.warning("This usually occurs in docker environments" \ + "attempting to skip this step") return def get_bookstack_exports(self, page_nodes: Dict[int, Node]): From 6f0c843523822c8a7115ac5e4aa366d1d455d464 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 6 Nov 2024 06:00:30 -0500 Subject: [PATCH 16/35] fix minor spacing issue on log message --- bookstack_file_exporter/archiver/archiver.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py index 096beba..969ee14 100644 --- a/bookstack_file_exporter/archiver/archiver.py +++ b/bookstack_file_exporter/archiver/archiver.py @@ -47,7 +47,7 @@ def create_export_dir(self): util.create_dir(self.config.user_inputs.output_path) except PermissionError as perm_err: log.warning("Failed to create base directory: %s", perm_err) - log.warning("This usually occurs in docker environments" \ + log.warning("This usually occurs in docker environments " \ "attempting to skip this step") return From 278bcd25b694ebd9ee00d5dc714c22113d10e353 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 14 Feb 2025 02:21:19 -0500 Subject: [PATCH 17/35] bump python version to 3.13.2 and pkg dep to latest --- .devcontainer/devcontainer.json | 2 +- .github/actions/python/action.yml | 2 +- .github/actions/tests/action.yml | 2 +- Dockerfile | 2 +- Makefile | 2 +- README.md | 2 +- setup.cfg | 4 ++-- 7 files changed, 8 insertions(+), 8 deletions(-) diff --git a/.devcontainer/devcontainer.json b/.devcontainer/devcontainer.json index 14b78b0..8036bea 100644 --- a/.devcontainer/devcontainer.json +++ b/.devcontainer/devcontainer.json @@ -7,7 +7,7 @@ "features": { "ghcr.io/devcontainers/features/python:1": { "installTools": true, - "version": "3.13.0" + "version": "3.13.2" } }, "customizations": { diff --git a/.github/actions/python/action.yml b/.github/actions/python/action.yml index e101b14..774aa44 100644 --- a/.github/actions/python/action.yml +++ b/.github/actions/python/action.yml @@ -23,7 +23,7 @@ runs: - name: Set up Python uses: actions/setup-python@v3 with: - python-version: '3.13.0' + python-version: '3.13.2' - name: Install Dependencies shell: bash run: | diff --git a/.github/actions/tests/action.yml b/.github/actions/tests/action.yml index 97805a3..d267fca 100644 --- a/.github/actions/tests/action.yml +++ b/.github/actions/tests/action.yml @@ -8,7 +8,7 @@ runs: - name: Set up Python uses: actions/setup-python@v3 with: - python-version: '3.13.0' + python-version: '3.13.2' - name: Install dependencies shell: bash run: | diff --git a/Dockerfile b/Dockerfile index fd258fa..695426b 100644 --- a/Dockerfile +++ b/Dockerfile @@ -1,5 +1,5 @@ ARG BASE_IMAGE=python -ARG BASE_IMAGE_TAG=3.13.0-slim-bookworm +ARG BASE_IMAGE_TAG=3.13.2-slim-bookworm FROM ${BASE_IMAGE}:${BASE_IMAGE_TAG} diff --git a/Makefile b/Makefile index 3eef2ba..0ea5a7a 100644 --- a/Makefile +++ b/Makefile @@ -1,6 +1,6 @@ ## DOCKER BUILD VARS BASE_IMAGE=python -BASE_IMAGE_TAG=3.13.0-slim-bookworm +BASE_IMAGE_TAG=3.13.2-slim-bookworm IMAGE_NAME=homeylab/bookstack-file-exporter # keep this start sequence unique (IMAGE_TAG=) # github actions will use this to create a tag diff --git a/README.md b/README.md index a7487cf..5ecb969 100644 --- a/README.md +++ b/README.md @@ -87,7 +87,7 @@ assets: The exporter can be installed via pip and run directly. #### Python Version -_Note: This application is tested and developed on Python version `3.13.X`. The min required version is >= `3.8` but is recommended to install (or set up a venv) a `3.13.X` version._ +_Note: This application is tested and developed on Python version `3.13.2`. The min required version is >= `3.8` but is recommended to install (or set up a venv) a `3.13.2` version._ #### Examples ```bash diff --git a/setup.cfg b/setup.cfg index 2cc3992..4c25490 100644 --- a/setup.cfg +++ b/setup.cfg @@ -18,9 +18,9 @@ classifiers = python_requires = >=3.8 install_requires = Pyyaml >= 6.0.2 # https://pypi.org/project/PyYAML/ - Pydantic >= 2.9.2 # https://docs.pydantic.dev/latest/ + Pydantic >= 2.10.6 # https://docs.pydantic.dev/latest/ requests >= 2.32.3 # https://pypi.org/project/requests/ - minio >= 7.2.10 # https://pypi.org/project/minio/ + minio >= 7.2.15 # https://pypi.org/project/minio/ packages = find: [options.entry_points] From d58fa4312f687f42d3961e069fc2bb5eeb5dec64 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 14 Feb 2025 04:27:59 -0500 Subject: [PATCH 18/35] add very basic scheduling --- bookstack_file_exporter/__main__.py | 2 +- .../config_helper/config_helper.py | 2 -- .../config_helper/models.py | 2 +- bookstack_file_exporter/run.py | 17 ++++++++++++++--- examples/docker-compose.yaml | 18 ++++++++++++++++++ 5 files changed, 34 insertions(+), 7 deletions(-) create mode 100644 examples/docker-compose.yaml diff --git a/bookstack_file_exporter/__main__.py b/bookstack_file_exporter/__main__.py index d91a0df..38fd292 100644 --- a/bookstack_file_exporter/__main__.py +++ b/bookstack_file_exporter/__main__.py @@ -9,7 +9,7 @@ def main(): args: argparse.Namespace = run_args.get_args() logging.basicConfig(format='%(asctime)s [%(levelname)s] %(message)s', level=run_args.get_log_level(args.log_level), datefmt='%Y-%m-%d %H:%M:%S') - run.exporter(args) + run.entrypoint(args) if __name__ == '__main__': diff --git a/bookstack_file_exporter/config_helper/config_helper.py b/bookstack_file_exporter/config_helper/config_helper.py index 80b822f..e89a861 100644 --- a/bookstack_file_exporter/config_helper/config_helper.py +++ b/bookstack_file_exporter/config_helper/config_helper.py @@ -57,8 +57,6 @@ def __init__(self, args: argparse.Namespace): self._token_id, self._token_secret = self._generate_credentials() self._headers = self._generate_headers() self._urls = self._generate_urls() - self._minio_access_key = "" - self._minio_secret_key = "" self._object_storage_config = self._generate_remote_config() def _generate_config(self, config_file: str) -> models.UserInput: diff --git a/bookstack_file_exporter/config_helper/models.py b/bookstack_file_exporter/config_helper/models.py index e82e5da..d984fcd 100644 --- a/bookstack_file_exporter/config_helper/models.py +++ b/bookstack_file_exporter/config_helper/models.py @@ -36,7 +36,7 @@ class UserInput(BaseModel): credentials: Optional[BookstackAccess] = None formats: List[Literal["markdown", "html", "pdf", "plaintext"]] output_path: Optional[str] = None - # export_meta: Optional[bool] = None assets: Optional[Assets] = Assets() minio: Optional[ObjectStorageConfig] = None keep_last: Optional[int] = None + run_interval: Optional[int] = None \ No newline at end of file diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index b69a35b..547760f 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -1,6 +1,7 @@ import argparse import sys import logging +import time from typing import Dict from bookstack_file_exporter.config_helper.config_helper import ConfigNode @@ -10,10 +11,20 @@ log = logging.getLogger(__name__) -def exporter(args: argparse.Namespace): - """export bookstack nodes and archive locally and/or remotely""" - ## get configuration from helper +def entrypoint(args: argparse.Namespace): + """entrypoint for export process""" + # get configuration from helper config = ConfigNode(args) + if config.user_inputs.run_interval: + while True: + exporter(config) + log.info(f"Waiting {config.user_inputs.run_interval} seconds for next run") + # sleep process state + time.sleep(config.user_inputs.run_interval) + exporter(config) + +def exporter(config: ConfigNode): + """export bookstack nodes and archive locally and/or remotely""" ## convenience vars bookstack_headers = config.headers diff --git a/examples/docker-compose.yaml b/examples/docker-compose.yaml new file mode 100644 index 0000000..23fd82a --- /dev/null +++ b/examples/docker-compose.yaml @@ -0,0 +1,18 @@ +name: bookstack-file-exporter +services: + bookstack-file-exporter: + image: homeylab/bookstack-file-exporter:latest + # use a uid/gid that has permissions to write to local dump directory + user: 1000:1000 + container_name: bookstack-file-exporter + environment: + - LOG_LEVEL=info + # example volumes shown + # change the left side of the ':' to your preferred files/dir + volumes: + - /opt/bookstack/bkps/config.yml:/export/config/config.yml:ro + - /opt/bookstack/bkps/archives:/export/dump + # can also pass env variables as a file + env_file: + - bkp.env + restart: always \ No newline at end of file From 20cfee191c590a75f163110d301f88b3085a6ea1 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 14 Feb 2025 04:50:22 -0500 Subject: [PATCH 19/35] add run_interval feature documentation and examples --- README.md | 6 ++++++ bookstack_file_exporter/config_helper/models.py | 2 +- examples/config.yml | 5 +++++ examples/minio_config.yml | 7 ++++++- 4 files changed, 18 insertions(+), 2 deletions(-) diff --git a/README.md b/README.md index 5ecb969..710424a 100644 --- a/README.md +++ b/README.md @@ -150,6 +150,11 @@ docker run \ homeylab/bookstack-file-exporter:latest ``` +#### Docker Compose +When using the configuration option: `run_interval`, a docker compose set up could be used to run the exporter as an always running application. The exporter will sleep and wait until `{run_interval}` seconds has elapsed before subsequent runs. + +An example is shown in `examples/docker-compose.yaml` + #### Environment Variables See [Valid Environment Variables](#valid-environment-variables) for more options. @@ -256,6 +261,7 @@ More descriptions can be found for each section below: | `assets.export_meta` | `bool` | `false` | Optional (default: `false`), export of metadata about the page in a json file | | `assets.verify_ssl` | `bool` | `false` | Optional (default: `true`), whether or not to check ssl certificates when requesting content from Bookstack host | | `keep_last` | `int` | `false` | Optional (default: `None`), if exporter can delete older archives. valid values are:
- set to `-1` if you want to delete all archives after each run (useful if you only want to upload to object storage)
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done | +| `run_interval` | `int` | `false` | Optional (default: `0`). If specified, exporter will run in a loop and pause for `{run_interval}` seconds before subsequent runs. Example: `86400` seconds = `24` hours or run once a day. Setting this property to `0` will disable looping | | `minio` | `object` | `false` | Optional [Minio](#minio-backups) configuration options. | #### Valid Environment Variables diff --git a/bookstack_file_exporter/config_helper/models.py b/bookstack_file_exporter/config_helper/models.py index d984fcd..153296e 100644 --- a/bookstack_file_exporter/config_helper/models.py +++ b/bookstack_file_exporter/config_helper/models.py @@ -39,4 +39,4 @@ class UserInput(BaseModel): assets: Optional[Assets] = Assets() minio: Optional[ObjectStorageConfig] = None keep_last: Optional[int] = None - run_interval: Optional[int] = None \ No newline at end of file + run_interval: Optional[int] = 0 \ No newline at end of file diff --git a/examples/config.yml b/examples/config.yml index cca6602..07df0bb 100644 --- a/examples/config.yml +++ b/examples/config.yml @@ -52,3 +52,8 @@ output_path: "bkps/" # set to 1+ if you want to retain a certain number of archives # set to 0 or comment out section if you want no action done keep_last: 5 +## optional - if specified exporter will run in a loop +# it will run and then pause for {run_interval} seconds before running again +# specify in seconds, example: 86400 seconds = 24 hours or run once a day +# omit/commit out or set to 0 if you just want a single run and exit +run_interval: 0 \ No newline at end of file diff --git a/examples/minio_config.yml b/examples/minio_config.yml index 3b36a08..56efe9d 100644 --- a/examples/minio_config.yml +++ b/examples/minio_config.yml @@ -82,4 +82,9 @@ clean_up: true # - this is useful if you only want to upload to object storage # set to 1+ if you want to retain a certain number of archives # set to 0 or comment out section if you want no action done -keep_last: -1 \ No newline at end of file +keep_last: -1 +## optional - if specified exporter will run in a loop +# it will run and then pause for {run_interval} seconds before running again +# specify in seconds, example: 86400 seconds = 24 hours or run once a day +# omit/commit out or set to 0 if you just want a single run and exit +run_interval: 0 \ No newline at end of file From e807e5ad937cab97d284f693af6f250d311ef14b Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 14 Feb 2025 05:19:57 -0500 Subject: [PATCH 20/35] update full example and build commands --- Makefile | 6 +++++- README.md | 1 + setup.cfg | 2 +- 3 files changed, 7 insertions(+), 2 deletions(-) diff --git a/Makefile b/Makefile index 0ea5a7a..7fd717b 100644 --- a/Makefile +++ b/Makefile @@ -15,7 +15,7 @@ pip_build: pip_local_dev: python -m pip install -e . -build: +pip_build: python -m pip install --upgrade build python -m build @@ -30,6 +30,10 @@ upload_testpypi: download_testpypi: python -m pip install --index-url https://test.pypi.org/simple/ --extra-index-url https://pypi.org/simple bookstack-file-exporter +upload_realpypi: + python -m pip install --upgrade twine + python -m twine upload dist/* + docker_build_simple: docker build \ --build-arg BASE_IMAGE=${BASE_IMAGE} \ diff --git a/README.md b/README.md index 710424a..f33b6b8 100644 --- a/README.md +++ b/README.md @@ -240,6 +240,7 @@ assets: export_meta: false verify_ssl: true keep_last: 5 +run_interval: 0 ``` #### Options and Descriptions diff --git a/setup.cfg b/setup.cfg index 4c25490..296aa29 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,7 +1,7 @@ [metadata] name = bookstack-file-exporter # version will be replaced by IMAGE_TAG via Github Actions -version = 0.0.1 +version = 1.4.0 author = pchang388 # author_email = your@email.address url = https://github.com/homeylab/bookstack-file-exporter From 5831ec73b4ddc6254bbf009a3cbfd3f05f7a7a84 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 14 Feb 2025 05:22:54 -0500 Subject: [PATCH 21/35] update full example and build commands --- setup.cfg | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/setup.cfg b/setup.cfg index 296aa29..4c25490 100644 --- a/setup.cfg +++ b/setup.cfg @@ -1,7 +1,7 @@ [metadata] name = bookstack-file-exporter # version will be replaced by IMAGE_TAG via Github Actions -version = 1.4.0 +version = 0.0.1 author = pchang388 # author_email = your@email.address url = https://github.com/homeylab/bookstack-file-exporter From bd5ffff6084e01c24966229826c6484e84e429f8 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 14 Feb 2025 05:33:49 -0500 Subject: [PATCH 22/35] update description on application mode --- README.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index f33b6b8..b8fc1b3 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,7 @@ What it does: - Can be run via [Python](#run-via-pip) or [Docker](#run-via-docker) - Can push archives to remote object storage like [Minio](https://min.io/) - Basic housekeeping option (`keep_last`) to keep a tidy archive destination +- Can run in application mode (always running) using `run_interval` property. Used for basic scheduling of backups. Supported backup targets are: @@ -262,7 +263,7 @@ More descriptions can be found for each section below: | `assets.export_meta` | `bool` | `false` | Optional (default: `false`), export of metadata about the page in a json file | | `assets.verify_ssl` | `bool` | `false` | Optional (default: `true`), whether or not to check ssl certificates when requesting content from Bookstack host | | `keep_last` | `int` | `false` | Optional (default: `None`), if exporter can delete older archives. valid values are:
- set to `-1` if you want to delete all archives after each run (useful if you only want to upload to object storage)
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done | -| `run_interval` | `int` | `false` | Optional (default: `0`). If specified, exporter will run in a loop and pause for `{run_interval}` seconds before subsequent runs. Example: `86400` seconds = `24` hours or run once a day. Setting this property to `0` will disable looping | +| `run_interval` | `int` | `false` | Optional (default: `0`). If specified, exporter will run as an application and pause for `{run_interval}` seconds before subsequent runs. Example: `86400` seconds = `24` hours or run once a day. Setting this property to `0` will invoke a single run and exit. Used for basic scheduling of backups. | | `minio` | `object` | `false` | Optional [Minio](#minio-backups) configuration options. | #### Valid Environment Variables From 5a65036910f2a6154ab3dbfbefbc040131a4e608 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 21 Feb 2025 01:38:47 -0500 Subject: [PATCH 23/35] avoid nested github action due to lack of support from pypi action --- .github/workflows/on_release.yml | 40 +++++++++++++++++++++++++++----- 1 file changed, 34 insertions(+), 6 deletions(-) diff --git a/.github/workflows/on_release.yml b/.github/workflows/on_release.yml index aa9aa72..3cbe760 100644 --- a/.github/workflows/on_release.yml +++ b/.github/workflows/on_release.yml @@ -54,9 +54,37 @@ jobs: timeout-minutes: 20 environment: 'PyPi' steps: - - name: Checkout - uses: actions/checkout@v4 - - name: Deploy release to PyPi - uses: ./.github/actions/python - with: - pypi_api_token: "${{ secrets.PYPI_API_TOKEN }}" \ No newline at end of file + - name: Get tag release without v + shell: bash + run: | + TAG=${{ github.ref_name }} + echo "VERSION=${TAG#v}" >> "$GITHUB_ENV" + echo "Tag without v is: ${VERSION}" + - name: Update Release Tag + shell: bash + run: sed -i "s/^version = [^ ]*/version = ${{ env.VERSION }}/" setup.cfg + - name: Set up Python + uses: actions/setup-python@v3 + with: + python-version: '3.13.2' + - name: Install Dependencies + shell: bash + run: | + python -m pip install --upgrade pip + pip install build + - name: Build Python Package + shell: bash + run: | + python -m pip install --upgrade build + python -m build + - name: Publish to PyPi + uses: pypa/gh-action-pypi-publish@release/v1 + with: + password: ${{ secrets.PYPI_API_TOKEN }} + skip-existing: true + # - name: Checkout + # uses: actions/checkout@v4 + # - name: Deploy release to PyPi + # uses: ./.github/actions/python + # with: + # pypi_api_token: "${{ secrets.PYPI_API_TOKEN }}" \ No newline at end of file From c34da95561b565c12feb17a11990e5ba488d8530 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 21 Feb 2025 03:47:53 -0500 Subject: [PATCH 24/35] fix retry status force list --- bookstack_file_exporter/common/util.py | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/bookstack_file_exporter/common/util.py b/bookstack_file_exporter/common/util.py index 4591e9d..f2ff5f4 100644 --- a/bookstack_file_exporter/common/util.py +++ b/bookstack_file_exporter/common/util.py @@ -16,11 +16,11 @@ def http_get_request(url: str, headers: Dict[str, str], # {backoff factor} * (2 ** ({number of previous retries})) # {raise_on_status} if status falls in status_forcelist range # and retries have been exhausted. - # {status_force_list} 429 is supposed to be included - retries = Retry(total=3, + # {status_force_list} 413, 429, 503 defaults are overwritten with additional ones + retries = Retry(total=5, backoff_factor=0.5, raise_on_status=True, - status_forcelist=[ 500, 502, 503, 504 ]) + status_forcelist=[413, 429, 500, 502, 503, 504]) session.mount(url_prefix, HTTPAdapter(max_retries=retries)) response = session.get(url, headers=headers, verify=verify_ssl, timeout=timeout) except Exception as req_err: From 5e9b408e10e8d48243ac36e9f2de4449606edea0 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 21 Feb 2025 04:23:54 -0500 Subject: [PATCH 25/35] fix missing github checkout step --- .github/workflows/on_release.yml | 2 ++ 1 file changed, 2 insertions(+) diff --git a/.github/workflows/on_release.yml b/.github/workflows/on_release.yml index 3cbe760..59e11de 100644 --- a/.github/workflows/on_release.yml +++ b/.github/workflows/on_release.yml @@ -54,6 +54,8 @@ jobs: timeout-minutes: 20 environment: 'PyPi' steps: + - name: Checkout + uses: actions/checkout@v4 - name: Get tag release without v shell: bash run: | From c191dfab82ac67bb72a338ac8963b7add8788b3c Mon Sep 17 00:00:00 2001 From: pchang388 Date: Sat, 22 Feb 2025 02:10:42 -0500 Subject: [PATCH 26/35] add helm option in docs --- README.md | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/README.md b/README.md index b8fc1b3..f774a02 100644 --- a/README.md +++ b/README.md @@ -7,6 +7,7 @@ Table of Contents - [Using This Application](#using-this-application) - [Run via Pip](#run-via-pip) - [Run via Docker](#run-via-docker) + - [Run via Helm](#run-via-helm) - [Authentication and Permissions](#authentication-and-permissions) - [Configuration](#configuration) - [Backup Behavior](#backup-behavior) @@ -174,6 +175,9 @@ docker run \ homeylab/bookstack-file-exporter:latest ``` +### Run via Helm +A helm chart can be used to run the exporter as a CronJob or Deployment resource. See [here](https://github.com/homeylab/helm-charts/tree/main/charts/bookstack-file-exporter) for more information on using the helm chart. + #### Bind Mounts | purpose | static docker path | description | example | | ------- | ------------------ | ----------- | ------- | From 74baa81105e84d515551199ef4fa6bd6b4cfec96 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Sat, 22 Feb 2025 02:14:05 -0500 Subject: [PATCH 27/35] fix position of doc addition --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index f774a02..6d5354d 100644 --- a/README.md +++ b/README.md @@ -175,15 +175,15 @@ docker run \ homeylab/bookstack-file-exporter:latest ``` -### Run via Helm -A helm chart can be used to run the exporter as a CronJob or Deployment resource. See [here](https://github.com/homeylab/helm-charts/tree/main/charts/bookstack-file-exporter) for more information on using the helm chart. - #### Bind Mounts | purpose | static docker path | description | example | | ------- | ------------------ | ----------- | ------- | | `config` | `/export/config/config.yml` | A valid configuration file |`-v /local/yourpath/config.yml:/export/config/config.yml:ro`| | `dump` | `/export/dump` | Directory to place exports. **This is optional when using remote storage option(s)**. Omit if you don't need a local copy. | `-v /local/yourpath/bkps:/export/dump` | +### Run via Helm +A helm chart can be used to run the exporter as a CronJob or Deployment resource. See [here](https://github.com/homeylab/helm-charts/tree/main/charts/bookstack-file-exporter) for more information on using the helm chart. + ### Authentication and Permissions #### Permissions **Note visibility of pages is based on user**, so use a user that has read access to pages and content you want to back up. *The role assigned to the user* should have the additional permissions for target pages and their content: From 843daef90e0c0aee0ae51af20a83b15f12b5566b Mon Sep 17 00:00:00 2001 From: pchang388 Date: Wed, 5 Mar 2025 03:10:54 -0500 Subject: [PATCH 28/35] add new http config input functionality --- bookstack_file_exporter/archiver/archiver.py | 6 +- .../archiver/asset_archiver.py | 28 +++--- .../archiver/page_archiver.py | 15 ++-- bookstack_file_exporter/archiver/util.py | 6 +- bookstack_file_exporter/common/util.py | 85 +++++++++++-------- .../config_helper/config_helper.py | 4 +- .../config_helper/models.py | 15 +++- bookstack_file_exporter/exporter/exporter.py | 10 +-- bookstack_file_exporter/run.py | 15 ++-- 9 files changed, 100 insertions(+), 84 deletions(-) diff --git a/bookstack_file_exporter/archiver/archiver.py b/bookstack_file_exporter/archiver/archiver.py index 969ee14..098b1a0 100644 --- a/bookstack_file_exporter/archiver/archiver.py +++ b/bookstack_file_exporter/archiver/archiver.py @@ -9,6 +9,7 @@ from bookstack_file_exporter.archiver.minio_archiver import MinioArchiver from bookstack_file_exporter.config_helper.remote import StorageProviderConfig from bookstack_file_exporter.config_helper.config_helper import ConfigNode +from bookstack_file_exporter.common.util import HttpHelper log = logging.getLogger(__name__) @@ -22,17 +23,18 @@ class Archiver: Args: :config: = Configuration with user inputs and general options. + :http_client: = http helper functions with config from user inputs Returns: Archiver instance with attributes that are accessible for use for handling bookstack exports and remote uploads. """ - def __init__(self, config: ConfigNode): + def __init__(self, config: ConfigNode, http_client: HttpHelper): self.config = config # for convenience self.base_dir = config.base_dir_name self.archive_dir = self._generate_root_folder(self.base_dir) - self._page_archiver = PageArchiver(self.archive_dir, self.config) + self._page_archiver = PageArchiver(self.archive_dir, self.config, http_client) self._remote_exports = {'minio': self._archive_minio, 's3': self._archive_s3} def create_export_dir(self): diff --git a/bookstack_file_exporter/archiver/asset_archiver.py b/bookstack_file_exporter/archiver/asset_archiver.py index dbea2bf..1dcadf2 100644 --- a/bookstack_file_exporter/archiver/asset_archiver.py +++ b/bookstack_file_exporter/archiver/asset_archiver.py @@ -5,7 +5,7 @@ # pylint: disable=import-error from requests import Response -from bookstack_file_exporter.common import util as common_util +from bookstack_file_exporter.common.util import HttpHelper log = logging.getLogger(__name__) @@ -112,28 +112,24 @@ class AssetArchiver: Args: :urls: = api urls for images and attachments - :headers: = http headers for api requests :verify_ssl: = verify ssl for api requests + :http_client: = http helper functions with config from user inputs Returns: AssetArchiver instance for use in archiving images and attachments for a page """ - def __init__(self, urls: Dict[str, str], headers: Dict[str, str], - verify_ssl: bool): + def __init__(self, urls: Dict[str, str], http_client: HttpHelper): self.api_urls = urls - self.verify_ssl = verify_ssl - self._headers = headers self._asset_map = { 'images': self._create_image_map, 'attachments': self._create_attachment_map } + self.http_client = http_client def get_asset_nodes(self, asset_type: str) -> Dict[str, ImageNode | AttachmentNode]: """Get image or attachment helpers for a page""" - asset_response: Response = common_util.http_get_request( - self.api_urls[asset_type], - self._headers, - self.verify_ssl) + asset_response: Response = self.http_client.http_get_request( + self.api_urls[asset_type]) asset_json = asset_response.json()['data'] return self._asset_map[asset_type](asset_json) @@ -141,18 +137,14 @@ def get_asset_data(self, asset_type: str, meta_data: Union[AttachmentNode, ImageNode]) -> Dict[str, str | bool | int | dict]: """Get asset data based on type""" data_url = f"{self.api_urls[asset_type]}/{meta_data.id_}" - asset_data_response: Response = common_util.http_get_request( - data_url, - self._headers, - self.verify_ssl) + asset_data_response: Response = self.http_client.http_get_request( + data_url) return asset_data_response.json() def get_asset_bytes(self, asset_type: str, url: str) -> bytes: """Get raw asset data""" - asset_response: Response = common_util.http_get_request( - url, - self._headers, - self.verify_ssl) + asset_response: Response = self.http_client.http_get_request( + url) match asset_type: case "images": asset_data = asset_response.content diff --git a/bookstack_file_exporter/archiver/page_archiver.py b/bookstack_file_exporter/archiver/page_archiver.py index fba5047..0b6b745 100644 --- a/bookstack_file_exporter/archiver/page_archiver.py +++ b/bookstack_file_exporter/archiver/page_archiver.py @@ -6,6 +6,7 @@ from bookstack_file_exporter.archiver import util as archiver_util from bookstack_file_exporter.archiver.asset_archiver import AssetArchiver, ImageNode, AttachmentNode from bookstack_file_exporter.config_helper.config_helper import ConfigNode +from bookstack_file_exporter.common.util import HttpHelper log = logging.getLogger(__name__) @@ -33,17 +34,16 @@ class PageArchiver: Args: :archive_dir: = directory where data will be put into. - :config: = Configuration with user inputs and general options. + :http_client: = http helper functions with config from user inputs Returns: :PageArchiver: instance with methods to help collect page content from a Bookstack instance. """ - def __init__(self, archive_dir: str, config: ConfigNode) -> None: + def __init__(self, archive_dir: str, config: ConfigNode, http_client: HttpHelper) -> None: self.asset_config = config.user_inputs.assets self.export_formats = config.user_inputs.formats self.api_urls = config.urls - self._headers = config.headers # full path, bookstack-, and .tgz extension self.archive_file = f"{archive_dir}{_FILE_EXTENSION_MAP['tgz']}" # name of intermediate tar file before gzip @@ -51,8 +51,9 @@ def __init__(self, archive_dir: str, config: ConfigNode) -> None: # name of the base folder to use within the tgz archive (internal tar layout) self.archive_base_path = archive_dir.split("/")[-1] self.modify_md: bool = self._check_md_modify() - self.asset_archiver = AssetArchiver(self.api_urls, self._headers, - self.verify_ssl) + self.asset_archiver = AssetArchiver(self.api_urls, + http_client) + self.http_client = http_client def _check_md_modify(self) -> bool: # check to ensure they have asset_config defined, could be None @@ -107,8 +108,8 @@ def _archive_page(self, page: Node, export_format: str, data: bytes): def _get_page_data(self, page_id: int, export_format: str) -> bytes: url = f"{self.api_urls['pages']}/{page_id}/{_EXPORT_API_PATH}/{export_format}" - return archiver_util.get_byte_response(url=url, headers=self._headers, - verify_ssl=self.verify_ssl) + return archiver_util.get_byte_response(url=url, + http_client=self.http_client) def _archive_page_meta(self, page_path: str, meta_data: Dict[str, Union[str, int]]): meta_file_name = f"{self.archive_base_path}/{page_path}{_FILE_EXTENSION_MAP['meta']}" diff --git a/bookstack_file_exporter/archiver/util.py b/bookstack_file_exporter/archiver/util.py index e6bcb8f..21cf8af 100644 --- a/bookstack_file_exporter/archiver/util.py +++ b/bookstack_file_exporter/archiver/util.py @@ -9,13 +9,13 @@ import glob from pathlib import Path -from bookstack_file_exporter.common import util +from bookstack_file_exporter.common.util import HttpHelper log = logging.getLogger(__name__) -def get_byte_response(url: str, headers: Dict[str, str], verify_ssl: bool) -> bytes: +def get_byte_response(url: str, http_client: HttpHelper) -> bytes: """get byte response from http request""" - response = util.http_get_request(url=url, headers=headers, verify_ssl=verify_ssl) + response = http_client.http_get_request(url=url) return response.content # append to a tar file instead of creating files locally and then tar'ing after diff --git a/bookstack_file_exporter/common/util.py b/bookstack_file_exporter/common/util.py index f2ff5f4..c19b501 100644 --- a/bookstack_file_exporter/common/util.py +++ b/bookstack_file_exporter/common/util.py @@ -1,44 +1,59 @@ import logging -from typing import Dict +from typing import Dict, List # pylint: disable=import-error import requests # pylint: disable=import-error from requests.adapters import HTTPAdapter, Retry +from bookstack_file_exporter.config_helper.models import HttpConfig + log = logging.getLogger(__name__) -def http_get_request(url: str, headers: Dict[str, str], - verify_ssl: bool, timeout: int = 30) -> requests.Response: - """make http requests and return response object""" - url_prefix = should_verify(url) - try: - with requests.Session() as session: - # {backoff factor} * (2 ** ({number of previous retries})) - # {raise_on_status} if status falls in status_forcelist range - # and retries have been exhausted. - # {status_force_list} 413, 429, 503 defaults are overwritten with additional ones - retries = Retry(total=5, - backoff_factor=0.5, - raise_on_status=True, - status_forcelist=[413, 429, 500, 502, 503, 504]) - session.mount(url_prefix, HTTPAdapter(max_retries=retries)) - response = session.get(url, headers=headers, verify=verify_ssl, timeout=timeout) - except Exception as req_err: - log.error("Failed to make request for %s", url) - raise req_err - try: - #raise_for_status() throws an exception on codes 400-599 - response.raise_for_status() - except requests.exceptions.HTTPError as e: - # this means it either exceeded 50X retries in `http_get_request` handler - # or it returned a 40X which is not expected - log.error("Bookstack request failed with status code: %d on url: %s", - response.status_code, url) - raise e - return response +class HttpHelper: + def __init__(self, headers: Dict[str, str], + config: HttpConfig): + self.backoff_factor = config.backoff_factor + self.retry_codes = config.retry_codes + self.retry_count = config.retry_count + self.http_timeout = config.timeout + self.verify_ssl = config.verify_ssl + self._headers = headers + + + # more details on options: https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html + def http_get_request(self, url: str) -> requests.Response: + """make http requests and return response object""" + url_prefix = self.should_verify(url) + try: + with requests.Session() as session: + # {backoff factor} * (2 ** ({number of previous retries})) + # {raise_on_status} if status falls in status_forcelist range + # and retries have been exhausted. + # {status_force_list} 413, 429, 503 defaults are overwritten with additional ones + retries = Retry(total=self.retry_count, + backoff_factor=self.backoff_factor, + raise_on_status=True, + status_forcelist=self.retry_codes) + session.mount(url_prefix, HTTPAdapter(max_retries=retries)) + response = session.get(url, headers=self._headers, verify=self.verify_ssl, + timeout=self.http_timeout) + except Exception as req_err: + log.error("Failed to make request for %s", url) + raise req_err + try: + #raise_for_status() throws an exception on codes 400-599 + response.raise_for_status() + except requests.exceptions.HTTPError as e: + # this means it either exceeded 50X retries in `http_get_request` handler + # or it returned a 40X which is not expected + log.error("Bookstack request failed with status code: %d on url: %s", + response.status_code, url) + raise e + return response -def should_verify(url: str) -> str: - """check if http or https""" - if url.startswith("https"): - return "https://" - return "http://" + @staticmethod + def should_verify(url: str) -> str: + """check if http or https""" + if url.startswith("https"): + return "https://" + return "http://" diff --git a/bookstack_file_exporter/config_helper/config_helper.py b/bookstack_file_exporter/config_helper/config_helper.py index e89a861..3fb512d 100644 --- a/bookstack_file_exporter/config_helper/config_helper.py +++ b/bookstack_file_exporter/config_helper/config_helper.py @@ -105,8 +105,8 @@ def _generate_remote_config(self) -> Dict[str, StorageProviderConfig]: def _generate_headers(self) -> Dict[str, str]: headers = {} # add additional_headers provided by user - if self.user_inputs.additional_headers: - for key, value in self.user_inputs.additional_headers.items(): + if self.user_inputs.http_config.additional_headers: + for key, value in self.user_inputs.http_config.additional_headers.items(): headers[key] = value # add default headers diff --git a/bookstack_file_exporter/config_helper/models.py b/bookstack_file_exporter/config_helper/models.py index 153296e..281015b 100644 --- a/bookstack_file_exporter/config_helper/models.py +++ b/bookstack_file_exporter/config_helper/models.py @@ -26,17 +26,26 @@ class Assets(BaseModel): export_attachments: Optional[bool] = False modify_markdown: Optional[bool] = False export_meta: Optional[bool] = False - verify_ssl: Optional[bool] = True + # verify_ssl: Optional[bool] = True + +class HttpConfig(BaseModel): + """YAML schema for user provided http settings""" + verify_ssl: Optional[bool] = False + timeout: Optional[int] = 30 + backoff_factor: Optional[float] = 2.5 + retry_codes: Optional[List[int]] = [413, 429, 500, 502, 503, 504] + retry_count: Optional[int] = 5 + additional_headers: Optional[Dict[str, str]] = {} # pylint: disable=too-few-public-methods class UserInput(BaseModel): """YAML schema for user provided configuration file""" host: str - additional_headers: Optional[Dict[str, str]] = None credentials: Optional[BookstackAccess] = None formats: List[Literal["markdown", "html", "pdf", "plaintext"]] output_path: Optional[str] = None assets: Optional[Assets] = Assets() minio: Optional[ObjectStorageConfig] = None keep_last: Optional[int] = None - run_interval: Optional[int] = 0 \ No newline at end of file + run_interval: Optional[int] = 0 + http_config: Optional[HttpConfig] = HttpConfig() \ No newline at end of file diff --git a/bookstack_file_exporter/exporter/exporter.py b/bookstack_file_exporter/exporter/exporter.py index 52702f2..fd1b3ed 100644 --- a/bookstack_file_exporter/exporter/exporter.py +++ b/bookstack_file_exporter/exporter/exporter.py @@ -5,7 +5,7 @@ from requests import Response from bookstack_file_exporter.exporter.node import Node -from bookstack_file_exporter.common import util +from bookstack_file_exporter.common.util import HttpHelper log = logging.getLogger(__name__) @@ -19,10 +19,9 @@ class NodeExporter(): Returns: NodeExporter instance to handle building shelve/book/chapter/page relations. """ - def __init__(self, api_urls: Dict[str, str], headers: Dict[str,str], verify_ssl: bool): + def __init__(self, api_urls: Dict[str, str], http_client: HttpHelper): self.api_urls = api_urls - self.headers = headers - self.verify_ssl = verify_ssl + self.http_client = http_client def get_all_shelves(self) -> Dict[int, Node]: """ @@ -38,8 +37,7 @@ def get_all_shelves(self) -> Dict[int, Node]: def _get_json_response(self, url: str) -> List[Dict[str, Union[str,int]]]: """get http response data in json format""" - response: Response = util.http_get_request(url=url, headers=self.headers, - verify_ssl=self.verify_ssl) + response: Response = self.http_client.http_get_request(url=url) return response.json() def _get_all_ids(self, url: str) -> List[int]: diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index 547760f..6ee971d 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -8,6 +8,7 @@ from bookstack_file_exporter.exporter.node import Node from bookstack_file_exporter.exporter.exporter import NodeExporter from bookstack_file_exporter.archiver.archiver import Archiver +from bookstack_file_exporter.common.util import HttpHelper log = logging.getLogger(__name__) @@ -26,23 +27,21 @@ def entrypoint(args: argparse.Namespace): def exporter(config: ConfigNode): """export bookstack nodes and archive locally and/or remotely""" - ## convenience vars - bookstack_headers = config.headers - api_urls = config.urls - unassigned_dir = config.unassigned_book_dir - verify_ssl = config.user_inputs.assets.verify_ssl - #### Export Data ##### # need to implement pagination for apis log.info("Beginning run") + ## Helper functions with user provided (or defaults) http config + http_client = HttpHelper(config.headers, config.user_inputs.http_config) + ## Use exporter class to get all the resources (pages, books, etc.) and their relationships log.info("Building shelve/book/chapter/page relationships") - export_helper = NodeExporter(api_urls, bookstack_headers, verify_ssl) + export_helper = NodeExporter(config.urls, http_client) ## shelves shelve_nodes: Dict[int, Node] = export_helper.get_all_shelves() ## books - book_nodes: Dict[int, Node] = export_helper.get_all_books(shelve_nodes, unassigned_dir) + book_nodes: Dict[int, Node] = export_helper.get_all_books(shelve_nodes, + config.unassigned_book_dir) ## pages page_nodes: Dict[int, Node] = export_helper.get_all_pages(book_nodes) if not page_nodes: From fb080db4686c69fe7cf770cf45808a8540845c97 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 7 Mar 2025 06:50:22 +0000 Subject: [PATCH 29/35] add examples for new http config section --- bookstack_file_exporter/common/util.py | 2 +- examples/config.yml | 15 +++++++++++++-- examples/minio_config.yml | 21 +++++++++++++-------- 3 files changed, 27 insertions(+), 11 deletions(-) diff --git a/bookstack_file_exporter/common/util.py b/bookstack_file_exporter/common/util.py index c19b501..ffb4751 100644 --- a/bookstack_file_exporter/common/util.py +++ b/bookstack_file_exporter/common/util.py @@ -1,5 +1,5 @@ import logging -from typing import Dict, List +from typing import Dict # pylint: disable=import-error import requests # pylint: disable=import-error diff --git a/examples/config.yml b/examples/config.yml index 07df0bb..7051ac7 100644 --- a/examples/config.yml +++ b/examples/config.yml @@ -40,8 +40,19 @@ assets: # like: last update, owner, revision count, etc. # omit this or set to false if not needed export_meta: false - # optional whether or not to check ssl certificates when requesting content from Bookstack host - verify_ssl: true +# optional - can override default http_config +# if not required, you can omit/comment out section +# https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html +http_config: + verify_ssl: false + timeout: 30 + backoff_factor: 2.5 + retry_codes: [413, 429, 500, 502, 503, 504] + retry_count: 5 + additional_headers: + test: "test" + test2: "test2" + User-Agent: "test-agent" # directory to export to # relative or full path output_path: "bkps/" diff --git a/examples/minio_config.yml b/examples/minio_config.yml index 56efe9d..7a7961c 100644 --- a/examples/minio_config.yml +++ b/examples/minio_config.yml @@ -11,12 +11,6 @@ credentials: # set here or as env variable, BOOKSTACK_TOKEN_SECRET # env var takes precedence over below token_secret: "" -# additional headers to add, examples below -# if not required, you can omit/comment out -additional_headers: - test: "test" - test2: "test2" - User-Agent: "test-agent" # supported formats from bookstack below # specify one or more formats: @@ -69,8 +63,19 @@ assets: # like: last update, owner, revision count, etc. # omit this or set to false if not needed export_meta: false - # optional whether or not to check ssl certificates when requesting content from Bookstack host - verify_ssl: true +# optional - can override default http_config +# if not required, you can omit/comment out section +# https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html +http_config: + verify_ssl: false + timeout: 30 + backoff_factor: 2.5 + retry_codes: [413, 429, 500, 502, 503, 504] + retry_count: 5 + additional_headers: + test: "test" + test2: "test2" + User-Agent: "test-agent" # After uploading to object storage targets, choose to clean up local files # delete the archive from local filesystem # optional From 59035e0da8530568785d8d0aa1def5079c1a3f92 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 7 Mar 2025 06:53:24 +0000 Subject: [PATCH 30/35] implement #45 - disable verbose warning logs for verify_ssl=false --- bookstack_file_exporter/common/util.py | 4 ++++ 1 file changed, 4 insertions(+) diff --git a/bookstack_file_exporter/common/util.py b/bookstack_file_exporter/common/util.py index ffb4751..71e55b4 100644 --- a/bookstack_file_exporter/common/util.py +++ b/bookstack_file_exporter/common/util.py @@ -1,5 +1,6 @@ import logging from typing import Dict +import urllib3 # pylint: disable=import-error import requests # pylint: disable=import-error @@ -9,6 +10,9 @@ log = logging.getLogger(__name__) +# disable TLS warnings if using verify_ssl=false +urllib3.disable_warnings() + class HttpHelper: def __init__(self, headers: Dict[str, str], config: HttpConfig): From 18978aa20f92ea254172e5c606266ec10d7d3f63 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 7 Mar 2025 07:33:01 +0000 Subject: [PATCH 31/35] fix http_client param --- bookstack_file_exporter/archiver/asset_archiver.py | 1 - bookstack_file_exporter/run.py | 2 +- 2 files changed, 1 insertion(+), 2 deletions(-) diff --git a/bookstack_file_exporter/archiver/asset_archiver.py b/bookstack_file_exporter/archiver/asset_archiver.py index 1dcadf2..c119e7f 100644 --- a/bookstack_file_exporter/archiver/asset_archiver.py +++ b/bookstack_file_exporter/archiver/asset_archiver.py @@ -112,7 +112,6 @@ class AssetArchiver: Args: :urls: = api urls for images and attachments - :verify_ssl: = verify ssl for api requests :http_client: = http helper functions with config from user inputs Returns: diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index 6ee971d..a33b8a3 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -49,7 +49,7 @@ def exporter(config: ConfigNode): sys.exit(0) log.info("Beginning archive") ## start archive ## - archive: Archiver = Archiver(config) + archive: Archiver = Archiver(config, http_client) # create export directory if not exists archive.create_export_dir() From 6bd2f92ee3b08633e4b5f5d99ee9ab35d3a89ef5 Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 7 Mar 2025 02:38:35 -0500 Subject: [PATCH 32/35] add example for backoff_factor behavior --- examples/config.yml | 2 ++ examples/minio_config.yml | 2 ++ 2 files changed, 4 insertions(+) diff --git a/examples/config.yml b/examples/config.yml index 7051ac7..115d965 100644 --- a/examples/config.yml +++ b/examples/config.yml @@ -43,6 +43,8 @@ assets: # optional - can override default http_config # if not required, you can omit/comment out section # https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html +# default backoff_factor 2.5 means we wait 5, 10, 20, and then 40 seconds before our last retry +# - this should allow for per minute rate limits to be refreshed http_config: verify_ssl: false timeout: 30 diff --git a/examples/minio_config.yml b/examples/minio_config.yml index 7a7961c..acfc775 100644 --- a/examples/minio_config.yml +++ b/examples/minio_config.yml @@ -66,6 +66,8 @@ assets: # optional - can override default http_config # if not required, you can omit/comment out section # https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html +# default backoff_factor 2.5 means we wait 5, 10, 20, and then 40 seconds before our last retry +# - this should allow for per minute rate limits to be refreshed http_config: verify_ssl: false timeout: 30 From c6757bc46b7669ccfda96284d5ec458ff20ae08c Mon Sep 17 00:00:00 2001 From: pchang388 Date: Fri, 7 Mar 2025 02:46:17 -0500 Subject: [PATCH 33/35] lint changes --- bookstack_file_exporter/common/util.py | 11 ++++++++++- bookstack_file_exporter/config_helper/models.py | 2 +- bookstack_file_exporter/run.py | 2 +- 3 files changed, 12 insertions(+), 3 deletions(-) diff --git a/bookstack_file_exporter/common/util.py b/bookstack_file_exporter/common/util.py index 71e55b4..02cd807 100644 --- a/bookstack_file_exporter/common/util.py +++ b/bookstack_file_exporter/common/util.py @@ -14,6 +14,16 @@ urllib3.disable_warnings() class HttpHelper: + """ + HttpHelper provides an http request helper with config stored and retries built in + + Args: + :headers: = all headers to use for http requests + :config: = Configuration with user inputs for http requests + + Returns: + :HttpHelper: instance with methods to help with http requests. + """ def __init__(self, headers: Dict[str, str], config: HttpConfig): self.backoff_factor = config.backoff_factor @@ -23,7 +33,6 @@ def __init__(self, headers: Dict[str, str], self.verify_ssl = config.verify_ssl self._headers = headers - # more details on options: https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html def http_get_request(self, url: str) -> requests.Response: """make http requests and return response object""" diff --git a/bookstack_file_exporter/config_helper/models.py b/bookstack_file_exporter/config_helper/models.py index 281015b..0f24b99 100644 --- a/bookstack_file_exporter/config_helper/models.py +++ b/bookstack_file_exporter/config_helper/models.py @@ -48,4 +48,4 @@ class UserInput(BaseModel): minio: Optional[ObjectStorageConfig] = None keep_last: Optional[int] = None run_interval: Optional[int] = 0 - http_config: Optional[HttpConfig] = HttpConfig() \ No newline at end of file + http_config: Optional[HttpConfig] = HttpConfig() diff --git a/bookstack_file_exporter/run.py b/bookstack_file_exporter/run.py index a33b8a3..e7b408e 100644 --- a/bookstack_file_exporter/run.py +++ b/bookstack_file_exporter/run.py @@ -19,7 +19,7 @@ def entrypoint(args: argparse.Namespace): if config.user_inputs.run_interval: while True: exporter(config) - log.info(f"Waiting {config.user_inputs.run_interval} seconds for next run") + log.info("Waiting %s seconds for next run", config.user_inputs.run_interval) # sleep process state time.sleep(config.user_inputs.run_interval) exporter(config) From a196272b5e5aa75d855efbbac671975ce679db4d Mon Sep 17 00:00:00 2001 From: pchang388 Date: Sat, 8 Mar 2025 02:31:16 -0500 Subject: [PATCH 34/35] update configuration defaults and checks --- .../config_helper/config_helper.py | 12 ++++++---- .../config_helper/models.py | 18 +++++++-------- .../config_helper/remote.py | 23 +++++++++++++++++++ examples/config.yml | 8 +------ 4 files changed, 40 insertions(+), 21 deletions(-) diff --git a/bookstack_file_exporter/config_helper/config_helper.py b/bookstack_file_exporter/config_helper/config_helper.py index 3fb512d..a1c3fe0 100644 --- a/bookstack_file_exporter/config_helper/config_helper.py +++ b/bookstack_file_exporter/config_helper/config_helper.py @@ -79,11 +79,8 @@ def _generate_config(self, config_file: str) -> models.UserInput: def _generate_credentials(self) -> Tuple[str, str]: # if user provided credentials in config file, load them - token_id = "" - token_secret = "" - if self.user_inputs.credentials: - token_id = self.user_inputs.credentials.token_id - token_secret = self.user_inputs.credentials.token_secret + token_id = self.user_inputs.credentials.token_id + token_secret = self.user_inputs.credentials.token_secret # check to see if env var is specified, if so, it takes precedence token_id = self._check_var(_BOOKSTACK_TOKEN_FIELD, token_id) @@ -98,8 +95,13 @@ def _generate_remote_config(self) -> Dict[str, StorageProviderConfig]: self.user_inputs.minio.access_key) minio_secret_key = self._check_var(_MINIO_SECRET_KEY_FIELD, self.user_inputs.minio.secret_key) + object_config["minio"] = StorageProviderConfig(minio_access_key, minio_secret_key, self.user_inputs.minio) + for platform, config in object_config.items(): + if not config.is_valid(platform): + error_str = "provided " + platform + " configuration is invalid" + raise ValueError(error_str) return object_config def _generate_headers(self) -> Dict[str, str]: diff --git a/bookstack_file_exporter/config_helper/models.py b/bookstack_file_exporter/config_helper/models.py index 0f24b99..2df2ca1 100644 --- a/bookstack_file_exporter/config_helper/models.py +++ b/bookstack_file_exporter/config_helper/models.py @@ -6,18 +6,18 @@ class ObjectStorageConfig(BaseModel): """YAML schema for minio configuration""" host: str - access_key: Optional[str] = None - secret_key: Optional[str] = None + access_key: Optional[str] = "" + secret_key: Optional[str] = "" bucket: str - path: Optional[str] = None + path: Optional[str] = "" region: str - keep_last: Optional[int] = None + keep_last: Optional[int] = 0 # pylint: disable=too-few-public-methods class BookstackAccess(BaseModel): """YAML schema for bookstack access credentials""" - token_id: str - token_secret: str + token_id: Optional[str] = "" + token_secret: Optional[str] = "" # pylint: disable=too-few-public-methods class Assets(BaseModel): @@ -41,11 +41,11 @@ class HttpConfig(BaseModel): class UserInput(BaseModel): """YAML schema for user provided configuration file""" host: str - credentials: Optional[BookstackAccess] = None + credentials: Optional[BookstackAccess] = BookstackAccess() formats: List[Literal["markdown", "html", "pdf", "plaintext"]] - output_path: Optional[str] = None + output_path: Optional[str] = "" assets: Optional[Assets] = Assets() minio: Optional[ObjectStorageConfig] = None - keep_last: Optional[int] = None + keep_last: Optional[int] = 0 run_interval: Optional[int] = 0 http_config: Optional[HttpConfig] = HttpConfig() diff --git a/bookstack_file_exporter/config_helper/remote.py b/bookstack_file_exporter/config_helper/remote.py index d97dd04..af3e425 100644 --- a/bookstack_file_exporter/config_helper/remote.py +++ b/bookstack_file_exporter/config_helper/remote.py @@ -1,5 +1,9 @@ +import logging + from bookstack_file_exporter.config_helper.models import ObjectStorageConfig +log = logging.getLogger(__name__) + ## convenience class ## able to work for minio, s3, etc. class StorageProviderConfig: @@ -21,6 +25,7 @@ def __init__(self, access_key: str, secret_key: str, config: ObjectStorageConfig self.config = config self._access_key = access_key self._secret_key = secret_key + self._valid_checker = {'minio': self._is_minio_valid()} @property def access_key(self) -> str: @@ -31,3 +36,21 @@ def access_key(self) -> str: def secret_key(self) -> str: """return secret key for use""" return self._secret_key + + def is_valid(self, storage_type: str) -> bool: + """check if object storage config is valid""" + return self._valid_checker[storage_type] + + def _is_minio_valid(self) -> bool: + """check if minio config is valid""" + # required values - keys already checked so skip + checks = { + "bucket": self.config.bucket, + "host": self.config.host + } + + for prop, check in checks.items(): + if not check: + log.error("%s is missing from minio configuration and is required", prop) + return False + return True diff --git a/examples/config.yml b/examples/config.yml index 115d965..71a7c8e 100644 --- a/examples/config.yml +++ b/examples/config.yml @@ -3,7 +3,7 @@ # if you put http here, it will try verify=false, not to check certs host: "https://bookstack.mydomain.org" # You could optionally set the bookstack token_id and token_secret here instead of env -# If using env vars instead you can omit/comment out this section +# If using env vars instead you can ignore this section credentials: # set here or as env variable, BOOKSTACK_TOKEN_ID # env var takes precedence over below @@ -11,12 +11,6 @@ credentials: # set here or as env variable, BOOKSTACK_TOKEN_SECRET # env var takes precedence over below token_secret: "" -# optional - additional headers to add, examples below -# if not required, you can omit/comment out section -additional_headers: - test: "test" - test2: "test2" - User-Agent: "test-agent" # supported formats from bookstack below # specify one or more formats: From 045b74e9dd2eb1e4650663b19038562f958c754e Mon Sep 17 00:00:00 2001 From: pchang388 Date: Sat, 8 Mar 2025 03:13:14 -0500 Subject: [PATCH 35/35] update configuration options and descriptions for http_config changes --- README.md | 65 ++++++++++++------- .../config_helper/models.py | 3 +- .../config_helper/remote.py | 4 +- examples/config.yml | 21 ++++-- examples/minio_config.yml | 28 ++++---- 5 files changed, 71 insertions(+), 50 deletions(-) diff --git a/README.md b/README.md index 6d5354d..5fa5732 100644 --- a/README.md +++ b/README.md @@ -17,6 +17,7 @@ Table of Contents - [Modify Markdown Files](#modify-markdown-files) - [Object Storage](#object-storage) - [Minio Backups](#minio-backups) + - [Potential Breaking Upgrades](#potential-breaking-upgrades) - [Future Items](#future-items) ## Background @@ -69,8 +70,8 @@ Simple example configuration: # config.yml host: "https://bookstack.yourdomain.com" credentials: - token_id: "" - token_secret: "" + token_id: "" + token_secret: "" formats: # md only example - markdown # - html @@ -78,11 +79,10 @@ formats: # md only example # - plaintext output_path: "bkps/" assets: - export_images: false - export_attachments: false - modify_markdown: false - export_meta: false - verify_ssl: true + export_images: false + export_attachments: false + modify_markdown: false + export_meta: false ``` ### Run via Pip @@ -205,7 +205,9 @@ Env variables for credentials will take precedence over configuration file optio **For object storage authentication**, find the relevant sections further down in their respective sections. ### Configuration -_Ensure [Authentication](#authentication-and-permissions) has been set up beforehand for required credentials._ For a simple example to run quickly, refer to the one in the [Using This Application](#using-this-application) section. A full example is also shown below with descriptions. Optionally, look at `examples/` folder of the github repo for more examples. +_Ensure [Authentication](#authentication-and-permissions) has been set up beforehand for required credentials._ For a simple example to run quickly, refer to the one in the [Using This Application](#using-this-application) section. + +A full example is also shown below. Optionally, look at `examples/` folder of the github repo for more examples with long descriptions. For object storage configuration, find more information in their respective sections - [Minio](#minio-backups) @@ -218,17 +220,21 @@ Below is an example configuration that shows example values for all possible opt ```yaml host: "https://bookstack.yourdomain.com" credentials: - token_id: "" - token_secret: "" -additional_headers: - test: "test" - test2: "test2" - User-Agent: "test-agent" + token_id: "" + token_secret: "" formats: - markdown - html - pdf - plaintext +http_config: + verify_ssl: false + timeout: 30 + backoff_factor: 2.5 + retry_codes: [413, 429, 500, 502, 503, 504] + retry_count: 5 + additional_headers: + User-Agent: "test-agent" minio: host: "minio.yourdomain.com" access_key: "" @@ -243,7 +249,6 @@ assets: export_attachments: true modify_markdown: false export_meta: false - verify_ssl: true keep_last: 5 run_interval: 0 ``` @@ -253,20 +258,25 @@ More descriptions can be found for each section below: | Configuration Item | Type | Required | Description | | ------------------ | ---- | -------- | ----------- | -| `host` | `str` | `true` | If `http/https` not specified in the url, defaults to `https`. Use `assets.verify_ssl` to disable certificate checking. | +| `host` | `str` | `true` | If `http/https` not specified in the url, defaults to `https`. Use `http_config.verify_ssl` to disable certificate checking. | | `credentials` | `object` | `false` | Optional section where Bookstack tokenId and tokenSecret can be specified. Env variable for credentials may be supplied instead. See [Authentication](#authentication) for more details. | -| `credentials.token_id` | `str`| `true` if `credentials` | If `credentials` section is given, this should be a valid tokenId | -| `credentials.token_secret` | `str` | `true` if `credentials`| If `credentials` section is given, this should be a valid tokenSecret | -| `additional_headers` | `object` | `false` | Optional section where key/value for pairs can be specified to use in Bookstack http request headers. +| `credentials.token_id` | `str`| `false` if specified through env var instead, otherwise `true` | A valid Bookstack tokenId. | +| `credentials.token_secret` | `str` | `false` if specified through env var instead, otherwise `true` | A valid Bookstack tokenSecret. | | `formats` | `list` | `true` | Which export formats to use for Bookstack page content. Valid options are: `["markdown", "html", "pdf", "plaintext"]`| | `output_path` | `str` | `false` | Optional (default: `cwd`) which directory (relative or full path) to place exports. User who runs the command should have access to read/write to this directory. This directory and any parent directories will be attempted to be created if they do not exist. If not provided, will use current run directory by default. If using docker, this option can be omitted. | | `assets` | `object` | `false` | Optional section to export additional assets from pages. | | `assets.export_images` | `bool` | `false` | Optional (default: `false`), export all images for a page to an `image` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | | `assets.export_attachments` | `bool` | `false` | Optional (default: `false`), export all attachments for a page to an `attachments` directory within page directory. See [Backup Behavior](#backup-behavior) for more information on layout | -| `assets.modify_markdown` | `bool` | `false` | Optional (default: `false`), modify markdown files to replace image links with local exported image paths. This requires `assets.export_images` to be `true` in order to work. See [Modify Markdown Files](#modify-markdown-files) for more information. -| `assets.export_meta` | `bool` | `false` | Optional (default: `false`), export of metadata about the page in a json file | -| `assets.verify_ssl` | `bool` | `false` | Optional (default: `true`), whether or not to check ssl certificates when requesting content from Bookstack host | -| `keep_last` | `int` | `false` | Optional (default: `None`), if exporter can delete older archives. valid values are:
- set to `-1` if you want to delete all archives after each run (useful if you only want to upload to object storage)
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done | +| `assets.modify_markdown` | `bool` | `false` | Optional (default: `false`), modify markdown files to replace image links with local exported image paths. This requires `assets.export_images` to be `true` in order to work. See [Modify Markdown Files](#modify-markdown-files) for more information. | +| `assets.export_meta` | `bool` | `false` | Optional (default: `false`), export of metadata about the page in a json file. | +| `http_config` | `object` | `false` | Optional section to override default http configuration. | +| `http_config.verify_ssl` | `bool` | `false` | Optional (default: `false`), whether or not to verify ssl certificates if using https. | +| `http_config.timeout` | `int` | `false` | Optional (default: `30`), set the timeout, in seconds, for http requests. | +| `http_config.retry_count` | `int` | `false` | Optional (default: `5`), the number of http retries after initial failure. | +| `http_config.retry_codes` | `List[int]` | `false` | Optional (default: `[413, 429, 500, 502, 503, 504]`), which http response status codes trigger a retry. | +| `http_config.backoff_factor` | `float` | `false` | Optional (default: `2.5`), set the backoff_factor for http request retries. Default backoff_factor `2.5` means we wait 5, 10, 20, and then 40 seconds (with default `http_config.retry_count: 5`) before our last retry. This should allow for per minute rate limits to be refreshed. | +| `http_config.additional_headers` | `object` | `false` | Optional (default: `{}`), specify key/value pairs that will be added as additional headers to http requests. | +| `keep_last` | `int` | `false` | Optional (default: `0`), if exporter can delete older archives. valid values are:
- set to `-1` if you want to delete all archives after each run (useful if you only want to upload to object storage)
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done. | | `run_interval` | `int` | `false` | Optional (default: `0`). If specified, exporter will run as an application and pause for `{run_interval}` seconds before subsequent runs. Example: `86400` seconds = `24` hours or run once a day. Setting this property to `0` will invoke a single run and exit. Used for basic scheduling of backups. | | `minio` | `object` | `false` | Optional [Minio](#minio-backups) configuration options. | @@ -468,7 +478,14 @@ minio: | `access_key` | `str` | `false` if specified through env var instead, otherwise `true` | Access key for the minio instance | | `secret_key` | `str` | `false` if specified through env var, otherwise `true` | Secret key for the minio instance | | `path` | `str` | `false` | Optional, path of the backup to use. Will use root bucket path if not set. `://bookstack-.tgz` | -| `keep_last` | `int` | `false` | Optional (default: `None`), if exporter can delete older archives in minio.
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done | +| `keep_last` | `int` | `false` | Optional (default: `0`), if exporter can delete older archives in minio.
- set to `1+` if you want to retain a certain number of archives
- `0` will result in no action done | + +## Potential Breaking Upgrades +Below are versions that have major changes to the way configuration or exporter runs. + +| Start Version | Target Version | Description | +| ------------- | -------------- | ----------- | +| `< 1.4.X` | `1.5.0` | `assets.verify_ssl` has been moved to `http_config.verify_ssl` and the default value has been updated to `false`. `additional_headers` has been moved to `http_config.additional_headers` | ## Future Items 1. ~~Be able to pull images locally and place in their respective page folders for a more complete file level backup.~~ diff --git a/bookstack_file_exporter/config_helper/models.py b/bookstack_file_exporter/config_helper/models.py index 2df2ca1..2e89649 100644 --- a/bookstack_file_exporter/config_helper/models.py +++ b/bookstack_file_exporter/config_helper/models.py @@ -5,7 +5,7 @@ # pylint: disable=too-few-public-methods class ObjectStorageConfig(BaseModel): """YAML schema for minio configuration""" - host: str + host: Optional[str] = "" access_key: Optional[str] = "" secret_key: Optional[str] = "" bucket: str @@ -26,7 +26,6 @@ class Assets(BaseModel): export_attachments: Optional[bool] = False modify_markdown: Optional[bool] = False export_meta: Optional[bool] = False - # verify_ssl: Optional[bool] = True class HttpConfig(BaseModel): """YAML schema for user provided http settings""" diff --git a/bookstack_file_exporter/config_helper/remote.py b/bookstack_file_exporter/config_helper/remote.py index af3e425..ddcc717 100644 --- a/bookstack_file_exporter/config_helper/remote.py +++ b/bookstack_file_exporter/config_helper/remote.py @@ -43,12 +43,10 @@ def is_valid(self, storage_type: str) -> bool: def _is_minio_valid(self) -> bool: """check if minio config is valid""" - # required values - keys already checked so skip + # required values - keys and bucket already checked so skip checks = { - "bucket": self.config.bucket, "host": self.config.host } - for prop, check in checks.items(): if not check: log.error("%s is missing from minio configuration and is required", prop) diff --git a/examples/config.yml b/examples/config.yml index 71a7c8e..99481d4 100644 --- a/examples/config.yml +++ b/examples/config.yml @@ -3,14 +3,14 @@ # if you put http here, it will try verify=false, not to check certs host: "https://bookstack.mydomain.org" # You could optionally set the bookstack token_id and token_secret here instead of env -# If using env vars instead you can ignore this section +# If using env vars instead you can leave values empty or omit this section credentials: - # set here or as env variable, BOOKSTACK_TOKEN_ID - # env var takes precedence over below - token_id: "" - # set here or as env variable, BOOKSTACK_TOKEN_SECRET - # env var takes precedence over below - token_secret: "" + # set here or as env variable, BOOKSTACK_TOKEN_ID + # env var takes precedence over below + token_id: "" + # set here or as env variable, BOOKSTACK_TOKEN_SECRET + # env var takes precedence over below + token_secret: "" # supported formats from bookstack below # specify one or more formats: @@ -40,11 +40,18 @@ assets: # default backoff_factor 2.5 means we wait 5, 10, 20, and then 40 seconds before our last retry # - this should allow for per minute rate limits to be refreshed http_config: + # whether or not to verify ssl certificates if using https verify_ssl: false + # set http timeout in seconds for requests timeout: 30 + # default backoff_factor 2.5 means we wait 5, 10, 20, and then 40 seconds before our last retry + # - this should allow for per minute rate limits to be refreshed backoff_factor: 2.5 + # which status codes trigger retries retry_codes: [413, 429, 500, 502, 503, 504] + # number of retries retry_count: 5 + # any additional headers to add to http requests additional_headers: test: "test" test2: "test2" diff --git a/examples/minio_config.yml b/examples/minio_config.yml index acfc775..517fa3d 100644 --- a/examples/minio_config.yml +++ b/examples/minio_config.yml @@ -3,14 +3,14 @@ # if you put http here, it will try verify=false, not to check certs host: "https://bookstack.mydomain.org" # You could optionally set the bookstack token_id and token_secret here instead of env -# If using env vars instead you can omit/comment out this section +# If using env vars instead you can leave values empty or omit this section credentials: - # set here or as env variable, BOOKSTACK_TOKEN_ID - # env var takes precedence over below - token_id: "" - # set here or as env variable, BOOKSTACK_TOKEN_SECRET - # env var takes precedence over below - token_secret: "" + # set here or as env variable, BOOKSTACK_TOKEN_ID + # env var takes precedence over below + token_id: "" + # set here or as env variable, BOOKSTACK_TOKEN_SECRET + # env var takes precedence over below + token_secret: "" # supported formats from bookstack below # specify one or more formats: @@ -66,23 +66,23 @@ assets: # optional - can override default http_config # if not required, you can omit/comment out section # https://urllib3.readthedocs.io/en/stable/reference/urllib3.util.html -# default backoff_factor 2.5 means we wait 5, 10, 20, and then 40 seconds before our last retry -# - this should allow for per minute rate limits to be refreshed http_config: + # whether or not to verify ssl certificates if using https verify_ssl: false + # set http timeout in seconds for requests timeout: 30 + # default backoff_factor 2.5 means we wait 5, 10, 20, and then 40 seconds before our last retry + # - this should allow for per minute rate limits to be refreshed backoff_factor: 2.5 + # which status codes trigger retries retry_codes: [413, 429, 500, 502, 503, 504] + # number of retries retry_count: 5 + # any additional headers to add to http requests additional_headers: test: "test" test2: "test2" User-Agent: "test-agent" -# After uploading to object storage targets, choose to clean up local files -# delete the archive from local filesystem -# optional -# default = false if omitted -clean_up: true # optional if specified exporter can delete older archives # valid values are: # set to -1 if you want to delete all archives after each run