Skip to content

feat: adapt rapidocr v3 and refactor code #99

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 22, 2025
Merged

feat: adapt rapidocr v3 and refactor code #99

merged 8 commits into from
Jun 22, 2025

Conversation

SWHL
Copy link
Member

@SWHL SWHL commented Jun 22, 2025

No description provided.

@SWHL SWHL requested a review from Copilot June 22, 2025 02:07
Copy link

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR refactors the core engine initialization and updates to support RapidOCR v3, reorganizing model downloading and preprocessing/postprocessing workflows.

  • Adapt PPTableStructurer and TableLabelDecode for new token handling
  • Introduce ModelProcessor for model path management
  • Refactor RapidTable to use rapidocr package and new dataclass configs

Reviewed Changes

Copilot reviewed 19 out of 35 changed files in this pull request and generated 6 comments.

File Description
rapid_table/table_structure/pp_structure/post_process.py Adjusted label decoding and special-token handling
rapid_table/main.py Refactored RapidTable init and CLI entrypoint
rapid_table/inference_engine/torch.py Added stub for Torch inference session
rapid_table/inference_engine/onnxruntime/main.py Updated ONNX Runtime session creation and error handling
Comments suppressed due to low confidence (2)

rapid_table/main.py:45

  • [nitpick] Warning is incomplete/unhelpful; consider clarifying, e.g., 'rapidocr package is not installed; OCR step disabled, only table postprocessing will run.'
            logger.warning("rapidocr package is not installed, only table rec")

rapid_table/inference_engine/onnxruntime/main.py:70

  • The method is annotated to return np.ndarray but calls session.run, which returns a List[np.ndarray]; update the return type or convert the output accordingly.
    def __call__(self, input_content: np.ndarray) -> np.ndarray:

@SWHL SWHL added this to the v2.0.0 milestone Jun 22, 2025
@SWHL SWHL self-assigned this Jun 22, 2025
@SWHL SWHL merged commit 38eab97 into main Jun 22, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant