core: support parsing back slash `\` in `parseKey` in FileStorage (JSON) #27587

fengyuentau · 2025-07-28T08:56:01Z

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

fengyuentau · 2025-07-28T08:57:22Z

With this PR, tokenizer.json from gpt2 and qwen3 can now be read by cv::FileStorage successfully.

If possible, this PR (along with #27579 ) needs to be merged and ported to 5.x soon to support the development progress of tokenizer support #27534 .

modules/core/test/test_io.cpp

asmorkalov · 2025-07-28T10:51:57Z

@fengyuentau What about raw strings in python with triple quotes?

fengyuentau · 2025-07-28T10:54:34Z

@fengyuentau What about raw strings in python with triple quotes?

@asmorkalov Thank you for suggestion! Now it works as expected with the r prefix in python.

# a.json: {"\"":1,"\\":59,"Ġ\"":366,"\\\\":6852}
import cv2 as cv
import json

# a.json
a = cv.FileStorage("a.json", cv.FileStorage_FORMAT_JSON)
print(type(a), a.getNode("\"").real(), a.getNode(r"\\").real())
# <class 'cv2.FileStorage'> 1.0 59.0

with open("a.json", "r") as f:
    a = json.load(f)
    print(type(a), a["\""], a["\\"])
    # <class 'dict'> 1 59

debug print

modules/core/src/persistence_json.cpp

fengyuentau · 2025-07-28T12:00:23Z

@dkurt @asmorkalov I found multiple sources saying that backslash should be escaped as well. So python's json printing a key of double backslash should be the reason of human readability; that's to say in memory it is still parsed as one single backslash

# a.json: {"\"":1,"\\":59,"Ġ\"":366,"\\\\":6852}
import json
with open("a.json", "r") as f:
    a = json.load(f)
    print(a.keys())
    # dict_keys(['"', '\\', 'Ġ"', '\\\\'])
    print(type(a), a["\""], a["\\"], a[r"\\"])
    # <class 'dict'> 1 59 6852

What do you think?

fengyuentau · 2025-07-28T12:34:23Z

With the latest commit, we can do the following things correctly:

import cv2 as cv
import json

# a.json
a = cv.FileStorage("a.json", cv.FileStorage_FORMAT_JSON)
print(a.getNode("\"").name(), a.getNode("\\").name(), a.getNode(r"\\").name())
# " \ \\
print(type(a), a.getNode("\"").real(), a.getNode("\\").real(), a.getNode(r"\\").real())
# <class 'cv2.FileStorage'> 1.0 59.0 6852.0

with open("a.json", "r") as f:
    a = json.load(f)
    print(a.keys())
    # dict_keys(['"', '\\', 'Ġ"', '\\\\'])
    print(type(a), a["\""], a["\\"], a[r"\\"])
    # <class 'dict'> 1 59 6852

# tokenizer.json
a = cv.FileStorage("tokenizer.json", cv.FileStorage_FORMAT_JSON)
b = a.getNode("model")
c = b.getNode("vocab")
print(type(a), c.getNode("\"").real(), c.getNode("\\").real())
# <class 'cv2.FileStorage'> 1.0 59.0

with open("tokenizer.json", "r") as f:
   a = json.load(f)
   b = a["model"]
   c = b["vocab"]
   print(type(a), c["\""], c["\\"])
   # <class 'dict'> 1 59

The only problem now is our key does not show as python's json's. I guess it is fine since it is different way to show information.

asmorkalov · 2025-07-28T15:42:08Z

I would say, that it's the case, when real file on opencv_extra is better than hard-coded solution. It's error prone and with file with do not need to re-check escape sequences and other syntax sugar in particular programming language.

fengyuentau · 2025-07-29T07:00:06Z

I would say, that it's the case, when real file on opencv_extra is better than hard-coded solution. It's error prone and with file with do not need to re-check escape sequences and other syntax sugar in particular programming language.

The current test uses raw string; that should be clear enough for readability and maintainability.

feat: support back slash in parseKey

82e1fe8

fengyuentau added this to the 4.13.0 milestone Jul 28, 2025

fengyuentau requested review from vpisarev, asmorkalov and dkurt July 28, 2025 08:56

fengyuentau added feature category: core port to 5.x is needed Label for maintainers. Authors of PR can ignore this labels Jul 28, 2025

dkurt reviewed Jul 28, 2025

View reviewed changes

modules/core/test/test_io.cpp Outdated Show resolved Hide resolved

fix: partially fix problems with quote

398b6a6

This comment was marked as outdated.

Sign in to view

dkurt previously approved these changes Jul 28, 2025

View reviewed changes

dkurt reviewed Jul 28, 2025

View reviewed changes

modules/core/src/persistence_json.cpp Outdated Show resolved Hide resolved

dkurt reviewed Jul 28, 2025

View reviewed changes

modules/core/src/persistence_json.cpp Outdated Show resolved Hide resolved

fix: fix according to comments and fix accuracy

525b0a1

vpisarev approved these changes Jul 29, 2025

View reviewed changes

asmorkalov approved these changes Jul 29, 2025

View reviewed changes

asmorkalov assigned vpisarev Jul 29, 2025

asmorkalov merged commit 07cf36c into opencv:4.x Jul 29, 2025
54 of 55 checks passed

fengyuentau deleted the 4x/core/filestorage_json_support_backslash branch July 30, 2025 05:55

asmorkalov mentioned this pull request Jul 30, 2025

5.x merge 4.x #27604

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

core: support parsing back slash `\` in `parseKey` in FileStorage (JSON) #27587

core: support parsing back slash `\` in `parseKey` in FileStorage (JSON) #27587

Uh oh!

fengyuentau commented Jul 28, 2025

Uh oh!

fengyuentau commented Jul 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

This comment was marked as outdated.

asmorkalov commented Jul 28, 2025

Uh oh!

fengyuentau commented Jul 28, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

fengyuentau commented Jul 28, 2025 •

edited

Loading

Uh oh!

fengyuentau commented Jul 28, 2025

Uh oh!

asmorkalov commented Jul 28, 2025

Uh oh!

fengyuentau commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

core: support parsing back slash \ in parseKey in FileStorage (JSON) #27587

core: support parsing back slash \ in parseKey in FileStorage (JSON) #27587

Uh oh!

Conversation

fengyuentau commented Jul 28, 2025

Pull Request Readiness Checklist

Uh oh!

fengyuentau commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

This comment was marked as outdated.

asmorkalov commented Jul 28, 2025

Uh oh!

fengyuentau commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

fengyuentau commented Jul 28, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fengyuentau commented Jul 28, 2025

Uh oh!

asmorkalov commented Jul 28, 2025

Uh oh!

fengyuentau commented Jul 29, 2025

Uh oh!

Uh oh!

Uh oh!

core: support parsing back slash `\` in `parseKey` in FileStorage (JSON) #27587

core: support parsing back slash `\` in `parseKey` in FileStorage (JSON) #27587

fengyuentau commented Jul 28, 2025 •

edited

Loading

fengyuentau commented Jul 28, 2025 •

edited

Loading

fengyuentau commented Jul 28, 2025 •

edited

Loading