Skip to content

Load original file metadata when loading Xliff 1.2 files #29148

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 13, 2019

Conversation

eternoendless
Copy link

Q A
Branch? master
Bug fix? no
New feature? no
BC breaks? no
Deprecations? no
Tests pass? yes
Fixed tickets n/a
License MIT
Doc PR n/a

At PrestaShop, we maintain our translations catalog automatically using an internal tool based on our TranslationToolsBundle, which is capable of reverse building a MessageCatalogue by parsing the source code, and then saving it to Xliff files.

Currently, this tool is only capable of building catalogs from scratch. We are currently moving to an incremental catalog where we only add new wordings, and keep old ones even if they are no longer present in the code (because of B/C). To do that, instead of starting from a clean MessageCatalogue, we load our current catalog using XliffLoader, and use that MessageCatalogue as a base. Easy peasy. But then we found a problem...

The Xliff 1.2 standard defines a list of <trans-unit> elements within a collection of <file> elements. The <file> element has a required attribute named original, which is supposed to contain the name of the file where the wordings are used in (at least in our case it does). This attribute is currently ignored by XliffFileLoader.

This means that it's currently impossible to read a Xliff 1.2 file using XliffFileloader, and save it back to Xliff without losing data.

This Pull Request adds a new file element to the messages' metadata (alongside notes, target-attributes and id). Right now, it only contains original, but it could be extended to contain all the other attributes from the <file> element if needed.

This required a small change in the loader where we loop through <file> elements before fetching their <trans-unit> children, instead of fetching all <trans-unit> elements at once.

@nicolas-grekas
Copy link
Member

nicolas-grekas commented Nov 8, 2018

This attribute is currently ignored by XliffFileLoader.

does it mean we have a spec violation? should we consider this a bug?

@eternoendless
Copy link
Author

does it means we have spec violation?

I guess it depends on where you draw the line regarding the scope of XliffFileLoader and MessageCatalogue. Is it just about loading the translatables and making them available for the Translator? Then it's not a bug.

In my opinion, loading metadata like notes is a "nice to have" that actually goes beyond the scope of MessageCatalogue. It could be considered a "leak" of vendor-specific (Xliff in this case) data into MessageCatalogue. In the end I don't think it hurts, albeit for a little overhead when reading the files and a small increase in the catalogue's memory footprint.

Copy link
Member

@nicolas-grekas nicolas-grekas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with one minor comment thanks.

Copy link
Member

@fabpot fabpot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Three things should be done before merging:

  • Take Nicolas's comment into account
  • Rebase on current master
  • Add a note in the CHANGELOG.

@eternoendless eternoendless force-pushed the load-original-file-info branch from f29c56b to d3e586e Compare January 7, 2019 10:40
@eternoendless
Copy link
Author

eternoendless commented Jan 7, 2019

Comment and rebase done 👍

I see that on the changelog, notes are under released versions. Should I add 4.3.0? Should I only add the note with no title?

@stof
Copy link
Member

stof commented Jan 7, 2019

this should go in 4.3.0, yes

// If the xlf file has another encoding specified, try to convert it because
// simple_xml will always return utf-8 encoded values
$target = $this->utf8ToCharset((string) (isset($translation->target) ? $translation->target : $translation->source), $encoding);
$file->registerXPathNamespace('xliff', $namespace);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think you need to register the namespace again. The registration on the root element should already be enough.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I recall having namespace issues without it.

@fabpot fabpot force-pushed the load-original-file-info branch from e178ff1 to 4073319 Compare January 13, 2019 16:47
@fabpot
Copy link
Member

fabpot commented Jan 13, 2019

Thank you @eternoendless.

@fabpot fabpot merged commit 4073319 into symfony:master Jan 13, 2019
fabpot added a commit that referenced this pull request Jan 13, 2019
…es (eternoendless)

This PR was squashed before being merged into the 4.3-dev branch (closes #29148).

Discussion
----------

Load original file metadata when loading Xliff 1.2 files

| Q             | A
| ------------- | ---
| Branch?       | master
| Bug fix?      | no
| New feature?  | no
| BC breaks?    | no
| Deprecations? | no
| Tests pass?   | yes
| Fixed tickets | n/a
| License       | MIT
| Doc PR        | n/a

At PrestaShop, we maintain our translations catalog automatically using an internal tool based on our [TranslationToolsBundle](https://github.com/PrestaShop/TranslationToolsBundle), which is capable of reverse building a MessageCatalogue by parsing the source code, and then saving it to Xliff files.

Currently, this tool is only capable of building catalogs from scratch. We are currently moving to an incremental catalog where we only add new wordings, and keep old ones even if they are no longer present in the code (because of B/C). To do that, instead of starting from a clean MessageCatalogue, we load our current catalog using XliffLoader, and use that MessageCatalogue as a base. Easy peasy. But then we found a problem...

The Xliff 1.2 standard defines a list of `<trans-unit>` elements within a collection of `<file>` elements. The `<file>` element has a required attribute named `original`, which is supposed to contain the name of the file where the wordings are used in (at least in our case it does). **This attribute is currently ignored by XliffFileLoader**.

This means that it's currently impossible to read a Xliff 1.2 file using XliffFileloader, and save it back to Xliff without losing data.

This Pull Request adds a new `file` element to the messages' metadata (alongside `notes`, `target-attributes` and `id`). Right now, it only contains `original`, but it could be extended to contain all the other attributes from the `<file>` element if needed.

This required a small change in the loader where we loop through `<file>` elements before fetching their `<trans-unit>` children, instead of fetching all `<trans-unit>` elements at once.

Commits
-------

4073319 Load original file metadata when loading Xliff 1.2 files
@nicolas-grekas nicolas-grekas modified the milestones: next, 4.3 Apr 30, 2019
@fabpot fabpot mentioned this pull request May 9, 2019
@eternoendless eternoendless deleted the load-original-file-info branch March 12, 2021 10:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants