-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feature/bookmarks #445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Feature/bookmarks #445
Conversation
Okay, a few things:
I'll see if there are some other comments I can make on actual commits. |
39066d8
to
464667a
Compare
Okay, there are a few other things, but let's address them on the next pass. Does this give you enough to noodle on for the moment? |
Okay, I agree that we can solve this in the oxml layer. However I do not see how finding a matching bookmarkEnd solves the issue of seperating two bookmarksStarts: It's likely that one wants to find the nearest <w:document>
<w:body>
<w:bookmarkStart w:name="test_one" w:id="1" />
<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>
<w:bookmarkStart w:name="test_two" w:id="2" />
<w:p>
<w:r>
<w:t>test_line_2</w:t>
</w:r>
</w:p>
<w:bookmarkEnd w:id="1" />
<w:bookmarkEnd w:id="2" />
<w:sectPr w:rsidRPr="0006063C" w:rsidR="00FC693F" w:rsidSect="00034616">
<w:pgSz w:w="12240" w:h="15840" />
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0" />
<w:cols w:space="720" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>
</w:document> In order to solve this, I think we do need some find the corresponding If I understand your code correctly this should be something like this:" class CT_BookmarkStart(...):
def match_bookmarkstart(self, name):
"""Returns `w:bookmarkStart` element having matching name, None if not present."""
root_element = self.getroottree().getroot()
matching_bookmarkStarts = root_element.xpath(
'.//w:bookmarkStart[@w:name=\'%s\']' % name
)
if not matching_bookmarkStarts:
return None
return matching_bookmarkStarts[0] |
84d926e
to
2e9971f
Compare
So the latest commit is capable of both generating bookmarks assuming the closed bookmarkStart belongs to the bookmarkEnd and it can match bookmarks based on the name if one wants to add bookmarks in some unconventional order. Let me know what you think. On an other note if you rebase the branch you basically lose all changes, but maybe down the road, you come back on some choice made in the past. How do you manage this? Creat a local duplicate branch of some of the older versions squashed or dropped in the rebase? |
I don't see why we would need to find a document.end_bookmark(bookmark) The class Document(...):
...
def end_bookmark(self, bookmark):
"""Close *bookmark* at current end of document."""
self._element.add_bookmarkEnd(bookmark.id) |
This does set me thinking though. In the XML Semantics section, we need to discover and describe a few things:
I'm thinking the def end_bookmark(self, bookmark):
"""Close *bookmark* at current end of document."""
if bookmark.is_closed:
raise ValueError('cannot end closed bookmark')
self._element.add_bookmarkEnd(bookmark.id) |
If a client wants to close bookmarks in an unconventional order, they are free to do that (assuming any overlaps etc. turn out to be allowable); all they need do is keep track of the This is a natural thing to do and gives the client full flexibility, for example: outside_bookmark = document.start_bookmark('outside')
add_bookmark_content()
inside_bookmark = document.start_bookmark('inside')
add_nested_bookmark_content()
document.end_bookmark(inside_bookmark)
document.end_bookmark(outside_bookmark) If we try to be "helpful" and essentially guess which one they want, we only give ourselves more work, the frustration inherent in attempting to read minds, and ultimately remove flexibility from the user. I don't know if it has a catchy name like YAGNI does, but the principle is "code that attempts to be "helpful", never is."
Yes, this is quite true. Grooming is destructive. That occasionally comes to bite. I think your mitigation approach has to be consistent with your style. I generally just go for it and rewrite from memory if necessary. If there's a big chunk I think I might want I either comment it out and leave it in until a later grooming pass or copy it out to my |
The document is not compliant, and the bookmark is ignored, the id's should be unique. However, we might want to implement a raise for 'orphaned' bookmarkEnds?
This is possible, its also possible to generate such a structure using the word editor. (selected text, which is already part of another bookmark, and add it to another bookmark ) It's because of this nesting thing, that I made up this match bookmark start code: class CT_BookmarkEnd(BaseOxmlElement):
"""
The ``<w:bookmarkEnd>`` element
"""
bmrk_id = RequiredAttribute('w:id', ST_RelationshipId)
@property
def _next_id(self):
root = self.getroottree().getroot()
return str(len(root.xpath('.//w:bookmarkStart')))
def match_bookmarkstart(self, name):
"""Returns `w:bookmarkStart` element having matching name, None if not present."""
root_element = self.getroottree().getroot()
matching_bookmarkStarts = root_element.xpath(
'.//w:bookmarkStart[@w:name=\'%s\']' % name
)
if not matching_bookmarkStarts:
raise ValueError('Bookmark name not found')
return None
return matching_bookmarkStarts[0].bmrk_id passing the bookmark object back and forth is off course also a good suggestion. I didn't think this was a preferred approach, if it is, it cancels the need for such a 'match_bookmarkstart' method on the |
Okay, so then we definitely want a test with raise (as above) on Btw, the
How would one of these arise? If we get a document with orphaned bookmarkEnds, that's on the client. As long as we don't add (or leave) any ourselves, we're fine. If we have a This is one of a few good reasons to have |
A couple remarks on naming, just so we use the established vocabulary:
I think we've got a good start there, actually probably well toward 80%. Where we're weak is in the specifics of the protocol (which determine the API calls to be implemented, so are critical) and scrubbing through the conversation here to pull out the experimental details, like overlapping is okay, elements with same ID trigger repair error, etc. I'll have another look once you've updated it as far as you can. I expect we'll have some more conversation surrounding the protocol section.
Let's just put them all in There are a lot of moves to figure out for yourself when starting here, so I propose we start with the following single acceptance test and then we can see where we are when that one's passing; sometimes I add new ones in groups. Actually, this one should go in Given a Document object as document
Then document.bookmarks is a Bookmarks object
You'll be developing test-driven (as opposed to spiking, which you can feel free to continue to do). One consequence of this is you don't need to try to figure out what the unit tests will look like as a whole; they will emerge in a natural pattern as you go. Another consequence is that you will be inserting your acceptance test commit immediately after your analysis page commit. So while unit tests may be failing on your spike tip, they will not fail on your test-driven (TDD) tip. The acceptance test will be decorated with When your acceptance test is run, it will fail (or it's not a good test). Where it fails leads you to the first unit test. Let's get to that when we see it. I expect it will give rise to a commit named:
You can add these into your spike if you want, I don't mind either way. Where they will be required, is in every commit up to your TDD tip. |
One last thought. Since it looks like the protocol for each bookmark "parent" will be the same, and even the implementation will be the same (or can be made to be), I'm thinking we want to use a mixin for implementation. If you haven't used one of those before, it's a class that other classes inherit from, but don't "completely" subclass. So we would have for example: class Paragraph(Parented, BookmarkParent):
...
# and in another module
class BookmarkParent(object):
def start_bookmark(self, name):
...
def end_bookmark(self, bookmark)
... This way we don't have a bunch of redundant code hanging around and also reduces the number of unit tests required (if we're clever about how we do that). |
Hey @scanny , unfortunately I didn't have much time to work on this project. However I did look into your mixin proposal. I think this could work for the class BookmarkParent(object):
def start_bookmark(self, name):
bookmarkstart = self._element.add_bookmarkStart(name)
return Bookmark(bookmarkstart)
def end_bookmark(self, name=None):
bookmarkend = self._element.add_bookmarkEnd(name)
return Bookmark(bookmarkend) However, this leads to a problem in the class BookmarkParent(object):
def start_bookmark(self, name):
if hasattr(self, '_body'):
self._element = self._body
bookmarkstart = self._element.add_bookmarkStart(name)
return Bookmark(bookmarkstart)
def end_bookmark(self, bookmark=None):
if hasattr(self, '_body'):
self._element = self._body
bookmarkend = self._element.add_bookmarkEnd(bookmark)
if bookmark is None:
bookmarkend._next_id
else:
bookmarkend._id = bookmark._id
return Bookmark(bookmarkend) Also, the end_bookmark function has the optional argument Everything seems to be in working order. let me know what you think of this solution. Ben |
I'm thinking that
|
Ah, that would be an elegant solution for our "problem" I'll see if I have some time this evening to try and make it work. |
If I understand you correctly, you end up with: class BlockItemContainer(Parented, BookmarkParent): But this doesn't solve this error: Which isn't there when I keep the I do have to change, '_body' with '_element' in the _Body() class, so I end up with this code below: class _Body(BlockItemContainer):
"""
Proxy for ``<w:body>`` element in this document, having primarily a
container role.
"""
def __init__(self, body_elm, parent):
super(_Body, self).__init__(body_elm, parent)
self._body = body_elm
def clear_content(self):
"""
Return this |_Body| instance after clearing it of all content.
Section properties for the main document story, if present, are
preserved.
"""
self._body.clear_content()
return self
def add_bookmarkStart(self, name):
bookmarkStart = self._element._add_bookmarkStart()
bookmarkStart.name = name
bookmarkStart.bmrk_id = bookmarkStart._next_id
return bookmarkStart
def add_bookmarkEnd(self, bookmark=None):
bookmarkEnd = self._element._add_bookmarkEnd()
if bookmark is None:
bookmarkEnd.bmrk_id = bookmarkEnd._next_id
else:
bookmarkEnd.bmrk_id = bookmark._id
return bookmarkEnd But to be honest, I'm lost. I do not understand how I could get this to work. |
Ok, here are some ideas on how this would work, working from the outside in:
Does that make sense? |
1a77bf3
to
06439d0
Compare
Yes it did! Thanks a lot @scanny I completely overlooked the fact that we can offcourse relay the call using So I've found some time to implement the first working version, and rebased it onto the first stab at the documentation. I've also updated the 'TO DO' list: Plan:
Feature docsI've looked over our conversation and the following things came up:
Functional tests
Unit-testsSince we opted for the
However, there seem to be some tests that are failing? |
The code reproduces the XML documentation example, however my implementation starts at 1. whereas word starts at zero. This should probably be in line. doc = Document()
par = doc.add_paragraph()
par.add_run('Example')
bmrk = par.start_bookmark(name='sampleBookmark')
par.add_run(' text.')
par = doc.add_paragraph()
par.add_run('Example')
par.end_bookmark(bmrk)
par.add_run(' text.') <w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkStart w:name="sampleBookmark" w:id="1" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkEnd w:id="1" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p>
xsd excerp: <w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="sampleBookmark" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkEnd w:id="0" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p> |
As you probably have noticed, I've tried to do my first TDD tip, let me know whether this is how it should be done. regards, Ben |
I don't think I've seen a use case yet that would drive the need for this. It sounds like a good thing to have and would be easy enough to implement, but I propose we wait until we have a live need for it. While we're talking about use cases, have you thought about any higher-level use cases? Maybe captions would be a good place to start, so thinking through how a developer (or we) would use bookmarks to implement captions. So something like this added to the protocol section of the analysis document: >>> # ---add caption ---
>>> caption_bookmark = document.start_bookmark('Picture Caption')
>>> document.add_picture(...)
>>> document.end_bookmark(caption_bookmark) ...or whatever that would need to look like. I'm not sure where the caption text goes in but expect you've got an idea. Based on these use cases, we'll be able to prioritize individual methods and properties to accomplish meaningful new capabilities rather than just nice-to-have ones :)
Okay, this is interesting. I wonder how it tells where the bookmark ends? I'd be inclined for us to adopt a "non-greedy" policy and identify the first-encountered In any case, we should make sure that we do not add any bookmarks with duplicate keys. An implementation of Btw, did we already establish what, if anything, happens when the bookmark end appears before the bookmark start? This one's another for the experimental results (Word Behaviors) section of the analysis document
Good to know. Let's make sure this is stated in the Word behaviors section of the analysis document.
Yes. There are subtle differences in the case of the various XML-element-related names. The XML Schema names are always UpperCamelCase (with the prefix being all-caps, e.g. CT, ST, EG, etc.). These are always copied exactly when used as class names. Likewise lowerCamelCase for element tag names and attributes (and the variables that represent those elements and attributes). This convention provides useful visual cues for what kind of thing you're looking at in the code.
I'm inclined to make the second one just
As we noted earlier, there should be no failing tests on your TDD tip, and failing tests on the spike tip are of no concern. If an existing test starts failing on a particular TDD commit, you fix it then. Generally this will be because you changed existing code, not because you added some. |
Thanks for looking at my commits @scanny, it's unfortunate that I had a different approach, but looking at your implementation, I can definitely see the elegance. I didn't know that it was possible to override the _add_bookmarkStart method on OxmlElement level. I've created acceptance tests for the end_bookmark() method. I thought this part of the implementation was really straightforward, but please let me know if I overlooked any underlying complexities. As soon as we can get these commits merged, I would really like to continue to implement the |
@scanny |
Yeah, that looks about right to me. I suppose inside a |
@scanny I've tested this with inline bookmarks using the word editor and it results in different paragraph level elements "in between" the run elements: <w:p w:rsidR="009039EA" w:rsidRDefault="005A2B2B" w14:paraId="79A0AC1D" w14:textId="2DF56D06">
<w:r>
<w:t xml:space="preserve">Inline </w:t>
</w:r>
<w:bookmarkStart w:name="test" w:id="0" />
<w:r>
<w:t>bookmark</w:t>
</w:r>
<w:bookmarkEnd w:id="0" />
<w:r>
<w:t xml:space="preserve"> text</w:t>
</w:r>
</w:p> |
bdbd6fe
to
a04ef91
Compare
Dear @scanny I've added the TDD commits to support paragraph level bookmarks as well. If you could review them and tell me if there is something missing, that would be most appreciated. After these commits the acceptance tests are exhausted again. Do we need additional functionalities? Perhaps a delete bookmark but besides that one I would say we are getting close to finalizing this feature :D |
@scanny, did you find any time to take a look at my latest commits? Feedback would be much appreciated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here are some quick comments on this next commit. If you can get these in I'll see if I can move us along. Sorry I only have time to review one commit at the moment :)
a04ef91
to
57b6ffe
Compare
@scanny Any thoughts Steve? |
@scanny If you could take a look at the functional tests and perhaps provide some feedback on some of the other commits we can perhaps get this PR merged! |
@scanny Can you provide me with some feedback so we can some of this merged? |
@scanny it's been another 3 months, would it be possible to get a review on the PR? If you rather discuss the way forward in a google meet or zoom meeting, that would be okay as well. regards, Ben |
Hey there, I've researching the development of this feature. It would be really useful to have for my use case. Will this be merged? Is there anything I can help with? Thanks, |
Hi @drewh1991, I have every intention of getting this merged, however the progress stalled pending review of @scanny. I do think the biggest part of the work on this is done so it would be great to regain some momentum and get it merged. regards, Ben |
Dear @scanny,
I've created a pull request like you requested. I'm trying to figure out how to connect this pull request with the issue I created: #425
regards,