Skip to content

Feature/bookmarks #445

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 44 commits into
base: master
Choose a base branch
from

Conversation

Benjamin-T
Copy link

Dear @scanny,

I've created a pull request like you requested. I'm trying to figure out how to connect this pull request with the issue I created: #425

regards,

  • Ben

@scanny
Copy link
Contributor

scanny commented Nov 11, 2017

Okay, a few things:

  • I don't believe there's a way to "join" a pull request with an issue. I think the best you can do is to add a link from each to the other so it's easy to navigate. It won't matter much, though, once we get a little further along, as all the conversation will be in this thread rather than the other.

  • It's really important to get the rebase thing figured out so the changes you propose in any given commit reflect the final state of that code and are not reversed or modified in a subsequent commit. So if you change your mind about something, you go back to that commit, make the changes, amend the commit (git commit -v --amend --reuse-message=HEAD) and then rebase any later commits onto the one you've just amended.

    This gets really fast once you get the hang of it. What it produces is a successively refined set of "clean" commits that each accomplish a specific purpose. Each of these can be reviewed and merged individually as they pass muster. If you review historical commits you'll see this pattern over and over (except during some refactoring "periods"). A feature starts with a "docs: document XYZ analysis" commit, goes on to an "acpt: add scenarios for ..." acceptance test commit, then a series of individual property or method commits with its respective unit test. Each one is focused, relatively small (usually), and easy to validate.

    Otherwise, it's extremely difficult to see what the changes really are, because they're distributed across multiple commits. It would basically make it more work for the committer (me) than doing it myself.

    If you're a little bit wiggly on what rebasing is actually doing, I recommend this document: https://jwiegley.github.io/git-from-the-bottom-up/. It's the one that produced the "Aha!" moment for me way back when and delivered me into Git fluency (by understanding what I was doing, really, with each command).

I'll see if there are some other comments I can make on actual commits.

@Benjamin-T Benjamin-T force-pushed the feature/bookmarks branch 4 times, most recently from 39066d8 to 464667a Compare November 15, 2017 22:10
@scanny
Copy link
Contributor

scanny commented Nov 16, 2017

Okay, there are a few other things, but let's address them on the next pass. Does this give you enough to noodle on for the moment?

@Benjamin-T
Copy link
Author

Okay, I agree that we can solve this in the oxml layer. However I do not see how finding a matching bookmarkEnd solves the issue of seperating two bookmarksStarts:

It's likely that one wants to find the nearest bookmarkStart en add this ID to the bookmarkEnd. However, if I want to generate something like the example below, I have at the moment of inserting the bookmarkEnd two potential bookmarkStarts.

<w:document>
  <w:body>
    <w:bookmarkStart w:name="test_one" w:id="1" />
    <w:p>
      <w:r>
        <w:t>test</w:t>
      </w:r>
    </w:p>
    <w:bookmarkStart w:name="test_two" w:id="2" />
    <w:p>
      <w:r>
        <w:t>test_line_2</w:t>
      </w:r>
    </w:p>
    <w:bookmarkEnd w:id="1" />
    <w:bookmarkEnd w:id="2" />
    <w:sectPr w:rsidRPr="0006063C" w:rsidR="00FC693F" w:rsidSect="00034616">
      <w:pgSz w:w="12240" w:h="15840" />
      <w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0" />
      <w:cols w:space="720" />
      <w:docGrid w:linePitch="360" />
    </w:sectPr>
  </w:body>
</w:document>

In order to solve this, I think we do need some find the corresponding bookmarkStart based on name.

If I understand your code correctly this should be something like this:"

class CT_BookmarkStart(...):
    def match_bookmarkstart(self, name):
        """Returns `w:bookmarkStart` element having matching name, None if not present."""
        root_element = self.getroottree().getroot() 
        matching_bookmarkStarts = root_element.xpath(
            './/w:bookmarkStart[@w:name=\'%s\']' % name
        )
        if not matching_bookmarkStarts:
            return None
        return matching_bookmarkStarts[0]

@Benjamin-T
Copy link
Author

So the latest commit is capable of both generating bookmarks assuming the closed bookmarkStart belongs to the bookmarkEnd and it can match bookmarks based on the name if one wants to add bookmarks in some unconventional order.

Let me know what you think.

On an other note if you rebase the branch you basically lose all changes, but maybe down the road, you come back on some choice made in the past. How do you manage this? Creat a local duplicate branch of some of the older versions squashed or dropped in the rebase?

@scanny
Copy link
Contributor

scanny commented Nov 17, 2017

Okay, I agree that we can solve this in the oxml layer. However I do not see how finding a matching bookmarkEnd solves the issue of seperating two bookmarksStarts:

It's likely that one wants to find the nearest bookmarkStart en add this ID to the bookmarkEnd. However, if I want to generate something like the example below, I have at the moment of inserting the bookmarkEnd two potential bookmarkStarts.

I don't see why we would need to find a bookmarkStart in order to close it. The client tells us explicitly which bookmark they want to close by passing it to us:

document.end_bookmark(bookmark)

The bookmark object knows where its bookmarkStart object is (bookmark._element) and even has an .id property on it, so implementing .end_bookmark() is almost trivial:

class Document(...):
    ...
    def end_bookmark(self, bookmark):
    """Close *bookmark* at current end of document."""
    self._element.add_bookmarkEnd(bookmark.id)

@scanny
Copy link
Contributor

scanny commented Nov 17, 2017

This does set me thinking though. In the XML Semantics section, we need to discover and describe a few things:

  • What, if any, is the allowable "nesting" behavior of bookmarks? In particular, can a bookmark appear entirely within another bookmark? Can bookmarked areas overlap or must they appear in strictly nested order?

  • What happens if a bookmarkEnd element appears before its matching bookmarkStart element in a document? Is this allowable (like the user selected a block of text from bottom to top) or does it perhaps trigger a repair error (or just get ignored and perhaps confuse a developer).

    I don't think we'll have to do anything special to avoid problems with this because ending a bookmark requires an open bookmark as a parameter, which by definition means the start precedes the end.

  • What happens if two bookmarkEnd elements appear with the same id? (I.e. the bookmark is closed "twice").

I'm thinking the Document.end_bookmark() method needs to check for the "already closed" condition and raise an exception in that case.

def end_bookmark(self, bookmark):
    """Close *bookmark* at current end of document."""
    if bookmark.is_closed:
        raise ValueError('cannot end closed bookmark')
    self._element.add_bookmarkEnd(bookmark.id)

@scanny
Copy link
Contributor

scanny commented Nov 17, 2017

So the latest commit is capable of both generating bookmarks assuming the closed bookmarkStart belongs to the bookmarkEnd and it can match bookmarks based on the name if one wants to add bookmarks in some unconventional order.

Let me know what you think.

If a client wants to close bookmarks in an unconventional order, they are free to do that (assuming any overlaps etc. turn out to be allowable); all they need do is keep track of the Bookmark objects they received on each start, and end those where they want to.

This is a natural thing to do and gives the client full flexibility, for example:

outside_bookmark = document.start_bookmark('outside')
add_bookmark_content()
inside_bookmark = document.start_bookmark('inside')
add_nested_bookmark_content()
document.end_bookmark(inside_bookmark)
document.end_bookmark(outside_bookmark)

If we try to be "helpful" and essentially guess which one they want, we only give ourselves more work, the frustration inherent in attempting to read minds, and ultimately remove flexibility from the user. I don't know if it has a catchy name like YAGNI does, but the principle is "code that attempts to be "helpful", never is."

On an other note if you rebase the branch you basically lose all changes, but maybe down the road, you come back on some choice made in the past. How do you manage this? Create a local duplicate branch of some of the older versions squashed or dropped in the rebase?

Yes, this is quite true. Grooming is destructive. That occasionally comes to bite.

I think your mitigation approach has to be consistent with your style. I generally just go for it and rewrite from memory if necessary. If there's a big chunk I think I might want I either comment it out and leave it in until a later grooming pass or copy it out to my _scratch/ directory (you'll see that's in the .gitignore file). I think the latter is probably better for a collaboration scenario so you don't have to explain a big red patch of commented stuff in the commit. But in the end you should do what you're comfortable with. I have occasionally kept a separate repo in like python-pptx-old/ or something as a peer project, just for potential scrap recovery while working through a big spike.

@Benjamin-T
Copy link
Author

Benjamin-T commented Nov 17, 2017

What happens if two bookmarkEnd elements appear with the same id? (I.e. the bookmark is closed "twice").

The document is not compliant, and the bookmark is ignored, the id's should be unique. However, we might want to implement a raise for 'orphaned' bookmarkEnds?

What, if any, is the allowable "nesting" behavior of bookmarks? In particular, can a bookmark appear > entirely within another bookmark? Can bookmarked areas overlap or must they appear in strictly nested order?

This is possible, its also possible to generate such a structure using the word editor. (selected text, which is already part of another bookmark, and add it to another bookmark )

It's because of this nesting thing, that I made up this match bookmark start code:

class CT_BookmarkEnd(BaseOxmlElement):
    """
    The ``<w:bookmarkEnd>`` element
    """
    bmrk_id = RequiredAttribute('w:id', ST_RelationshipId)

    @property
    def _next_id(self):
        root = self.getroottree().getroot()
        return str(len(root.xpath('.//w:bookmarkStart')))

    def match_bookmarkstart(self, name):
        """Returns `w:bookmarkStart` element having matching name, None if not present."""
        root_element = self.getroottree().getroot() 
        matching_bookmarkStarts = root_element.xpath(
            './/w:bookmarkStart[@w:name=\'%s\']' % name
        )
        if not matching_bookmarkStarts:
            raise ValueError('Bookmark name not found')
            return None
        return matching_bookmarkStarts[0].bmrk_id

passing the bookmark object back and forth is off course also a good suggestion. I didn't think this was a preferred approach, if it is, it cancels the need for such a 'match_bookmarkstart' method on the CT_BookmarkEnd element

@scanny
Copy link
Contributor

scanny commented Nov 17, 2017

What happens if two bookmarkEnd elements appear with the same id? (I.e. the bookmark is closed "twice").

The document is not compliant, and the bookmark is ignored, the id's should be unique.

Okay, so then we definitely want a test with raise (as above) on .end_bookmark() so we don't end up adding in a second bookmarkEnd for a given bookmark, even if the client asks for it.

Btw, the .is_closed property should search for the bookmarkEnd element each time it is called and not store a reference to it in the proxy. Otherwise it could get out of sync if there ended up being two proxy objects for the same bookmark (which could easily happen).

However, we might want to implement a raise for 'orphaned' bookmarkEnds?

How would one of these arise? If we get a document with orphaned bookmarkEnds, that's on the client. As long as we don't add (or leave) any ourselves, we're fine. If we have a Bookmark.remove() method (which I think we might want), it needs to get rid of both start and end. With .end_bookmark(bookmark) requiring a valid bookmark object as a parameter (which itself requires a bookmarkStart element), I don't see a way to create one with the API, even accidentally.

This is one of a few good reasons to have .end_bookmark() take a Bookmark object rather than a name as the bookmark identifier. The other I can think of is the ambiguity of name as a "key". Btw, did you work out whether bookmark name is enforced unique yet?

@scanny
Copy link
Contributor

scanny commented Nov 17, 2017

Plan:

I'm eager to continue, so I drafted a plan for the next steps that are required:

  • Feature docs
  • Functional tests
  • Unit tests
  • Doc strings

A couple remarks on naming, just so we use the established vocabulary:

  • The first one here is either the "analysis page" or the "enhancement proposal" (like a PEP). I generally call it the analysis page. It's always one page, and at this stage its all about capturing the analysis behind the feature (group), so it seems an apt enough name. In any case, it's the term you'll see used over and over again in the project conversation history.

  • We've used the term "acceptance tests" for the ones written in Cucumber and run with Behave. I think that's kind of common for Cucumber language tests, but in any case it's how you'll see them referred to in the conversation history.

Analysis page

I've committed an initial version, I'll update it according the current implementation, but I think it's already more or less okay.

I think we've got a good start there, actually probably well toward 80%. Where we're weak is in the specifics of the protocol (which determine the API calls to be implemented, so are critical) and scrubbing through the conversation here to pull out the experimental details, like overlapping is okay, elements with same ID trigger repair error, etc. I'll have another look once you've updated it as far as you can. I expect we'll have some more conversation surrounding the protocol section.

Acceptance tests

These tests are run using behave, I haven't used this before, but there are many examples in the package. However, I think we need to agree on which feature files are going to be required, since this implementation is accross the document, paragraph and run objects.

As with all other things in this package, my guess is that it needs to be thorough ;) therefore I think we require the following feature files:

  • run-add_bookmark_start.feature
  • run-add_bookmark_end.feature
  • par-add_bookmark_start.feature
  • par-add_bookmark_end.feature
  • doc-add_bookmark_start.feature
  • doc-add_bookmark_end.feature

Let's just put them all in bookmark.feature for now. I've gone back and forth and back again on how to organize acceptance tests, but where I've landed for now is put everything associated with an object type in the same feature. If it gets over a couple hundred lines, we'll find a way to break it apart neatly.

There are a lot of moves to figure out for yourself when starting here, so I propose we start with the following single acceptance test and then we can see where we are when that one's passing; sometimes I add new ones in groups.

Actually, this one should go in doc-access-collections.feature so you won't even need to create a new file initially:

Given a Document object as document
 Then document.bookmarks is a Bookmarks object

Unit-tests

Since we touched upon many other files, I think this will be a bit of work as well, or maybe you think of a more elegant way to implement the bookmark code, but there will be additions required to the following unittest files:

  • \tests\test_document.py
  • \tests\oxml\test_run.py
  • \tests\test_document.py
  • \tests\oxml\text.py
  • \tests\text\test_run.py
  • \tests\text\test_paragraph.py

And a new file should be added:

  • \tests\text\test_bookmark.py

However, there seem to be some tests that are failing?

You'll be developing test-driven (as opposed to spiking, which you can feel free to continue to do). One consequence of this is you don't need to try to figure out what the unit tests will look like as a whole; they will emerge in a natural pattern as you go.

Another consequence is that you will be inserting your acceptance test commit immediately after your analysis page commit. So while unit tests may be failing on your spike tip, they will not fail on your test-driven (TDD) tip. The acceptance test will be decorated with @wip so it can be skipped until it passes and doesn't count as a failing test for interim purposes.

When your acceptance test is run, it will fail (or it's not a good test). Where it fails leads you to the first unit test. Let's get to that when we see it. I expect it will give rise to a commit named: doc: add Document.bookmarks.

Docstrings

You mentioned that you've adopted a certain style of docstring? Anyway, I think I need to add the docstrings in one commit, and you can review them in one go.

You can add these into your spike if you want, I don't mind either way. Where they will be required, is in every commit up to your TDD tip.

@scanny
Copy link
Contributor

scanny commented Nov 17, 2017

One last thought. Since it looks like the protocol for each bookmark "parent" will be the same, and even the implementation will be the same (or can be made to be), I'm thinking we want to use a mixin for implementation. If you haven't used one of those before, it's a class that other classes inherit from, but don't "completely" subclass.

So we would have for example:

class Paragraph(Parented, BookmarkParent):
    ...

# and in another module
class BookmarkParent(object):
    def start_bookmark(self, name):
        ...
    def end_bookmark(self, bookmark)
        ...

This way we don't have a bunch of redundant code hanging around and also reduces the number of unit tests required (if we're clever about how we do that).

@Benjamin-T
Copy link
Author

Benjamin-T commented Nov 23, 2017

Hey @scanny , unfortunately I didn't have much time to work on this project. However I did look into your mixin proposal. I think this could work for the Run and Paragraph part, since these both share the self._element attribute. If we mixin the object like this:

class BookmarkParent(object):
    def start_bookmark(self, name):
        bookmarkstart = self._element.add_bookmarkStart(name)
        return Bookmark(bookmarkstart)

    def end_bookmark(self, name=None):
        bookmarkend = self._element.add_bookmarkEnd(name)
        return Bookmark(bookmarkend)

However, this leads to a problem in the _Body object. I used the following workaround:

class BookmarkParent(object):
    def start_bookmark(self, name):
        if hasattr(self, '_body'):
            self._element = self._body
        bookmarkstart = self._element.add_bookmarkStart(name)
        return Bookmark(bookmarkstart)

    def end_bookmark(self, bookmark=None):
        if hasattr(self, '_body'):
            self._element = self._body
        bookmarkend = self._element.add_bookmarkEnd(bookmark)
        if bookmark is None:
            bookmarkend._next_id
        else:
            bookmarkend._id = bookmark._id
        return Bookmark(bookmarkend)

Also, the end_bookmark function has the optional argument bookmark this means we can get rid of the more complicated match_bookmarkstart method I cooked up last week.

Everything seems to be in working order.

let me know what you think of this solution.

Ben

@scanny
Copy link
Contributor

scanny commented Nov 26, 2017

I'm thinking that BookmarkParent should be subclassed by docx.blkcntr.BlockItemContainer, rather than Body itself (Body subclasses BlockItemContainer).

BlockItemContainer already has an ._element attribute (which Body inherits, by the way), so no need for the if hasattr() bit. In general, any element "can" have an ._element attribute, so if we found a proxy object that didn't have that, we would just add it in the constructor, e.g. self._element = self._body = body. I realized part-way through development that it would be important to have a more abstract name for the proxied element, which is when I started using ._element instead of or in addition to for example ._body. I just didn't go back and add it everywhere it was missing; I just add it in as I encounter a need for it.

@Benjamin-T
Copy link
Author

Ah, that would be an elegant solution for our "problem" I'll see if I have some time this evening to try and make it work.

@Benjamin-T
Copy link
Author

If I understand you correctly, you end up with:

class BlockItemContainer(Parented, BookmarkParent):

But this doesn't solve this error: AttributeError: 'CT_Document' object has no attribute 'add_bookmarkStart'

Which isn't there when I keep the if hasattr() code in place.

I do have to change, '_body' with '_element' in the _Body() class, so I end up with this code below:

class _Body(BlockItemContainer):
    """
    Proxy for ``<w:body>`` element in this document, having primarily a
    container role.
    """

    def __init__(self, body_elm, parent):
        super(_Body, self).__init__(body_elm, parent)
        self._body = body_elm

    def clear_content(self):
        """
        Return this |_Body| instance after clearing it of all content.
        Section properties for the main document story, if present, are
        preserved.
        """
        self._body.clear_content()
        return self

    def add_bookmarkStart(self, name):
        bookmarkStart = self._element._add_bookmarkStart()
        bookmarkStart.name = name
        bookmarkStart.bmrk_id = bookmarkStart._next_id
        return bookmarkStart

    def add_bookmarkEnd(self, bookmark=None):
        bookmarkEnd = self._element._add_bookmarkEnd()
        if bookmark is None:
            bookmarkEnd.bmrk_id = bookmarkEnd._next_id
        else:
            bookmarkEnd.bmrk_id = bookmark._id
        return bookmarkEnd

But to be honest, I'm lost. I do not understand how I could get this to work.

@scanny
Copy link
Contributor

scanny commented Nov 29, 2017

Ok, here are some ideas on how this would work, working from the outside in:

  • Document has a start_bookmark() method. This is what you call as a developer using python-docx.

  • The bookmark is actually added in the w:body element, so Document.start_bookmark() delegates that job to its _Body object by calling the single line return self._body.start_bookmark(). You see this same pattern with other API methods of Document like .add_paragraph() etc.

  • _Body gets its .start_bookmark() method by inheriting it from BlockItemContainer. Its ._element attribute gets set to its w:body element in the super(_Body, self).__init__(body_elm, ...) call in its constructor (init method). This is because the ._element attribute is actually provided by BlockItemContainer and _Body inherits it.

  • So the error you should get, if any, once all this is wired correctly, is CT_Body object (or whatever the body element is called) has no attribute add_bookmarkStart. This is remedied by adding the bookmarkStart = ZeroOrMore(...) line to the CT_Body class.

Does that make sense?

@Benjamin-T
Copy link
Author

Benjamin-T commented Nov 30, 2017

Yes it did! Thanks a lot @scanny I completely overlooked the fact that we can offcourse relay the call using self._body.start_bookmark()

So I've found some time to implement the first working version, and rebased it onto the first stab at the documentation.

I've also updated the 'TO DO' list:

Plan:

  • Feature docs
  • Functional tests
  • Unit tests
  • Doc strings - Of course, these need review.

Feature docs

I've looked over our conversation and the following things came up:

  • Implement a raise for ended bookmarks.
    I've tried to do this, see the current commit.

  • Implement Remove bookmark feature
    No sure on the specifics on this one, do we still, but if I would have to guess, we need to implement this in the Bookmarks object?

  • Are bookmarks enforced unique?
    The word document allows for duplicate keys and it does not trigger a repair error if we there are.
    However, we could implement a warning when aduplicate key is added?

  • Are overlapping bookmarks okay?
    yes, and in the current implementation, it works.

  • Complex type names.
    The current implementation uses the names CT_bookmarkStart and CT_bookmarkEnd however, closer inspection of the XML schema excerpts shows that these probably need to be renamed to CT_Bookmark and CT_MarkupRange respectively.

         <xsd:element name="bookmarkStart"               type="CT_Bookmark"/>
         <xsd:element name="bookmarkEnd"                 type="CT_MarkupRange"/>

Functional tests

  • doc-access-collections.feature
  • doc-bookmark.feature

Unit-tests

Since we opted for the Mixin option, the amount of unittest files that either need revising or need to be created is reduced to two:

  • \tests\test_document.py
  • \tests\text\test_bookmarks.py

However, there seem to be some tests that are failing?

@Benjamin-T
Copy link
Author

The code reproduces the XML documentation example, however my implementation starts at 1. whereas word starts at zero. This should probably be in line.

doc = Document()
par = doc.add_paragraph()
par.add_run('Example')
bmrk = par.start_bookmark(name='sampleBookmark')
par.add_run(' text.')

par = doc.add_paragraph()
par.add_run('Example')
par.end_bookmark(bmrk)
par.add_run(' text.')
    <w:p>
      <w:r>
        <w:t>Example</w:t>
      </w:r>
      <w:bookmarkStart w:name="sampleBookmark" w:id="1" />
      <w:r>
        <w:t xml:space="preserve"> text.</w:t>
      </w:r>
    </w:p>
    <w:p>
      <w:r>
        <w:t>Example</w:t>
      </w:r>
      <w:bookmarkEnd w:id="1" />
      <w:r>
        <w:t xml:space="preserve"> text.</w:t>
      </w:r>
    </w:p>

xsd excerp:

  <w:p>
     <w:r>
       <w:t>Example</w:t>
     </w:r>
     <w:bookmarkStart w:id="0" w:name="sampleBookmark" />
     <w:r>
       <w:t xml:space="preserve"> text.</w:t>
     </w:r>
  </w:p>
  <w:p>
    <w:r>
      <w:t>Example</w:t>
    </w:r>
      <w:bookmarkEnd w:id="0" />
    <w:r>
      <w:t xml:space="preserve"> text.</w:t>
    </w:r>
  </w:p>  

@Benjamin-T
Copy link
Author

As you probably have noticed, I've tried to do my first TDD tip, let me know whether this is how it should be done.

regards,

Ben

@scanny
Copy link
Contributor

scanny commented Dec 1, 2017

  • Implement Remove bookmark feature
    Not sure on the specifics on this one, do we still, but if I would have to guess, we need to implement this in the Bookmarks object?

I don't think I've seen a use case yet that would drive the need for this. It sounds like a good thing to have and would be easy enough to implement, but I propose we wait until we have a live need for it.

While we're talking about use cases, have you thought about any higher-level use cases? Maybe captions would be a good place to start, so thinking through how a developer (or we) would use bookmarks to implement captions. So something like this added to the protocol section of the analysis document:

>>> # ---add caption ---
>>> caption_bookmark = document.start_bookmark('Picture Caption')
>>> document.add_picture(...)
>>> document.end_bookmark(caption_bookmark)

...or whatever that would need to look like. I'm not sure where the caption text goes in but expect you've got an idea.

Based on these use cases, we'll be able to prioritize individual methods and properties to accomplish meaningful new capabilities rather than just nice-to-have ones :)

  • Are bookmarks enforced unique?
    The word document allows for duplicate keys and it does not trigger a repair error if we there are.
    However, we could implement a warning when aduplicate key is added?

Okay, this is interesting. I wonder how it tells where the bookmark ends? I'd be inclined for us to adopt a "non-greedy" policy and identify the first-encountered bookmarkEnd having a matching ID as the end marker for a given bookmarkStart. This will be important when it comes to identifying the end marker of a bookmark.

In any case, we should make sure that we do not add any bookmarks with duplicate keys. An implementation of .next_available_bookmark_id should take care of that just fine.

Btw, did we already establish what, if anything, happens when the bookmark end appears before the bookmark start? This one's another for the experimental results (Word Behaviors) section of the analysis document

  • Are overlapping bookmarks okay?
    yes, and in the current implementation, it works.

Good to know. Let's make sure this is stated in the Word behaviors section of the analysis document.

  • Complex type names.
    The current implementation uses the names CT_bookmarkStart and CT_bookmarkEnd however, closer inspection of the XML schema excerpts shows that these probably need to be renamed to CT_Bookmark and CT_MarkupRange respectively.

Yes. There are subtle differences in the case of the various XML-element-related names. The XML Schema names are always UpperCamelCase (with the prefix being all-caps, e.g. CT, ST, EG, etc.). These are always copied exactly when used as class names. Likewise lowerCamelCase for element tag names and attributes (and the variables that represent those elements and attributes). This convention provides useful visual cues for what kind of thing you're looking at in the code.

Unit-tests

Since we opted for the Mixin option, the amount of unittest files that either need revising or need to be created is reduced to two:

  • \tests\test_document.py
  • \tests\text\test_bookmarks.py

I'm inclined to make the second one just tests/test_bookmarks.py. Bookmarks are not strictly related to text; they seem to be pretty much their own thing.

However, there seem to be some tests that are failing?

As we noted earlier, there should be no failing tests on your TDD tip, and failing tests on the spike tip are of no concern.

If an existing test starts failing on a particular TDD commit, you fix it then. Generally this will be because you changed existing code, not because you added some.

@Benjamin-T
Copy link
Author

Thanks for looking at my commits @scanny,

it's unfortunate that I had a different approach, but looking at your implementation, I can definitely see the elegance. I didn't know that it was possible to override the _add_bookmarkStart method on OxmlElement level.

I've created acceptance tests for the end_bookmark() method.

I thought this part of the implementation was really straightforward, but please let me know if I overlooked any underlying complexities.

As soon as we can get these commits merged, I would really like to continue to implement the start_bookmark() and end_bookmark() methods on the paragraph level.

@Benjamin-T
Copy link
Author

@scanny
I don't have the time right know to get the acceptance tests and unittests updated, but I've spiked the paragraph implementation. If you could glance over it and tell me whether you agree with it, I will make the tdd commits.

@scanny
Copy link
Contributor

scanny commented Sep 4, 2019

Yeah, that looks about right to me. I suppose inside a Paragraph (between runs) is the only place they can appear "inline". Is that your understanding?

@Benjamin-T
Copy link
Author

@scanny
Yes, I've committed the updated feature proposal. I've taken a look at the xsd schema and what I conclude from the schema excerpts is that the bookmarks can only be added in the 'block-item-container' and at paragraph level.

I've tested this with inline bookmarks using the word editor and it results in different paragraph level elements "in between" the run elements:

    <w:p w:rsidR="009039EA" w:rsidRDefault="005A2B2B" w14:paraId="79A0AC1D" w14:textId="2DF56D06">
      <w:r>
        <w:t xml:space="preserve">Inline </w:t>
      </w:r>
      <w:bookmarkStart w:name="test" w:id="0" />
      <w:r>
        <w:t>bookmark</w:t>
      </w:r>
      <w:bookmarkEnd w:id="0" />
      <w:r>
        <w:t xml:space="preserve"> text</w:t>
      </w:r>
    </w:p>

@Benjamin-T Benjamin-T force-pushed the feature/bookmarks branch 6 times, most recently from bdbd6fe to a04ef91 Compare September 7, 2019 05:53
@Benjamin-T
Copy link
Author

Dear @scanny

I've added the TDD commits to support paragraph level bookmarks as well. If you could review them and tell me if there is something missing, that would be most appreciated.

After these commits the acceptance tests are exhausted again. Do we need additional functionalities? Perhaps a delete bookmark but besides that one I would say we are getting close to finalizing this feature :D

@Benjamin-T
Copy link
Author

@scanny, did you find any time to take a look at my latest commits? Feedback would be much appreciated.
Regards,
Benjamin

@Benjamin-T Benjamin-T requested a review from scanny October 19, 2019 12:49
Copy link
Contributor

@scanny scanny left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here are some quick comments on this next commit. If you can get these in I'll see if I can move us along. Sorry I only have time to review one commit at the moment :)

@Benjamin-T
Copy link
Author

@scanny Any thoughts Steve?

@Benjamin-T
Copy link
Author

@scanny If you could take a look at the functional tests and perhaps provide some feedback on some of the other commits we can perhaps get this PR merged!

@Benjamin-T
Copy link
Author

@scanny Can you provide me with some feedback so we can some of this merged?

@Benjamin-T
Copy link
Author

@scanny it's been another 3 months, would it be possible to get a review on the PR? If you rather discuss the way forward in a google meet or zoom meeting, that would be okay as well.

regards,

Ben

@drewh1991
Copy link

Hey there, I've researching the development of this feature. It would be really useful to have for my use case. Will this be merged? Is there anything I can help with?

Thanks,
Drew

@Benjamin-T
Copy link
Author

Hi @drewh1991,

I have every intention of getting this merged, however the progress stalled pending review of @scanny. I do think the biggest part of the work on this is done so it would be great to regain some momentum and get it merged.

regards,

Ben

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants