-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Feature: Add Bookmark #425
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hi @Benjamin-T, yes, creating an analysis document (aka. enhancement proposal) is the right first step, and that is independently committable whether you go on to implement or not. To do that, you should fork this repo and create a new branch based on |
Thanks for the reply @scanny Therefore I would propose first to develop the bookmark code, and subsequently implement the captions and references.
Question: How do I add such a xml element? class CT_R(BaseOxmlElement): object? Interesting link: BookmarksWithin a WordprocessingML document, bookmarks refer to arbitrary regions of content which are bounded and XML:<w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="sampleBookmark" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkEnd w:id="0" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p> bookmarkStartThis element specifies the start of a bookmark within a WordprocessingML document. This start marker is bookmarkEndThis element specifies the end of a bookmark within a WordprocessingML document. This end marker is The following WordprocessingML illustrates an example of content which fufills this constraint: <w:p>
<w:r>
<w:t xml:space="preserve">This is sentence one.</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="testing123"/>
<w:r>
<w:t>This is sentence two.</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t xml:space="preserve">This </w:t>
</w:r>
<w:bookmarkEnd w:id="0"/>
<w:r>
<w:t>is sentence three.</w:t>
</w:r>
</w:p> |
Initial proposalI would place the method 'add_bookmark()' in the docx.oxml.runthe bookmarkStart element: from docx.oxml import OxmlElement
def add_bookmark_start(self, name='test_bookmark'):
bmrk = OxmlElement('w:bookmarkStart')
bmrk.set(qn('w:id'), '1')
bmrk.set(qn('w:name'), name)
return bmrk
The bookmarkEnd element:
def add_bookmark_end(self):
bmrk = OxmlElement('w:bookmarkEnd')
bmrk.set(qn('w:id'), '1')
return bmrk Challenges:
docx.text.runIn the def add_bookmark(self, name='test_bookmark'):
self._r.append(self._r.add_bookmark_start(name=name))
self._r.append(self._r.add_bookmark_end()) Simple test code:import docx
test = docx.Document()
par = test.add_paragraph()
test_run = par.add_run()
test_run.add_text('Test')
test_run.add_bookmark()
test.save('Test_bookmark.docx') |
FeatureI've never worked with behave, but this is my first shot at a feature text: The feature text:
After some googling, I understand running tests is done like this: Functional tests:
Unittests:
To Do:
|
Regarding how to add an element, in
|
Regarding the API/protocol for this, we'll need to answer a couple XML Schema-related questions, but the call to add a bookmark will definitely not be on a I'd be inclined (but still in brainstorming mode) to something like: bookmark = paragraph.start_bookmark('my bookmark name')
# ... add more runs, maybe more paragraphs
same_or_later_paragraph.end_bookmark(bookmark) A couple notes:
|
Regarding acceptance tests, these can only exercise the API, and therefore can only test what the API allows us to detect. A consequence of this is we need the "read" API before (usually just before) we need the "write" API. So you probably want to be thinking of how you find bookmarks in a document and what one would want to do with them once found. This is about the time I start taking a close look at the Microsoft Visual Basic API for Word to see how they're handled there. A Google search on 'vba word bookmark' is also a worthwhile exercise, to get a sense for the VBA calling protocol for bookmarks. This will often uncover unexpected complexities that would be very expensive to discover late in the process. https://msdn.microsoft.com/en-us/VBA/Word-VBA/articles/object-model-word-vba-reference |
First of all, thanks for the detailed response @scanny. WordprocessingML - documentationApparently, the bookmark XML predates the concepts of the XML: They present the following example, of a bookmark spanning a paragraph: <w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkStart w:id="0" w:name="sampleBookmark" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p>
<w:p>
<w:r>
<w:t>Example</w:t>
</w:r>
<w:bookmarkEnd w:id="0" />
<w:r>
<w:t xml:space="preserve"> text.</w:t>
</w:r>
</w:p> It seems that therefore the element should be available at 'any' point. The document lists the possible parent objects as well. Its a large list at first, but a lot of it isn't implemented (yet) in the docx package. I've listed the possible parents, and grouped them a bit, sorted them based on the level and likelihood of implementation. - Ironically, the run object is practically the only object in the XML structure that isn't a possible parent of the bookmark object.
Requirements:For now just thought of two, probably more.
Custom Element ClassFor the bookmark to work, we would need two custom element classes: |
I've renamed the feature to add_bookmark only. As it looks like it's a better idea to first fully focus on the correct implementation of the bookmark object, and afterwards see how things like captions and cross references interact with the bookmarks. More detailed VBA function analysisLike you recommended, I've investigated the VBA bookmark documentation. This provided some more background and insight about what kind of behavior and functionality is expected from the bookmark object. BookmarksA collection of bookmark object that represent the bookmarks in the specified selection, range, or document. Also, the object returns the bookmark corresponding to the index, which is either a key or a number. The numbered index can be used, and the named indices can apparently be sorted alphabetically. Bookmark methods:There are three methods described:
Bookmark properties:There is a total of ten properties:
Properties Included or OmittedI think features 1, 3, 10 can be omitted, and I'm unsure about 8. Correct me if I'm wrong, but can I conclude from this that there should be some Container object for bookmarks? |
Bookmarks - ObjectContinuing on the bookmarks 'container' object, it has the following properties and methods: Properties
Methods:
|
Okay, regarding the parentage aspect, I'm thinking the key parent elements to start with are One possibility is to implement the bookmark functionality as a mixin, so it can be added to multiple "locations" without duplicating the code. That would also provide a uniform interface, regardless of the parent object. Maybe we call that |
Regarding the MS-API for bookmarks:
|
Regarding the MS-API for the Bookmarks collection:
As for the methods:
So in short, I think we're good with a |
Allright, summarizing: If I understand correctly we're going to develop a the Bookmark - ObjectAttributes / properties
These could be added in the 'future'. The Methods / setters
Bookmarks - ObjectThe bookmarks object will be placed within the Attributes / properties:
methods
I'm currently trying to develop some kind of basic version of the above. |
Without thinking about the python-docx package too much, I had something like this in mind. I'm familiar with a lot of the from docx.oxml.ns import qn
from docx.oxml import OxmlElement
class Bookmark(object):
def __init__(self):
self.bmrk_start = OxmlElement('w:bookmarkStart')
self.bmrk_end = OxmlElement('w:bookmarkEnd')
self.name = None
self.id = None
def set_id(self, id):
self.id = id
self.bmrk_start.set(qn('w:id'), str(id))
self.bmrk_end.set(qn('w:id'), str(id))
return self
def set_name(self, name):
self.name = name
return self
def start_bookmark(self):
return self.bmrk_start
def end_bookmark(self):
return self.bmrk_en
class Bookmarks(object):
def __init__(self):
self.bookmarks = []
self.mapping = []
def __iter__(self):
for bookmark in self.bookmarks:
yield bookmark
@property
def count(self):
return len(self.bookmarks)
def add_bookmark(self, name):
idx = self.count + 1
bm = Bookmark().set_id(idx)
bm.set_name(name)
self.bookmarks.append(bm)
self.mapping.append(name)
return bm
def get_bookmark(self, name):
if name in self.mapping:
return self.mapping.index(name)
def item(self, idx):
return self.bookmarks[idx] |
Okay, so lets start with your proposed interface for
|
Regarding implementation details, let's start at the "outside" with class Document(...):
# ---other properties and methods ...
@lazyproperty
def bookmarks(self):
return Bookmarks(self._element)
So on to the from collections import Sequence
class Bookmarks(Sequence):
def __init__(self, document_elm):
super(Bookmarks, self).__init__()
self._document = self._element = self.document_elm
def __iter__(self):
for bookmarkStart in self._bookmarkStarts:
yield Bookmark(bookmarkStart)
def __getitem__(self, idx):
bookmarkStart = self._bookmarkStarts[idx]
return Bookmark(bookmarkStart)
def __len__(self):
return len(self._bookmarkStarts)
def get(self, name, default=None):
for bookmarkStart in self._bookmarkStarts:
if bookmarkStart.name == name:
return Bookmark(bookmarkStart)
return default
@property
def _bookmarkStarts(self):
return self._document.xpath('.//w:bookmarkStart')
I'm out of time right now, let's leave it there and let me know how you digest this chunk, then we can get into the |
First of all, thanks for all the ideas and input. I really feel like I'm learning a lot here. And I really appreciate you taking the time to explain me some of the concepts. I know its a bit too soon, but I really wanted to give it a go, so I've tried to create the If you could please take a look and see if I'm on the right track? I really struggled with understanding how these
|
If you want to submit code for review, a pull request (PR) is the best way to do it. You might need to to a little research first. The key thing is to be able to make and submit changes without creating a new pull request each time. Basically, the way you do this is creating a new branch, based on master (on your fork) and making that branch the base of the PR. When you make changes, you rebase that branch to suit, and the PR is automatically updated (when you push that branch). If you're not familiar with rebasing with Git you'll need to study up; it's an essential skill for this kind of interaction. Then I can make comments directly on the code changes you submitted; you can then make changes, rebase the branch and push, and submit the revised code. Regarding your first stab:
I'd recommend you focus on getting the pull request bit working and show me where you are on the analysis document, in particular the XML Schema excerpts, and then we can come back to these two definitions. Btw, there's nothing wrong (in my mind) with jumping into the implementation, just to get straight in your mind how it's all going to work. I do that all the time (I call it 'spiking'). The only problem is thinking you're done once you have it working. That's where most folks stop and that's why they don't get a merge. From there you are in a good position to develop the acceptance and unit tests, flesh out the documentation required, and form it into a series of coherent commits that can be confidently committed without a lot of rework. All those things are required to actually add the feature without degrading the existing code base. |
Allright! thanks for the feedback, I guess i have some studying to do. I got rid of the set methods, however, the CT_BookmarkEnd and CT_BookmarkStart names do not rename to 'w:bookmarkEnd' and 'w:bookmarkStart'. But I dont understand why and how I can change this? class CT_BookmarkStart(BaseOxmlElement):
"""
``<w:bookmarkStart>`` element
"""
bookmark = OneOrMore('w:bookmarkStart', ST_String)
name = RequiredAttribute('w:name', ST_String)
bmrk_id = RequiredAttribute('w:id', ST_RelationshipId) def add_bookmark_start(self, id, name='test_bookmark'):
bmrk = CT_BookmarkStart()
bmrk.bookmark = 'w:bookmarkStart'
bmrk.bmrk_id = str(id)
bmrk.name = name
self.append(bmrk)
return bmrk This leads to the following xml: <w:p>
<w:r>
<w:t>Hello</w:t>
</w:r>
<CT_BookmarkStart w:id="2" w:name="text_to_be_bookmarked" />
<w:r>
<w:t>Hello</w:t>
</w:r>
<CT_BookmarkEnd w:id="2" />
</w:p> Clearly Im overlooking something... |
After hours of puzzeling and camparing code, I got it working! I'also created a pull request like you asked; I purposly committed the 'removed obsolete code' commit to check if the pull request is automatically updated with the new code. Let me know if I need to change some settings. the class CT_P(BaseOxmlElement):
"""
``<w:p>`` element, containing the properties and text for a paragraph.
"""
pPr = ZeroOrOne('w:pPr')
r = ZeroOrMore('w:r')
bookmarkStart = ZeroOrMore('w:bookmarkStart')
bookmarkEnd = ZeroOrMore('w:bookmarkEnd') the from __future__ import (
absolute_import, division, print_function, unicode_literals
)
from ..enum.style import WD_STYLE_TYPE
from .parfmt import ParagraphFormat
from .run import Run
from ..shared import Parented
from docx.shared import ElementProxy
from docx.oxml.bookmark import CT_BookmarkStart, CT_BookmarkEnd
class Paragraph(Parented):
"""
Proxy object wrapping ``<w:p>`` element.
"""
def __init__(self, p, parent):
super(Paragraph, self).__init__(parent)
self._p = self._element = p
def add_bookmark_start(self, id, name):
bmrk = self._p._add_bookmarkStart()
bmrk.name = name
bmrk.bmrk_id = str(id)
return self._p.append(bmrk)
def add_bookmark_end(self, id):
bmrk = self._p._add_bookmarkEnd()
bmrk.bmrk_id =str(id)
return self._p.append(bmrk) and the elements: class CT_BookmarkStart(BaseOxmlElement):
"""
``<w:bookmarkStart>`` element
"""
name = RequiredAttribute('w:name', ST_String)
bmrk_id = RequiredAttribute('w:id', ST_RelationshipId)
class CT_BookmarkEnd(BaseOxmlElement):
"""
The ``<w:bookmarkEnd>`` element
"""
bmrk_id = RequiredAttribute('w:id', ST_RelationshipId) |
Since I finally got a half decent version working, I'll start focussing more on the development of the analysis documentation. Also I just found out that 'location' of the feature is within the The reason I placed it in the paragraph type is that the bookmarks are placed in a paragraph. Therefore i assumed that that would be the logical location. This is interesting; <xsd:element name="bookmarkStart" type="CT_Bookmark"/>
<xsd:element name="bookmarkEnd" type="CT_MarkupRange"/> <xsd:complexType name="CT_Body"> <!-- denormalized -->
<xsd:sequence>
<xsd:choice minOccurs="0" maxOccurs="unbounded">
<xsd:element name="p" type="CT_P"/>
<xsd:element name="tbl" type="CT_Tbl"/>
<xsd:element name="sdt" type="CT_SdtBlock"/>
<xsd:element name="customXml" type="CT_CustomXmlBlock"/>
<xsd:element name="altChunk" type="CT_AltChunk"/>
<xsd:element name="proofErr" type="CT_ProofErr"/>
<xsd:element name="permStart" type="CT_PermStart"/>
<xsd:element name="permEnd" type="CT_Perm"/>
<xsd:element name="bookmarkStart" type="CT_Bookmark"/>
<xsd:element name="bookmarkEnd" type="CT_MarkupRange"/>
<xsd:element name="moveFromRangeStart" type="CT_MoveBookmark"/>
<xsd:element name="moveFromRangeEnd" type="CT_MarkupRange"/>
<xsd:element name="moveToRangeStart" type="CT_MoveBookmark"/>
<xsd:element name="moveToRangeEnd" type="CT_MarkupRange"/>
<xsd:element name="commentRangeStart" type="CT_MarkupRange"/>
<xsd:element name="commentRangeEnd" type="CT_MarkupRange"/>
<xsd:element name="customXmlInsRangeStart" type="CT_TrackChange"/>
<xsd:element name="customXmlInsRangeEnd" type="CT_Markup"/>
<xsd:element name="customXmlDelRangeStart" type="CT_TrackChange"/>
<xsd:element name="customXmlDelRangeEnd" type="CT_Markup"/>
<xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange"/>
<xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup"/>
<xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange"/>
<xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup"/>
<xsd:element name="ins" type="CT_RunTrackChange"/>
<xsd:element name="del" type="CT_RunTrackChange"/>
<xsd:element name="moveFrom" type="CT_RunTrackChange"/>
<xsd:element name="moveTo" type="CT_RunTrackChange"/>
<xsd:element ref="m:oMathPara"/>
<xsd:element ref="m:oMath"/>
</xsd:choice>
<xsd:element name="sectPr" minOccurs="0" maxOccurs="1" type="CT_SectPr"/>
</xsd:sequence>
</xsd:complexType> |
from the wml.xsd excerpts: <xsd:complexType name="CT_BookmarkRange">
<xsd:complexContent>
<xsd:extension base="CT_MarkupRange">
<xsd:attribute name="colFirst" type="ST_DecimalNumber" use="optional">
<xsd:annotation>
<xsd:documentation>First Table Column Covered By Bookmark</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
<xsd:attribute name="colLast" type="ST_DecimalNumber" use="optional">
<xsd:annotation>
<xsd:documentation>Last Table Column Covered By Bookmark</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType>
<xsd:complexType name="CT_Bookmark">
<xsd:complexContent>
<xsd:extension base="CT_BookmarkRange">
<xsd:attribute name="name" type="ST_String" use="required">
<xsd:annotation>
<xsd:documentation>Bookmark Name</xsd:documentation>
</xsd:annotation>
</xsd:attribute>
</xsd:extension>
</xsd:complexContent>
</xsd:complexType> <xsd:group name="EG_RangeMarkupElements">
<xsd:choice>
<xsd:element name="bookmarkStart" type="CT_Bookmark">
<xsd:annotation>
<xsd:documentation>Bookmark Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="bookmarkEnd" type="CT_MarkupRange">
<xsd:annotation>
<xsd:documentation>Bookmark End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="moveFromRangeStart" type="CT_MoveBookmark">
<xsd:annotation>
<xsd:documentation>Move Source Location Container - Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="moveFromRangeEnd" type="CT_MarkupRange">
<xsd:annotation>
<xsd:documentation>Move Source Location Container - End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="moveToRangeStart" type="CT_MoveBookmark">
<xsd:annotation>
<xsd:documentation>Move Destination Location Container - Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="moveToRangeEnd" type="CT_MarkupRange">
<xsd:annotation>
<xsd:documentation>Move Destination Location Container - End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="commentRangeStart" type="CT_MarkupRange">
<xsd:annotation>
<xsd:documentation>Comment Anchor Range Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="commentRangeEnd" type="CT_MarkupRange">
<xsd:annotation>
<xsd:documentation>Comment Anchor Range End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlInsRangeStart" type="CT_TrackChange">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Insertion Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlInsRangeEnd" type="CT_Markup">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Insertion End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlDelRangeStart" type="CT_TrackChange">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Deletion Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlDelRangeEnd" type="CT_Markup">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Deletion End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlMoveFromRangeStart" type="CT_TrackChange">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Move Source Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlMoveFromRangeEnd" type="CT_Markup">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Move Source End</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlMoveToRangeStart" type="CT_TrackChange">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Move Destination Location Start</xsd:documentation>
</xsd:annotation>
</xsd:element>
<xsd:element name="customXmlMoveToRangeEnd" type="CT_Markup">
<xsd:annotation>
<xsd:documentation>Custom XML Markup Move Destination Location End</xsd:documentation>
</xsd:annotation>
</xsd:element>
</xsd:choice>
</xsd:group> |
Hmm, this is interesting. I haven't come across annotation and documentation strings in the schema before. Where did you pull these from? The schema files I use are in the code tree here (and in the directory just above): |
Not sure what you mean by the "feature". This term is usually used here to indicate the API methods and properties added to implement some useful behavior. But bookmarks appear to be part of both the body and paragraph objects and some others as well. So I'm not sure there's a clear "locale" for these, unlike most other objects in the code base. A bookmark can be added to the body ( So something to noodle on :) |
this is a really interesting document: |
Ah, of course, the ECMA spec, I had forgotten about that. It's been a while, so I'm not sure of the current status, but when I started the project I was using the ECMA spec until I started puzzling over why certain elements weren't in there or were named differently. The short story (at the time at least) is that Microsoft uses the ISO spec (ISO/IEC 29500). Anyway, the ECMA spec might be useful for reference, but I believe the ISO spec is still the definitive one. I'll be looking for XML Schema excerpts from the ISO spec/schemas. |
@scanny, I've studied up on the rebasing, and I think I got it figured out. It's a really neat trick! This makes collaboration a lot easier. Normally, I was always 'worried' about the number of commits for a certain fix, but with this rebasing you can combine / and reorder and hence restructure your work afterworks. Having said that, I've taken a stab at the feature documentation. See pull request With regard to the objects, we still had to figure out how to link the two parts of a Bookmark (bookmarkStart and bookmarkEnd) together, if there is like a paragraph in between. My suggestion would be to make two ways to add a bookmark; # This adds a paragraph containing the start and stop of the bookmark
document.add_bookmark(name='test', inplace=True)
# This places a bookmark start:
bookmark_obj_1 = document.add_bookmark(name='test_1')
bookmark_obj_1.start_bookmark()
# Start some new run, with text which maybe contains another bookmark:
par = document.add_paragraph()
bookmark_obj_2 = par.add_bookmark(name'test_2')
bookmark.start_bookmark()
# In this case, the bookmark contains the text, which could be referred to
par.add_run('sometext')
bookmark_obj_2.end_bookmark()
# Add the end of the first bookmark:
bookmark_obj_1.end_bookmark() This does mean that one has to correctly place the bookmark object to be able to succesfully add the end the bookmark. I'm not entirely sure about the exact implementation.
I'll try to get an example up and running. |
So, I've had my first struggles with git, but somehow managed to figure it out. I think it had something to do with the fact that I use both Sourcetree and the git shell commands. I think i forgot to fetch from all remotes and that is where the fun started.. What I've tried to do, is implement the Bookmarks and Bookmark object. And make them available in the document class. Currently they do generate a bookmark, but this bookmark is placed at the end of the Body. using this code: doc = Document()
asdf = doc.bookmark()
new_bookmark = asdf.add_bookmark()
new_bookmark.add_bookmark_start(name='test_1', id=1)
par = doc.add_paragraph()
par.add_run('test')
new_bookmark.add_bookmark_end(id=1)
new_bookmark_2 = asdf.add_bookmark()
new_bookmark_2.add_bookmark_start(name='test_2', id=2)
new_bookmark_2.add_bookmark_end(id=2)
doc.save('tab.docx') <w:document>
<w:body>
<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>
<w:sectPr w:rsidRPr="0006063C" w:rsidR="00FC693F" w:rsidSect="00034616">
<w:pgSz w:w="12240" w:h="15840" />
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0" />
<w:cols w:space="720" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
<w:bookmarkStart w:name="test_1" w:id="1" />
<w:bookmarkEnd w:id="1" />
<w:bookmarkStart w:name="test_2" w:id="2" />
<w:bookmarkEnd w:id="1" />
</w:body>
</w:document> I hope I have some time tomorrow to figure it out, I'm thinking that the bookmark should be within a paragraph instance. regards, |
Okay, so here's a little about what I think the code structure will look like. These sorts of things (proxy objects) all look the same in many respects. The term proxy object is just our name for objects like Let's start with the calling protocol I think we're after. This might bear more discussion, but this will do for a start: >>> document = Document()
>>> bookmark = document.start_bookmark(name='test')
>>> document.add_paragraph('test bookmark body')
<Paragraph object at 0x...>
>>> document.end_bookmark(bookmark) I've explicitly named the parameters, just for clarity. I expect there's no good reason not to allow them to be identified by position, e.g. So we'd want the code above to produce this XML: <w:document>
<w:body>
<w:bookmarkStart w:name="test" w:id="1"/>
<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>
<w:bookmarkEnd w:id="1"/>
<w:sectPr w:rsidRPr="0006063C" w:rsidR="00FC693F" w:rsidSect="00034616">
<w:pgSz w:w="12240" w:h="15840"/>
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0"/>
<w:cols w:space="720"/>
<w:docGrid w:linePitch="360"/>
</w:sectPr>
</w:body>
</w:document> The first challenge is to get the bookmarkStart and End elements to appear above the sentinel bookmarkStart = ZeroOrMore('w:bookmarkStart', successors=('w:sectPr',)) The implementation in def start_bookmark(self, name):
bookmarkStart = self._body.add_bookmarkStart(name)
return Bookmark(bookmarkStart) This leaves some of the work to other objects. In particular, the bookmark id must be resolved and assigned by In addition, I expect there needs to be a This may be a good place to mention that a proxy object (like Note that Does that give you something to noodle on for next steps? |
It is a neat trick, isn't it! :) I call when I'm getting the commits cleaned up "grooming" the commit history/sequence. It's not always worthwhile in every situation, but when you're collaborating it's a really nice way to make your commit(s) reflect a clear, single focus and be separately mergeable. |
So I've succeeded in getting the elements above the sentinel object, but am now stuck on the item id's. my initial thought is making the Bookmarks object available within the _Body object. This allows for getting 'new' id's. However, I'm still trying to figure out a neat solution for identifying bookmarkStarts without bookmarkEnds. current approach:
The docx.document.py code: class Bookmark(ElementProxy):
def __init__(self, doc_element):
super(Bookmark, self).__init__(doc_element)
self._element = doc_element
@property
def _id(self):
return self._element.bmrk_id
@property
def _name(self):
return self._element.name class _Body(BlockItemContainer):
"""
Proxy for ``<w:body>`` element in this document, having primarily a
container role.
"""
def __init__(self, body_elm, parent):
super(_Body, self).__init__(body_elm, parent)
self._body = body_elm
self._parent = parent
def clear_content(self):
"""
Return this |_Body| instance after clearing it of all content.
Section properties for the main document story, if present, are
preserved.
"""
self._body.clear_content()
return self
def add_bookmarkStart(self, name):
bookmarkStart = self._body._add_bookmarkStart()
bookmarkStart.name = name
bookmarkStart.bmrk_id = self._parent._Bookmarks._next_id
return Bookmark(bookmarkStart)
def add_bookmarkEnd(self):
open_id = self._parent._Bookmarks._unclosed_bookmark
bookmarkEnd = self._body._add_bookmarkEnd()
bookmarkEnd.bmrk_id = open_id
return Bookmark(bookmarkEnd) and the Bookmarks class: class Bookmarks(Sequence):
def __init__(self, document_elm):
super(Bookmarks, self).__init__()
self._document = self._element = document_elm
def __iter__(self):
for bookmarkStart in self._bookmarkStarts:
yield Bookmark(bookmarkStart)
def __getitem__(self, idx):
bookmarkStart = self._bookmarkStarts[idx]
return Bookmark(bookmarkStart)
def __len__(self):
return len(self._bookmarkStarts)
def get(self, name, default=None):
for bookmarkStart in self._bookmarkStarts:
if bookmarkStart.name == name:
return Bookmark(bookmarkStart)
return default
@property
def _bookmarkStarts(self):
return self._document.xpath('.//w:bookmarkStart')
@property
def _bookmarkEnds(self):
return self._document.xpath('.//w:bookmarkEnd')
@property
def _next_id(self):
return str(len(self._bookmarkStarts)) |
Hmm, I'm not sure about this solution. This checks for open bookmarks, but if there are two bookmarks open there is no way of choosing one of the two. Therefore, we might need to go for a This is the current code outline: doc = Document()
bmark_obj = doc.bookmarks()
test = doc.start_bookmark('test_one')
par = doc.add_paragraph()
par.add_run('test')
doc.end_bookmark()
doc.start_bookmark('test_two')
par_2 = doc.add_paragraph()
par_2.add_run('test_line_2')
doc.end_bookmark()
doc.save('test.docx') results in: <w:document>
<w:body>
<w:bookmarkStart w:name="test_one" w:id="1" />
<w:p>
<w:r>
<w:t>test</w:t>
</w:r>
</w:p>
<w:bookmarkEnd w:id="1" />
<w:bookmarkStart w:name="test_two" w:id="2" />
<w:p>
<w:r>
<w:t>test_line_2</w:t>
</w:r>
</w:p>
<w:bookmarkEnd w:id="2" />
<w:sectPr w:rsidRPr="0006063C" w:rsidR="00FC693F" w:rsidSect="00034616">
<w:pgSz w:w="12240" w:h="15840" />
<w:pgMar w:top="1440" w:right="1800" w:bottom="1440" w:left="1800" w:header="720" w:footer="720" w:gutter="0" />
<w:cols w:space="720" />
<w:docGrid w:linePitch="360" />
</w:sectPr>
</w:body>
</w:document> |
We should probably keep all the conversation in the pull request going forward, so it's all in one piece. I'll respond to these here but after that, let's do any new ones in the Conversations tab of the pull request :) |
Okay, I don't completely understand your code for the "finding the ids" bit, but I'm thinking the solution is simpler than you think, basically that we use XPath to find the ids when we need them (and that we don't store them anywhere outside the XML). I'm also not sure we need an "unclosed" test. I'm open to argumentation to the contrary, but I'm not seeing it yet. It looks to me like there are a certain number of I do think we're on the wrong side of YAGNI here, working with methods and properties we don't have a clear need for yet. At times like these, I always return to the use-case/protocol part of the analysis and narrow my focus down to something entirely concrete. Basically answer the question "As a developer using python-docx, I need to be able to {do what exactly?}" Anyway, back to the "finding the bookmarkEnd elements" question. So say we have a class CT_BookmarkStart(...):
@property
def matching_bookmarkEnd(self):
"""Return first `w:bookmarkEnd` element having matching id, None if not present."""
root_element = self.getroottree().getroot()
matching_bookmarkEnds = root_element.xpath(
'.//w:bookmarkEnd[@w:id=\'%d\']' % self.id
)
if not matching_bookmarkEnds:
return None
return matching_bookmarkEnds[0] If you're available for a call or a Google hangout, it might be a faster way to work through some of these initial concepts, let me know :) |
As multiple issues have stated before, it would be great if we the captions supported in python-docx. Previous issues already should temporary fixes to similar problems like the IndexedEntry (#137) and the figure caption in (#359).
Also bookmarks have been investigated: (#109)
As I understood from: feature analysis: section columns and Feature: Paragraph.add_hyperlink() (wip)
Is that there are a few mandatory steps to take before a pull request is accepted:
Finally: Focus is key, one feature should only by a single functionality, to quote @scanny:
Now I've collected the available issues and pull request my question is, how to go from here. I would assume that we start at the beginning: Feature analysis
Adding the caption to a paragraph isn't directly the issue, the main goal is to create a feature which can be used together with a crossreference. Therefore the number of references or bookmarks that are created should be stored in an object. This object should be available to the caption method, in order to be support a sort of 'guessed figure/table number'. What I mean by this is best explained with an example:
The proposed solution in #359 works. Only he user first has to create a print view to force a fieldcode refresh to make the numbers visible. Therefore, it would be nice if the caption method could already place a "guessed" number. (Comparable as how Word behaves)
The text was updated successfully, but these errors were encountered: