Skip to content

Headers and Footers API #236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 26 commits into from
Closed

Conversation

eupharis
Copy link
Contributor

@eupharis eupharis commented Dec 3, 2015

This PR is way not ready to actually be merged. Just want to get some feedback on the API before I go farther.

  • All existing tests pass
  • (Except for 3 hash comparison tests since I added a header and footer to tests/expanded_docx/. I skipped them for now.)
  • There are 5 new passing tests for header and 5 new passing tests for footer. There obviously need to be many, many more unittests before this is ready for primetime.
  • The code needs much cleanup.

Here's a sample:

header_style = document.styles.add_style('Bar', ...)
header_style.font = Pt(32)

document = Document()
header = document.add_header()
header_p = header.add_paragraph('foo', style='Bar')
header_p.add_run('alice')
header_p.add_run().add_picture(img_file, width, height)

Fuller API notes below in docs/header_footer.rst.

@eupharis
Copy link
Contributor Author

eupharis commented Dec 8, 2015

Woops. Looks like I had some python 3 issues. Fixed now.

@scanny
Copy link
Contributor

scanny commented Dec 8, 2015

Hi @eupharis, looks like you've made some good progress. It probably makes sense for you to sketch out the API you're thinking of so we can consider it directly. You've correctly identified that as the key part of getting a commit. It's the one thing we can't change once it's been put in so needs to be gotten right the first time :)

There are analysis pages under the docs directory, here's one that's related:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/sections.html

There are a few others around this one that are fairly good examples:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/text/font-color.html

I use the word "protocol" in there to denote describing the API by showing a typical usage of it, rather than just saying what methods there are on what objects. I think that's a pretty good way to capture the API you propose. It would take too long and be too error-prone to pull that out of your draft code; better to spell it out so we can discuss it easily.

I'm thinking one of these pages is a good place to gather that sort of thing because it will then become part of the documentation. It will also probably be the first item committed.

There are other examples, some well developed, in the analysis documentation for python-pptx; here's a recent one:
http://python-pptx.readthedocs.org/en/latest/dev/analysis/features/shp-hyperlink.html

I recommend just starting out with the bits that seem helpful for communicating your thinking. I see these as documentation for my own use when I'm writing them. Things I need to get straight in my head or details I need to dig out of the schema or Microsoft API or whatever that I know I'll need when I get into the code.

Then we can focus the discussion around a fairly well circumscribed area.

@eupharis
Copy link
Contributor Author

Ok cool! I just pushed a protocol here:

https://github.com/eupharis/python-docx/blob/master/docs/dev/analysis/features/header-footer.rst

It looks slightly better via local Sphinx but the github rendering is pretty good.

It includes a write-up of my notes of how the hell docx headers work. I will definitely reference it in the future.

I modeled the protocol format after this one:

http://python-docx.readthedocs.org/en/latest/dev/analysis/features/text/font-color.html

@eupharis
Copy link
Contributor Author

eupharis commented Jan 7, 2016

Ping!

@scanny
Copy link
Contributor

scanny commented Jan 26, 2016

Hi @eupharis, apologies, I'm not actively working on this project at the moment (other projects call as you will understand I'm sure), so it's taken a while to get to this.

Thanks for writing up the analysis page. This helps a lot in getting details ironed out before you go in to do the actual implementation. I've been surprised a bit to discover that the implementation generally turns out to be the easy part; it's the discovery, analysis, and resolution up front that take the time and all the noodling :)

The approach you suggest in the analysis page appears to be inconsistent with the existing approach for that sort of thing and also how Word handles the broader bit about headers and footers.

I just realized I had started some analysis work on headers/footers a while back. I just posted it here, you might want to have a look:
e8fb68e

I think the first thing is that headers and footers have a natural affinity for sections, rather than the document as a whole. Most documents probably only have a single section, but of course the API needs to work for the general case. So the first step in the protocol would be something like this I'm pretty sure:

>>> document = Document('xyx.docx')
>>> section = document.sections[-1]  #last section, for example
>>> odd_page_header = section.primary_header
>>> odd_page_header
<docx.header.Header object at 0x1029a6820>
>>>

Another item that jumps out is that there can be as many as three headers and/or footers for each section, one for odd pages (the primary header/footer), one for even pages (if laying out verso/recto for binding), and a possibly different first-page header/footer. So the API will need to both provide access to those and provide properties like has_first_page_header or whatever to allow discovery of what is defined.

I think you'll want to take a look at the MS API for headers and footers to see what they do. We usually find it pretty useful to model after that API, although there are important departures in certain cases to make things more Pythonic or just more rational. In any case it's useful to understand the MS API protocol as a key input to the analysis. I know in this case they maintain a collection accessed by indices 1, 2, or 3. I'm thinking we just make three properties (six actually when you include footers, e.g. .even_page_header that either return the appropriate section or None if one is not defined.

You'll also want to pull together the key parts of the XML schema along with examples/snippets. You can find plenty of examples of those in the other analysis pages. You might be able to reuse some of the bits from sections here, since they're so closely related:
https://github.com/eupharis/python-docx/blob/master/docs/dev/analysis/features/sections.rst

Also have a look at this issue for some ideas and prior work:
#104

Let us know how you go :)

@eupharis
Copy link
Contributor Author

Ok awesome! I totally understand about other projects hah.

This is all super helpful. I'm currently using this code on a project of mine, and I've run into the problem with this v1 version of headers not playing nicely with sections. The approach you outline above should work much better.

I've blocked out some time next week and will dig into this further!

@eupharis
Copy link
Contributor Author

Next week turned out to be rather optimistic ;)

I merged in all your feature/header commits. It looks like that new settingsPart work will be invaluable for adding <w:evenAndOddHeaders/> to settings.xml right?

I spelled out the XML that should exist for all the different all/even/odd/first header scenarios:

https://github.com/eupharis/python-docx/blob/master/docs/dev/analysis/features/header-footer.rst#all-pages-even-pages-odd-pages-first-page

I also updated the analysis I did so that header, even_page_header, and friends are all on Section not Document:

https://github.com/eupharis/python-docx/blob/master/docs/dev/analysis/features/header-footer.rst#candidate-protocol

Your headerfooter.rst still has a lot of good tidbits still that aren't merged into header-footer.rst yet. I figure we'll tackle that when we get closer.

I think we are getting close with the API?!?

@eupharis
Copy link
Contributor Author

I totally agree that defining all the API and understanding the XML structure and writing the analysis is the hard part. This is actually pretty simple to implement now. I could port all my document.add_header methods and tests pretty easily.

@scanny
Copy link
Contributor

scanny commented Feb 23, 2016

It's very counterintuitive that way, isn't it? It still surprises me when I do it :)

I'll take a look at your analysis page and provide some feedback.

@scanny
Copy link
Contributor

scanny commented Feb 23, 2016

Ok @eupharis, I've taken a look and have some remarks I want to add in as comments. What I'll need is to have this file in a commit of its own so I can make inline comments. And it needs to be the whole file, all at once. If there are changes spread across multiple commits it breaks up the text and it's impossible to trace through.

What I'm thinking probably makes sense is for you to create a new, separate pull request that we use for moving this to commit/merge. We can keep this one open as it might come in handy. The new one will be built up commit by commit, essentially getting each one ready to merge before bringing in the next one.

How's your Git? You'll be needing it here I expect. Let me know if you need help on some part as we go. Here's what we'll need.

The new branch will be called feature/header. It will be based (rooted) at python-openxml/master. All commits submitted for review and later merging will appear on that branch, which will be eupharis/feature/header once you push it to your repo. On your local it will likely be origin/feature/header, but that can vary depending on where you cloned from.

This first commit should be named: 'docs: document header/footer analysis'. It will remain a single commit until merging, at least on your GitHub repo. You can keep it any way you like locally as you're the only one who sees that and occasionally it makes sense to have multiple commits and then squash them or whatever before pushing.

In order to overwrite the pull request branch (eupharis/feature/header) with changes to an existing commit, you'll need to push with the --force option. After a forced push, the pull request will automatically show the updated version. The comments from me (often extensive) from the prior version of that commit will be gone I believe, so make sure you get what you need before force pushing :)

In general there will be no merges in the pull request branch; the commit history will remain entirely linear. Likewise, there will be no "fixed this, fixed that" commits. Changes will be rebased into place as necessary. If changes need to be made to a commit that's already been merged, we'll talk and work it out.

Take a browse through the commit history for the latent styles features added starting here: e7930e4

You'll see a pattern that happens over and over again, and should happen here too:

  • docs: document analysis (single commit)
  • acpt: add scenarios for #acceptance tests
  • style: add Object.property # contains one unit test and code to make that test pass
  • repeat for 2-6 atomic functionality bits, one commit each
  • after the last one, the acceptance test(s) pass, that's a wrap for that feature
  • rinse and repeat

You have to go back about that far to see it, because I did a bit of reorganization before the last release and that has a very different form (and it's not done very often :).

You'll notice a few things:

  • the commits are very atomic, there is only one thing happening in each one, but it's a whole thing.
  • there are no little back and forth fixing things up commits. If those happen, they're rebased out before the "merge".
  • There are no merges. The commit history is entirely linear, except at the tip where there may be various feature branches in progress at any given time.
  • The entire test suite passes on each and every commit. No exceptions. The currently active acceptance test is marked @wip until that feature is done so it is skipped by the main test script. Where it fails (when run in development) points to what commit needs to come next; the one that gets you past the acceptance test error and gives you a different error further along.

Let me know if you need more to go on. Otherwise I'll review and remark on your analysis page once it's up :)

@eupharis
Copy link
Contributor Author

Ok new PR created!

#263

I'm curious why we are basing it off of python-openxml/master. Are we going to merge in python-openxml/feature-header later?

@scanny
Copy link
Contributor

scanny commented Feb 23, 2016

Basing it off of feature/header is even better, by all means, let's do that :)

@eupharis
Copy link
Contributor Author

Perfect! Here we go:

#264

I spent a fair while this morning playing with git rebase and git rebase -i. I feel good about rebasing to keep up to speed if you push anything new to feature/header. And squashing to get nice commits!

@scanny
Copy link
Contributor

scanny commented May 26, 2016

Closing this one for now since we have a new, later one open for header/footer.

@scanny scanny closed this May 26, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants