Skip to content

feature: Document.headers and footers #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
rotsee opened this issue Oct 30, 2014 · 45 comments
Closed

feature: Document.headers and footers #104

rotsee opened this issue Oct 30, 2014 · 45 comments
Labels

Comments

@rotsee
Copy link

rotsee commented Oct 30, 2014

It would be incredibly useful to be able to get the page headers through Document.headers, or similar.

For now, I use a very ugly hack to get alll header texts:

def get_headers(file):
    import xml.etree.ElementTree as ET
    document = Document(file)
    namespace = dict(w="http://schemas.openxmlformats.org/wordprocessingml/2006/main")
    header_uri = "http://schemas.openxmlformats.org/officeDocument/2006/relationships/header"
    headers = []
    for rel, val in document._document_part._rels.iteritems():
        if val._reltype == header_uri:
            xml = val._target._blob
            root = ET.fromstring(xml)
            text_element = root.find(".//w:t", namespace)
            if text_element.text is not None:
                headers.append(text_element.text)
    return headers
@scanny
Copy link
Contributor

scanny commented Oct 30, 2014

Hi @rotsee,

@robline had been working on this feature over here:
https://github.com/robline/python-docx/commits/feature/headers

@robline: Can you give us an update on where you ended up and near-term prospects?

@rotsee
Copy link
Author

rotsee commented Oct 31, 2014

That looks promising! Let me know if I can help somehow.

@danmilon
Copy link

danmilon commented Nov 3, 2014

I need this feature and am available to help too!

@robline
Copy link

robline commented Nov 3, 2014

hey Steve et al, I switched contracts so I haven't been working on it. I am slated to return to headers full time on Nov 26. I am happy to convo with anyone interested to talk through where my code is and look at next steps. I'd like to see this feature pushed through, and with a little effort it can be. I got the test framework running so now it is time to build failing tests and make them pass. I welcome anyone to take this and run with it, or assist me after the 26th, or whatever is best to get momentum.

@danmilon
Copy link

danmilon commented Nov 3, 2014

That is great @robline! I'll take a closer look at your branch and see where it goes.

@danmilon
Copy link

danmilon commented Nov 4, 2014

I picked up from your work, rebased to latest master and connected the dots here and there. Took me the whole day cause I wasn't familiar with the codebase. Anyway, it has no docs and no tests, but you can give it a try. See danmilon/python-docx#feature-headers.

from docx import Document

d = Document('/tmp/doc_with_header.docx')

d.sections[0].headers[0].add_paragraph(text='moar header')
d.save('/tmp/moar_headers.docx')

What do you think about the API? I'll add footer support tomorrow hopefully.

@rotsee
Copy link
Author

rotsee commented Nov 5, 2014

Great, I try it out later this week! API looks fine, though

@scanny
Copy link
Contributor

scanny commented Nov 5, 2014

Header and footer access from the section will need to be provided by a specific property for each of the three possible types. A simple sequence will not be articulate enough:

Section.primary_header
Section.first_page_header
Section.even_page_header

There also need to be a couple boolean properties on Section to determine whether the section has a different first-page header and whether it has distinct even page headers.

Best practice is to document the proposed API on a page like this one (probably this specific one in this case), so it can be discussed and resolved before investing too much work in a wrong direction.
https://github.com/robline/python-docx/blob/feature/headers/docs/dev/analysis/features/headers.rst

See others here for more examples:
http://python-docx.readthedocs.org/en/latest/dev/analysis/index.html

This one might be a decent model:
http://python-docx.readthedocs.org/en/latest/dev/analysis/features/par-alignment.html

Microsoft API is usually a good-ish model to start with on API design.
http://msdn.microsoft.com/en-us/library/office/ff837487(v=office.15).aspx

The API is much more important to get exactly right than its implementation. Reason being it's very rude to change it once it's published. Best to get that established early so your pull request doesn't require a lot of rework.

@danmilon
Copy link

danmilon commented Nov 7, 2014

That is right, thanks for the resources too! I'll do as you suggested.

@danmilon
Copy link

Could you help me a bit with the evenAndOddHeaders flag? Where is it expected to appear?
The spec has this snippet, but is not very informative.

<w:settings>
...
<w:evenAndOddHeaders />
...
</w:settings>

You also mentioned a flag that determines whether a secion has a different first-page header. Could you elaborate on that?

@robline
Copy link

robline commented Nov 17, 2014

I am away from my notes so this is from memory, and I will fill in specifics later using correct language, but... the headers part has 0-3 headers. Default header is the odd header. If there is an evenAndOddHeaders flag set, there should be two headers, one indicated as even and one as odd. There is also a third possible header, firstpage header, which overrides the odd header for the first page. I believe this same pattern can be applied to any of the document sections (of which there is always one by default, but could be many.) So my GUESS is that w:evenAndOddHeaders should be specified in the section.

I am back on this project starting Monday, so I'm happy to convo/code/whatever starting then.

@scanny
Copy link
Contributor

scanny commented Nov 21, 2014

@danmilon, if you want to get a good feel for how the XML is laid out for various situations, opc-diag is probably a good tool to get familiar with. You can make a quick example Word file then browse the XML it produces, and then compare that to other examples you make, like with an even page header, etc.

I've always found the spec the barest of help in working through the initial analysis to resolve what behavior a feature should implement. I usually check it out anyway, but I usually get a lot more from the XML schema and inspecting Word-generated XML.

@scanny scanny changed the title Feature request: Retrieving headers and footers feature: Document.headers and footers Dec 18, 2014
@scanny scanny added the header label Feb 13, 2015
@MCopperhead
Copy link

Hi! First of all, thanks for this great python lib.

Is there any estimation time of when this feature for header/footer adding will be available?

@scanny
Copy link
Contributor

scanny commented Feb 19, 2015

Hi Vadim, well, we don't exactly have a development schedule because it's all based on when someone wants it bad enough to put in the work (or fund it). But this one is hovering near the top of the backlog. I'd say it's likely within about four to six months.

bjinwright pushed a commit to bjinwright/python-docx that referenced this issue Aug 21, 2015
@guilhermebr
Copy link

Hi,

Any news on this issue? someone have a fork that header and footer works?

[]s

@mustash
Copy link

mustash commented Oct 4, 2015

@scanny ,

Steve, it doesn't look like the work of bjinwright/python-docx or mikemaccana/python-docx have been merged into master here. Is that perhaps because adequate testing and regression have not been performed on their respectve commits, or because of higher-priority items on this project's backlog. Thanks!

@scanny
Copy link
Contributor

scanny commented Oct 4, 2015

The mikemaccana repo is a different code base althogether (although we kept the name), so that will never be merged. Mostly this version is a superset of the legacy version; any remaining features would have to be developed for this code base.

The bjinwright fork has only a single additional commit on it, and no tests I can see, so there would need to be quite a bit more work there to get the feature(s) it's implementing into shape for a merge.

@mhsiddiqui
Copy link

Hi @scanny!
Is footer support is working now or still under development?

@AlbinoShadow
Copy link

@scanny, still hoping that this is something that comes in 2016! For now, I'll work on a workaround.

@eismog
Copy link

eismog commented May 27, 2016

@AlbinoShadow
About the header
do you have the workaround, could you share it?

I want to replace docx's header's text.
Who can help me?

@eismog
Copy link

eismog commented May 27, 2016

@scanny

I have lots of docx, and I want to replace these docx header's text
Is there any way to do it?

Thanks

@scanny
Copy link
Contributor

scanny commented May 27, 2016

This feature set is under active development by @eupharis on this pull request: #291

I don't believe there's a workaround at the moment, it would need to be a pretty hairy one given the involvement of new parts, but would certainly be possible. You can look at the analysis documents on that branch to get an idea what's involved.

And of course encouragement/+1s on that PR is always helpful in keeping spirits high as we move the development along :)

@eismog
Copy link

eismog commented May 30, 2016

Thank you for your detailed answer.

@mayuryeola
Copy link

I am trying to add header to my document using docx-python.
here is the code:

from docx import Document
d = Document()
d.sections[0].headers[0].add_paragraph(text='moar header')
d.save('C:/users/Ashwini/Desktop/CI2.docx')

I am getting an error :AttributeError: 'Section' object has no attribute 'headers'

Please help me out. Thanks in advance

@Sum4196
Copy link

Sum4196 commented Jul 7, 2016

@mayuryeola this feature is being developed by @eupharis which can be seen under the pull request #291 as @scanny has stated in an above comment.

@scanny, after reviewing the pull request #291, it seems as if the feature just needs to be merged with the master project? Please correct me if otherwise, I would like to use this as I am using a work around with pywin32 at the moment. Thank you to all whom have contributed to the development of this project, especially you @scanny, thanks again :)

@scanny
Copy link
Contributor

scanny commented Jul 9, 2016

@Sum4196 that PR is work in progress. It doesn't work quite yet but it's probably about half-way to doing the basics. Might be done in a month or so, just guessing :)

@Sum4196
Copy link

Sum4196 commented Jul 11, 2016

Sounds great :) I can't wait to see it implemented. It will be very useful to me for a program I'm building. This is the first large program I'm building and if any help is needed for this I would gladly devote my time to it!!

@HuangKBAaron
Copy link

HuangKBAaron commented Sep 25, 2016

hi there, is it done yet? I just download version 0.8.6, and when i use "section.header", it broke with "'Section' object has no attribute 'header'". I found another work around says "document.xpath(...)" like here or here. with 0.8.6, there is no document._document_part nor document.xpath existed. Please give me a suggestion what i can get a header of exist docx by far? Many thanks!
@scanny @eupharis @Sum4196

@Sum4196
Copy link

Sum4196 commented Sep 26, 2016

@HuangKBAaron , below is a function that I gathered up from other sources and created a few elements that I have used in a recent project. This creates a header and footer on every page in the document. I do not know how to put them on just the first page, but you can figure that out I would like to know. Hopefully this helps you in your situation while the header and footer feature is integrated into python-docx. Thanks for your hard work @scanny and @eupharis .

Note: For some reason the function won't copy into a single code block so it is split into two separate ones and will probably need to be copied one at a time. Also, this will only work on Windows as it communicates using COM.

def write_header_and_footer(header_text, footer_text):

win32gencache = win32client.gencache

if win32gencache.is_readonly == True:
    #allow gencache to create the cached wrapper objects
    win32gencache.is_readonly = False

    #under py2exe the call in gencache to __init__() does not happen
    #so we use Rebuild() to force the creation of the gen_py folder
    win32gencache.Rebuild()

    #Ensures that the gen_py directory is created before use
    win32gencache.GetGeneratePath()

    #You must ensure that the python...\win32com.client.gen_py dir does not exist
    #to allow creation of the cache in %temp%

#Ensures that the MS Word object is created and ready for use
win32gencache.EnsureModule('{00020905-0000-0000-C000-000000000046}', 0, 8, 5)

word = win32gencache.EnsureDispatch('Word.Application')
doc = word.Documents.Open(wordDocument)
word.Visible = False
activeDocument = word.ActiveDocument.Sections(1)
activeDocument.Headers(win32client.constants.wdHeaderFooterPrimary).Range.Text = header_text
activeDocument.Headers(1).PageNumbers.Add(2, True)
activeDocument.Headers(1).PageNumbers.NumberStyle = 8
activeDocument.Footers(win32client.constants.wdHeaderFooterPrimary).Range.Text = footer_text
word.ActiveDocument.Save()
doc.Close(True)
word.Application.Quit()

@neoyagami
Copy link

Hi guy, thanks for all your hard work.
right now im working in a project that will use this package, and the implementation of this api(header, footer) will be a really welcome addition.

is there a release date for the PR to be merged?

@scanny
Copy link
Contributor

scanny commented Feb 2, 2017

@neoyagami The only way features get a delivery date is if someone sponsors them. Contributed features are just that, and depend on the contributor's timetable.

@ccurvey
Copy link

ccurvey commented May 23, 2017

My client might be willing to sponsor the effort to get this "over the finish line". Any idea how much would be required?

@scanny
Copy link
Contributor

scanny commented May 23, 2017

@ccurvey I sent you an email to the email address on your profile. Let me know if you don't get it and we'll find another way to connect.

@baltechies
Copy link

Is PR merged after 2 years or its still pending?
Or is there any easy way to add header and footer to doc file using django API?

@scanny
Copy link
Contributor

scanny commented Nov 14, 2017

@baltechies work on this stalled at perhaps 30% of completion. If someone wants to pick it up or sponsor the feature it could probably be completed in a few weeks. Let me know if you're interested.

@ccurvey
Copy link

ccurvey commented Nov 14, 2017 via email

@ccurvey
Copy link

ccurvey commented Nov 14, 2017 via email

@baltechies
Copy link

@scanny I am able to resolve my issue using empty template for doc file. Otherwise this library is awesome to create a doc file.

@ondrej-111
Copy link
Contributor

Hello guys. I am new in contribute of GIT projects but I want to help to resolve this issue. Yesterday I forked project and started working on this issue. I already implemented write to header and footer.
Example of usage looks like:

document = Document(args['document'])
header = document.headers[0]
header.paragraphs[0].runs[0].text = 'hee'
footer = document.footers[0]
footer.paragraphs[0].runs[0].text = 'foo'
document.save(args['output'])

but my code is not covered by tests, yet.

@Lokkook
Copy link

Lokkook commented Oct 27, 2018

Thanks you @ondrej-111 : I'm using your fork to clear footers in a bunch of docx files. It works well and you saved me lot of time !

@ondrej-111
Copy link
Contributor

Thank you @Lokkook.

Now I reworked my fork by this specification:

https://github.com/python-openxml/python-docx/blob/master/docs/dev/analysis/features/header.rst

If somebody want to work with python-docx like in spec it is possible on this branch:

feature/section_h&f

of my fork. I don't want to mess it up with master branch yet because somebody can using it...

Next week I will implement tests and create PR.

@Sum4196
Copy link

Sum4196 commented Nov 12, 2018

@ondrej-111 I am able to edit any existing header and footer of a document, however, creating them does not seem to work.

@ondrej-111
Copy link
Contributor

Yes, sorry for misunderstanding It is not implemented yet. Only part with editing header, footers is working. At first I want to write some tests for this part and then add functionality to header, footer...

@Sum4196
Copy link

Sum4196 commented Nov 12, 2018

I think it's a great start so far. I'm excited to see what you come up with in the future, thanks for the clarification.

@scanny
Copy link
Contributor

scanny commented Jan 7, 2019

Header and footer support will be released with v0.8.8 in the next day or two. Until then a working version is available on the spike-header branch.

@scanny scanny closed this as completed Jan 7, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests