Skip to content

Restart numbering of an ordered list in document. #25

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
proofit404 opened this issue Mar 18, 2014 · 36 comments
Open

Restart numbering of an ordered list in document. #25

proofit404 opened this issue Mar 18, 2014 · 36 comments

Comments

@proofit404
Copy link

We can easily add ordered list with document.add_paragraph(style='ListNumber') code. But how can we restart its numbering?

@scanny scanny modified the milestones: v0.6.0, 0.6.3 May 1, 2014
@scanny scanny modified the milestones: v0.8.0, 0.6.3 May 13, 2014
@lafolle
Copy link

lafolle commented Aug 6, 2014

+1

@ilnurgi
Copy link

ilnurgi commented Aug 15, 2014

+1 :)

@downtown12
Copy link

Hi @scanny . I have a problem with workaround way you mentioned in stackoverflow.com. In order to restarting the list numbering, you said one need to locate the numbering definition of the style as well as the abstract numbering definition it points to. I tried but failed to locate these two definitions. Can you explain in detail how they can be located? Cuz I haven't found any documents about them.

@scanny
Copy link
Contributor

scanny commented Jul 6, 2015

Can you link to the SO response? that would save me from taking the time to search it up.

@downtown12
Copy link

Here is the link, the same as the link you answered for issue #87 : http://stackoverflow.com/questions/23446268/python-docx-how-to-restart-list-lettering/23464442#23464442

@scanny
Copy link
Contributor

scanny commented Jul 7, 2015

@downtown12 I don't have time to develop step-by-step instructions for you, but if you're looking for the right direction to start looking for yourself, these might be helpful:

  • The document object provides access to the numbering part for the document (numbering.xml). This call should get you there: document.part.numbering_part
  • The NumberingPart object provides access to a sequence of NumberingDefinition objects: numbering_part.numbering_definitions
  • The NumberingDefinitions class looks like where API support ends at the moment. You can get the w:numbering element from it though, which is the parent of all the numbering definitions: numbering_definitions._numbering. From there you'll need to work with lxml calls to get it's children using XPath and so on.

@downtown12
Copy link

Thanks a lot @scanny .Your instructions help me connect the attributes you mentioned above with the elements of numbering.xml. They are pretty helpful.

@yurac
Copy link

yurac commented Sep 15, 2015

@downtown12, I also use this feature. I see that all infrastructure is already in python-docx. Enabling it is easy - Below is a patch. I also added accessors for numId, ilvl and ind.left for
convenience

diff --git a/docx/document.py b/docx/document.py
index 655a70e..4a0ec06 100644
--- a/docx/document.py
+++ b/docx/document.py
@@ -51,6 +51,12 @@ class Document(ElementProxy):
         paragraph.add_run().add_break(WD_BREAK.PAGE)
         return paragraph

+    def get_new_list(self, abstractNumId):
+        """
+        Returns a new numId that references given abstractNumId
+        """
+        return self.numbering.numbering_definitions.add_num(abstractNumId, True)
+
     def add_paragraph(self, text='', style=None):
         """
         Return a paragraph newly added to the end of the document, populated
@@ -157,6 +163,14 @@ class Document(ElementProxy):
         return self._part.styles

     @property
+    def numbering(self):
+        """
+        A "Provides access to numbering part
+        """
+        x=self._part.numbering_part
+        return self._part.numbering_part
+
+    @property
     def tables(self):
         """
         A list of |Table| instances corresponding to the tables in the
diff --git a/docx/oxml/numbering.py b/docx/oxml/numbering.py
index aeedfa9..097f3d6 100644
--- a/docx/oxml/numbering.py
+++ b/docx/oxml/numbering.py
@@ -96,13 +96,15 @@ class CT_Numbering(BaseOxmlElement):
     """
     num = ZeroOrMore('w:num', successors=('w:numIdMacAtCleanup',))

-    def add_num(self, abstractNum_id):
+    def add_num(self, abstractNum_id, restart=False):
         """
         Return a newly added CT_Num (<w:num>) element referencing the
         abstract numbering definition identified by *abstractNum_id*.
         """
         next_num_id = self._next_numId
         num = CT_Num.new(next_num_id, abstractNum_id)
+        if restart:
+            num.add_lvlOverride(ilvl=0).add_startOverride(1)
         return self._insert_num(num)

     def num_having_numId(self, numId):
diff --git a/docx/parts/numbering.py b/docx/parts/numbering.py
index e324c5a..186c6f8 100644
--- a/docx/parts/numbering.py
+++ b/docx/parts/numbering.py
@@ -43,5 +43,8 @@ class _NumberingDefinitions(object):
         super(_NumberingDefinitions, self).__init__()
         self._numbering = numbering_elm

+    def add_num(self, abstractNum_id, restart=False):
+        return self._numbering.add_num(abstractNum_id, restart).numId
+
     def __len__(self):
         return len(self._numbering.num_lst)

@downtown12
Copy link

@yurac Thanks.
Actually, I solved my issue serveral days ago in another work-around way.
I didn't walk the way that changing the original docx packages but adding some features in the app of myself.
But I wrote much more amount of code than you did.
I read the document_part as well as the numbering_part from the Document object that I was editing.
Firstly I parse the document_part using lxml, once I extract one numbering list paragraph, I create a abstract number node for it in the numbering_part. So that can make sure every list has its numbering restarted.
It seems every road lead to Rome.
Thanks very much for supplying me a new way.

@yurac
Copy link

yurac commented Sep 19, 2015

@downtown12, I also thought about adding new abstract number node per list, but was too lazy to implement it:)

Thanks!

@jmhansen
Copy link

@yurac, do you plan to submit a pull request?

@yurac
Copy link

yurac commented Sep 24, 2015

@nesnahnoj , I submitted one: #210

@jmhansen
Copy link

Thanks @yurac. Can you explain how you are accessing/using numId, ilvl and ind.left? I've been using the same command described in the original post of this issue: document.add_paragraph(style='ListNumber'), but I want to solve for the following use case:

Add the list of strings [a, b, c, d, e, f] to a docx file in the following outline format:

  1. a
    1. b
    2. c
  2. d
    1. e
  3. f

@yurac
Copy link

yurac commented Sep 24, 2015

You have to set ilvl and numid for each item. You can set them using the accessors in the pull request. You need to understand class relationship in python-docx to do so. Generate numid using the function in pull request using the abstract numid as parameter. Get abstract numid from your template file. For your example you do not need to set ind.left. I will try to get you some example when i have time.

@yurac
Copy link

yurac commented Sep 24, 2015

@nesnahnoj

@jeberger
Copy link

This appears to work reasonably well for numbered lists, but not so good for multi-levels bullet lists. From my tests, it looks like you need to have a w:abstractNum and a w:num in numbering.xml. Moreover, the w:abstractNum must have a w:lvl for each ilvl you use.

Those are used to define the list style (bullet character and indentation, mostly). When they are missing, it looks like Word picks reasonable defaults for a numbered list but not for a bullet list.

@yurac, PR #110 generates random values for numId and checks that they are not already used in the document text. However it does not check whether they are defined in numbering.xml. I suspect that this might lead to funny behaviour if you pick a number that has a definition in numbering.xml even though it is unused in the text. Especially if that definition is for the wrong list type…

@yurac
Copy link

yurac commented Sep 25, 2015

@nesnahnoj here is code that should generate the example you asked:

#!/usr/bin/env python

from docx import Document

document = Document("template.docx")

# Add desired numbering styles to your template file.
# Extract abstractNumId from there. In this example, abstractNumId is 10
numId = document.get_new_list("10")

# Add a list
p = document.add_paragraph(style = 'ListParagraph', text = "a")
p.num_id = numId
p.level = 0
p = document.add_paragraph(style = 'ListParagraph', text = "b")
p.num_id = numId
p.level = 1
p = document.add_paragraph(style = 'ListParagraph', text = "c")
p.num_id = numId
p.level = 1
p = document.add_paragraph(style = 'ListParagraph', text = "d")
p.num_id = numId
p.level = 0
p = document.add_paragraph(style = 'ListParagraph', text = "e")
p.num_id = numId
p.level = 1
p = document.add_paragraph(style = 'ListParagraph', text = "f")
p.num_id = numId
p.level = 0

# Restart numbering at the outer level
numId = document.get_new_list("10")

# Add the same list once again. The numbering is restarted
p = document.add_paragraph(style = 'ListParagraph', text = "a")
p.num_id = numId
p.level = 0
p = document.add_paragraph(style = 'ListParagraph', text = "b")
p.num_id = numId
p.level = 1
p = document.add_paragraph(style = 'ListParagraph', text = "c")
p.num_id = numId
p.level = 1
p = document.add_paragraph(style = 'ListParagraph', text = "d")
p.num_id = numId
p.level = 0
p = document.add_paragraph(style = 'ListParagraph', text = "e")
p.num_id = numId
p.level = 1
p = document.add_paragraph(style = 'ListParagraph', text = "f")
p.num_id = numId
p.level = 0

document.save("num.docx")

@yurac
Copy link

yurac commented Sep 25, 2015

@jeberger multi-level bullet list work for me the same way. I create an abstractNumId for such a list using Word and then reference it the same way I do for numbered lists

@jeberger
Copy link

@yurac and the result when opened in word looks like:

  • 1st item of 1st level list
  • 1st item of 2nd level list
  • 2nd item of 2nd level list
  • 2nd item of 1st level list

instead of:

  • 1st item of 1st level list
    • 1st item of 2nd level list
    • 2nd item of 2nd level list
  • 2nd item of 1st level list

@jeberger
Copy link

Note that this depends on your docx template. My results are with the default.docx that comes with v.0.8.5

@yurac
Copy link

yurac commented Sep 25, 2015

@jeberger I do not use default.docx. I created a template and added there a numbered multi level list and a bulleted multi level list of the style I want to use. Then I opened the template file numbering.xml component and found there the cirresponding abstractNumId's for the numbered list and the bulleted list I created. Now I always use these ids together with the pull request I posted above and I get the expected results for both numbered and bulleted multi level lists

@jeberger
Copy link

That's more or less the point I wanted to make: this requires a specific docx template and code that is tailored to use the same IDs as this template. It will not work with a generic template, and you may have to change the IDs in the code each time you change the template.

@yurac
Copy link

yurac commented Sep 25, 2015

Yes. I think a better approach would be adding abstractNumId per list. However there is currently no API for adding new abstractNumId so you can only use ones defined in your template

@scanny scanny removed this from the Sections milestone Apr 9, 2016
@akobler
Copy link

akobler commented Jun 6, 2016

+1

@ameily
Copy link

ameily commented Apr 18, 2017

I need this functionality for a project. It looks like there are two PRs open, #110 and #210, and both are awaiting tests. Can I step in and help and, if so, which PR should I go off of? I personally like the API in #110 better.

@scanny
Copy link
Contributor

scanny commented Apr 21, 2017

Hi Adam, you're more than welcome to contribute. The key thing that stops most folks is getting the tests done. Python folks just don't seem to be test-driven as a whole and this project is strictly test-driven. No commit gets merged without tests. You can look at the commit history to get an idea the granularity and flow of how these feature additions go. Generally look for a commit that starts with 'docs: document analysis for ...' followed by one starting 'acpt: add scenarios for xxx'. Then the implementation follows and then the pattern repeats.

The first step is developing the enhancement proposal, also known as the "analysis page". This is where the API is resolved and so it's a natural first step. It's also separately committable, whether you go on to develop the feature or not. After that, you can pick and choose whatever you want from the existing pull requests to use in yours.

Neither one of the two PRs look like they went in the right direction, which is typically what happens when you don't start with the analysis document. There's a lot to think through and inputs to be collected and understood, like the relevant XML Schema excerpts, Word's own behaviors with respect to lists, what the spec has to say about it (little generally :), and what the MS API is for doing the same from VBA.

Let me know if you need more to go on :)

@Sebastancho
Copy link

Any updates on this issue or we can contrib?

@cm-cm-cm-cm
Copy link

Are there any updates to this?

@chaithanyaramkumar
Copy link

chaithanyaramkumar commented Jun 20, 2018

how to read bullets or numbering in an existing document
for example
input is
1.apple
2.boy
output is
['1.apple','2.boy']

@madphysicist
Copy link

madphysicist commented Jun 27, 2018

So I've made a thing that searches the existing abstract numbering schemes for the style of the current paragraph, and sets a reasonable abstract style based on that if possible:

def list_number(doc, par, prev=None, level=None, num=True):
    """
    Makes a paragraph into a list item with a specific level and
    optional restart.

    An attempt will be made to retreive an abstract numbering style that
    corresponds to the style of the paragraph. If that is not possible,
    the default numbering or bullet style will be used based on the
    ``num`` parameter.

    Parameters
    ----------
    doc : docx.document.Document
        The document to add the list into.
    par : docx.paragraph.Paragraph
        The paragraph to turn into a list item.
    prev : docx.paragraph.Paragraph or None
        The previous paragraph in the list. If specified, the numbering
        and styles will be taken as a continuation of this paragraph.
        If omitted, a new numbering scheme will be started.
    level : int or None
        The level of the paragraph within the outline. If ``prev`` is
        set, defaults to the same level as in ``prev``. Otherwise,
        defaults to zero.
    num : bool
        If ``prev`` is :py:obj:`None` and the style of the paragraph
        does not correspond to an existing numbering style, this will
        determine wether or not the list will be numbered or bulleted.
        The result is not guaranteed, but is fairly safe for most Word
        templates.
    """
    xpath_options = {
        True: {'single': 'count(w:lvl)=1 and ', 'level': 0},
        False: {'single': '', 'level': level},
    }

    def style_xpath(prefer_single=True):
        """
        The style comes from the outer-scope variable ``par.style.name``.
        """
        style = par.style.style_id
        return (
            'w:abstractNum['
                '{single}w:lvl[@w:ilvl="{level}"]/w:pStyle[@w:val="{style}"]'
            ']/@w:abstractNumId'
        ).format(style=style, **xpath_options[prefer_single])

    def type_xpath(prefer_single=True):
        """
        The type is from the outer-scope variable ``num``.
        """
        type = 'decimal' if num else 'bullet'
        return (
            'w:abstractNum['
                '{single}w:lvl[@w:ilvl="{level}"]/w:numFmt[@w:val="{type}"]'
            ']/@w:abstractNumId'
        ).format(type=type, **xpath_options[prefer_single])

    def get_abstract_id():
        """
        Select as follows:

            1. Match single-level by style (get min ID)
            2. Match exact style and level (get min ID)
            3. Match single-level decimal/bullet types (get min ID)
            4. Match decimal/bullet in requested level (get min ID)
            3. 0
        """
        for fn in (style_xpath, type_xpath):
            for prefer_single in (True, False):
                xpath = fn(prefer_single)
                ids = numbering.xpath(xpath)
                if ids:
                    return min(int(x) for x in ids)
        return 0

    if (prev is None or
            prev._p.pPr is None or
            prev._p.pPr.numPr is None or
            prev._p.pPr.numPr.numId is None):
        if level is None:
            level = 0
        numbering = doc.part.numbering_part.numbering_definitions._numbering
        # Compute the abstract ID first by style, then by num
        anum = get_abstract_id()
        # Set the concrete numbering based on the abstract numbering ID
        num = numbering.add_num(anum)
        # Make sure to override the abstract continuation property
        num.add_lvlOverride(ilvl=level).add_startOverride(1)
        # Extract the newly-allocated concrete numbering ID
        num = num.numId
    else:
        if level is None:
            level = prev._p.pPr.numPr.ilvl.val
        # Get the previous concrete numbering ID
        num = prev._p.pPr.numPr.numId.val
    par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_numId().val = num
    par._p.get_or_add_pPr().get_or_add_numPr().get_or_add_ilvl().val = level

This is in no way comprehensive or particularly robust, but it works pretty well for what I am trying to do.

@chaithanyaramkumar
Copy link

thanks

@chaithanyaramkumar
Copy link

In a document have one table
we need to read first row is col header is true otherwise its false
also first col is row header otherwise its false
please answer me.

@nitinkhosla79
Copy link

nitinkhosla79 commented Apr 2, 2020

All kudos to @jlovegren0. He added low-level support for numbering styles
We are able to generate multilevel bullet lists now.
PR: #582
We are helping him test his PR. Requesting everyone to please try his api/sample code(in PR comments) and share your success story as well. OR provide feedback.
Thank you all for your support.

@komawar
Copy link

komawar commented Apr 2, 2020

Yes, excellent work indeed by @jlovegren0

Nested numbering lists work with correct numbering.

@komawar
Copy link

komawar commented Apr 6, 2020

We have a good discussion on PR: #582
This should lay out different possibilities on the implementation of the problem and some of the corner cases/challenges therein

again approbation for @jlovegren0 for all the crafted inputs and workarounds

@VictorBancho
Copy link

@yurac Thanks. Actually, I solved my issue serveral days ago in another work-around way. I didn't walk the way that changing the original docx packages but adding some features in the app of myself. But I wrote much more amount of code than you did. I read the document_part as well as the numbering_part from the Document object that I was editing. Firstly I parse the document_part using lxml, once I extract one numbering list paragraph, I create a abstract number node for it in the numbering_part. So that can make sure every list has its numbering restarted. It seems every road lead to Rome. Thanks very much for supplying me a new way.

Hi @downtown12, I know it has been quite a few years since you posted this but I have encountered this issue and wanted to try your solution. Would you mind elaborating on how you parsed the document_part (which property?) using lxml and located the numbering list paragraphs? Furthermoer, how did you manage to create and where/how to insert the abstract number nodes?

I've iterated through the document.part._element, and its children, but that didn't seem to show anything insightful (no list number objects etc even though the document has lists). Wasn't clear if the docs for python-docx also elaborated on how to do this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests