Skip to content

PathSimplifier fails to ignore CLOSEPOLY vertices #17914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
brunobeltran opened this issue Jul 13, 2020 · 3 comments · Fixed by #28478
Closed

PathSimplifier fails to ignore CLOSEPOLY vertices #17914

brunobeltran opened this issue Jul 13, 2020 · 3 comments · Fixed by #28478
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! status: confirmed bug topic: path handling
Milestone

Comments

@brunobeltran
Copy link
Contributor

Bug report

Bug summary

This bug does not seem to have surfaced anywhere yet, but Path's with NaN vertices at CLOSEPOLY can come through PathNanRemover in a "correct" state but then PathSimplifier can sometimes break it:

Code for reproduction/Actual Outcome

>>> p = Path([(0, 0), (1, 0), (1, 1), (np.nan, np.nan)],
         [Path.MOVETO, Path.LINETO, Path.LINETO, Path.CLOSEPOLY])
>>> # correctly ignores CLOSEPOLY even though it's NaN
>>> p.cleaned(remove_nans=True)
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [ 0.,  0.]]), array([ 1,  2,  2, 79,  0], dtype=uint8))
>>> # but now for some reason these NaN's are used to populate LINETO's
>>> p.cleaned(remove_nans=True, simplify=True)
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [nan, nan],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 0], dtype=uint8))

Expected outcome

The values of the vertices in a CLOSEPOLY should always be ignored, in favor of the most recent MOVETO's vertex values:

>>> p.cleaned(remove_nans=True, simplify=True)
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [0, 0],
       [ 0.,  0.]]), array([1, 2, 2, 2, 0], dtype=uint8))

Matplotlib version

  • Operating system:
  • Matplotlib version:
  • Matplotlib backend (print(matplotlib.get_backend())):
  • Python version:
  • Jupyter version (if applicable):
  • Other libraries:
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

This issue has been marked "inactive" because it has been 365 days since the last comment. If this issue is still present in recent Matplotlib releases, or the feature request is still wanted, please leave a comment and this label will be removed. If there are no updates in another 30 days, this issue will be automatically closed, but you are free to re-open or create a new issue if needed. We value issue reports, and this procedure is meant to help us resurface and prioritize issues that have not been addressed yet, not make them disappear. Thanks for your help!

@github-actions github-actions bot added the status: inactive Marked by the “Stale” Github Action label Aug 9, 2023
@tacaswell
Copy link
Member

I think the bug is in the path simplification dealing with nan points (sorry for the non-contiguous numbering, only copied the interesting lines 😉 ) :

In [4]: p = Path([(0, 0), (1, 0), (1, 1), (np.nan, np.nan)],
   ...:          [Path.MOVETO, Path.LINETO, Path.LINETO, Path.CLOSEPOLY])
In [9]: p.cleaned(simplify=True)
Out[9]:
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [nan, nan],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 0], dtype=uint8))
In [13]: p.cleaned(simplify=True).cleaned(simplify=True)
Out[13]:
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [nan, nan],
       [nan, nan],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 0], dtype=uint8))

In [14]: p.cleaned(simplify=True).cleaned(simplify=True)
Out[14]:
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [nan, nan],
       [nan, nan],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 0], dtype=uint8))

In [15]: p.cleaned(simplify=True).cleaned(simplify=True).cleaned(simplify=True)
Out[15]:
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [nan, nan],
       [nan, nan],
       [nan, nan],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 2, 2, 0], dtype=uint8))

at it seems to insert an extra LINETO nan everytime it is run!

It is not surprising to me that the cleaning steps are not abealian so I am not sure I am too worried that

In [17]: p.cleaned(simplify=True).cleaned(remove_nans=True)
Out[17]:
Path(array([[0., 0.],
       [1., 0.],
       [1., 1.],
       [0., 0.]]), array([1, 2, 2, 0], dtype=uint8))

In [18]: p.cleaned(remove_nans=True).cleaned(simplify=True)
Out[18]:
Path(array([[ 0.,  0.],
       [ 1.,  0.],
       [ 1.,  1.],
       [nan, nan],
       [nan, nan],
       [ 0.,  0.]]), array([1, 2, 2, 2, 2, 0], dtype=uint8))

are different and passing both keywords does the second:

matplotlib/src/_path.h

Lines 1033 to 1045 in f800bf6

typedef agg::conv_transform<py::PathIterator> transformed_path_t;
typedef PathNanRemover<transformed_path_t> nan_removal_t;
typedef PathClipper<nan_removal_t> clipped_t;
typedef PathSnapper<clipped_t> snapped_t;
typedef PathSimplifier<snapped_t> simplify_t;
typedef agg::conv_curve<simplify_t> curve_t;
typedef Sketch<curve_t> sketch_t;
transformed_path_t tpath(path, trans);
nan_removal_t nan_removed(tpath, remove_nans, path.has_codes());
clipped_t clipped(nan_removed, do_clip, rect);
snapped_t snapped(clipped, snap_mode, path.total_vertices(), stroke_width);
simplify_t simplified(snapped, do_simplify, path.simplify_threshold());

The implemenation of the simplification is at

unsigned vertex(double *x, double *y)
{
unsigned cmd;
/* The simplification algorithm doesn't support curves or compound paths
so we just don't do it at all in that case... */
if (!m_simplify) {
return m_source->vertex(x, y);
}
/* idea: we can skip drawing many lines: we can combine
sequential parallel lines into a
single line instead of redrawing lines over the same
points. The loop below works a bit like a state machine,
where what it does depends on what it did in the last
looping. To test whether sequential lines are close to
parallel, I calculate the distance moved perpendicular to
the last line. Once it gets too big, the lines cannot be
combined. */
/* This code was originally written by Allan Haldane and I
have modified to work in-place -- meaning not creating an
entirely new path list each time. In order to do that
without too much additional code complexity, it keeps a
small queue around so that multiple points can be emitted
in a single call, and those points will be popped from the
queue in subsequent calls. The following block will empty
the queue before proceeding to the main loop below.
-- Michael Droettboom */
/* This code was originally written by Allan Haldane and
updated by Michael Droettboom. I have modified it to
handle anti-parallel vectors. This is done essentially
the same way as parallel vectors, but requires a little
additional book-keeping to track whether or not we have
observed an anti-parallel vector during the current run.
-- Kevin Rose */
if (queue_pop(&cmd, x, y)) {
return cmd;
}
/* The main simplification loop. The point is to consume only
as many points as necessary until something has been added
to the outbound queue, not to run through the entire path
in one go. This eliminates the need to allocate and fill
an entire additional path array on each draw. */
while ((cmd = m_source->vertex(x, y)) != agg::path_cmd_stop) {
/* if we are starting a new path segment, move to the first point
+ init */
if (m_moveto || cmd == agg::path_cmd_move_to) {
/* m_moveto check is not generally needed because
m_source generates an initial moveto; but it is
retained for safety in case circumstances arise
where this is not true. */
if (m_origdNorm2 != 0.0 && !m_after_moveto) {
/* m_origdNorm2 is nonzero only if we have a
vector; the m_after_moveto check ensures we
push this vector to the queue only once. */
_push(x, y);
}
m_after_moveto = true;
m_lastx = *x;
m_lasty = *y;
m_moveto = false;
m_origdNorm2 = 0.0;
m_dnorm2BackwardMax = 0.0;
m_clipped = true;
if (queue_nonempty()) {
/* If we did a push, empty the queue now. */
break;
}
continue;
}
m_after_moveto = false;
/* NOTE: We used to skip this very short segments, but if
you have a lot of them cumulatively, you can miss
maxima or minima in the data. */
/* Don't render line segments less than one pixel long */
/* if (fabs(*x - m_lastx) < 1.0 && fabs(*y - m_lasty) < 1.0) */
/* { */
/* continue; */
/* } */
/* if we have no orig vector, set it to this vector and
continue. this orig vector is the reference vector we
will build up the line to */
if (m_origdNorm2 == 0.0) {
if (m_clipped) {
queue_push(agg::path_cmd_move_to, m_lastx, m_lasty);
m_clipped = false;
}
m_origdx = *x - m_lastx;
m_origdy = *y - m_lasty;
m_origdNorm2 = m_origdx * m_origdx + m_origdy * m_origdy;
// set all the variables to reflect this new orig vector
m_dnorm2ForwardMax = m_origdNorm2;
m_dnorm2BackwardMax = 0.0;
m_lastForwardMax = true;
m_lastBackwardMax = false;
m_currVecStartX = m_lastx;
m_currVecStartY = m_lasty;
m_nextX = m_lastx = *x;
m_nextY = m_lasty = *y;
continue;
}
/* If got to here, then we have an orig vector and we just got
a vector in the sequence. */
/* Check that the perpendicular distance we have moved
from the last written point compared to the line we are
building is not too much. If o is the orig vector (we
are building on), and v is the vector from the last
written point to the current point, then the
perpendicular vector is p = v - (o.v)o/(o.o)
(here, a.b indicates the dot product of a and b). */
/* get the v vector */
double totdx = *x - m_currVecStartX;
double totdy = *y - m_currVecStartY;
/* get the dot product o.v */
double totdot = m_origdx * totdx + m_origdy * totdy;
/* get the para vector ( = (o.v)o/(o.o)) */
double paradx = totdot * m_origdx / m_origdNorm2;
double parady = totdot * m_origdy / m_origdNorm2;
/* get the perp vector ( = v - para) */
double perpdx = totdx - paradx;
double perpdy = totdy - parady;
/* get the squared norm of perp vector ( = p.p) */
double perpdNorm2 = perpdx * perpdx + perpdy * perpdy;
/* If the perpendicular vector is less than
m_simplify_threshold pixels in size, then merge
current x,y with the current vector */
if (perpdNorm2 < m_simplify_threshold) {
/* check if the current vector is parallel or
anti-parallel to the orig vector. In either case,
test if it is the longest of the vectors
we are merging in that direction. If it is, then
update the current vector in that direction. */
double paradNorm2 = paradx * paradx + parady * parady;
m_lastForwardMax = false;
m_lastBackwardMax = false;
if (totdot > 0.0) {
if (paradNorm2 > m_dnorm2ForwardMax) {
m_lastForwardMax = true;
m_dnorm2ForwardMax = paradNorm2;
m_nextX = *x;
m_nextY = *y;
}
} else {
if (paradNorm2 > m_dnorm2BackwardMax) {
m_lastBackwardMax = true;
m_dnorm2BackwardMax = paradNorm2;
m_nextBackwardX = *x;
m_nextBackwardY = *y;
}
}
m_lastx = *x;
m_lasty = *y;
continue;
}
/* If we get here, then this vector was not similar enough to the
line we are building, so we need to draw that line and start the
next one. */
/* If the line needs to extend in the opposite direction from the
direction we are drawing in, move back to we start drawing from
back there. */
_push(x, y);
break;
}
/* Fill the queue with the remaining vertices if we've finished the
path in the above loop. */
if (cmd == agg::path_cmd_stop) {
if (m_origdNorm2 != 0.0) {
queue_push((m_moveto || m_after_moveto) ? agg::path_cmd_move_to
: agg::path_cmd_line_to,
m_nextX,
m_nextY);
if (m_dnorm2BackwardMax > 0.0) {
queue_push((m_moveto || m_after_moveto) ? agg::path_cmd_move_to
: agg::path_cmd_line_to,
m_nextBackwardX,
m_nextBackwardY);
}
m_moveto = false;
}
queue_push((m_moveto || m_after_moveto) ? agg::path_cmd_move_to : agg::path_cmd_line_to,
m_lastx,
m_lasty);
m_moveto = false;
queue_push(agg::path_cmd_stop, 0.0, 0.0);
}
/* Return the first item in the queue, if any, otherwise
indicate that we're done. */
if (queue_pop(&cmd, x, y)) {
return cmd;
} else {
return agg::path_cmd_stop;
}
}

I suspect that the fix is to add a few lines of special-casing nan because nan is equal to nothing and all math involving nans results in nan so the "is the vector going the same direction enough" logic fails.

I'm marking this as a good first issue (there is no API design and a pretty clear tast case), but has medium difficulty.

To take this issue on you should know both Python and C++ pretty well (including templates, classes, and understanding the vertex iterator protocol). Review on this is likely to be slow.

@tacaswell tacaswell added this to the v3.9.0 milestone Aug 9, 2023
@tacaswell tacaswell added status: confirmed bug Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! labels Aug 9, 2023
@github-actions
Copy link

github-actions bot commented Aug 9, 2023

Good first issue - notes for new contributors

This issue is suited to new contributors because it does not require understanding of the Matplotlib internals. To get started, please see our contributing guide.

We do not assign issues. Check the Development section in the sidebar for linked pull requests (PRs). If there are none, feel free to start working on it. If there is an open PR, please collaborate on the work by reviewing it rather than duplicating it in a competing PR.

If something is unclear, please reach out on any of our communication channels.

@github-actions github-actions bot removed the status: inactive Marked by the “Stale” Github Action label Aug 11, 2023
@tacaswell tacaswell modified the milestones: v3.9.0, v3.10.0 Mar 13, 2024
@r3kste r3kste mentioned this issue Jun 27, 2024
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Difficulty: Medium https://matplotlib.org/devdocs/devel/contribute.html#good-first-issues Good first issue Open a pull request against these issues if there are no active ones! status: confirmed bug topic: path handling
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants