Skip to content

Commit 16611e1

Browse files
Adding pandas viz sprint
1 parent 563103e commit 16611e1

File tree

1 file changed

+70
-0
lines changed

1 file changed

+70
-0
lines changed

_posts/2019-06-05-pandas-viz.md

Lines changed: 70 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,70 @@
1+
---
2+
category: "london"
3+
title: "pandas visualisation"
4+
level: "All"
5+
time: "18:00"
6+
rsvp_link: https://www.meetup.com/Python-Sprints/events/261730665/
7+
project:
8+
sponsor: jpmorgan
9+
---
10+
11+
Plotting in `pandas` is very easy, mainly by using `Series.plot()` and `DataFrame.plot()` methods.
12+
There is a plotting subsystem in `pandas` based in `matplotlib` that implement different types of
13+
plots (e.g. lines, bars, boxplots, kde...).
14+
15+
In the last months, several new projects have been create to address new use cases for visualising
16+
in `pandas`. For example, to generate interactive plots. Some of these projects are:
17+
18+
- https://hvplot.pyviz.org
19+
- https://github.com/PatrikHlobil/Pandas-Bokeh
20+
- https://github.com/altair-viz/pdvega/
21+
22+
Those libraries have been monkey patching `pandas` to be able to plot easily, so plotting can be
23+
done by using `DataFrame.hvplot()`, `DataFrame.plot_bokeh()`...
24+
25+
But a better design would be to decouple the existing plotting code in pandas into an extension
26+
registered with the `pandas` extension capabilities, and be able to select with an option with
27+
plotting backend the user wants to use. The resulting code would be:
28+
29+
```python
30+
pandas.set_option('plotting.backend', 'hvplot')
31+
df.plot()
32+
```
33+
34+
With this architecture, there are several advantages:
35+
36+
- Developing new plotting backends for pandas becomes much simpler
37+
- Plotting backends share a common API
38+
- Users of pandas don't need to learn a new syntax for each plugin
39+
- Migrating existing code becomes trivial, by just adding a single line of code setting the option for the backend
40+
- Internal pandas code becomes cleaner, with the plotting code fully decoupled from the rest
41+
42+
Work on this is already going on, with a first PR that decoupled the current plotting code:
43+
44+
https://github.com/pandas-dev/pandas/pull/26414
45+
46+
In this sprint we will continue the work, by working on different tasks:
47+
- Adding the option to select the backend
48+
- Defining and documenting the pandas plotting API
49+
- Porting existing libraries to the new plotting API
50+
- Document current pandas functionality
51+
52+
As usual, we will give priority to join the sprint to the next people:
53+
54+
- Experienced open source contributors
55+
- People from underrepresented minorities in our sprints
56+
57+
Agenda
58+
------
59+
60+
- 6pm: Food and networking
61+
- 6:30pm: Presentation of the project and the sponsor
62+
- 6:45pm: Coding
63+
- 9pm: Retrospective presentations
64+
65+
66+
The day of the sprint
67+
---------------------
68+
69+
- Bring your own laptop if you can
70+
- Join the [Gitter channel](https://gitter.im/py-sprints/pandas-bokeh) of the sprint

0 commit comments

Comments
 (0)