diff --git a/config.toml b/config.toml index 8883ce1..aabfbc4 100644 --- a/config.toml +++ b/config.toml @@ -3,7 +3,7 @@ languageCode = "en-us" title = "Matplotblog" theme = "aether" canonifyurls = true -paginate = 3 +paginate = 4 [params] head_img = "/mpl_logo.png" @@ -14,3 +14,6 @@ link1_description = "About" [taxonomies] category = "categories" + +[markup.goldmark.renderer] + unsafe = true diff --git a/content/posts/GSoC_2020_Final_Work_Product/index.md b/content/posts/GSoC_2020_Final_Work_Product/index.md new file mode 100644 index 0000000..7922e58 --- /dev/null +++ b/content/posts/GSoC_2020_Final_Work_Product/index.md @@ -0,0 +1,55 @@ +--- +title: "GSoC 2020 Work Product - Baseline Images Problem" +date: 2020-08-16T09:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Final Work Product Report for the Google Summer of Code 2020 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020 is completed. Hurray!! This post discusses about the progress so far in the three months of the coding period from 1 June to 24 August 2020 regarding the project `Baseline Images Problem` under `matplotlib` organisation under the umbrella of `NumFOCUS` organization. + +## Project Details: + +This project helps with the difficulty in adding/modifying tests which require a baseline image. Baseline images are problematic because +- Baseline images cause the repo size to grow rather quickly. +- Baseline images force matplotlib contributors to pin to a somewhat old version of FreeType because nearly every release of FreeType causes tiny rasterization changes that would entail regenerating all baseline images (and thus cause even more repo size growth). + +So, the idea is to not store the baseline images in the repository, instead to create them from the existing tests. + +## Creation of the matplotlib_baseline_images package + +We had created the `matplotlib_baseline_images` package. This package is involved in the sub-wheels directory so that more packages can be added in the same directory, if needed in future. The `matplotlib_baseline_images` package contain baseline images for both `matplotlib` and `mpl_toolkits`. +The package can be installed by using `python3 -mpip install matplotlib_baseline_images`. + +## Creation of the matplotlib baseline image generation flag + +We successfully created the `generate_missing` command line flag for baseline image generation for `matplotlib` and `mpl_toolkits` in the previous months. It was generating the `matplotlib` and the `mpl_toolkits` baseline images initially. Now, we have also modified the existing flow to generate any missing baseline images, which would be fetched from the `master` branch on doing `git pull` or `git checkout -b feature_branch`. + +Now, the image generation on the time of fresh install of matplotlib and the generation of missing baseline images works with the `python3 -pytest lib/matplotlib matplotlib_baseline_image_generation` for the `lib/matplotlib` folder and `python3 -pytest lib/mpl_toolkits matplotlib_baseline_image_generation` for the `lib/mpl_toolkits` folder. + +## Documentation + +We have written documentation explaining the following scenarios: +1. How to generate the baseline images on a fresh install of matplotlib? +2. How to generate the missing baseline images on fetching changes from master? +3. How to install the `matplotlib_baseline_images_package` to be used for testing by the developer? +4. How to intentionally change an image? + +## Links to the work done + +- [Issue](https://github.com/matplotlib/matplotlib/issues/16447) +- [Pull Request](https://github.com/matplotlib/matplotlib/pull/17793) +- [Blog Posts](https://matplotlib.org/matplotblog/categories/gsoc/) + +## Mentors + +- Thomas A Caswell +- Hannah +- Antony Lee + +I am grateful to be part of such a great community. Project is really interesting and challenging :) + +Thanks Thomas, Antony and Hannah for helping me to complete this project. + diff --git a/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_Final/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Final/index.md b/content/posts/GSoC_2021_Final/index.md new file mode 100644 index 0000000..c956f6e --- /dev/null +++ b/content/posts/GSoC_2021_Final/index.md @@ -0,0 +1,169 @@ +--- +title: "GSoC'21: Final Report" +date: 2021-08-17T17:36:40+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Google Summer of Code 2021: Final Report - Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**Matplotlib: Revisiting Text/Font Handling** + +To kick things off for the final report, here's a [meme](https://user-images.githubusercontent.com/43996118/129448683-bc136398-afeb-40ac-bbb7-0576757baf3c.jpg) to nudge about the [previous blogs](https://matplotlib.org/matplotblog/categories/gsoc/). +## About Matplotlib +Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations, which has become a _de-facto Python plotting library_. + +Much of the implementation behind its font manager is inspired by [W3C](https://www.w3.org/) compliant algorithms, allowing users to interact with font properties like `font-size`, `font-weight`, `font-family`, etc. + +#### However, the way Matplotlib handled fonts and general text layout was not ideal, which is what Summer 2021 was all about. + +> By "not ideal", I do not mean that the library has design flaws, but that the design was engineered in the early 2000s, and is now _outdated_. + +(..more on this later) + +### About the Project +(PS: here's [the link](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/view#heading=h.feg5pv3x59u2) to my GSoC proposal, if you're interested) + +Overall, the project was divided into two major subgoals: +1. Font Subsetting +2. Font Fallback + +But before we take each of them on, we should get an idea about some basic terminology for fonts (which are a _lot_, and are rightly _confusing_) + +The [PR: Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346/files) brings about a bit of clarity about some of these terms. The next section has a linked PR which also explains the types of fonts and how that is relevant to Matplotlib. +## Font Subsetting +An easy-to-read guide on Fonts and Matplotlib was created with [PR: [Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450), which is currently live at [Matplotlib's DevDocs](https://matplotlib.org/devdocs/users/fonts.html). + +Taking an excerpt from one of my previous blogs (and [the doc](https://matplotlib.org/devdocs/users/fonts.html#subsetting)): + +> Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are required for a certain array of characters, and embed only those within the output. + +PDF, PS/EPS and SVG output document formats are special, as in **the text within them can be editable**, i.e, one can copy/search text from documents (for eg, from a PDF file) if the text is editable. + +### Matplotlib and Subsetting +The PDF, PS/EPS and SVG backends used to support font subsetting, _only for a few types_. What that means is, before Summer '21, Matplotlib could generate Type 3 subsets for PDF, PS/EPS backends, but it *could not* generate Type 42 / TrueType subsets. + +With [PR: Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) merged in, users can expect their PDF/PS/EPS documents to contains subsetted glyphs from the original fonts. + +This is especially benefitial for people who wish to use commercial (or [CJK](https://en.wikipedia.org/wiki/CJK_characters)) fonts. Licenses for many fonts ***require*** subsetting such that they can’t be trivially copied from the output files generated from Matplotlib. + +## Font Fallback +Matplotlib was designed to work with a single font at runtime. A user _could_ specify a `font.family`, which was supposed to correspond to [CSS](https://www.w3schools.com/cssref/pr_font_font-family.asp) properties, but that was only used to find a _single_ font present on the user's system. + +Once that font was found (which is almost always found, since Matplotlib ships with a set of default fonts), all the user text was rendered only through that font. (which used to give out "tofu" if a character wasn't found) + +--- + +It might seem like an _outdated_ approach for text rendering, now that we have these concepts like font-fallback, but these concepts weren't very well discussed in early 2000s. Even getting a single font to work _was considered a hard engineering problem_. + +This was primarily because of the lack of **any standardization** for representation of fonts (Adobe had their own font representation, and so did Apple, Microsoft, etc.) + + +| ![Previous](https://user-images.githubusercontent.com/43996118/128605750-9d76fa4a-ce57-45c6-af23-761334d48ef7.png) | ![After](https://user-images.githubusercontent.com/43996118/128605746-9f79ebeb-c03d-407e-9e27-c3203a210908.png) | +|--------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------| +

+ Previous (notice Tofus) VS After (CJK font as fallback) +

+ +To migrate from a font-first approach to a text-first approach, there are multiple steps involved: + +### Parsing the whole font family +The very first (and crucial!) step is to get to a point where we have multiple font paths (ideally individual font files for the whole family). That is achieved with either: +- [PR: [with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496), or +- [PR: [without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) + +Quoting one of my [previous](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) blogs: +> Don’t break, a lot at stake! + +My first approach was to change the existing public `findfont` API to incorporate multiple filepaths. Since Matplotlib has a _very huge_ userbase, there's a high chance it would break a chunk of people's workflow: + +

+ FamilyParsingFlowChart + First PR (left), Second PR (right) +

+ +### FT2Font Overhaul +Once we get a list of font paths, we need to change the internal representation of a "font". Matplotlib has a utility called FT2Font, which is written in C++, and used with wrappers as a Python extension, which in turn is used throughout the backends. For all intents and purposes, it used to mean: ```FT2Font === SingleFont``` (if you're interested, here's a [meme](https://user-images.githubusercontent.com/43996118/128352387-76a3f52a-20fc-4853-b624-0c91844fc785.png) about how FT2Font was named!) + +But that is not the case anymore, here's a flowchart to explain what happens now: +

+ FamilyParsingFlowChart + Font-Fallback Algorithm +

+ +With [PR: Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740), every FT2Font object has a `std::vector fallback_list`, which is used for filling the parent cache, as can be seen in the self-explanatory flowchart. + +For simplicity, only one type of cache (character -> FT2Font) is shown, whereas in actual implementation there's 2 types of caches, one shown above, and another for glyphs (glyph_id -> FT2Font). + +> Note: Only the parent's APIs are used in some backends, so for each of the individual public functions like `load_glyph`, `load_char`, `get_kerning`, etc., we find the FT2Font object which has that glyph from the parent FT2Font cache! + +### Multi-Font embedding in PDF/PS/EPS +Now that we have multiple fonts to render a string, we also need to embed them for those special backends (i.e., PDF/PS, etc.). This was done with some patches to specific backends: +- [PR: Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804) +- [PR: Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832) + +With this, one could create a PDF or a PS/EPS document with multiple fonts which are embedded (and subsetted!). + +## Conclusion +From small contributions to eventually working on a core module of such a huge library, the road was not what I had imagined, and I learnt a lot while designing solutions to these problems. + +#### The work I did would eventually end up affecting every single Matplotlib user. +...since all plots will work their way through the new codepath! + +I think that single statement is worth the whole GSoC project. + +### Pull Request Statistics +For the sake of statistics (and to make GSoC sound a bit less intimidating), here's a list of contributions I made to Matplotlib before Summer '21, most of which are only a few lines of diff: + +| Created At | PR Title | Diff | Status | +|:------------: |------------------------------------------------------------------------------------------------------------------------- |:---------------: |:------: | +| Nov 2, 2020 | [Expand ScalarMappable.set_array to accept array-like inputs](https://github.com/matplotlib/matplotlib/pull/18870) | (+28 −4) | MERGED | +| Nov 8, 2020 | [Add overset and underset support for mathtext](https://github.com/matplotlib/matplotlib/pull/18916) | (+71 −0) | MERGED | +| Nov 14, 2020 | [Strictly increasing check with test coverage for streamplot grid](https://github.com/matplotlib/matplotlib/pull/18947) | (+54 −2) | MERGED | +| Jan 11, 2021 | [WIP: Add support to edit subplot configurations via textbox](https://github.com/matplotlib/matplotlib/pull/19271) | (+51 −11) | DRAFT | +| Jan 18, 2021 | [Fix over/under mathtext symbols](https://github.com/matplotlib/matplotlib/pull/19314) | (+7,459 −4,169) | MERGED | +| Feb 11, 2021 | [Add overset/underset whatsnew entry](https://github.com/matplotlib/matplotlib/pull/19497) | (+28 −17) | MERGED | +| May 15, 2021 | [Warn user when mathtext font is used for ticks](https://github.com/matplotlib/matplotlib/pull/20235) | (+28 −0) | MERGED | + +Here's a list of PRs I opened during Summer'21: +- [Status: ✅] [Clarify/Improve docs on family-names vs generic-families](https://github.com/matplotlib/matplotlib/pull/20346) +- [Status: ✅] [Add parse_math in Text and default it False for TextBox](https://github.com/matplotlib/matplotlib/pull/20367) +- [Status: ✅] [Type42 subsetting in PS/PDF](https://github.com/matplotlib/matplotlib/pull/20391) +- [Status: ✅] [[Doc] Font Types and Font Subsetting](https://github.com/matplotlib/matplotlib/pull/20450) +- [Status: 🚧] [[with findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20496) +- [Status: 🚧] [[without findfont diff] Parsing all families in font_manager](https://github.com/matplotlib/matplotlib/pull/20549) +- [Status: 🚧] [Implement Font-Fallback in Matplotlib](https://github.com/matplotlib/matplotlib/pull/20740) +- [Status: 🚧] [Implement multi-font embedding for PDF Backend](https://github.com/matplotlib/matplotlib/pull/20804) +- [Status: 🚧] [Implement multi-font embedding for PS Backend](https://github.com/matplotlib/matplotlib/pull/20832) + + +## Acknowledgements +From learning about software engineering fundamentals from [Tom](https://github.com/tacaswell) to learning about nitty-gritty details about font representations from [Jouni](https://github.com/jkseppan); + +From learning through [Antony](https://github.com/anntzer)'s patches and pointers to receiving amazing feedback on these blogs from [Hannah](https://github.com/story645), it has been an adventure! 💯 + +_Special Mentions: [Frank](https://github.com/sauerburger), [Srijan](https://github.com/srijan-paul) and [Atharva](https://github.com/tfidfwastaken) for their helping hands!_ + +And lastly, _you_, the reader; if you've been following my [previous blogs](https://matplotlib.org/matplotblog/categories/gsoc/), or if you've landed at this one directly, I thank you nevertheless. (one last [meme](https://user-images.githubusercontent.com/43996118/126441988-5a2067fd-055e-44e5-86e9-4dddf47abc9d.png), I promise!) + +I know I speak for every developer out there, when I say ***it means a lot*** when you choose to look at their journey or their work product; it could as well be a tiny website, or it could be as big as designing a complete library! + +
+ +> I'm grateful to [Maptlotlib](https://matplotlib.org/) (under the parent organisation: [NumFOCUS](https://numfocus.org/)), and of course, [Google Summer of Code](https://summerofcode.withgoogle.com/) for this incredible learning opportunity. + +Farewell, reader! :') + +

+ MatplotlibGSoC + Consider contributing to Matplotlib (Open Source in general) ❤️ +

+ +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-final/). diff --git a/content/posts/GSoC_2021_Introduction/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Introduction/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_Introduction/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Introduction/index.md b/content/posts/GSoC_2021_Introduction/index.md new file mode 100644 index 0000000..dcc586d --- /dev/null +++ b/content/posts/GSoC_2021_Introduction/index.md @@ -0,0 +1,92 @@ +--- +title: "Aitik Gupta joins as a Student Developer under GSoC'21" +date: 2021-05-19T20:03:57+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Introduction about Aitik Gupta, Google Summer of Code 2021 Intern under the parent organisation: NumFOCUS" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**The day of result, was a very, very long day.** + +With this small writeup, I intend to talk about everything before _that day_, my experiences, my journey, and the role of Matplotlib throughout! + +## About Me +I am a third-year undergraduate student currently pursuing a Dual Degree (B.Tech + M.Tech) in Information Technology at Indian Institute of Information Technology, Gwalior. + +During my sophomore year, my interests started expanding in the domain of Machine Learning, where I learnt about various amazing open-source libraries like *NumPy*, *SciPy*, *pandas*, and *Matplotlib*! Gradually, in my third year, I explored the field of Computer Vision during my internship at a startup, where a big chunk of my work was to integrate their native C++ codebase to Android via JNI calls. + +To actuate my learnings from the internship, I worked upon my own research along with a [friend from my university](https://linkedin.com/in/aaditagarwal). The paper was accepted in CoDS-COMAD’21 and is published at ACM Digital Library. ([Link](https://dl.acm.org/doi/abs/10.1145/3430984.3430986), if anyone's interested) + +During this period, I also picked up the knack for open-source and started glaring at various issues (and pull requests) in libraries, including OpenCV [[contributions](https://github.com/opencv/opencv/issues?q=author%3Aaitikgupta+)] and NumPy [[contributions](https://github.com/numpy/numpy/issues?q=author%3Aaitikgupta+)]. + +I quickly got involved in Matplotlib’s community; it was very welcoming and beginner-friendly. + +**Fun fact: Its dev call was the very first I attended with people from all around the world!** + +## First Contributions +We all mess up, my [very first PR](https://github.com/opencv/opencv/pull/18440) to an organisation like OpenCV went horrible, till date, it looks like this: +![OpenCV_PR](https://user-images.githubusercontent.com/43996118/118848259-35d6e300-b8ec-11eb-8cdc-387e9f5a37a3.png) + +In all honesty, I added a single commit with only a few lines of diff. +> However, I pulled all the changes from upstream `master` to my working branch, whereas the PR was to be made on `3.4` branch. + +I'm sure I could've done tons of things to solve it, but at that time I couldn't do anything - imagine the anxiety! + +At this point when I look back at those fumbled PRs, I feel like they were important for my learning process. + +**Fun Fact: Because of one of these initial contributions, I got a shiny little badge [[Mars 2020 Helicopter Contributor](https://github.com/readme/nasa-ingenuity-helicopter)] on GitHub!** + + + + +## Getting started with Matplotlib +It was around initial weeks of November last year, I was scanning through `Good First Issue` and `New Feature` labels, I realised a pattern - most Mathtext related issues were unattended. + +To make it simple, Mathtext is a part of Matplotlib which parses mathematical expressions and provides TeX-like outputs, for example: + + +I scanned the related source code to try to figure out how to solve those Mathtext issues. Eventually, with the help of maintainers reviewing the PRs and a lot of verbose discussions on GitHub issues/pull requests and on the [Gitter](https://gitter.im/matplotlib/matplotlib) channel, I was able to get my initial PRs merged! + +## Learning throughout the process +Most of us use libraries without understanding the underlining structure of them, which sometimes can cause downstream bugs! + +While I was studying Matplotlib's architecture, I figured that I could use the same ideology for one of my [own projects](https://aitikgupta.github.io/swi-ml/)! + +Matplotlib uses a global dictionary-like object named as `rcParams`, I used a smaller interface, similar to rcParams, in [swi-ml](https://pypi.org/project/swi-ml/) - a small Python library I wrote, implementing a subset of ML algorithms, with a switchable backend. + + +## Where does GSoC fit? +It was around January, I had a conversation with one of the maintainers (hey [Antony](https://github.com/anntzer)!) about the long-list of issues with the current ways of handling texts/fonts in the library. + +After compiling them into an order, after few tweaks from maintainers, [GSoC Idea-List](https://github.com/matplotlib/matplotlib/wiki/GSOC-2021-ideas) for Matplotlib was born. And so did my journey of building a strong proposal! + +## About the Project +#### Proposal Link: [Google Docs](https://docs.google.com/document/d/11PrXKjMHhl0rcQB4p_W9JY_AbPCkYuoTT0t85937nB0/edit?usp=sharing) (will stay alive after GSoC), [GSoC Website](https://storage.googleapis.com/summerofcode-prod.appspot.com/gsoc/core_project/doc/6319153410998272_1617936740_GSoC_Proposal_-_Matplotlib.pdf?Expires=1621539234&GoogleAccessId=summerofcode-prod%40appspot.gserviceaccount.com&Signature=QU8uSdPnXpa%2FooDtzVnzclz809LHjh9eU7Y7iR%2FH1NM32CBgzBO4%2FFbMeDmMsoic91B%2BKrPZEljzGt%2Fx9jtQeCR9X4O53JJLPVjw9Bg%2Fzb2YKjGzDk0oFMRPXjg9ct%2BV58PD6f4De1ucqARLtHGjis5jhK1W08LNiHAo88NB6BaL8Q5hqcTBgunLytTNBJh5lW2kD8eR2WeENnW9HdIe53aCdyxJkYpkgILJRoNLCvp111AJGC3RLYba9VKeU6w2CdrumPfRP45FX6fJlrKnClvxyf5VHo3uIjA3fGNWIQKwGgcd1ocGuFN3YnDTS4xkX3uiNplwTM4aGLQNhtrMqA%3D%3D) (not so sure) + +### Revisiting Text/Font Handling +The aim of the project is divided into 3 subgoals: + +1. **Font-Fallback**: A redesigned text-first font interface - essentially parsing all family before rendering a "tofu". + + *(similar to specifying font-family in CSS!)* +2. **Font Subsetting**: Every exported PS/PDF would contain embedded glyphs subsetted from the whole font. + + *(imagine a plot with just a single letter "a", would you like it if the PDF you exported from Matplotlib to embed the whole font file within it?)* + +3. Most mpl backends would use the unified TeX exporting mechanism + +**Mentors** [Thomas A Caswell](https://github.com/tacaswell), [Antony Lee](https://github.com/anntzer), [Hannah](https://github.com/story645). + +Thanks a lot for spending time reading the blog! I'll be back with my progress in subsequent posts. + + +##### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-intro/)! + diff --git a/content/posts/GSoC_2021_MidTerm/AitikGupta_GSoC.png b/content/posts/GSoC_2021_MidTerm/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_MidTerm/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_MidTerm/index.md b/content/posts/GSoC_2021_MidTerm/index.md new file mode 100644 index 0000000..dece87c --- /dev/null +++ b/content/posts/GSoC_2021_MidTerm/index.md @@ -0,0 +1,88 @@ +--- +title: "GSoC'21: Mid-Term Progress" +date: 2021-07-02T08:32:05+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Mid-Term Progress with Google Summer of Code 2021 project under NumFOCUS: Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**"Aitik, how is your GSoC going?"** + +Well, it's been a while since I last wrote. But I wasn't spending time watching _Loki_ either! (that's a lie.) + +During this period the project took on some interesting (and stressful) curves, which I intend to talk about in this small writeup. +## New Mentor! +The first week of coding period, and I met one of my new mentors, [Jouni](https://github.com/jkseppan). Without him, along with [Tom](https://github.com/tacaswell) and [Antony](https://github.com/anntzer), the project wouldn't have moved _an inch_. + +It was initially Jouni's [PR](https://github.com/matplotlib/matplotlib/pull/18143) which was my starting point of the first milestone in my proposal, Font Subsetting. + +## What is Font Subsetting anyway? +As was proposed by Tom, a good way to understand something is to document your journey along the way! (well, that's what GSoC wants us to follow anyway right?) + +Taking an excerpt from one of the paragraphs I wrote [here](https://github.com/matplotlib/matplotlib/blob/a94f52121cea4194a5d6f6fc94eafdfb03394628/doc/users/fonts.rst#subsetting): +> Font Subsetting can be used before generating documents, to embed only the _required_ glyphs within the documents. Fonts can be considered as a collection of these glyphs, so ultimately the goal of subsetting is to find out which glyphs are required for a certain array of characters, and embed only those within the output. + +Now this may seem straightforward, right? +#### Wrong. +The glyph programs can call their own subprograms, for example, characters like `ä` could be composed by calling subprograms for `a` and `¨`; or `→` could be composed by a program that changes the display matrix and calls the subprogram for `←`. + +Since the subsetter has to find out _all such subprograms_ being called by _every glyph_ included in the subset, this is a generally difficult problem! + +Something which one of my mentors said which _really_ stuck with me: +> Matplotlib isn't a font library, and shouldn't try to be one. + +It's really easy to fall into the trap of trying to do _everything_ within your own project, which ends up rather _hurting_ itself. + +Since this holds true even for Matplotlib, it uses external dependencies like [FreeType](https://www.freetype.org/), [ttconv](https://github.com/sandflow/ttconv), and newly proposed [fontTools](https://github.com/fonttools/fonttools) to handle font subsetting, embedding, rendering, and related stuff. + +PS: If that font stuff didn't make sense, I would recommend going through a friendly tutorial I wrote, which is all about [Matplotlib and Fonts](https://matplotlib.org/stable/users/fonts.html)! +## Unexpected Complications +Matplotlib uses an external dependency `ttconv` which was initially forked into Matplotlib's repository **in 2003**! +> ttconv was a standalone commandline utility for converting TrueType fonts to subsetted Type 3 fonts (among other features) written in 1995, which Matplotlib forked in order to make it work as a library. + +Over the time, there were a lot of issues with it which were either hard to fix, or didn't attract a lot of attention. (See the above paragraph for a valid reason) + +One major utility which is still used is `convert_ttf_to_ps`, which takes a _font path_ as input and converts it into a Type 3 or Type 42 PostScript font, which can be embedded within PS/EPS output documents. The guide I wrote ([link](https://matplotlib.org/stable/users/fonts.html)) contains decent descriptions, the differences between these type of fonts, etc. + +#### So we need to convert that _font path_ input to a _font buffer_ input. +Why do we need to? Type 42 subsetting isn't really supported by ttconv, so we use a new dependency called fontTools, whose 'full-time job' is to subset Type 42 fonts for us (among other things). + +> It provides us with a font buffer, however ttconv expects a font path to embed that font + +Easily enough, this can be done by Python's `tempfile.NamedTemporaryFile`: +```python +with tempfile.NamedTemporaryFile(suffix=".ttf") as tmp: + # fontdata is the subsetted buffer + # returned from fontTools + tmp.write(fontdata.getvalue()) + + # TODO: allow convert_ttf_to_ps + # to input file objects (BytesIO) + convert_ttf_to_ps( + os.fsencode(tmp.name), + fh, + fonttype, + glyph_ids, + ) +``` + +***But this is far from a clean API; in terms of separation of \*reading\* the file from \*parsing\* the data.*** + +What we _ideally_ want is to pass the buffer down to `convert_ttf_to_ps`, and modify the embedding code of `ttconv` (written in C++). And _here_ we come across a lot of unexplored codebase, _which wasn't touched a lot ever since it was forked_. + +Funnily enough, just yesterday, after spending a lot of quality time, me and my mentors figured out that the **whole logging system of ttconv was broken**, all because of a single debugging function. 🥲 + +
+ +This is still an ongoing problem that we need to tackle over the coming weeks, hopefully by the next time I write one of these blogs, it gets resolved! + +Again, thanks a ton for spending time reading these blogs. :D +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-mid/). diff --git a/content/posts/GSoC_2021_PreQuarter/AitikGupta_GSoC.png b/content/posts/GSoC_2021_PreQuarter/AitikGupta_GSoC.png new file mode 100644 index 0000000..e769799 Binary files /dev/null and b/content/posts/GSoC_2021_PreQuarter/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_PreQuarter/index.md b/content/posts/GSoC_2021_PreQuarter/index.md new file mode 100644 index 0000000..292495a --- /dev/null +++ b/content/posts/GSoC_2021_PreQuarter/index.md @@ -0,0 +1,92 @@ +--- +title: "GSoC'21: Pre-Quarter Progress" +date: 2021-07-19T07:32:05+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Pre-Quarter Progress with Google Summer of Code 2021 project under NumFOCUS: Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**“Well? Did you get it working?!”** + +Before I answer that question, if you're missing the context, check out my [previous blog](https://matplotlib.org/matplotblog/posts/gsoc_2021_midterm/)'s last few lines.. promise it won't take you more than 30 seconds to get the whole problem! + +With this short writeup, I intend to talk about _what_ we did and _why_ we did, what we did. XD + +## Ostrich Algorithm +Ring any bells? Remember OS (Operating Systems)? It's one of the core CS subjects which I bunked then and regret now. (╥﹏╥) + +The [wikipedia page](https://en.wikipedia.org/wiki/Ostrich_algorithm) has a 2-liner explaination if you have no idea what's an Ostrich Algorithm.. but I know most of y'all won't bother clicking it XD, so here goes: +> Ostrich algorithm is a strategy of ignoring potential problems by "sticking one's head in the sand and pretending there is no problem" + +An important thing to note: it is used when it is more **cost-effective** to _allow the problem to occur than to attempt its prevention_. + +As you might've guessed by now, we ultimately ended up with the *not-so-clean* API (more on this later). + +## What was the problem? +The highest level overview of the problem was: + +``` +❌ fontTools -> buffer -> ttconv_with_buffer +✅ fontTools -> buffer -> tempfile -> ttconv_with_file +``` +The first approach created corrupted outputs, however the second approach worked fine. A point to note here would be that *Method 1* is better in terms of separation of *reading* the file from *parsing* the data. + +1. [fontTools](https://github.com/fonttools/fonttools) handles the Type42 subsetting for us, whereas [ttconv](https://github.com/matplotlib/matplotlib/tree/master/extern/ttconv) handles the embedding. +2. `ttconv_with_buffer` is a modification to the original `ttconv_with_file`; that allows it to input a file buffer instead of a file-path + +You might be tempted to say: +> "Well, `ttconv_with_buffer` must be wrongly modified, duh." + +Logically, yes. `ttconv` was designed to work with a file-path and not a file-object (buffer), and modifying a codebase **written in 1998** turned out to be a larger pain than we anticipated. +#### It came to a point where one of my mentors decided to implement everything in Python! +He even did, but the efforts to get it to production / or to fix `ttconv` embedding were ⋙ to just get on with the second method. That damn ostrich really helped us get out of that debugging hell. 🙃 +## Font Fallback - initial steps +Finally, we're onto the second subgoal for the summer: [Font Fallback](https://www.w3schools.com/css/css_font_fallbacks.asp)! + +To give an idea about how things work right now: +1. User asks Matplotlib to use certain font families, specified by: +```python +matplotlib.rcParams["font-family"] = ["list", "of", "font", "families"] +``` +2. This list is used to search for available fonts on a user's system. +3. However, in current (and previous) versions of Matplotlib: +> As soon as a font is found by iterating the font-family, **all text** is rendered by that _and only that_ font. + +You can immediately see the problems with this approach; using the same font for every character will not render any glyph which isn't present in that font, and will instead spit out a square rectangle called "tofu" (read the first line [here](https://www.google.com/get/noto/)). + +And that is exactly the first milestone! That is, parsing the _entire list_ of font families to get an intermediate representation of a multi-font interface. +## Don't break, a lot at stake! +Imagine if you had the superpower to change Python standard library's internal functions, _without_ consulting anybody. Let's say you wanted to write a solution by hooking in and changing, let's say `str("dumb")` implementation by returning: +```ipython +>>> str("dumb") +["d", "u", "m", "b"] +``` +Pretty "dumb", right? xD + +For your usecase it might work fine, but it would also mean breaking the _entire_ Python userbase' workflow, not to mention the 1000000+ libraries that depend on the original functionality. + +On a similar note, Matplotlib has a public API known as `findfont(prop: str)`, which when given a string (or [FontProperties](https://matplotlib.org/stable/api/font_manager_api.html#matplotlib.font_manager.FontProperties)) finds you a font that best matches the given properties in your system. + +It is used throughout the library, as well as at multiple other places, including downstream libraries. Being naive as I was, I changed this function signature and submitted the [PR](https://github.com/matplotlib/matplotlib/pull/20496). 🥲 + +Had an insightful discussion about this with my mentors, and soon enough raised the [other PR](https://github.com/matplotlib/matplotlib/pull/20549), which didn't touch the `findfont` API at all. + +--- + +One last thing to note: Even if we do complete the first milestone, we wouldn't be done yet, since this is just parsing the entire list to get multiple fonts.. + +We still need to migrate the library's internal implementation from **font-first** to **text-first**! + + +But that's for later, for now: +![OnceAgainThankingYou](https://user-images.githubusercontent.com/43996118/126441988-5a2067fd-055e-44e5-86e9-4dddf47abc9d.png) + +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-pre-quarter/). diff --git a/content/posts/GSoC_2021_Quarter/AitikGupta_GSoC.png b/content/posts/GSoC_2021_Quarter/AitikGupta_GSoC.png new file mode 100644 index 0000000..6a0fb71 Binary files /dev/null and b/content/posts/GSoC_2021_Quarter/AitikGupta_GSoC.png differ diff --git a/content/posts/GSoC_2021_Quarter/index.md b/content/posts/GSoC_2021_Quarter/index.md new file mode 100644 index 0000000..128779e --- /dev/null +++ b/content/posts/GSoC_2021_Quarter/index.md @@ -0,0 +1,144 @@ +--- +title: "GSoC'21: Quarter Progress" +date: 2021-08-03T18:48:00+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Quarter Progress with Google Summer of Code 2021 project under NumFOCUS: Aitik Gupta" +displayInList: true +author: Aitik Gupta + +resources: +- name: featuredImage + src: "AitikGupta_GSoC.png" + params: + showOnTop: true +--- + +**“Matplotlib, I want 多个汉字 in between my text.”** + +Let's say you asked Matplotlib to render a plot with some label containing 多个汉字 (multiple Chinese characters) in between your English text. + +Or conversely, let's say you use a Chinese font with Matplotlib, but you had English text in between (which is quite common). + +> Assumption: the Chinese font doesn't have those English glyphs, and vice versa + +With this short writeup, I'll talk about how does a migration from a font-first to a text-first approach in Matplotlib looks like, which ideally solves the above problem. +### Have the fonts? +Logically, the very first step to solving this would be to ask whether you _have_ multiple fonts, right? + +Matplotlib doesn't ship [CJK](https://en.wikipedia.org/wiki/List_of_CJK_fonts) (Chinese Japanese Korean) fonts, which ideally contains these Chinese glyphs. It does try to cover most grounds with the [default font](https://matplotlib.org/stable/users/dflt_style_changes.html#normal-text) it ships with, however. + +So if you don't have a font to render your Chinese characters, go ahead and install one! Matplotlib will find your installed fonts (after rebuilding the cache, that is). +### Parse the fonts +This is where things get interesting, and what my [previous writeup](https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/) was all about.. + +> Parsing the whole family to get multiple fonts for given font properties + +## FT2Font Magic! +To give you an idea about how things used to work for Matplotlib: +1. A single font was chosen _at draw time_ + (fixed: re [previous writeup]((https://matplotlib.org/matplotblog/posts/gsoc_2021_prequarter/))) +2. Every character displayed in your document was rendered by only that font + (partially fixed: re _this writeup_) + +> FT2Font is a matplotlib-to-font module, which provides high-level Python API to interact with a _single font's operations_ like read/draw/extract/etc. + +Being written in C++, the module needs wrappers around it to be converted into a [Python extension](https://docs.python.org/3/extending/extending.html) using Python's C-API. + +> It allows us to use C++ functions directly from Python! + +So wherever you see a use of font within the library (by library I mean the readable Python codebase XD), you could have derived that: +``` +FT2Font === SingleFont +``` + +Things are be a bit different now however.. +## Designing a multi-font system +FT2Font is basically itself a wrapper around a library called [FreeType](https://www.freetype.org/), which is a freely available software library to render fonts. + +

+

+ FT2Font Naming +
How FT2Font was named
+
+

+ +In my initial proposal.. while looking around how FT2Font is structured, I figured: +``` +Oh, looks like all we need are Faces! +``` +> If you don't know what faces/glyphs/ligatures are, head over to why [Text Hates You](https://gankra.github.io/blah/text-hates-you/). I can guarantee you'll definitely enjoy some real life examples of why text rendering is hard. 🥲 + +Anyway, if you already know what Faces are, it might strike you: + +If we already have all the faces we need from multiple fonts (let's say we created a child of FT2Font.. which only tracks the faces for its families), we should be able to render everything from that parent FT2Font right? + +As I later figured out while finding segfaults in implementing this design: +``` +Each FT2Font is linked to a single FT_Library object! +``` + +If you tried to load the face/glyph/character (basically anything) from a different FT2Font object.. you'll run into serious segfaults. (because one object linked to an `FT_Library` can't really access another object which has it's own `FT_Library`) +```cpp +// face is linked to FT2Font; which is +// linked to a single FT_Library object +FT_Face face = this->get_face(); +FT_Get_Glyph(face->glyph, &placeholder); // works like a charm + +// somehow get another FT2Font's face +FT_Face family_face = this->get_family_member()->get_face(); +FT_Get_Glyph(family_face->glyph, &placeholder); // segfaults! +``` + +Realizing this took a good amount of time! After this I quickly came up with a recursive approach, wherein we: +1. Create a list of FT2Font objects within Python, and pass it down to FT2Font +2. FT2Font will hold pointers to its families via a \ + `std::vector fallback_list` +3. Find if the character we want is available in the current font + 1. If the character is available, use that FT2Font to render that character + 2. If the character isn't found, go to step 3 again, but now iterate through the `fallback_list` +4. That's it! + +A quick overhaul of the above piece of code^ +```cpp +bool ft_get_glyph(FT_Glyph &placeholder) { + FT_Error not_found = FT_Get_Glyph(this->get_face(), &placeholder); + if (not_found) return False; + else return True; +} + +// within driver code +for (uint i=0; ift_get_glyph(placeholder); + if (was_found) break; +} +``` + +With the idea surrounding this implementation, the [Agg backend](https://matplotlib.org/stable/api/backend_agg_api.html) is able to render a document (either through GUI, or a PNG) with multiple fonts! + +

+

+ ChineseInBetween +
PNG straight outta Matplotlib!
+
+

+ +## Python C-API is hard, at first! +I've spent days at Python C-API's [argument doc](https://docs.python.org/3/c-api/arg.html), and it's hard to get what you need at first, ngl. + +But, with the help of some amazing people in the GSoC community ([@srijan-paul](https://srijan-paul.github.io/), [@atharvaraykar](https://atharvaraykar.me/)) and amazing mentors, blockers begone! + +## So are we done? +Oh no. XD + +Things work just fine for the Agg backend, but to generate a PDF/PS/SVG with multiple fonts is another story altogether! I think I'll save that for later. + +

+

+ ThankYouDwight +
If you've been following the progress so far, mayn you're awesome!
+
+

+ +#### NOTE: This blog post is also available at my [personal website](https://aitikgupta.github.io/gsoc-quarter/). diff --git a/content/posts/GSoC_Coding_Phase_Blog_2/index.md b/content/posts/GSoC_Coding_Phase_Blog_2/index.md new file mode 100644 index 0000000..0b84aab --- /dev/null +++ b/content/posts/GSoC_Coding_Phase_Blog_2/index.md @@ -0,0 +1,47 @@ +--- +title: "GSoC Coding Phase 1 Blog 2" +date: 2020-06-24T16:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Progress Report for the second half of the Google Summer of Code 2020 Phase 1 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020's first evaluation is about to complete. This post discusses about the progress so far in the last two weeks of the first coding period from 15 June to 30 June 2020. + +## Completion of the demo package + +We successfully created the demo app and uploaded it to the test.pypi. It contains the main and the secondary package. The main package is analogous to the matplotlib and secondary package is analogous to the matplotlib_baseline_images package as discussed in the previous blog. + +## Learning more about the Git and mpl workflow + +I came across another way to merge the master into the branch to resolve conflicts is by rebasing the master. I understood how to create modular commits inside a pull request for easy reviewal process and better understandability of the code. + +## Creation of the matplotlib_baseline_images package + +Then, we implemented the similar changes to create the `matplotlib_baseline_images` package. Finally, we were successful in uploading it to the [test.pypi](https://test.pypi.org/project/matplotlib.baseline-images/3.3.0rc1/#history). This package is involved in the `sub-wheels` directory so that more packages can be added in the same directory, if needed in future. The `matplotlib_baseline_images` package contain baseline images for both `matplotlib` and `mpl_toolkits`. +Some changes were required in the main `matplotlib` package's setup.py so that it will not take information from the packages present in the `sub-wheels` directory. + +## Symlinking the baseline images + +As baseline images are moved out of the `lib/matplotlib` and `lib/mpl_toolkits` directory. We symlinked the locations where they are used, namely in `lib/matplotlib/testing/decorator.py`, `tools/triage_tests.py`, `lib/matplotlib/tests/__init__.py` and `lib/mpl_toolkits/tests/__init__.py`. + +## Creation of the tests/test_data directory + +There are some test data that is present in the `baseline_images` which doesn't need to be moved to the `matplotlib_baseline_images` package. So, that is stored under the `lib/matplotlib/tests/test_data` folder. + +## Understanding Travis, Appvoyer and Azure-pipelines + +I came across the Continuous Integration tools used at mpl. We tried to install the `matplotlib` followed by `matplotlib_baseline_images` package in all three travis, appvoyer and azure-pipeline. + +## Future Goals + +Once the [current PR](https://github.com/matplotlib/matplotlib/pull/17557) is merged, we will move to the [Proposal for the baseline images problem](https://github.com/matplotlib/matplotlib/issues/16447). + +## Daily Meet-ups + +Everyday meeting initiated at [11:00pm IST](https://everytimezone.com/) via Zoom. Meeting notes are present at HackMD. + +I am grateful to be part of such a great community. Project is really interesting and challenging :) Thanks Antony and Hannah for helping me so far. + \ No newline at end of file diff --git a/content/posts/GSoC_Coding_Phase_Blog_3/index.md b/content/posts/GSoC_Coding_Phase_Blog_3/index.md new file mode 100644 index 0000000..34cd836 --- /dev/null +++ b/content/posts/GSoC_Coding_Phase_Blog_3/index.md @@ -0,0 +1,43 @@ +--- +title: "GSoC Coding Phase 2 Blog 1" +date: 2020-07-11T19:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Progress Report for the first half of the Google Summer of Code 2020 Phase 2 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020's first evaluation is completed. I passed!!! Hurray! Now we are in the mid way of the second evaluation. This post discusses about the progress so far in the first two weeks of the second coding period from 30 June to 12 July 2020. + +## Completion of the matplotlib_baseline_images package + +We successfully created the matplotlib_baseline_images package. It contains the matplotlib and the matplotlib toolkit baseline images. Symlinking is done for the baseline images, related changes for Travis, appvoyer, azure pipelines etc. are functional and tests/test_data is created as discussed in the previous blog. PR is reviewed and suggested work is done. + +## Modular approach towards removal of matplotlib baseline images + +We have divide the work in two parts. The first part is the generation of the baseline images discussed below. The second part is the modification of the baseline images which happens when some baseline images gets modified due to `git push` or `git merge`. Modification of baseline images will be further divided into two sub tasks: addition of new baseline image and the deletion of the previous baseline image. This will be discussed in the second half of the second phase of the Google Summer of Code 2020. + +## Generation of the matplotlib baseline images + +After the changes proposed in the [previous PR](https://github.com/matplotlib/matplotlib/pull/17557), the developer will have no baseline images on fresh install of matplotlib. The developer would need to install the sub-wheel matplotlib_baseline_images package to get started with the testing part of the mpl. Now, we have started removing the use of the matplotlib_baseline_images package. It will require two steps as discussed above. +The images can be generated by the image comparison tests. Once these images are generated for the first time, then they can be used as the baseline images for the later times for comparison. This is the main principle adopted. The images are first created in the `result_images` directory. Then they will be moved to the `lib/matplotlib/tests/baseline_images` directory. Later on, running the pytests will start the image comparison. + +## Created commandline flags for baseline images creation + +I learned about the pytest hooks and fixtures. I build a command line flag `matplotlib_baseline_image_generation` which will create the baseline images in the `result_images` directory. The full command will be `python3 pytest --matplotlib_baseline_image_generation`. In order to do this, we have done changes in the `conftest.py` and also added markers to the `image_comparison` decorator. + +## Learning more about the Git and virtual environments + +I came to know about the git worktree and the scenarios in which we can use it. I also know more about virtual environments and their need in different scenarios. + +## Future Goals + +Once the generation of the baseline images is completed in the [current PR](https://github.com/matplotlib/matplotlib/pull/17793), we will move to the modification of the baseline images in the second half of the second coding phase. + +## Daily Meet-ups + +Monday to Thursday meeting initiated at [11:00pm IST](https://everytimezone.com/) via Zoom. Meeting notes are present at HackMD. + +I am grateful to be part of such a great community. Project is really interesting and challenging :) Thanks Thomas, Antony and Hannah for helping me so far. + \ No newline at end of file diff --git a/content/posts/GSoC_Coding_Phase_Blog_4/index.md b/content/posts/GSoC_Coding_Phase_Blog_4/index.md new file mode 100644 index 0000000..cb4d8f5 --- /dev/null +++ b/content/posts/GSoC_Coding_Phase_Blog_4/index.md @@ -0,0 +1,37 @@ +--- +title: "GSoC Coding Phase 2 Blog 2" +date: 2020-07-23T19:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Progress Report for the second half of the Google Summer of Code 2020 Phase 2 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020's second evaluation is about to complete. Now we are about to start with the final coding phase. This post discusses about the progress so far in the last two weeks of the second coding period from 13 July to 26 July 2020. + +## Modular approach towards removal of matplotlib baseline images + +We have divided the work in two parts as discussed in the [previous blog](https://matplotlib.org/matplotblog/posts/gsoc_coding_phase_blog_3/). The first part is the generation of the baseline images discussed below. The second part is the modification of the baseline images. The modification part will be implemented in the last phase of the Google Summer of Code 2020. + +## Generation of the matplotlib baseline images + + Now, we have started removing the use of the `matplotlib_baseline_images` package. After the changes proposed in the [previous PR](https://github.com/matplotlib/matplotlib/pull/17557), the developer will have no baseline images on fresh install of matplotlib. So, the developer would need to generate matplotlib baseline images locally to get started with the testing part of the mpl. +The images can be generated by the image comparison tests with use of `matplotlib_baseline_image_generation` flag from the command line. Once these images are generated for the first time, then they can be used as the baseline images for the later times for comparison. This is the main principle adopted. + +## Completion of the generation of images for the matplotlib directory + +We successfully created the `matplotlib_baseline_image_generation` flag in the beginning of the second evaluation but images were not created in the `baseline images` directory inside the `matplotlib` and `mpl_toolkits` directories, instead they were created in the `result_images` directory. So, we implemented this functionality. The images are created in the `lib/matplotlib/tests/baseline_images` directory directly now in the baseline image generation step. The baseline image generation step uses `python3 -mpytest lib/matplotlib --matplotlib_baseline_image_generation` command. Later on, running the pytests with `python3 -mpytest lib/matplotlib` will start the image comparison. + +Right now, the matplotlib_baseline_image_generation flag works for the matplotlib directory. We are trying to achieve the same functionality for the mpl_toolkits directory. + +## Future Goals + +Once the generation of the baseline images for `mpl_toolkits` directory is completed in the [current PR](https://github.com/matplotlib/matplotlib/pull/17793), we will move to the modification of the baseline images in the third coding phase. The addition of new baseline image and deletion of the old baseline image will also be implemented in the last phase of GSoC. Modification of baseline images will be further divided into two sub tasks: addition of new baseline image and the deletion of the previous baseline image. + + +## Daily Meet-ups + +Monday to Thursday meeting initiated at [11:00pm IST](https://everytimezone.com/) via Zoom. Meeting notes are present at HackMD. + +I am grateful to be part of such a great community. Project is really interesting and challenging :) Thanks Thomas, Antony and Hannah for helping me so far. \ No newline at end of file diff --git a/content/posts/GSoC_Coding_Phase_Blog_5/index.md b/content/posts/GSoC_Coding_Phase_Blog_5/index.md new file mode 100644 index 0000000..af5ee3a --- /dev/null +++ b/content/posts/GSoC_Coding_Phase_Blog_5/index.md @@ -0,0 +1,40 @@ +--- +title: "GSoC Coding Phase 3 Blog 1" +date: 2020-08-08T09:47:51+05:30 +draft: false +categories: ["News", "GSoC"] +description: "Progress Report for the first half of the Google Summer of Code 2020 Phase 3 for the Baseline Images Problem" +displayInList: true +author: Sidharth Bansal +--- + +Google Summer of Code 2020's second evaluation is completed. I passed!!! Hurray! Now we are in the mid way of the last evaluation. This post discusses about the progress so far in the first two weeks of the third coding period from 26 July to 9 August 2020. + +## Completion of the modification logic for the matplotlib_baseline_images package + +We successfully created the `matplotlib_baseline_image_generation` command line flag for baseline image generation for `matplotlib` and `mpl_toolkits` in the previous months. It was generating the matplotlib and the matplotlib toolkit baseline images successfully. Now, we modified the existing flow to generate any missing baseline images, which would be fetched from the `master` branch on doing `git pull` or `git checkout -b feature_branch`. + +We initially thought of creating a command line flag `generate_baseline_images_for_test "test_a,test_b"`, but later on analysis of the approach, we came to the conclusion that the developer will not know about the test names to be given along with the flag. So, we tried to generate the missing images by `generate_missing` without the test names. This worked successfully. + +## Adopting reusability and Do not Repeat Yourself (DRY) Principles + +Later, we refactored the `matplot_baseline_image_generation` and `generate_missing` command line flags to single command line flag `matplotlib_baseline_image_generation` as the logic was similar for both of them. Now, the image generation on the time of fresh install of matplotlib and the generation of missing baseline images works with the `python3 -pytest lib/matplotlib matplotlib_baseline_image_generation` for the `lib/matplotlib` folder and `python3 -pytest lib/mpl_toolkits matplotlib_baseline_image_generation` for the `lib/mpl_toolkits` folder. + +## Writing the documentation + +We have written documentation explaining the following scenarios: +1. How to generate the baseline images on a fresh install of matplotlib? +2. How to generate the missing baseline images on fetching changes from master? +3. How to install the `matplotlib_baseline_images_package` to be used for testing by the developer? +4. How to intentionally change an image? + +## Refactoring and improving the code quality before merging + +Right now, we are trying to refactor the code and maintain git clean history. The [current PR](https://github.com/matplotlib/matplotlib/pull/17793) is under review. I am working on the suggested changes. We are trying to merge this :) + +## Daily Meet-ups + +Monday to Thursday meeting initiated at [11:00pm IST](https://everytimezone.com/) via Zoom. Meeting notes are present at HackMD. + +I am grateful to be part of such a great community. Project is really interesting and challenging :) Thanks Thomas, Antony and Hannah for helping me so far. + \ No newline at end of file diff --git a/content/posts/animated-fractals/header_image.png b/content/posts/animated-fractals/header_image.png new file mode 100644 index 0000000..c2f669c Binary files /dev/null and b/content/posts/animated-fractals/header_image.png differ diff --git a/content/posts/animated-fractals/index.md b/content/posts/animated-fractals/index.md new file mode 100644 index 0000000..0a15463 --- /dev/null +++ b/content/posts/animated-fractals/index.md @@ -0,0 +1,260 @@ +--- +title: "Animate Your Own Fractals in Python with Matplotlib" +date: 2020-07-04T00:06:36+02:00 +draft: false +description: "Discover the bizarre geometry of the fractals and learn how to make an animated visualization of these +marvels using Python and the Matplotlib's Animation API." +categories: ["tutorials"] +displayInList: true +author: Vladimir Ilievski +resources: +- name: featuredImage + src: "header_image.png" + params: + description: "Julia Set Fractal" + showOnTop: true +--- + +Imagine zooming an image over and over and never go out of finer details. It may sound bizarre but the mathematical +concept of [fractals](https://en.wikipedia.org/wiki/Fractal) opens the realm towards this intricating infinity. This +strange geometry exhibits the same or similar patterns irrespectively of the scale. We can see one fractal example +in the image above. + +The *fractals* may seem difficult to understand due to their peculiarity, but that's not the case. As Benoit Mandelbrot, +one of the founding fathers of the fractal geometry said in his legendary +[TED Talk](https://www.ted.com/talks/benoit_mandelbrot_fractals_and_the_art_of_roughness?language=en): + + +> A surprising aspect is that the rules of this geometry are extremely short. You crank the formulas several times and +at the end, you get things like this (pointing to a stunning plot) +> +> -- Benoit Mandelbrot + +In this tutorial blog post, we will see how to construct fractals in Python and animate them using the amazing +*Matplotlib's* Animation API. First, we will demonstrate the convergence of the *Mandelbrot Set* with an +enticing animation. In the second part, we will analyze one interesting property of the *Julia Set*. Stay tuned! + +# Intuition + +We all have a common sense of the concept of similarity. We say two objects are similar to each other if they share +some common patterns. + +This notion is not only limited to a comparison of two different objects. We can also compare different parts of the +same object. For instance, a leaf. We know very well that the left side matches exactly the right side, i.e. the leaf +is symmetrical. + +In mathematics, this phenomenon is known as [self-similarity](https://en.wikipedia.org/wiki/Self-similarity). It means +a given object is similar (completely or to some extent) to some smaller part of itself. One remarkable example is the +[Koch Snowflake](https://isquared.digital/visualizations/2020-06-15-koch-curve/) as shown in the image below: + +![Koch Snowflake](snowflake.png) + +We can infinitely magnify some part of it and the same pattern will repeat over and over again. This is how fractal +geometry is defined. + +# Animated Mandelbrot Set + +[Mandelbrot Set](https://en.wikipedia.org/wiki/Mandelbrot_set) is defined over the set of *complex numbers*. It consists +of all complex numbers **c**, such that the sequence **zᵢ₊ᵢ = zᵢ² + c, z₀ = 0** is bounded. It means, after a certain +number of iterations the absolute value must not exceed a given limit. At first sight, it might +seem odd and simple, but in fact, it has some mind-blowing properties. + +The *Python* implementation is quite straightforward, as given in the code snippet below: + +```python +def mandelbrot(x, y, threshold): + """Calculates whether the number c = x + i*y belongs to the + Mandelbrot set. In order to belong, the sequence z[i + 1] = z[i]**2 + c + must not diverge after 'threshold' number of steps. The sequence diverges + if the absolute value of z[i+1] is greater than 4. + + :param float x: the x component of the initial complex number + :param float y: the y component of the initial complex number + :param int threshold: the number of iterations to considered it converged + """ + # initial conditions + c = complex(x, y) + z = complex(0, 0) + + for i in range(threshold): + z = z**2 + c + if abs(z) > 4.: # it diverged + return i + + return threshold - 1 # it didn't diverge +``` + +As we can see, we set the maximum number of iterations encoded in the variable `threshold`. If the magnitude of the +sequence at some iteration exceeds **4**, we consider it as diverged (**c** does not belong to the set) and return the +iteration number at which this occurred. If this never happens (**c** belongs to the set), we return the maximum +number of iterations. + +We can use the information about the number of iterations before the sequence diverges. All we have to do +is to associate this number to a color relative to the maximum number of loops. Thus, for all complex numbers +**c** in some lattice of the complex plane, we can make a nice animation of the convergence process as a function +of the maximum allowed iterations. + +One particular and interesting area is the *3x3* lattice starting at position -2 and -1.5 for the *real* and +*imaginary* axis respectively. We can observe the process of convergence as the number of allowed iterations increases. +This is easily achieved using the *Matplotlib's* Animation API, as shown with the following code: + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation + +x_start, y_start = -2, -1.5 # an interesting region starts here +width, height = 3, 3 # for 3 units up and right +density_per_unit = 250 # how many pixles per unit + +# real and imaginary axis +re = np.linspace(x_start, x_start + width, width * density_per_unit ) +im = np.linspace(y_start, y_start + height, height * density_per_unit) + +fig = plt.figure(figsize=(10, 10)) # instantiate a figure to draw +ax = plt.axes() # create an axes object + +def animate(i): + ax.clear() # clear axes object + ax.set_xticks([], []) # clear x-axis ticks + ax.set_yticks([], []) # clear y-axis ticks + + X = np.empty((len(re), len(im))) # re-initialize the array-like image + threshold = round(1.15**(i + 1)) # calculate the current threshold + + # iterations for the current threshold + for i in range(len(re)): + for j in range(len(im)): + X[i, j] = mandelbrot(re[i], im[j], threshold) + + # associate colors to the iterations with an iterpolation + img = ax.imshow(X.T, interpolation="bicubic", cmap='magma') + return [img] + +anim = animation.FuncAnimation(fig, animate, frames=45, interval=120, blit=True) +anim.save('mandelbrot.gif',writer='imagemagick') +``` + +We make animations in *Matplotlib* using the `FuncAnimation` function from the *Animation* API. We need to specify +the `figure` on which we draw a predefined number of consecutive `frames`. A predetermined `interval` expressed in +milliseconds defines the delay between the frames. + +In this context, the `animate` function plays a central role, where the input argument is the frame number, starting +from 0. It means, in order to animate we always have to think in terms of frames. Hence, we use the frame number +to calculate the variable `threshold` which is the maximum number of allowed iterations. + +To represent our lattice we instantiate two arrays `re` and `im`: the former for the values on the *real* axis +and the latter for the values on the *imaginary* axis. The number of elements in these two arrays is defined by +the variable `density_per_unit` which defines the number of samples per unit step. The higher it is, the better +quality we get, but at a cost of heavier computation. + +Now, depending on the current `threshold`, for every complex number **c** in our lattice, we calculate the number of +iterations before the sequence **zᵢ₊ᵢ = zᵢ² + c, z₀ = 0** diverges. We save them in an initially empty matrix called `X`. +In the end, we *interpolate* the values in `X` and assign them a color drawn from a prearranged *colormap*. + +After cranking the `animate` function multiple times we get a stunning animation as depicted below: + +![Mandelbrot set animation](mandelbrot.gif) + +# Animated Julia Set + +The [Julia Set](https://en.wikipedia.org/wiki/Julia_set) is quite similar to the *Mandelbrot Set*. Instead of setting +**z₀ = 0** and testing whether for some complex number **c = x + i\*y** the sequence **zᵢ₊ᵢ = zᵢ² + c** is bounded, we +switch the roles a bit. We fix the value for **c**, we set an arbitrary initial condition **z₀ = x + i\*y**, and we +observe the convergence of the sequence. The *Python* implementation is given below: + +```python +def julia_quadratic(zx, zy, cx, cy, threshold): + """Calculates whether the number z[0] = zx + i*zy with a constant c = x + i*y + belongs to the Julia set. In order to belong, the sequence + z[i + 1] = z[i]**2 + c, must not diverge after 'threshold' number of steps. + The sequence diverges if the absolute value of z[i+1] is greater than 4. + + :param float zx: the x component of z[0] + :param float zy: the y component of z[0] + :param float cx: the x component of the constant c + :param float cy: the y component of the constant c + :param int threshold: the number of iterations to considered it converged + """ + # initial conditions + z = complex(zx, zy) + c = complex(cx, cy) + + for i in range(threshold): + z = z**2 + c + if abs(z) > 4.: # it diverged + return i + + return threshold - 1 # it didn't diverge +``` + +Obviously, the setup is quite similar as the *Mandelbrot Set* implementation. The maximum number of iterations is +denoted as `threshold`. If the magnitude of the sequence is never greater than **4**, the number **z₀** belongs to +the *Julia Set* and vice-versa. + +The number **c** is giving us the freedom to analyze its impact on the convergence of the sequence, given that the +number of maximum iterations is fixed. One interesting range of values for **c** is for **c = r cos α + i × r sin α** +such that **r=0.7885** and **α ∈ \[0, 2π\]**. + +The best possible way to make this analysis is to create an animated visualization as the number **c** changes. +This [ameliorates our visual perception](https://isquared.digital/blog/2020-02-08-interactive-dataviz/) and +understanding of such abstract phenomena in a captivating manner. To do so, we use the Matplotlib's *Animation API*, as +demonstrated in the code below: + +```python +import numpy as np +import matplotlib.pyplot as plt +import matplotlib.animation as animation + +x_start, y_start = -2, -2 # an interesting region starts here +width, height = 4, 4 # for 4 units up and right +density_per_unit = 200 # how many pixles per unit + +# real and imaginary axis +re = np.linspace(x_start, x_start + width, width * density_per_unit ) +im = np.linspace(y_start, y_start + height, height * density_per_unit) + + +threshold = 20 # max allowed iterations +frames = 100 # number of frames in the animation + +# we represent c as c = r*cos(a) + i*r*sin(a) = r*e^{i*a} +r = 0.7885 +a = np.linspace(0, 2*np.pi, frames) + +fig = plt.figure(figsize=(10, 10)) # instantiate a figure to draw +ax = plt.axes() # create an axes object + +def animate(i): + ax.clear() # clear axes object + ax.set_xticks([], []) # clear x-axis ticks + ax.set_yticks([], []) # clear y-axis ticks + + X = np.empty((len(re), len(im))) # the initial array-like image + cx, cy = r * np.cos(a[i]), r * np.sin(a[i]) # the initial c number + + # iterations for the given threshold + for i in range(len(re)): + for j in range(len(im)): + X[i, j] = julia_quadratic(re[i], im[j], cx, cy, threshold) + + img = ax.imshow(X.T, interpolation="bicubic", cmap='magma') + return [img] + +anim = animation.FuncAnimation(fig, animate, frames=frames, interval=50, blit=True) +anim.save('julia_set.gif', writer='imagemagick') +``` + +The logic in the `animate` function is very similar to the previous example. We update the number **c** as a function +of the frame number. Based on that we estimate the convergence of all complex numbers in the defined lattice, given the +fixed `threshold` of allowed iterations. Same as before, we save the results in an initially empty matrix `X` and +associate them to a color relative to the maximum number of iterations. The resulting animation is illustrated below: + +![Julia Set Animation](julia_set.gif) + + +# Summary + +The fractals are really mind-gobbling structures as we saw during this blog. First, we gave a general intuition +of the fractal geometry. Then, we observed two types of fractals: the *Mandelbrot* and *Julia* sets. We implemented +them in Python and made interesting animated visualizations of their properties. diff --git a/content/posts/animated-fractals/julia_set.gif b/content/posts/animated-fractals/julia_set.gif new file mode 100644 index 0000000..aeaab85 Binary files /dev/null and b/content/posts/animated-fractals/julia_set.gif differ diff --git a/content/posts/animated-fractals/mandelbrot.gif b/content/posts/animated-fractals/mandelbrot.gif new file mode 100644 index 0000000..19c0cb1 Binary files /dev/null and b/content/posts/animated-fractals/mandelbrot.gif differ diff --git a/content/posts/animated-fractals/snowflake.png b/content/posts/animated-fractals/snowflake.png new file mode 100644 index 0000000..023d596 Binary files /dev/null and b/content/posts/animated-fractals/snowflake.png differ diff --git a/content/posts/book/book-cover.png b/content/posts/book/book-cover.png new file mode 100644 index 0000000..443f56a Binary files /dev/null and b/content/posts/book/book-cover.png differ diff --git a/content/posts/book/book-gallery.png b/content/posts/book/book-gallery.png new file mode 100644 index 0000000..18b0255 Binary files /dev/null and b/content/posts/book/book-gallery.png differ diff --git a/content/posts/book/book.png b/content/posts/book/book.png new file mode 100644 index 0000000..74983e2 Binary files /dev/null and b/content/posts/book/book.png differ diff --git a/content/posts/book/index.md b/content/posts/book/index.md new file mode 100644 index 0000000..f04aefe --- /dev/null +++ b/content/posts/book/index.md @@ -0,0 +1,26 @@ +--- +title: "Newly released open access book" +date: 2021-11-15T14:26:51+01:00 +draft: false +description: "New open access book released" +categories: ["News"] +displayInList: true +author: Nicolas P. Rougier +resources: +- name: featuredImage + src: "book-cover.png" + params: + description: "Book cover" + showOnTop: true +--- + +It's my great pleasure to announce that I've finished my book on matplotlib and it is now freely available at [www.labri.fr/perso/nrougier/scientific-visualization.html](https://www.labri.fr/perso/nrougier/scientific-visualization.html) while sources for the book are hosted at [github.com/rougier/scientific-visualization-book](https://github.com/rougier/scientific-visualization-book). + +## Abstract + +The Python scientific visualisation landscape is huge. It is composed of a myriad of tools, ranging from the most versatile and widely used down to the more specialised and confidential. Some of these tools are community based while others are developed by companies. Some are made specifically for the web, others are for the desktop only, some deal with 3D and large data, while others target flawless 2D rendering. In this landscape, Matplotlib has a very special place. It is a versatile and powerful library that allows you to design very high quality figures, suitable for scientific publishing. It also offers a simple and intuitive interface as well as an object oriented architecture that allows you to tweak anything within a figure. Finally, it can be used as a regular graphic library in order to design non‐scientific figures. This book is organized into four parts. The first part considers the fundamental principles of the Matplotlib library. This includes reviewing the different parts that constitute a figure, the different coordinate systems, the available scales and projections, and we’ll also introduce a few concepts related to typography and colors. The second part is dedicated to the actual design of a figure. After introducing some simple rules for generating better figures, we’ll then go on to explain the Matplotlib defaults and styling system before diving on into figure layout organization. We’ll then explore the different types of plot available and see how a figure can be ornamented with different elements. The third part is dedicated to more advanced concepts, namely 3D figures, optimization & animation. The fourth and final part is a collection of showcases. + +### Book gallery + +![](book-gallery.png) + diff --git a/content/posts/codeswitching-visualization/.ipynb_checkpoints/index-checkpoint.md b/content/posts/codeswitching-visualization/.ipynb_checkpoints/index-checkpoint.md new file mode 100644 index 0000000..040f522 --- /dev/null +++ b/content/posts/codeswitching-visualization/.ipynb_checkpoints/index-checkpoint.md @@ -0,0 +1,153 @@ +--- +title: "Visualizing Code-Switching with Step Charts in Matplotlib" +date: 2020-08-25T12:33:20-07:00 +description: "Learn how to easily create step charts through examining the multilingualism of pop group WayV" +categories: ["tutorials", "graphs"] +author: J (a.k.a. WayV Subs & Translations) +displayInList: true +draft: false + +resources: +- name: featuredImage + src: "Image1.png" + params: + showOnTop: false + +--- + +![](Image1.png) + +# Introduction + +Code-switching is the practice of alternating between two or more languages in the context of a single conversation, either consciously or unconsciously. As someone who grew up bilingual and is currently learning other languages, I find code-switching a fascinating facet of communication from not only a purely linguistic perspective, but also a social one. In particular, I've personally found that code-switching often helps build a sense of community and familiarity in a group and that the unique ways in which speakers code-switch with each other greatly contribute to shaping group dynamics. + +This is something that's evident in seven-member pop boy group WayV. Aside from their discography, artistry, and group chemistry, WayV is well-known among fans and many non-fans alike for their multilingualism and code-switching, which many fans have affectionately coined as "WayV language." Every member in the group is fluent in both Mandarin and Korean, and at least one member in the group is fluent in one or more of the following: English, Cantonese, Thai, Wenzhounese, and German. It's an impressive trait that's become a trademark of WayV as they've quickly drawn a global audience since their debut in January 2019. Their multilingualism is reflected in their music as well. On top of their regular album releases in Mandarin, WayV has also released singles in Korean and English, with their latest single "Bad Alive (English Ver.)" being a mix of English, Korean, and Mandarin. + +As an independent translator who translates WayV content into English, I've become keenly aware of the true extent and rate of WayV's code-switching when communicating with each other. In a lot of their content, WayV frequently switches between three or more languages every couple of seconds, a phenomenon that can make translating quite challenging at times, but also extremely rewarding and fun. I wanted to be able to present this aspect of WayV in a way that would both highlight their linguistic skills and present this dimension of their group dynamic in a more concrete, quantitative, and visually intuitive manner, beyond just stating that "they code-switch a lot." This prompted me to make step charts - perfect for displaying data that changes at irregular intervals but remains constant between the changes - in hopes of enriching the viewer's experience and helping make a potentially abstract concept more understandable and readily consumable. With a step chart, it becomes more apparent to the viewer the extent of how a group communicates, and cross-sections of the graph allow a rudimentary look into how multilinguals influence each other in code-switching. + +# Tutorial +This tutorial on creating step charts uses one of WayV's livestreams as an example. There were four members in this livestream and a total of eight languages/dialects spoken. I will go through the basic steps of creating a step chart that depicts the frequency of code-switching for just one member. A full code chunk that shows how to layer two or more step chart lines in one graph to depict code-switching for multiple members can be found near the end. + +## Dataset +First, we import the required libaries and load the data into a Pandas dataframe. + + import pandas as pd + import matplotlib.pyplot as plt + import seaborn as sns + +This dataset includes the timestamp of every switch (in seconds) and the language of switch for one speaker. + + df_h = pd.read_csv("WayVHendery.csv") + HENDERY = df_h.reset_index() + HENDERY.head() + + +| index | time | lang | +| ---- |----|----| +| 0 | 2 | ENG | +| 1 | 3 | KOR | +| 2 | 10 | ENG | +| 3 | 13 | MAND| +| 4 | 15 | ENG | + + +## Plotting +With the dataset loaded, we can now set up our graph in terms of determining the size of the figure, dpi, font size, and axes limits. We can also play around with the aesthetics, such as modifying the colors of our plot. These few simple steps easily transform the default all-white graph into a more visually appealing one. + +### Without Customization + fig, ax = plt.subplots(figsize = (20,12)) + +![](fig1.png) + +### With Customization + + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + fig, ax = plt.subplots(figsize = (20,12), dpi = 300) + + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + +![](fig2.png) + + + + +Following this, we can make our step chart line easily with matplotlib.pyplot.step, in which we plot the x and y values and determine the text of the legend, color of the step chart line, and width of the step chart line. + + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + +![](fig3.png) + +## Labeling +Of course, we want to know not only how many switches there were and when they occurred, but also to what language the member switched. For this, we can write a for loop that labels each switch with its respective language as recorded in our dataset. + + for x,y,z in zip(HENDERY["time"], HENDERY["index"], HENDERY["lang"]): + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + +![](fig4.png) + +## Final Touches +Now add a title, save the graph, and there you have it! + + plt.title("WayV Livestream Code-Switching", fontsize = 35) + + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +Below is the complete code for layering step chart lines for multiple speakers in one graph. You can see how easy it is to take the code for visualizing the code-switching of one speaker and adapt it to visualizing that of multiple speakers. In addition, you can see that I've intentionally left the title blank so I can incorporate external graphic adjustments after I created the chart in Matplotlib, such as the addition of my social media handle and the use of a specific font I wanted, which you can see in the final graph. With visualizations being all about communicating information, I believe using Matplotlib in conjunction with simple elements of graphic design can be another way to make whatever you're presenting that little bit more effective and personal, especially when you're doing so on social media platforms. + +## Complete Code for Step Chart of Multiple Speakers + + + # Initialize graph color and size + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + + fig, ax = plt.subplots(figsize = (20,12), dpi = 120) + + # Set up axes and labels + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + + # Layer step charts for each speaker + ax.step(YANGYANG.time, YANGYANG.index, label = "YANGYANG", color = "firebrick", linewidth = 4) + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + ax.step(TEN.time, TEN.index, label = "TEN", color = "mediumpurple", linewidth = 4) + ax.step(KUN.time, KUN.index, label = "KUN", color = "mediumblue", linewidth = 4) + + # Add legend + ax.legend(fontsize = 17) + + # Label each data point with the language switch + for i in (KUN, TEN, HENDERY, YANGYANG): #for each dataset + for x,y,z in zip(i["time"], i["index"], i["lang"]): #looping within the dataset + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + + # Add title (blank to leave room for external graphics) + plt.title("\n\n", fontsize = 35) + + # Save figure + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +![](Image1.png) +Languages/dialects: Korean (KOR), English (ENG), Mandarin (MAND), German (GER), Cantonese (CANT), Hokkien (HOKK), Teochew (TEO), Thai (THAI) + +186 total switches! That's approximately one code-switch in the group every 2.95 seconds. + +And voilà! There you have it: a brief guide on how to make step charts. While I utilized step charts here to visualize code-switching, you can use them to visualize whatever data you would like. Please feel free to contact me [here](https://twitter.com/WayVSubs2019) if you have any questions or comments. I hope you enjoyed this tutorial, and thank you so much for reading! \ No newline at end of file diff --git a/content/posts/codeswitching-visualization/Image1.png b/content/posts/codeswitching-visualization/Image1.png new file mode 100644 index 0000000..9329c0e Binary files /dev/null and b/content/posts/codeswitching-visualization/Image1.png differ diff --git a/content/posts/codeswitching-visualization/Image3.png b/content/posts/codeswitching-visualization/Image3.png new file mode 100644 index 0000000..9329c0e Binary files /dev/null and b/content/posts/codeswitching-visualization/Image3.png differ diff --git a/content/posts/codeswitching-visualization/fig1.png b/content/posts/codeswitching-visualization/fig1.png new file mode 100644 index 0000000..4fc9754 Binary files /dev/null and b/content/posts/codeswitching-visualization/fig1.png differ diff --git a/content/posts/codeswitching-visualization/fig2.png b/content/posts/codeswitching-visualization/fig2.png new file mode 100644 index 0000000..124f26e Binary files /dev/null and b/content/posts/codeswitching-visualization/fig2.png differ diff --git a/content/posts/codeswitching-visualization/fig3.png b/content/posts/codeswitching-visualization/fig3.png new file mode 100644 index 0000000..f4848f1 Binary files /dev/null and b/content/posts/codeswitching-visualization/fig3.png differ diff --git a/content/posts/codeswitching-visualization/fig4.png b/content/posts/codeswitching-visualization/fig4.png new file mode 100644 index 0000000..d5026b3 Binary files /dev/null and b/content/posts/codeswitching-visualization/fig4.png differ diff --git a/content/posts/codeswitching-visualization/fig5.png b/content/posts/codeswitching-visualization/fig5.png new file mode 100644 index 0000000..d0d5d5a Binary files /dev/null and b/content/posts/codeswitching-visualization/fig5.png differ diff --git a/content/posts/codeswitching-visualization/index.md b/content/posts/codeswitching-visualization/index.md new file mode 100644 index 0000000..5c91817 --- /dev/null +++ b/content/posts/codeswitching-visualization/index.md @@ -0,0 +1,153 @@ +--- +title: "Visualizing Code-Switching with Step Charts" +date: 2020-09-26T19:41:21-07:00 +description: "Learn how to easily create step charts through examining the multilingualism of pop group WayV" +categories: ["tutorials", "graphs"] +author: J (a.k.a. WayV Subs & Translations) +displayInList: true +draft: false + +resources: +- name: featuredImage + src: "Image1.png" + params: + showOnTop: false + +--- + +![](Image1.png) + +# Introduction + +Code-switching is the practice of alternating between two or more languages in the context of a single conversation, either consciously or unconsciously. As someone who grew up bilingual and is currently learning other languages, I find code-switching a fascinating facet of communication from not only a purely linguistic perspective, but also a social one. In particular, I've personally found that code-switching often helps build a sense of community and familiarity in a group and that the unique ways in which speakers code-switch with each other greatly contribute to shaping group dynamics. + +This is something that's evident in seven-member pop boy group WayV. Aside from their discography, artistry, and group chemistry, WayV is well-known among fans and many non-fans alike for their multilingualism and code-switching, which many fans have affectionately coined as "WayV language." Every member in the group is fluent in both Mandarin and Korean, and at least one member in the group is fluent in one or more of the following: English, Cantonese, Thai, Wenzhounese, and German. It's an impressive trait that's become a trademark of WayV as they've quickly drawn a global audience since their debut in January 2019. Their multilingualism is reflected in their music as well. On top of their regular album releases in Mandarin, WayV has also released singles in Korean and English, with their latest single "Bad Alive (English Ver.)" being a mix of English, Korean, and Mandarin. + +As an independent translator who translates WayV content into English, I've become keenly aware of the true extent and rate of WayV's code-switching when communicating with each other. In a lot of their content, WayV frequently switches between three or more languages every couple of seconds, a phenomenon that can make translating quite challenging at times, but also extremely rewarding and fun. I wanted to be able to present this aspect of WayV in a way that would both highlight their linguistic skills and present this dimension of their group dynamic in a more concrete, quantitative, and visually intuitive manner, beyond just stating that "they code-switch a lot." This prompted me to make step charts - perfect for displaying data that changes at irregular intervals but remains constant between the changes - in hopes of enriching the viewer's experience and helping make a potentially abstract concept more understandable and readily consumable. With a step chart, it becomes more apparent to the viewer the extent of how a group communicates, and cross-sections of the graph allow a rudimentary look into how multilinguals influence each other in code-switching. + +# Tutorial +This tutorial on creating step charts uses one of WayV's livestreams as an example. There were four members in this livestream and a total of eight languages/dialects spoken. I will go through the basic steps of creating a step chart that depicts the frequency of code-switching for just one member. A full code chunk that shows how to layer two or more step chart lines in one graph to depict code-switching for multiple members can be found near the end. + +## Dataset +First, we import the required libraries and load the data into a Pandas dataframe. + + import pandas as pd + import matplotlib.pyplot as plt + import seaborn as sns + +This dataset includes the timestamp of every switch (in seconds) and the language of switch for one speaker. + + df_h = pd.read_csv("WayVHendery.csv") + HENDERY = df_h.reset_index() + HENDERY.head() + + +| index | time | lang | +| ---- |----|----| +| 0 | 2 | ENG | +| 1 | 3 | KOR | +| 2 | 10 | ENG | +| 3 | 13 | MAND| +| 4 | 15 | ENG | + + +## Plotting +With the dataset loaded, we can now set up our graph in terms of determining the size of the figure, dpi, font size, and axes limits. We can also play around with the aesthetics, such as modifying the colors of our plot. These few simple steps easily transform the default all-white graph into a more visually appealing one. + +### Without Customization + fig, ax = plt.subplots(figsize = (20,12)) + +![](fig1.png) + +### With Customization + + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + fig, ax = plt.subplots(figsize = (20,12), dpi = 300) + + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + +![](fig2.png) + + + + +Following this, we can make our step chart line easily with matplotlib.pyplot.step, in which we plot the x and y values and determine the text of the legend, color of the step chart line, and width of the step chart line. + + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + +![](fig3.png) + +## Labeling +Of course, we want to know not only how many switches there were and when they occurred, but also to what language the member switched. For this, we can write a for loop that labels each switch with its respective language as recorded in our dataset. + + for x,y,z in zip(HENDERY["time"], HENDERY["index"], HENDERY["lang"]): + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + +![](fig4.png) + +## Final Touches +Now add a title, save the graph, and there you have it! + + plt.title("WayV Livestream Code-Switching", fontsize = 35) + + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +Below is the complete code for layering step chart lines for multiple speakers in one graph. You can see how easy it is to take the code for visualizing the code-switching of one speaker and adapt it to visualizing that of multiple speakers. In addition, you can see that I've intentionally left the title blank so I can incorporate external graphic adjustments after I created the chart in Matplotlib, such as the addition of my social media handle and the use of a specific font I wanted, which you can see in the final graph. With visualizations being all about communicating information, I believe using Matplotlib in conjunction with simple elements of graphic design can be another way to make whatever you're presenting that little bit more effective and personal, especially when you're doing so on social media platforms. + +## Complete Code for Step Chart of Multiple Speakers + + + # Initialize graph color and size + sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'}) + + fig, ax = plt.subplots(figsize = (20,12), dpi = 120) + + # Set up axes and labels + plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18) + plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18) + + plt.xlim(0, 570) + plt.ylim(0, 85) + + # Layer step charts for each speaker + ax.step(YANGYANG.time, YANGYANG.index, label = "YANGYANG", color = "firebrick", linewidth = 4) + ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4) + ax.step(TEN.time, TEN.index, label = "TEN", color = "mediumpurple", linewidth = 4) + ax.step(KUN.time, KUN.index, label = "KUN", color = "mediumblue", linewidth = 4) + + # Add legend + ax.legend(fontsize = 17) + + # Label each data point with the language switch + for i in (KUN, TEN, HENDERY, YANGYANG): #for each dataset + for x,y,z in zip(i["time"], i["index"], i["lang"]): #looping within the dataset + label = z + ax.annotate(label, #text + (x,y), #label coordinate + textcoords = "offset points", #how to position text + xytext = (15,-5), #distance from text to coordinate (x,y) + ha = "center", #alignment + fontsize = 8.5) #font size of text + + # Add title (blank to leave room for external graphics) + plt.title("\n\n", fontsize = 35) + + # Save figure + fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor()) + +![](Image1.png) +Languages/dialects: Korean (KOR), English (ENG), Mandarin (MAND), German (GER), Cantonese (CANT), Hokkien (HOKK), Teochew (TEO), Thai (THAI) + +186 total switches! That's approximately one code-switch in the group every 2.95 seconds. + +And voilà! There you have it: a brief guide on how to make step charts. While I utilized step charts here to visualize code-switching, you can use them to visualize whatever data you would like. Please feel free to contact me [here](https://twitter.com/WayVSubs2019) if you have any questions or comments. I hope you enjoyed this tutorial, and thank you so much for reading! diff --git a/content/posts/elementary-cellular-automata/ca-bar.png b/content/posts/elementary-cellular-automata/ca-bar.png new file mode 100644 index 0000000..a608eee Binary files /dev/null and b/content/posts/elementary-cellular-automata/ca-bar.png differ diff --git a/content/posts/elementary-cellular-automata/ca-thumb.png b/content/posts/elementary-cellular-automata/ca-thumb.png new file mode 100644 index 0000000..2fa60dc Binary files /dev/null and b/content/posts/elementary-cellular-automata/ca-thumb.png differ diff --git a/content/posts/elementary-cellular-automata/index.md b/content/posts/elementary-cellular-automata/index.md new file mode 100644 index 0000000..9d528a5 --- /dev/null +++ b/content/posts/elementary-cellular-automata/index.md @@ -0,0 +1,294 @@ +--- +title: "Elementary Cellular Automata" +date: 2020-07-14T15:48:23-04:00 +draft: false +description: "A brief tour through the world of elementary cellular automata" +categories: ["tutorials"] +displayInList: true +author: Eitan Lees +resources: +- name: featuredImage + src: "ca-thumb.png" + params: + description: "Rule 110" + showOnTop: false +--- + +[Cellular automata](https://en.wikipedia.org/wiki/Cellular_automaton) are discrete models, typically on a grid, which evolve in time. Each grid cell has a finite state, such as 0 or 1, which is updated based on a certain set of rules. A specific cell uses information of the surrounding cells, called it's _neighborhood_, to determine what changes should be made. In general cellular automata can be defined in any number of dimensions. A famous two dimensional example is [Conway's Game of Life](https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life) in which cells "live" and "die", sometimes producing beautiful patterns. + + +In this post we will be looking at a one dimensional example known as [elementary cellular automaton](https://en.wikipedia.org/wiki/Elementary_cellular_automaton), popularized by [Stephen Wolfram](https://en.wikipedia.org/wiki/Stephen_Wolfram) in the 1980s. + +![](./ca-bar.png) + +Imagine a row of cells, arranged side by side, each of which is colored black or white. We label black cells 1 and white cells 0, resulting in an array of bits. As an example lets consider a random array of 20 bits. + + +```python +import numpy as np + +rng = np.random.RandomState(42) +data = rng.randint(0, 2, 20) + +print(data) +``` + + [0 1 0 0 0 1 0 0 0 1 0 0 0 0 1 0 1 1 1 0] + + +To update the state of our cellular automaton we will need to define a set of rules. +A given cell \\(C\\) only knows about the state of it's left and right neighbors, labeled \\(L\\) and \\(R\\) respectively. We can define a function or rule, \\(f(L, C, R)\\), which maps the cell state to either 0 or 1. + +Since our input cells are binary values there are \\(2^3=8\\) possible inputs into the function. + + +```python +for i in range(8): + print(np.binary_repr(i, 3)) +``` + + 000 + 001 + 010 + 011 + 100 + 101 + 110 + 111 + + +For each input triplet, we can assign 0 or 1 to the output. The output of \\(f\\) is the value which will replace the current cell \\(C\\) in the next time step. In total there are \\(2^{2^3} = 2^8 = 256\\) possible rules for updating a cell. Stephen Wolfram introduced a naming convention, now known as the [Wolfram Code](https://en.wikipedia.org/wiki/Wolfram_code), for the update rules in which each rule is represented by an 8 bit binary number. + +For example "Rule 30" could be constructed by first converting to binary and then building an array for each bit + + +```python +rule_number = 30 +rule_string = np.binary_repr(rule_number, 8) +rule = np.array([int(bit) for bit in rule_string]) +print(rule) +``` + + [0 0 0 1 1 1 1 0] + + +By convention the Wolfram code associates the leading bit with '111' and the final bit with '000'. For rule 30 the relationship between the input, rule index and output is as follows: + + +```python +for i in range(8): + triplet = np.binary_repr(i, 3) + print(f"input:{triplet}, index:{7-i}, output {rule[7-i]}") +``` + + input:000, index:7, output 0 + input:001, index:6, output 1 + input:010, index:5, output 1 + input:011, index:4, output 1 + input:100, index:3, output 1 + input:101, index:2, output 0 + input:110, index:1, output 0 + input:111, index:0, output 0 + + +We can define a function which maps the input cell information with the associated rule index. Essentially we are converting the binary input to decimal and adjusting the index range. + + +```python +def rule_index(triplet): + L, C, R = triplet + index = 7 - (4*L + 2*C + R) + return int(index) +``` + +Now we can take in any input and look up the output based on our rule, for example: + + +```python +rule[rule_index((1, 0, 1))] +``` + + + + + 0 + + + +Finally, we can use Numpy to create a data structure containing all the triplets for our state array and apply the function across the appropriate axis to determine our new state. + + +```python +all_triplets = np.stack([ + np.roll(data, 1), + data, + np.roll(data, -1)] +) +new_data = rule[np.apply_along_axis(rule_index, 0, all_triplets)] +print(new_data) +``` + + [1 1 1 0 1 1 1 0 1 1 1 0 0 1 1 0 1 0 0 1] + + +That is the process for a single update of our cellular automata. + +To do many updates and record the state over time, we will create a function. + + +```python +def CA_run(initial_state, n_steps, rule_number): + rule_string = np.binary_repr(rule_number, 8) + rule = np.array([int(bit) for bit in rule_string]) + + m_cells = len(initial_state) + CA_run = np.zeros((n_steps, m_cells)) + CA_run[0, :] = initial_state + + for step in range(1, n_steps): + all_triplets = np.stack( + [ + np.roll(CA_run[step - 1, :], 1), + CA_run[step - 1, :], + np.roll(CA_run[step - 1, :], -1), + ] + ) + CA_run[step, :] = rule[np.apply_along_axis(rule_index, 0, all_triplets)] + + return CA_run +``` + + +```python +initial = np.array([0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0]) +data = CA_run(initial, 10, 30) +print(data) +``` + + [[0. 1. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 0. 1. 1. 1. 0.] + [1. 1. 1. 0. 1. 1. 1. 0. 1. 1. 1. 0. 0. 1. 1. 0. 1. 0. 0. 1.] + [0. 0. 0. 0. 1. 0. 0. 0. 1. 0. 0. 1. 1. 1. 0. 0. 1. 1. 1. 1.] + [1. 0. 0. 1. 1. 1. 0. 1. 1. 1. 1. 1. 0. 0. 1. 1. 1. 0. 0. 0.] + [1. 1. 1. 1. 0. 0. 0. 1. 0. 0. 0. 0. 1. 1. 1. 0. 0. 1. 0. 1.] + [0. 0. 0. 0. 1. 0. 1. 1. 1. 0. 0. 1. 1. 0. 0. 1. 1. 1. 0. 1.] + [1. 0. 0. 1. 1. 0. 1. 0. 0. 1. 1. 1. 0. 1. 1. 1. 0. 0. 0. 1.] + [0. 1. 1. 1. 0. 0. 1. 1. 1. 1. 0. 0. 0. 1. 0. 0. 1. 0. 1. 1.] + [0. 1. 0. 0. 1. 1. 1. 0. 0. 0. 1. 0. 1. 1. 1. 1. 1. 0. 1. 0.] + [1. 1. 1. 1. 1. 0. 0. 1. 0. 1. 1. 0. 1. 0. 0. 0. 0. 0. 1. 1.]] + + +## Let's Get Visual + +For larger simulations, interesting patterns start to emerge. To visualize our simulation results we will use the `ax.matshow` function. + + +```python +import matplotlib.pyplot as plt +plt.rcParams['image.cmap'] = 'binary' + +rng = np.random.RandomState(0) +data = CA_run(rng.randint(0, 2, 300), 150, 30) + +fig, ax = plt.subplots(figsize=(16, 9)) +ax.matshow(data) +ax.axis(False); +``` + + +![png](output_18_0.png) + + +## Learning the Rules + +With the code set up to produce the simulation, we can now start to explore the properties of these different rules. Wolfram separated the rules into four classes which are outlined below. + + +```python +def plot_CA_class(rule_list, class_label): + rng = np.random.RandomState(seed=0) + fig, axs = plt.subplots(1, len(rule_list),figsize=(10, 3.5), constrained_layout=True) + initial = rng.randint(0, 2, 100) + + for i, ax in enumerate(axs.ravel()): + data = CA_run(initial, 100, rule_list[i]) + ax.set_title(f'Rule {rule_list[i]}') + ax.matshow(data) + ax.axis(False) + + fig.suptitle(class_label, fontsize=16) + + return fig, ax +``` + +### Class One +Cellular automata which rapidly converge to a uniform state + + +```python +_ = plot_CA_class([4, 32, 172], 'Class One') +``` + + +![png](output_22_0.png) + + +### Class Two +Cellular automata which rapidly converge to a repetitive or stable state + + +```python +_ = plot_CA_class([50, 108, 173], 'Class Two') +``` + + +![png](output_24_0.png) + + +### Class Three + +Cellular automata which appear to remain in a random state + + +```python +_ = plot_CA_class([60, 106, 150], 'Class Three') +``` + + +![png](output_26_0.png) + + +### Class Four + +Cellular automata which form areas of repetitive or stable states, but also form structures that interact with each other in complicated ways. + + +```python +_ = plot_CA_class([54, 62, 110], 'Class Four') +``` + + +![png](output_28_0.png) + + +Amazingly, the interacting structures which emerge from rule 110 has been shown to be capable of [universal computation](https://en.wikipedia.org/wiki/Turing_machine). + +In all the examples above a random initial state was used, but another interesting case is when a single 1 is initialized with all other values set to zero. + + +```python +initial = np.zeros(300) +initial[300//2] = 1 +data = CA_run(initial, 150, 30) + +fig, ax = plt.subplots(figsize=(10, 5)) +ax.matshow(data) +ax.axis(False); +``` + + +![png](output_31_0.png) + + +For certain rules, the emergent structures interact in chaotic and interesting ways. + +I hope you enjoyed this brief look into the world of elementary cellular automata, and are inspired to make some pretty pictures of your own. diff --git a/content/posts/elementary-cellular-automata/output_18_0.png b/content/posts/elementary-cellular-automata/output_18_0.png new file mode 100644 index 0000000..e5e8d7b Binary files /dev/null and b/content/posts/elementary-cellular-automata/output_18_0.png differ diff --git a/content/posts/elementary-cellular-automata/output_22_0.png b/content/posts/elementary-cellular-automata/output_22_0.png new file mode 100644 index 0000000..c68f7c1 Binary files /dev/null and b/content/posts/elementary-cellular-automata/output_22_0.png differ diff --git a/content/posts/elementary-cellular-automata/output_24_0.png b/content/posts/elementary-cellular-automata/output_24_0.png new file mode 100644 index 0000000..484f196 Binary files /dev/null and b/content/posts/elementary-cellular-automata/output_24_0.png differ diff --git a/content/posts/elementary-cellular-automata/output_26_0.png b/content/posts/elementary-cellular-automata/output_26_0.png new file mode 100644 index 0000000..31c02d9 Binary files /dev/null and b/content/posts/elementary-cellular-automata/output_26_0.png differ diff --git a/content/posts/elementary-cellular-automata/output_28_0.png b/content/posts/elementary-cellular-automata/output_28_0.png new file mode 100644 index 0000000..e400a19 Binary files /dev/null and b/content/posts/elementary-cellular-automata/output_28_0.png differ diff --git a/content/posts/elementary-cellular-automata/output_31_0.png b/content/posts/elementary-cellular-automata/output_31_0.png new file mode 100644 index 0000000..8707dbc Binary files /dev/null and b/content/posts/elementary-cellular-automata/output_31_0.png differ diff --git a/content/posts/gsod-developing-matplotlib-entry-paths/index.md b/content/posts/gsod-developing-matplotlib-entry-paths/index.md new file mode 100644 index 0000000..3294ffe --- /dev/null +++ b/content/posts/gsod-developing-matplotlib-entry-paths/index.md @@ -0,0 +1,61 @@ +--- +title: "GSoD: Developing Matplotlib Entry Paths" +date: 2020-12-08T08:16:42-08:00 +draft: false +description: "This is my first post contribution to Matplotlib." +categories: ["GSoD"] +displayInList: true +author: Jerome Villegas +--- + +# Introduction + +This year’s Google Season of Docs (GSoD) provided me the opportunity to work with the open source organization, Matplotlib. In early summer, I submitted my proposal of Developing Matplotlib Entry Paths with the goal of improving the documentation with an alternative approach to writing. + +I had set out to identify with users more by providing real world contexts to examples and programming. My purpose was to lower the barrier of entry for others to begin using the Python library with an expository approach. I focused on aligning with users based on consistent derived purposes and a foundation of task-based empathy. + +The project began during the community bonding phase with learning the fundamentals of building documentation and working with open source code. I later generated usability testing surveys to the community and consolidated findings. From these results, I developed two new documents for merging into the Matplotlib repository, a Getting Started introductory tutorial and a lean Style Guide for the documentation. + +# Project Report + +Throughout this year’s Season of Docs with Matplotlib, I learned a great deal about working on open source projects, provided contributions of surveying communities and interviewing subject matter experts in documentation usability testing, and produced a comprehensive introductory guide for improving entry-level content with an initiative style guide section. + +As a new user to Git and GitHub, I had a learning curve in getting started with building documentation locally on my machine. Working with cloning repositories and familiarizing myself with commits and pull requests took the bulk of the first few weeks on this project. However, with experiencing errors and troubleshooting broken branches, it was excellent to be able to lean on my mentors for resolving these issues. Platforms like Gitter, Zoom, and HackMD were key in keeping communication timely and concise. I was fortunate to be able to get in touch with the team to help me as soon as I had problems. + +With programming, I was not a completely fresh face to Python and Matplotlib. However, installing the library from the source and breaking down functionality to core essentials helped me grow in my understanding of not only the fundamentals, but also the terminology. Tackling everything through my own experience of using Python and then also having suggestions and advice from the development team accelerated the ideas and implementations I aimed to work towards. + +New formats and standards with reStructuredText files and Sphinx compatibility were unfamiliar avenues to me at first. In building documentation and reading through already written content, I adapted to making the most of the features available with the ideas I had for writing material suited for users new to Matplotlib. Making use of tables and code examples embedded allowed me to be more flexible in visual layout and navigation. + +During the beginning stages of the project, I was able to incorporate usability testing for the current documentation. By reaching out to communities on Twitter, Reddit, and various Slack channels, I compiled and consolidated findings that helped shape the language and focus of new content to create. I summarized and shared the community’s responses in addition to separate informational interviews conducted with subject matter experts in my location. These data points helped in justifying and supporting decisions for the scope and direction of the language and content. + +At the end of the project, I completed our agreed upon expectations for the documentation. The focused goal consisted of a Getting Started tutorial to introduce and give context to Matplotlib for new users. In addition, through the documentation as well as the meetings with the community, we acknowledged a missing element of a Style Guide. Though a comprehensive document for the entire library was out of the scope of the project, I put together, in conjunction with the featured task, a lean version that serves as a foundational resource for writing Matplotlib documentation. + +The two sections are part of a current pull request to merge into Matplotlib’s repository. I have already worked through smaller changes to the content and am working with the community in moving forward with the process. + +# Conclusion + +This Season of Docs proposal began as a vision of ideals I hoped to share and work towards with an organization and has become a technical writing experience full of growth and camaraderie. I am pleased with the progress I had made and cannot thank the team enough for the leadership and mentorship they provided. It is fulfilling and rewarding to both appreciate and be appreciated within a team. + +In addition, the opportunity put together by the team at Google to foster collaboration among skilled contributors cannot be understated. Highlighting the accomplishments of these new teams raises the bar for the open source community. + +# Details + +## Acknowledgements + +Special thanks to Emily Hsu, Joe McEwen, and Smriti Singh for their time and responses, fellow Matplotlib Season of Docs writer Bruno Beltran for his insight and guidance, and the Matplotlib development team mentors Tim, Tom, and Hannah for their patience, support, and approachability for helping a new technical writer like me with my own Getting Started. + +## External Links + +- [Getting Started GSoD Pull Request](https://github.com/matplotlib/matplotlib/pull/18873) +- [Matplotlib User Survey](https://docs.google.com/forms/d/e/1FAIpQLSfPX13wXNOV5LM4OoHUYT3xtSZzVQ6I3ZA4cvz5P6DKuph4aw/viewform?usp=sf_link) +- [User Survey Responses](https://docs.google.com/spreadsheets/d/1z_bAu7hG-IgtFkM5uPezkUHQvi6gsWKxoDnh0Hz1K5U/edit?usp=sharing) +- [User Survey Open Questions](https://docs.google.com/spreadsheets/d/15EzVNmWVn2SjCUBc-Kt5Y0_entLgvWRMRYy8syt_-Xg/edit?usp=sharing) +- [HackMD GSoD Meeting Agenda](https://hackmd.io/cSNb2JhrSo26zJGag3bvLg) + +## About Me + +My name is [Jerome Villegas](https://www.linkedin.com/in/jeromefuertevillegas/) and I'm a technical writer based in Seattle. I've been in education and education-adjacent fields for several years before transitioning to the industry of technical communication. My career has taken me to Taiwan to teach English and work in publishing, then to New York City to work in higher education, and back to Seattle where I worked at a private school. + +Since leaving my job, I've taken to supporting my family while studying technical writing at the University of Washington and supplementing the knowledge with learning programming on the side. Along with a former classmate, the two of us have worked with the UX writing community in the Pacific Northwest. We host interview sessions, moderate sessions at conferences, and generate content analyzing trends and patterns in UX/tech writing. + +In telling people what I've got going on in my life, you can find work I've done at my [personal site](https://jeromefvillegas.wordpress.com) and see what we're up to at [shift J](https://teamshiftj.wordpress.com). Thanks for reading! \ No newline at end of file diff --git a/content/posts/how-to-contribute/index.md b/content/posts/how-to-contribute/index.md index f50d8b9..ee06082 100644 --- a/content/posts/how-to-contribute/index.md +++ b/content/posts/how-to-contribute/index.md @@ -16,12 +16,22 @@ resources: Matplotblog relies on your contributions to it. We want to showcase all the amazing projects that make use of Matplotlib. In this post, we will see which steps you have to follow to add a post to our blog. -To manage your contributions, we will use [Git pull requests](https://yangsu.github.io/pull-request-tutorial/). So, if you have not done it already, you first need to clone [our Git repository](https://github.com/matplotlib/matplotblog), by typing the following in a terminal window: +To manage your contributions, we will use [Git pull requests](https://yangsu.github.io/pull-request-tutorial/). So, if you have not done it already, you first need to fork and clone [our Git repository](https://github.com/matplotlib/matplotblog), by clicking on the Fork button on the top right corner of the Github page, and then type the following in a terminal window: ``` -git clone https://github.com/matplotlib/matplotblog.git +git clone git@github.com:[USERNAME]/matplotblog.git +``` +where [USERNAME] should be replaced by your Github username. You now have to make sure that if you reuse this forked repository, it is up to date with the main Matplotblog repository. To do so, type the following: +``` +git remote add upstream https://github.com/matplotlib/matplotblog.git +``` + +You should now create a new branch, which will contain your changes. First, checkout the master: +``` +git checkout master +git merge upstream/master ``` -Then, you should create a new branch, which will contain your changes. +and then create a new branch and check it out: ``` cd matplotblog @@ -83,11 +93,18 @@ hugo server ``` Then open the browser and visit [http://localhost:1313/matplotblog](http://localhost:1313/matplotblog) to make sure your post appears in the homepage. If you spot errors or something that you want to tune, go back to your index.md file and modify it. -When your post is ready to go, you can add it to the repository, commit and push the changes to your branch: +When your post is ready to go, you can add it to your local repository, commit and push the changes to your branch: ``` git add content/posts/my-fancy-title git commit -m "Added new blog post" git push ``` -Finally, submit a pull request to have our admins review your contribution and merge it to the master repository. That is it folks! +Finally, submit a **pull request** to have our admins review your contribution and merge it to the master repository. To do so, type the following: +``` +git checkout post-my-fancy-title +git rebase master +``` +and then go to the page for your fork on GitHub, select your development branch, and click the pull request button. Your pull request will automatically track the changes on your development branch and update. Further info on the pull request process are available [here](https://docs.github.com/en/enterprise/2.16/user/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork). + +That is it folks! diff --git a/content/posts/how-to-create-custom-tables/0_example.png b/content/posts/how-to-create-custom-tables/0_example.png new file mode 100644 index 0000000..e3434de Binary files /dev/null and b/content/posts/how-to-create-custom-tables/0_example.png differ diff --git a/content/posts/how-to-create-custom-tables/1_coordinate_space.png b/content/posts/how-to-create-custom-tables/1_coordinate_space.png new file mode 100644 index 0000000..d96312a Binary files /dev/null and b/content/posts/how-to-create-custom-tables/1_coordinate_space.png differ diff --git a/content/posts/how-to-create-custom-tables/2_adding_data.png b/content/posts/how-to-create-custom-tables/2_adding_data.png new file mode 100644 index 0000000..07af6a5 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/2_adding_data.png differ diff --git a/content/posts/how-to-create-custom-tables/3_headers.png b/content/posts/how-to-create-custom-tables/3_headers.png new file mode 100644 index 0000000..1ba7039 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/3_headers.png differ diff --git a/content/posts/how-to-create-custom-tables/4_gridlines.png b/content/posts/how-to-create-custom-tables/4_gridlines.png new file mode 100644 index 0000000..c6f3a99 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/4_gridlines.png differ diff --git a/content/posts/how-to-create-custom-tables/5_highlight_column.png b/content/posts/how-to-create-custom-tables/5_highlight_column.png new file mode 100644 index 0000000..f01d64b Binary files /dev/null and b/content/posts/how-to-create-custom-tables/5_highlight_column.png differ diff --git a/content/posts/how-to-create-custom-tables/6_hide_axis.png b/content/posts/how-to-create-custom-tables/6_hide_axis.png new file mode 100644 index 0000000..d0db672 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/6_hide_axis.png differ diff --git a/content/posts/how-to-create-custom-tables/6_title.png b/content/posts/how-to-create-custom-tables/6_title.png new file mode 100644 index 0000000..36a15ee Binary files /dev/null and b/content/posts/how-to-create-custom-tables/6_title.png differ diff --git a/content/posts/how-to-create-custom-tables/7_floating_axes.png b/content/posts/how-to-create-custom-tables/7_floating_axes.png new file mode 100644 index 0000000..4500f3c Binary files /dev/null and b/content/posts/how-to-create-custom-tables/7_floating_axes.png differ diff --git a/content/posts/how-to-create-custom-tables/8_sparklines.png b/content/posts/how-to-create-custom-tables/8_sparklines.png new file mode 100644 index 0000000..4830d7a Binary files /dev/null and b/content/posts/how-to-create-custom-tables/8_sparklines.png differ diff --git a/content/posts/how-to-create-custom-tables/header.jpeg b/content/posts/how-to-create-custom-tables/header.jpeg new file mode 100644 index 0000000..e21ee70 Binary files /dev/null and b/content/posts/how-to-create-custom-tables/header.jpeg differ diff --git a/content/posts/how-to-create-custom-tables/index.md b/content/posts/how-to-create-custom-tables/index.md new file mode 100644 index 0000000..5fa935f --- /dev/null +++ b/content/posts/how-to-create-custom-tables/index.md @@ -0,0 +1,235 @@ +--- +title: "How to create custom tables" +date: 2022-03-11T11:10:06Z +draft: false +description: A tutorial on how to create custom tables in Matplotlib which allow for flexible design and customization. +categories: ["tutorials"] +displayInList: true +author: Tim Bayer +resources: +- name: featuredImage + src: "header.jpeg" + params: + description: "header pic" + showOnTop: true +--- + +# Introduction + +This tutorial will teach you how to create custom tables in Matplotlib, which are extremely flexible in terms of the design and layout. You’ll hopefully see that the code is very straightforward! In fact, the main methods we will be using are `ax.text()` and `ax.plot()`. + +I want to give a lot of credit to [Todd Whitehead](https://twitter.com/CrumpledJumper) who has created these types of tables for various Basketball teams and players. His approach to tables is nothing short of fantastic due to the simplicity in design and how he manages to effectively communicate data to his audience. I was very much inspired by his approach and wanted to be able to achieve something similar in Matplotlib. + +Before I begin with the tutorial, I wanted to go through the logic behind my approach as I think it's valuable and transferable to other visualizations (and tools!). + +With that, I would like you to **think of tables as highly structured and organized scatterplots**. Let me explain why: for me, scatterplots are the most fundamental chart type (regardless of tool). + +![Scatterplots](scatterplots.png) + +For example `ax.plot()` automatically "connects the dots" to form a line chart or `ax.bar()` automatically "draws rectangles" across a set of coordinates. Very often (again regardless of tool) we may not always see this process happening. The point is, it is useful to think of any chart as a scatterplot or simply as a collection of shapes based on xy coordinates. This logic / thought process can unlock a ton of *custom* charts as the only thing you need are the coordinates (which can be mathematically computed). + +With that in mind, we can move on to tables! So rather than plotting rectangles or circles we want to plot text and gridlines in a highly organized manner. + +We will aim to create a table like this, which I have posted on Twitter [here](https://twitter.com/TimBayer93/status/1476926897850359809). Note, the only elements added outside of Matplotlib are the fancy arrows and their descriptions. + +![Example](0_example.png) + + +# Creating a custom table + +Importing required libraries. + +```python +import matplotlib as mpl +import matplotlib.patches as patches +from matplotlib import pyplot as plt +``` + +First, we will need to set up a coordinate space - I like two approaches: +1. working with the standard Matplotlib 0-1 scale (on both the x- and y-axis) or +2. an index system based on row / column numbers (this is what I will use here) + +I want to create a coordinate space for a table containing 6 columns and 10 rows - this means (similar to pandas row/column indices) each row will have an index between 0-9 and each column will have an index between 0-6 (this is technically 1 more column than what we defined but one of the columns with a lot of text will span two column “indices”) + +```python +# first, we'll create a new figure and axis object +fig, ax = plt.subplots(figsize=(8,6)) + +# set the number of rows and cols for our table +rows = 10 +cols = 6 + +# create a coordinate system based on the number of rows/columns +# adding a bit of padding on bottom (-1), top (1), right (0.5) +ax.set_ylim(-1, rows + 1) +ax.set_xlim(0, cols + .5) +``` + +![Empty Coordinate Space](1_coordinate_space.png) + +Now, the data we want to plot is sports (football) data. We have information about 10 players and some values against a number of different metrics (which will form our columns) such as goals, shots, passes etc. + +```python +# sample data +data = [ + {'id': 'player10', 'shots': 1, 'passes': 79, 'goals': 0, 'assists': 1}, + {'id': 'player9', 'shots': 2, 'passes': 72, 'goals': 0, 'assists': 1}, + {'id': 'player8', 'shots': 3, 'passes': 47, 'goals': 0, 'assists': 0}, + {'id': 'player7', 'shots': 4, 'passes': 99, 'goals': 0, 'assists': 5}, + {'id': 'player6', 'shots': 5, 'passes': 84, 'goals': 1, 'assists': 4}, + {'id': 'player5', 'shots': 6, 'passes': 56, 'goals': 2, 'assists': 0}, + {'id': 'player4', 'shots': 7, 'passes': 67, 'goals': 0, 'assists': 3}, + {'id': 'player3', 'shots': 8, 'passes': 91, 'goals': 1, 'assists': 1}, + {'id': 'player2', 'shots': 9, 'passes': 75, 'goals': 3, 'assists': 2}, + {'id': 'player1', 'shots': 10, 'passes': 70, 'goals': 4, 'assists': 0} +] +``` + +Next, we will start plotting the table (as a structured scatterplot). I did promise that the code will be very simple, less than 10 lines really, here it is: + + +```python +# from the sample data, each dict in the list represents one row +# each key in the dict represents a column +for row in range(rows): + # extract the row data from the list + d = data[row] + + # the y (row) coordinate is based on the row index (loop) + # the x (column) coordinate is defined based on the order I want to display the data in + + # player name column + ax.text(x=.5, y=row, s=d['id'], va='center', ha='left') + # shots column - this is my "main" column, hence bold text + ax.text(x=2, y=row, s=d['shots'], va='center', ha='right', weight='bold') + # passes column + ax.text(x=3, y=row, s=d['passes'], va='center', ha='right') + # goals column + ax.text(x=4, y=row, s=d['goals'], va='center', ha='right') + # assists column + ax.text(x=5, y=row, s=d['assists'], va='center', ha='right') +``` + +![Adding data](2_adding_data.png) + +As you can see, we are starting to get a basic wireframe of our table. Let's add column headers to further make this *scatterplot* look like a table. + +```python +# Add column headers +# plot them at height y=9.75 to decrease the space to the +# first data row (you'll see why later) +ax.text(.5, 9.75, 'Player', weight='bold', ha='left') +ax.text(2, 9.75, 'Shots', weight='bold', ha='right') +ax.text(3, 9.75, 'Passes', weight='bold', ha='right') +ax.text(4, 9.75, 'Goals', weight='bold', ha='right') +ax.text(5, 9.75, 'Assists', weight='bold', ha='right') +ax.text(6, 9.75, 'Special\nColumn', weight='bold', ha='right', va='bottom') +``` + +![Adding Headers](3_headers.png) + + +# Formatting our table + +The rows and columns of our table are now done. The only thing that is left to do is formatting - much of this is personal choice. The following elements I think are generally useful when it comes to good table design (more research [here](https://www.storytellingwithdata.com/blog/2019/10/29/how-i-improved-the-table)): + +Gridlines: Some level of gridlines are useful (less is more). Generally some guidance to help the audience trace their eyes or fingers across the screen can be helpful (this way we can *group* items too by drawing gridlines around them). + +```python +for row in range(rows): + ax.plot( + [0, cols + 1], + [row -.5, row - .5], + ls=':', + lw='.5', + c='grey' + ) + +# add a main header divider +# remember that we plotted the header row slightly closer to the first data row +# this helps to visually separate the header row from the data rows +# each data row is 1 unit in height, thus bringing the header closer to our +# gridline gives it a distinctive difference. +ax.plot([0, cols + 1], [9.5, 9.5], lw='.5', c='black') +``` + +![Adding Gridlines](4_gridlines.png) + +Another important element for tables in my opinion is highlighting the *key* data points. We already bolded the values that are in the "Shots" column but we can further shade this column to give it further importance to our readers. + +```python +# highlight the column we are sorting by +# using a rectangle patch +rect = patches.Rectangle( + (1.5, -.5), # bottom left starting position (x,y) + .65, # width + 10, # height + ec='none', + fc='grey', + alpha=.2, + zorder=-1 +) +ax.add_patch(rect) +``` + +![Highlight column](5_highlight_column.png) + +We're almost there. The magic piece is `ax.axis(‘off’)`. This hides the axis, axis ticks, labels and everything “attached” to the axes, which means our table now looks like a clean table! + +```python +ax.axis('off') +``` + +![Hide axis](6_hide_axis.png) + +Adding a title is also straightforward. + +```python +ax.set_title( + 'A title for our table!', + loc='left', + fontsize=18, + weight='bold' +) +``` + +![Title](6_title.png) + +# Bonus: Adding special columns + +Finally, if you wish to add images, sparklines, or other custom shapes and patterns then we can do this too. + +To achieve this we will create new floating axes using `fig.add_axes()` to create a new set of floating axes based on the figure coordinates (this is different to our axes coordinate system!). + +Remember that figure coordinates by default are between 0 and 1. [0,0] is the bottom left corner of the entire figure. If you’re unfamiliar with the differences between a figure and axes then check out [Matplotlib's Anatomy of a Figure](https://matplotlib.org/stable/gallery/showcase/anatomy.html) for further details. + +```python +newaxes = [] +for row in range(rows): + # offset each new axes by a set amount depending on the row + # this is probably the most fiddly aspect (TODO: some neater way to automate this) + newaxes.append( + fig.add_axes([.75, .725 - (row*.063), .12, .06]) + ) +``` + +You can see below what these *floating* axes will look like (I say floating because they’re on top of our main axis object). The only tricky thing is figuring out the xy (figure) coordinates for these. + +These *floating* axes behave like any other Matplotlib axes. Therefore, we have access to the same methods such as ax.bar(), ax.plot(), patches, etc. Importantly, each axis has its own independent coordinate system. We can format them as we wish. + +![Floating axes](7_floating_axes.png) + +```python +# plot dummy data as a sparkline for illustration purposes +# you can plot _anything_ here, images, patches, etc. +newaxes[0].plot([0, 1, 2, 3], [1, 2, 0, 2], c='black') +newaxes[0].set_ylim(-1, 3) + +# once again, the key is to hide the axis! +newaxes[0].axis('off') +``` + +![Sparklines](8_sparklines.png) + +That’s it, custom tables in Matplotlib. I did promise very simple code and an ultra-flexible design in terms of what you want / need. You can adjust sizes, colors and pretty much anything with this approach and all you need is simply a loop that plots text in a structured and organized manner. I hope you found it useful. Link to a Google Colab notebook with the code is [here](https://colab.research.google.com/drive/1JshATKxjs7NWz2U8Oy6xOJaLgjldC1CW) + diff --git a/content/posts/how-to-create-custom-tables/scatterplots.png b/content/posts/how-to-create-custom-tables/scatterplots.png new file mode 100644 index 0000000..5e3da1e Binary files /dev/null and b/content/posts/how-to-create-custom-tables/scatterplots.png differ diff --git a/content/posts/ipcc-sr15/IPCC-SR15-cover.jpg b/content/posts/ipcc-sr15/IPCC-SR15-cover.jpg new file mode 100644 index 0000000..56d2092 Binary files /dev/null and b/content/posts/ipcc-sr15/IPCC-SR15-cover.jpg differ diff --git a/content/posts/ipcc-sr15/index.md b/content/posts/ipcc-sr15/index.md new file mode 100644 index 0000000..4d1df3f --- /dev/null +++ b/content/posts/ipcc-sr15/index.md @@ -0,0 +1,96 @@ +--- +title: "Figures in the IPCC Special Report on Global Warming of 1.5°C (SR15)" +date: 2020-12-31T08:32:45+01:00 +draft: false +description: | + Many figures in the IPCC SR15 were generated using Matplotlib. + The data and open-source notebooks were published to increase the transparency and reproducibility of the analysis. +categories: ["academia", "tutorials"] +displayInList: true +author: Daniel Huppmann + +resources: +- name: featuredImage + src: "IPCC-SR15-cover.jpg" + params: + description: "Cover page of the IPCC SR15" + showOnTop: false + +--- + +## Background + +
+ + +
+ Cover of the IPCC SR15
+
+ +The IPCC's *Special Report on Global Warming of 1.5°C* (SR15), published in October 2018, +presented the latest research on anthropogenic climate change. +It was written in response to the 2015 UNFCCC's "Paris Agreement" of + +> holding the increase in the global average temperature to well below 2 °C +> above pre-industrial levels and to pursue efforts to limit the temperature increase to 1.5 °C [...]". + +cf. [Article 2.1.a of the Paris Agreement](https://unfccc.int/process-and-meetings/the-paris-agreement/the-paris-agreement) + +As part of the SR15 assessment, an ensemble of quantitative, model-based scenarios +was compiled to underpin the scientific analysis. +Many of the headline statements widely reported by media +are based on this scenario ensemble, including the finding that + +> global net anthropogenic CO2 emissions decline by ~45% from 2010 levels by 2030 + +in all pathways limiting global warming to 1.5°C +(cf. [statement C.1](https://www.ipcc.ch/sr15/chapter/spm/) in the *Summary For Policymakers*). + +## Open-source notebooks for transparency and reproducibility of the assessment + +When preparing the SR15, the authors wanted to go beyond previous reports +not just regarding the scientific rigor and scope of the analysis, +but also establish new standards in terms of openness, transparency and reproducibility. + +The scenario ensemble was made accessible via an interactive *IAMC 1.5°C Scenario Explorer* +([link](http://data.ene.iiasa.ac.at/iamc-1.5c-explorer/#/workspaces)) in line with the +[FAIR principles for scientific data management and stewardship](https://www.go-fair.org/fair-principles/). +The process for compiling, validating and analyzing the scenario ensemble +was described in an open-access manuscript published in *Nature Climate Change* +(doi: [10.1038/s41558-018-0317-4](https://doi.org/10.1038/s41558-018-0317-4)). + +In addition, the Jupyter notebooks generating many of the headline statements, +tables and figures (using Matplotlib) were released under an open-source license +to facilitate a better understanding of the analysis +and enable reuse for subsequent research. +The notebooks are available in [rendered format](https://data.ene.iiasa.ac.at/sr15_scenario_analysis) +and on [GitHub](https://github.com/iiasa/ipcc_sr15_scenario_analysis). + +
+ +
+ Figure 2.4 of the IPCC SR15, showing the range of assumptions of socio-economic drivers
+ across the IAMC 1.5°C Scenario Ensemble
+ Drawn with Matplotlib, source code available here +
+
+ +
+ +
+ Figure 2.15 of the IPCC SR15, showing the primary energy development in illustrative pathways
+ Drawn with Matplotlib, source code available here +
+
+ +## A package for scenario analysis & visualization + +To facilitate reusability of the scripts and plotting utilities +developed for the SR15 analysis, we started the open-source Python package **pyam** +as a toolbox for working with scenarios from integrated-assessment and energy system models. + +The package is a wrapper for [pandas](https://pandas.pydata.org) and Matplotlib +geared for several data formats commonly used in energy modelling. +[Read the docs!](https://pyam-iamc.readthedocs.io) + + diff --git a/content/posts/ipcc-sr15/pyam-header.png b/content/posts/ipcc-sr15/pyam-header.png new file mode 100644 index 0000000..e1a67a7 Binary files /dev/null and b/content/posts/ipcc-sr15/pyam-header.png differ diff --git a/content/posts/ipcc-sr15/sr15-fig2.15.png b/content/posts/ipcc-sr15/sr15-fig2.15.png new file mode 100644 index 0000000..1e52d6f Binary files /dev/null and b/content/posts/ipcc-sr15/sr15-fig2.15.png differ diff --git a/content/posts/ipcc-sr15/sr15-fig2.4.png b/content/posts/ipcc-sr15/sr15-fig2.4.png new file mode 100644 index 0000000..4634846 Binary files /dev/null and b/content/posts/ipcc-sr15/sr15-fig2.4.png differ diff --git a/content/posts/pyplot-vs-object-oriented-interface/index.md b/content/posts/pyplot-vs-object-oriented-interface/index.md index 5dc7dd2..290fc0f 100644 --- a/content/posts/pyplot-vs-object-oriented-interface/index.md +++ b/content/posts/pyplot-vs-object-oriented-interface/index.md @@ -60,7 +60,7 @@ This interface shares a lot of similarities in syntax and methodology with MATLA import matplotlib.pyplot as plt plt.figure(figsize=(9,7), dpi=100) -plt.plot(distance,'bo-') +plt.plot(time,distance,'bo-') plt.xlabel("Time") plt.ylabel("Distance") plt.legend(["Distance"]) @@ -76,7 +76,7 @@ The plot shows how much distance was covered by the free-falling object with eac ```python plt.figure(figsize=(9,7), dpi=100) -plt.plot(velocity,'go-') +plt.plot(time, velocity,'go-') plt.xlabel("Time") plt.ylabel("Velocity") plt.legend(["Velocity"]) @@ -94,8 +94,8 @@ Let's try to see what kind of plot we get when we plot both distance and velocit ```python plt.figure(figsize=(9,7), dpi=100) -plt.plot(velocity,'g-') -plt.plot(distance,'b-') +plt.plot(time, velocity,'g-') +plt.plot(time, distance,'b-') plt.ylabel("Distance and Velocity") plt.xlabel("Time") plt.legend(["Distance", "Velocity"]) diff --git a/content/posts/python-graph-gallery.com/.DS_Store b/content/posts/python-graph-gallery.com/.DS_Store new file mode 100644 index 0000000..5008ddf Binary files /dev/null and b/content/posts/python-graph-gallery.com/.DS_Store differ diff --git a/content/posts/python-graph-gallery.com/annotations.png b/content/posts/python-graph-gallery.com/annotations.png new file mode 100644 index 0000000..8c22959 Binary files /dev/null and b/content/posts/python-graph-gallery.com/annotations.png differ diff --git a/content/posts/python-graph-gallery.com/boxplot.png b/content/posts/python-graph-gallery.com/boxplot.png new file mode 100644 index 0000000..59e0051 Binary files /dev/null and b/content/posts/python-graph-gallery.com/boxplot.png differ diff --git a/content/posts/python-graph-gallery.com/home-page-overview.png b/content/posts/python-graph-gallery.com/home-page-overview.png new file mode 100644 index 0000000..6616f9b Binary files /dev/null and b/content/posts/python-graph-gallery.com/home-page-overview.png differ diff --git a/content/posts/python-graph-gallery.com/index.md b/content/posts/python-graph-gallery.com/index.md new file mode 100644 index 0000000..a902975 --- /dev/null +++ b/content/posts/python-graph-gallery.com/index.md @@ -0,0 +1,70 @@ +--- +title: "The Python Graph Gallery: hundreds of python charts with reproducible code." +date: 2021-07-24T14:06:57+02:00 +draft: false +description: "The Python Graph Gallery is a website that displays hundreds of chart examples made with python. It goes from very basic to highly customized examples and is based on common viz libraries like matplotlib, seaborn or plotly." +categories: ["tutorials", "graphs"] +displayInList: true +author: Yan Holtz +resources: +- name: featuredImage + src: "home-page-overview.png" + params: + description: "An overview of the gallery homepage" + showOnTop: false +--- + +Data visualization is a key step in a data science pipeline. [Python](https://www.python.org) offers great possibilities when it comes to representing some data graphically, but it can be hard and time-consuming to create the appropriate chart. + +The [Python Graph Gallery](https://www.python-graph-gallery.com) is here to help. It displays many examples, always providing the reproducible code. It allows to build the desired chart in minutes. + +# About 400 charts in 40 sections + +The gallery currently provides more than [400 chart examples](https://www.python-graph-gallery.com/all-charts/). Those examples are organized in 40 sections, one for each chart types: [scatterplot](https://www.python-graph-gallery.com/scatter-plot/), [boxplot](https://www.python-graph-gallery.com/boxplot/), [barplot](https://www.python-graph-gallery.com/barplot/), [treemap](https://www.python-graph-gallery.com/treemap/) and so on. Those chart types are organized in 7 big families as suggested by [data-to-viz.com](https://www.data-to-viz.com): one for each visualization purpose. + +It is important to note that not only the most common chart types are covered. Lesser known charts like [chord diagrams](https://www.python-graph-gallery.com/chord-diagram/), [streamgraphs](https://www.python-graph-gallery.com/streamchart/) or [bubble maps](https://www.python-graph-gallery.com/bubble-map/) are also available. + +![overview of the python graph gallery sections](sections-overview.png) + +# Master the basics + +Each section always starts with some very basic examples. It allows to understand how to build a chart type in a few seconds. Hopefully applying the same technique on another dataset will thus be very quick. + +For instance, the [scatterplot section](https://www.python-graph-gallery.com/scatter-plot/) starts with this [matplotlib](https://matplotlib.org/) example. It shows how to create a dataset with [pandas](https://pandas.pydata.org/) and plot it with the `plot()` function. The main graph argument like `linestyle` and `marker` are described to make sure the code is understandable. + +[_blogpost overview_:](https://www.python-graph-gallery.com/130-basic-matplotlib-scatterplot) + +![a basic scatterplot example](scatterplot-example.png) + +# Matplotlib customization + +The gallery uses several libraries like [seaborn](https://www.python-graph-gallery.com/seaborn/) or [plotly](https://www.python-graph-gallery.com/plotly/) to produce its charts, but is mainly focus on matplotlib. Matplotlib comes with great flexibility and allows to build any kind of chart without limits. + +A [whole page](https://www.python-graph-gallery.com/matplotlib/) is dedicated to matplotlib. It describes how to solve recurring issues like customizing [axes](https://www.python-graph-gallery.com/191-custom-axis-on-matplotlib-chart) or [titles](https://www.python-graph-gallery.com/190-custom-matplotlib-title), adding [annotations](https://www.python-graph-gallery.com/193-annotate-matplotlib-chart) (see below) or even using [custom fonts](https://www.python-graph-gallery.com/custom-fonts-in-matplotlib). + +![annotation examples](annotations.png) + +The gallery is also full of non-straightforward examples. For instance, it has a [tutorial](https://www.python-graph-gallery.com/streamchart-basic-matplotlib) explaining how to build a streamchart with matplotlib. It is based on the `stackplot()` function and adds some smoothing to it: + +![stream chart with python and matplotlib](streamchart.png) + +Last but not least, the gallery also displays some publication ready charts. They usually involve a lot of matplotlib code, but showcase the fine grain control one has over a plot. + +Here is an example with a post inspired by [Tuo Wang](https://www.r-graph-gallery.com/web-violinplot-with-ggstatsplot.html)'s work for the tidyTuesday project. (Code translated from R available [here](https://www.python-graph-gallery.com/web-ggbetweenstats-with-matplotlib)) + +![python violin and boxplot example](boxplot.png) + + +# Contributing + +The python graph gallery is an ever growing project. It is open-source, with all its related code hosted on [github](https://github.com/holtzy/The-Python-Graph-Gallery). + +Contributions are very welcome to the gallery. Each blogpost is just a jupyter notebook so suggestion should be very easy to do through issues or pull requests! + +# Conclusion + +The [python graph gallery](https://www.python-graph-gallery.com) is a project developed by [Yan Holtz](https://www.yan-holtz.com) in his free time. It can help you improve your technical skills when it comes to visualizing data with python. + +The gallery belongs to an ecosystem of educative websites. [Data to viz](https://www.data-to-viz.com) describes best practices in data visualization, the [R](https://www.r-graph-gallery.com), [python](https://www.python-graph-gallery.com) and [d3.js](https://www.d3-graph-gallery.com) graph galleries provide technical help to build charts with the 3 most common tools. + +For any question regarding the project, please say hi on twitter at [@R_Graph_Gallery](https://twitter.com/R_Graph_Gallery)! diff --git a/content/posts/python-graph-gallery.com/scatterplot-example.png b/content/posts/python-graph-gallery.com/scatterplot-example.png new file mode 100644 index 0000000..99d0869 Binary files /dev/null and b/content/posts/python-graph-gallery.com/scatterplot-example.png differ diff --git a/content/posts/python-graph-gallery.com/sections-overview.png b/content/posts/python-graph-gallery.com/sections-overview.png new file mode 100644 index 0000000..7a0da60 Binary files /dev/null and b/content/posts/python-graph-gallery.com/sections-overview.png differ diff --git a/content/posts/python-graph-gallery.com/streamchart.png b/content/posts/python-graph-gallery.com/streamchart.png new file mode 100644 index 0000000..1990a51 Binary files /dev/null and b/content/posts/python-graph-gallery.com/streamchart.png differ diff --git a/content/posts/stellar-chart-alternative-radar-chart/index.md b/content/posts/stellar-chart-alternative-radar-chart/index.md new file mode 100644 index 0000000..55e9cd2 --- /dev/null +++ b/content/posts/stellar-chart-alternative-radar-chart/index.md @@ -0,0 +1,182 @@ +--- +title: "Stellar Chart, a Type of Chart to Be on Your Radar" +date: 2021-01-10T20:29:40Z +draft: false +description: "Learn how to create a simple stellar chart, an alternative to the radar chart." +categories: ["tutorials"] +displayInList: true +author: João Palmeiro +resources: + - name: featuredImage + src: "stellar_chart.png" + params: + description: "example of a stellar chart" + showOnTop: false +--- + +In May 2020, Alexandre Morin-Chassé published a blog post about the **stellar chart**. This type of chart is an (approximately) direct alternative to the **radar chart** (also known as web, spider, star, or cobweb chart) — you can read more about this chart [here](https://medium.com/nightingale/the-stellar-chart-an-elegant-alternative-to-radar-charts-ae6a6931a28e). + +![Comparison of a radar chart and a stellar chart](radar_stellar_chart.png) + +In this tutorial, we will see how we can create a quick-and-dirty stellar chart. First of all, let's get the necessary modules/libraries, as well as prepare a dummy dataset (with just a single record). + +```python +from itertools import chain, zip_longest +from math import ceil, pi + +import matplotlib.pyplot as plt + +data = [ + ("V1", 8), + ("V2", 10), + ("V3", 9), + ("V4", 12), + ("V5", 6), + ("V6", 14), + ("V7", 15), + ("V8", 25), +] +``` + +We will also need some helper functions, namely a function to round up to the nearest 10 (`round_up()`) and a function to join two sequences (`even_odd_merge()`). In the latter, the values of the first sequence (a list or a tuple, basically) will fill the even positions and the values of the second the odd ones. + +```python +def round_up(value): + """ + >>> round_up(25) + 30 + """ + return int(ceil(value / 10.0)) * 10 + + +def even_odd_merge(even, odd, filter_none=True): + """ + >>> list(even_odd_merge([1,3], [2,4])) + [1, 2, 3, 4] + """ + if filter_none: + return filter(None.__ne__, chain.from_iterable(zip_longest(even, odd))) + + return chain.from_iterable(zip_longest(even, odd)) +``` + +That said, to plot `data` on a stellar chart, we need to apply some transformations, as well as calculate some auxiliary values. So, let's start by creating a function (`prepare_angles()`) to calculate the angle of each axis on the chart (`N` corresponds to the number of variables to be plotted). + +```python +def prepare_angles(N): + angles = [n / N * 2 * pi for n in range(N)] + + # Repeat the first angle to close the circle + angles += angles[:1] + + return angles +``` + +Next, we need a function (`prepare_data()`) responsible for adjusting the original data (`data`) and separating it into several easy-to-use objects. + +```python +def prepare_data(data): + labels = [d[0] for d in data] # Variable names + values = [d[1] for d in data] + + # Repeat the first value to close the circle + values += values[:1] + + N = len(labels) + angles = prepare_angles(N) + + return labels, values, angles, N +``` + +Lastly, for this specific type of chart, we require a function (`prepare_stellar_aux_data()`) that, from the previously calculated angles, prepares two lists of auxiliary values: a list of **intermediate angles** for each pair of angles (`stellar_angles`) and a list of small **constant values** (`stellar_values`), which will act as the values of the variables to be plotted in order to achieve the **star-like shape** intended for the stellar chart. + +```python +def prepare_stellar_aux_data(angles, ymax, N): + angle_midpoint = pi / N + + stellar_angles = [angle + angle_midpoint for angle in angles[:-1]] + stellar_values = [0.05 * ymax] * N + + return stellar_angles, stellar_values +``` + +At this point, we already have all the necessary _ingredients_ for the stellar chart, so let's move on to the Matplotlib side of this tutorial. In terms of **aesthetics**, we can rely on a function (`draw_peripherals()`) designed for this specific purpose (feel free to customize it!). + +```python +def draw_peripherals(ax, labels, angles, ymax, outer_color, inner_color): + # X-axis + ax.set_xticks(angles[:-1]) + ax.set_xticklabels(labels, color=outer_color, size=8) + + # Y-axis + ax.set_yticks(range(10, ymax, 10)) + ax.set_yticklabels(range(10, ymax, 10), color=inner_color, size=7) + ax.set_ylim(0, ymax) + ax.set_rlabel_position(0) + + # Both axes + ax.set_axisbelow(True) + + # Boundary line + ax.spines["polar"].set_color(outer_color) + + # Grid lines + ax.xaxis.grid(True, color=inner_color, linestyle="-") + ax.yaxis.grid(True, color=inner_color, linestyle="-") +``` + +To **plot the data** and orchestrate (almost) all the steps necessary to have a stellar chart, we just need one last function: `draw_stellar()`. + +```python +def draw_stellar( + ax, + labels, + values, + angles, + N, + shape_color="tab:blue", + outer_color="slategrey", + inner_color="lightgrey", +): + # Limit the Y-axis according to the data to be plotted + ymax = round_up(max(values)) + + # Get the lists of angles and variable values + # with the necessary auxiliary values injected + stellar_angles, stellar_values = prepare_stellar_aux_data(angles, ymax, N) + all_angles = list(even_odd_merge(angles, stellar_angles)) + all_values = list(even_odd_merge(values, stellar_values)) + + # Apply the desired style to the figure elements + draw_peripherals(ax, labels, angles, ymax, outer_color, inner_color) + + # Draw (and fill) the star-shaped outer line/area + ax.plot( + all_angles, + all_values, + linewidth=1, + linestyle="solid", + solid_joinstyle="round", + color=shape_color, + ) + + ax.fill(all_angles, all_values, shape_color) + + # Add a small hole in the center of the chart + ax.plot(0, 0, marker="o", color="white", markersize=3) +``` + +Finally, let's get our chart on a _blank canvas_ (figure). + +```python +fig = plt.figure(dpi=100) +ax = fig.add_subplot(111, polar=True) # Don't forget the projection! + +draw_stellar(ax, *prepare_data(data)) + +plt.show() +``` + +![Example of a stellar chart](stellar_chart.png) + +It's done! Right now, you have an example of a stellar chart and the boilerplate code to add this type of chart to your _repertoire_. If you end up creating your own stellar charts, feel free to share them with the _world_ (and [me](https://twitter.com/joaompalmeiro)!). I hope this tutorial was useful and interesting for you! diff --git a/content/posts/stellar-chart-alternative-radar-chart/radar_stellar_chart.png b/content/posts/stellar-chart-alternative-radar-chart/radar_stellar_chart.png new file mode 100644 index 0000000..6699cc7 Binary files /dev/null and b/content/posts/stellar-chart-alternative-radar-chart/radar_stellar_chart.png differ diff --git a/content/posts/stellar-chart-alternative-radar-chart/stellar_chart.png b/content/posts/stellar-chart-alternative-radar-chart/stellar_chart.png new file mode 100644 index 0000000..1f73871 Binary files /dev/null and b/content/posts/stellar-chart-alternative-radar-chart/stellar_chart.png differ diff --git a/content/posts/unc-biol222/fox.png b/content/posts/unc-biol222/fox.png new file mode 100644 index 0000000..5d8307d Binary files /dev/null and b/content/posts/unc-biol222/fox.png differ diff --git a/content/posts/unc-biol222/index.md b/content/posts/unc-biol222/index.md new file mode 100644 index 0000000..24640da --- /dev/null +++ b/content/posts/unc-biol222/index.md @@ -0,0 +1,218 @@ +--- +title: "Art from UNC BIOL222" +date: 2021-11-19T08:46:00-08:00 +draft: false +description: "UNC BIOL222: Art created with Matplotlib" +categories: ["art", "academia"] +displayInList: true +author: Joseph Lucas +resources: +- name: featuredImage + src: "fox.png" + params: + description: "Emily Foster's Fox" + showOnTop: true +--- + +As part of the University of North Carolina BIOL222 class, [Dr. Catherine Kehl](https://twitter.com/tylikcat) asked her students to "use `matplotlib.pyplot` to make art." BIOL222 is Introduction to Programming, aimed at students with no programming background. The emphasis is on practical, hands-on active learning. + +The students completed the assignment with festive enthusiasm around Halloween. Here are some great examples: + +Harris Davis showed an affinity for pumpkins, opting to go 3D! +![3D Pumpkin](pumpkin.png) +```python +# get library for 3d plotting +from mpl_toolkits.mplot3d import Axes3D + +# make a pumpkin :) +rho = np.linspace(0, 3*np.pi,32) +theta, phi = np.meshgrid(rho, rho) +r, R = .5, .5 +X = (R + r * np.cos(phi)) * np.cos(theta) +Y = (R + r * np.cos(phi)) * np.sin(theta) +Z = r * np.sin(phi) + +# make the stem +theta1 = np.linspace(0,2*np.pi,90) +r1 = np.linspace(0,3,50) +T1, R1 = np.meshgrid(theta1, r1) +X1 = R1 * .5*np.sin(T1) +Y1 = R1 * .5*np.cos(T1) +Z1 = -(np.sqrt(X1**2 + Y1**2) - .7) +Z1[Z1 < .3] = np.nan +Z1[Z1 > .7] = np.nan + +# Display the pumpkin & stem +fig = plt.figure() +ax = fig.gca(projection = '3d') +ax.set_xlim3d(-1, 1) +ax.set_ylim3d(-1, 1) +ax.set_zlim3d(-1, 1) +ax.plot_surface(X, Y, Z, color = 'tab:orange', rstride = 1, cstride = 1) +ax.plot_surface(X1, Y1, Z1, color = 'tab:green', rstride = 1, cstride = 1) +plt.show() +``` + +Bryce Desantis stuck to the biological theme and demonstrated [fractal](https://en.wikipedia.org/wiki/Fractal) art. +![Bryce Fern](leaf.png) +```python +import numpy as np +import matplotlib.pyplot as plt + +#Barnsley's Fern - Fractal; en.wikipedia.org/wiki/Barnsley_… + +#functions for each part of fern: +#stem +def stem(x,y): + return (0, 0.16*y) +#smaller leaflets +def smallLeaf(x,y): + return (0.85*x + 0.04*y, -0.04*x + 0.85*y + 1.6) +#large left leaflets +def leftLarge(x,y): + return (0.2*x - 0.26*y, 0.23*x + 0.22*y + 1.6) +#large right leftlets +def rightLarge(x,y): + return (-0.15*x + 0.28*y, 0.26*x + 0.24*y + 0.44) +componentFunctions = [stem, smallLeaf, leftLarge, rightLarge] + +# number of data points and frequencies for parts of fern generated: +#lists with all 75000 datapoints +datapoints = 75000 +x, y = 0, 0 +datapointsX = [] +datapointsY = [] +#For 75,000 datapoints +for n in range(datapoints): + FrequencyFunction = np.random.choice(componentFunctions, p=[0.01, 0.85, 0.07, 0.07]) + x, y = FrequencyFunction(x,y) + datapointsX.append(x) + datapointsY.append(y) + +#Scatter plot & scaled down to 0.1 to show more definition: +plt.scatter(datapointsX,datapointsY,s=0.1, color='g') +#Title of Figure +plt.title("Barnsley's Fern - Assignment 3") +#Changing background color +ax = plt.axes() +ax.set_facecolor("#d8d7bf") +``` + +Grace Bell got a little trippy with this rotationally semetric art. It's pretty cool how she captured mouse events. It reminds us of a flower. What do you see? +![Rotations](rotations.png) +```python +import matplotlib.pyplot as plt +from matplotlib.tri import Triangulation +from matplotlib.patches import Polygon +import numpy as np + +#I found this sample code online and manipulated it to make the art piece! +#was interested in because it combined what we used for functions as well as what we used for plotting with (x,y) +def update_polygon(tri): + if tri == -1: + points = [0, 0, 0] + else: + points = triang.triangles[tri] + xs = triang.x[points] + ys = triang.y[points] + polygon.set_xy(np.column_stack([xs, ys])) + +def on_mouse_move(event): + if event.inaxes is None: + tri = -1 + else: + tri = trifinder(event.xdata, event.ydata) + update_polygon(tri) + ax.set_title(f'In triangle {tri}') + event.canvas.draw() +#this is the info that creates the angles +n_angles = 14 +n_radii = 7 +min_radius = 0.1 #the radius of the middle circle can move with this variable +radii = np.linspace(min_radius, 0.95, n_radii) +angles = np.linspace(0, 2 * np.pi, n_angles, endpoint=False) +angles = np.repeat(angles[..., np.newaxis], n_radii, axis=1) +angles[:, 1::2] += np.pi / n_angles +x = (radii*np.cos(angles)).flatten() +y = (radii*np.sin(angles)).flatten() +triang = Triangulation(x, y) +triang.set_mask(np.hypot(x[triang.triangles].mean(axis=1), + y[triang.triangles].mean(axis=1)) + < min_radius) + +trifinder = triang.get_trifinder() + +fig, ax = plt.subplots(subplot_kw={'aspect': 'equal'}) +ax.triplot(triang, 'y+-') #made the color of the plot yellow and there are "+" for the data points but you can't really see them because of the lines crossing +polygon = Polygon([[0, 0], [0, 0]], facecolor='y') +update_polygon(-1) +ax.add_patch(polygon) +fig.canvas.mpl_connect('motion_notify_event', on_mouse_move) +plt.show() +``` + +As a bonus, did you like that fox in the banner? That was created (and well documented) by Emily Foster! +```python +import numpy as np +import matplotlib.pyplot as plt + +plt.axis('off') + +#head +xhead = np.arange(-50,50,0.1) +yhead = -0.007*(xhead*xhead) + 100 + +plt.plot(xhead, yhead, 'darkorange') + +#outer ears +xearL = np.arange(-45.8,-9,0.1) +yearL = -0.08*(xearL*xearL) -4*xearL + 70 + +xearR = np.arange(9,45.8,0.1) +yearR = -0.08*(xearR*xearR) + 4*xearR + 70 + +plt.plot(xearL, yearL, 'black') +plt.plot(xearR, yearR, 'black') + +#inner ears +xinL = np.arange(-41.1,-13.7,0.1) +yinL = -0.08*(xinL*xinL) -4*xinL + 59 + +xinR = np.arange(13.7,41.1,0.1) +yinR = -0.08*(xinR*xinR) + 4*xinR + 59 + +plt.plot(xinL, yinL, 'salmon') +plt.plot(xinR, yinR, 'salmon') + +# bottom of face +xfaceL = np.arange(-49.6,-14,0.1) +xfaceR = np.arange(14,49.3,0.1) +xfaceM = np.arange(-14,14,0.1) + +plt.plot(xfaceL, abs(xfaceL), 'darkorange') +plt.plot(xfaceR, abs(xfaceR), 'darkorange') +plt.plot(xfaceM, abs(xfaceM), 'black') + +#nose +xnose = np.arange(-14,14,0.1) +ynose = -0.03*(xnose*xnose) + 20 + +plt.plot(xnose, ynose, 'black') + +#whiskers +xwhiskR = [50, 70, 55, 70, 55, 70, 49.3] +xwhiskL = [-50, -70, -55, -70, -55, -70, -49.3] +ywhisk = [82.6, 85, 70, 65, 60, 45, 49.3] + +plt.plot(xwhiskR, ywhisk, 'darkorange') +plt.plot(xwhiskL, ywhisk, 'darkorange') + +#eyes +plt.plot(20,60, color = 'black', marker = 'o', markersize = 15) +plt.plot(-20,60,color = 'black', marker = 'o', markersize = 15) + +plt.plot(22,62, color = 'white', marker = 'o', markersize = 6) +plt.plot(-18,62,color = 'white', marker = 'o', markersize = 6) +``` + +We look forward to seeing these students continue in their plotting and scientific adventures! \ No newline at end of file diff --git a/content/posts/unc-biol222/leaf.png b/content/posts/unc-biol222/leaf.png new file mode 100644 index 0000000..448b82d Binary files /dev/null and b/content/posts/unc-biol222/leaf.png differ diff --git a/content/posts/unc-biol222/pumpkin.png b/content/posts/unc-biol222/pumpkin.png new file mode 100644 index 0000000..76eeaf7 Binary files /dev/null and b/content/posts/unc-biol222/pumpkin.png differ diff --git a/content/posts/unc-biol222/rotations.png b/content/posts/unc-biol222/rotations.png new file mode 100644 index 0000000..dd9c045 Binary files /dev/null and b/content/posts/unc-biol222/rotations.png differ diff --git a/content/posts/visualising-usage-using-batteries/Liverpool.png b/content/posts/visualising-usage-using-batteries/Liverpool.png new file mode 100644 index 0000000..4114444 Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/Liverpool.png differ diff --git a/content/posts/visualising-usage-using-batteries/Liverpool_Usage_Chart.png b/content/posts/visualising-usage-using-batteries/Liverpool_Usage_Chart.png new file mode 100644 index 0000000..b7586e4 Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/Liverpool_Usage_Chart.png differ diff --git a/content/posts/visualising-usage-using-batteries/battery.png b/content/posts/visualising-usage-using-batteries/battery.png new file mode 100644 index 0000000..4a131c1 Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/battery.png differ diff --git a/content/posts/visualising-usage-using-batteries/data.PNG b/content/posts/visualising-usage-using-batteries/data.PNG new file mode 100644 index 0000000..89f472e Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/data.PNG differ diff --git a/content/posts/visualising-usage-using-batteries/head_data.PNG b/content/posts/visualising-usage-using-batteries/head_data.PNG new file mode 100644 index 0000000..3f6d6cb Binary files /dev/null and b/content/posts/visualising-usage-using-batteries/head_data.PNG differ diff --git a/content/posts/visualising-usage-using-batteries/index.md b/content/posts/visualising-usage-using-batteries/index.md new file mode 100644 index 0000000..b056466 --- /dev/null +++ b/content/posts/visualising-usage-using-batteries/index.md @@ -0,0 +1,220 @@ +--- +title: "Battery Charts - Visualise usage rates & more" +date: 2021-08-19T16:52:58+05:30 +draft: false +description: A tutorial on how to show usage rates and more using batteries +categories: ["tutorials"] +displayInList: true +author: Rithwik Rajendran +resources: +- name: featuredImage + src: "Liverpool_Usage_Chart.png" + params: + description: "my image description" + showOnTop: true + +--- + +# Introduction + +I have been creating common visualisations like scatter plots, bar charts, beeswarms etc. for a while and thought about doing something different. Since I'm an avid football fan, I thought of ideas to represent players' usage or involvement over a period (a season, a couple of seasons). I have seen some cool visualisations like donuts which depict usage and I wanted to make something different and simple to understand. I thought about representing batteries as a form of player usage and it made a lot of sense. + +For players who have been barely used (played fewer minutes) show a ***large amount of battery*** present since they have enough energy left in the tank. And for heavily used players, do the opposite i.e. show ***drained or less amount of battery*** + +So, what is the purpose of a battery chart? You can use it to show usage, consumption, involvement, fatigue etc. (anything usage related). + +The image below is a sample view of how a battery would look in our figure, although a single battery isn't exactly what we are going to recreate in this tutorial. + +![A sample visualisation](battery.png) + +# Tutorial + +Before jumping on to the tutorial, I would like to make it known that the function can be tweaked to fit accordingly depending on the number of subplots or any other size parameter. Coming to the figure we are going to plot, there are a series of steps that is to be considered which we will follow one by one. The following are those steps:- + +1. Outlining what we are going to plot +2. Import necessary libraries +3. Write a function to draw the battery + - This is the function that will be called to plot the battery chart +4. Read the data and plot the chart accordingly + - We will demonstrate it with an example + + +## Plot Outline + +What is our use case? + +- We are given a dataset where we have data of Liverpool's players and their minutes played in the last 2 seasons (for whichever club they for played in that time period). We will use this data for our visualisation. +- The final visualisation is the featured image of this blog post. We will navigate step-by-step as to how we'll create the visualisation. + +## Importing Libraries + +The first and foremost part is to import the essential libraries so that we can leverage the functions within. In this case, we will import the libraries we need. + +```python +import pandas as pd +import matplotlib.pyplot as plt +from matplotlib.path import Path +from matplotlib.patches import FancyBboxPatch, PathPatch, Wedge +``` + +The functions imported from `matplotlib.path` and `matplotlib.patches` will be used to draw lines, rectangles, boxes and so on to display the battery as it is. + +## Drawing the Battery - A function + +The next part is to define a function named `draw_battery()`, which will be used to draw the battery. Later on, we will call this function by specifying certain parameters to build the figure as we require. The following below is the code to build the battery - + +```python +def draw_battery(fig, ax, percentage=0, bat_ec="grey", + tip_fc="none", tip_ec="grey", + bol_fc="#fdfdfd", bol_ec="grey", invert_perc=False): + ''' + Parameters + ---------- + fig : figure + The figure object for the plot + ax : axes + The axes/axis variable of the figure. + percentage : int, optional + This is the battery percentage - size of the fill. The default is 0. + bat_ec : str, optional + The edge color of the battery/cell. The default is "grey". + tip_fc : str, optional + The fill/face color of the tip of battery. The default is "none". + tip_ec : str, optional + The edge color of the tip of battery. The default is "grey". + bol_fc : str, optional + The fill/face color of the lighning bolt. The default is "#fdfdfd". + bol_ec : str, optional + The edge color of the lighning bolt. The default is "grey". + invert_perc : bool, optional + A flag to invert the percentage shown inside the battery. The default is False + + Returns + ------- + None. + + ''' + try: + fig.set_size_inches((15,15)) + ax.set(xlim=(0, 20), ylim=(0, 5)) + ax.axis("off") + if invert_perc == True: + percentage = 100 - percentage + # color options - #fc3d2e red & #53d069 green & #f5c54e yellow + bat_fc = "#fc3d2e" if percentage <= 20 else "#53d069" if percentage >= 80 else "#f5c54e" + + ''' + Static battery and tip of battery + ''' + battery = FancyBboxPatch((5, 2.1), 10, 0.8, + "round, pad=0.2, rounding_size=0.5", + fc="none", ec=bat_ec, fill=True, + ls="-", lw=1.5) + tip = Wedge((15.35, 2.5), 0.2, 270, 90, fc="none", + ec=bat_ec, fill=True, + ls="-", lw=3) + ax.add_artist(battery) + ax.add_artist(tip) + + ''' + Filling the battery cell with the data + ''' + filler = FancyBboxPatch((5.1, 2.13), (percentage/10)-0.2, 0.74, + "round, pad=0.2, rounding_size=0.5", + fc=bat_fc, ec=bat_fc, fill=True, + ls="-", lw=0) + ax.add_artist(filler) + + ''' + Adding a lightning bolt in the centre of the cell + ''' + verts = [ + (10.5, 3.1), #top + (8.5, 2.4), #left + (9.5, 2.4), #left mid + (9, 1.9), #bottom + (11, 2.6), #right + (10, 2.6), #right mid + (10.5, 3.1), #top + ] + + codes = [ + Path.MOVETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.LINETO, + Path.CLOSEPOLY, + ] + path = Path(verts, codes) + bolt = PathPatch(path, fc=bol_fc, + ec=bol_ec, lw=1.5) + ax.add_artist(bolt) + except Exception as e: + import traceback + print("EXCEPTION FOUND!!! SAFELY EXITING!!! Find the details below:") + traceback.print_exc() + +``` + +## Reading the Data + +Once we have created the API or function, we can now implement the same. And for that, we need to feed in required data. In our example, we have a dataset that has the list of Liverpool players and the minutes they have played in the past two seasons. The data was collected from Football Reference aka FBRef. + +We use the read excel function in the pandas library to read our dataset that is stored as an excel file. + +```python +data = pd.read_excel("Liverpool Minutes Played.xlsx") +``` + +Now, let us have a look at how the data looks by listing out the first five rows of our dataset - + +```python +data.head() +``` +![The first 5 rows of our dataset](head_data.PNG) + +## Plotting our data + +Now that everything is ready, we go ahead and plot the data. We have 25 players in our dataset, so a 5 x 5 figure is the one to go for. We'll also add some headers and set the colors accordingly. + +```python +fig, ax = plt.subplots(5, 5, figsize=(5, 5)) +facecolor = "#00001a" +fig.set_facecolor(facecolor) +fig.text(0.35, 0.95, "Liverpool: Player Usage/Involvement", color="white", size=18, fontname="Libre Baskerville", fontweight="bold") +fig.text(0.25, 0.92, "Data from 19/20 and 20/21 | Battery percentage indicate usage | less battery = played more/ more involved", color="white", size=12, fontname="Libre Baskerville") +``` + +We have now now filled in appropriate headers, figure size etc. The next step is to plot all the axes i.e. batteries for each and every player. `p` is the variable used to iterate through the dataframe and fetch each players data. The `draw_battery()` function call will obviously plot the battery. We also add the required labels along with that - player name and usage rate/percentage in this case. + +```python +p = 0 #The variable that'll iterate through each row of the dataframe (for every player) +for i in range(0, 5): + for j in range(0, 5): + ax[i, j].text(10, 4, str(data.iloc[p, 0]), color="white", size=14, fontname="Lora", va='center', ha='center') + ax[i, j].set_facecolor(facecolor) + draw_battery(fig, ax[i, j], round(data.iloc[p, 8]), invert_perc=True) + ''' + Add the battery percentage as text if a label is required + ''' + ax[i, j].text(5, 0.9, "Usage - "+ str(int(100 - round(data.iloc[p, 8]))) + "%", fontsize=12, color="white") + p += 1 +``` + +Now that everything is almost done, we do some final touchup and this is a completely optional part anyway. Since the visualisation is focused on Liverpool players, I add Liverpool's logo and also add my watermark. Also, crediting the data source/provider is more of an ethical habit, so we go ahead and do that as well before displaying the plot. + +```python +liv = Image.open('Liverpool.png', 'r') +liv = liv.resize((80, 80)) +liv = np.array(liv).astype(np.float) / 255 +fig.figimage(liv, 30, 890) +fig.text(0.11, 0.08, "viz: Rithwik Rajendran/@rithwikrajendra", color="lightgrey", size=14, fontname="Lora") +fig.text(0.8, 0.08, "data: FBRef/Statsbomb", color="lightgrey", size=14, fontname="Lora") +plt.show() +``` + +So, we have the plot below. You can customise the design as you want in the `draw_battery()` function - change size, colours, shapes etc + +![Usage_Chart_Liverpool](Liverpool_Usage_Chart.png) diff --git a/make_logo.py b/make_logo.py index 1ba67a5..1f62aa7 100644 --- a/make_logo.py +++ b/make_logo.py @@ -1,9 +1,8 @@ import numpy as np -import matplotlib as mpl import matplotlib.pyplot as plt import matplotlib.cm as cm import matplotlib.font_manager -from matplotlib.patches import Circle, Rectangle, PathPatch +from matplotlib.patches import Rectangle, PathPatch from matplotlib.textpath import TextPath import matplotlib.transforms as mtrans @@ -131,6 +130,7 @@ def make_logo(height_px, lw_bars, lw_grid, lw_border, rgrid, with_text=False): return fig, ax + make_logo(height_px=110, lw_bars=0.7, lw_grid=0.5, lw_border=1, rgrid=[1, 3, 5, 7], with_text=True) plt.savefig("mpl_logo.png")