Exploring Microsoft PowerPoint AI, Using Python
Exploring Microsoft PowerPoint AI, Using Python
You have 2 free member-only stories left this month. Sign up for Medium and get an extra one
Python
Get started Open in app
Here’s how to replicate the PowerPoint AI using Machine Learning and Python
A couple of days ago I was working on a PowerPoint presentation for my PhD research
and this happened:
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 2/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
It was not the exact same image, it was actually way more explicit, with the x label,y
Get started Open in app
label, title and all of that, but it is not really important right now.
The very interesting thing is the Alt Text. The AI system of PowerPoint is not only able to
detect that we actually have a 2d plot (or Chart) but it recognizes that we are talking
about a boxplot!
Of course, I don’t exactly know how they do this, but as I work with Machine Learning
and Data Science all the days of my life I can try to take a guess. As the readers may
know, the technology that it is very widely used to classify images is known as
Convolutional Neural Networks (CNNs).
They may have used CNNs as a multi-class classifier. Here is an example of a Butterly
image classifier (more than 70 species/classes). A way more complicated thing that they
may have done is image captioning. Nonetheless, CNNs are surely used in their deep
learning algorithm, at the very minimum as basic bricks of something that is much
larger and complex.
In this very small example I will show how it is possible to build a Machine Learning
model that helps you distinguish boxplots and other kinds of plots, for example
lineplots.
Let’s do this.
0. The Libraries
These are the libraries that I used for this notebook:
import keras
import numpy as np
import warnings
warnings.filterwarnings("ignore")
import pandas as pd
import tensorflow as tf
import numpy as np
import random
f th i t i fil j i
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 3/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
from os.path import isfile, join
import math
plt.style.use('ggplot')
plt.rcParams['font.family'] = 'sans-serif'
plt.rcParams['font.serif'] = 'Ubuntu'
plt.rcParams['font.size'] = 14
plt.rcParams['axes.labelsize'] = 12
plt.rcParams['axes.labelweight'] = 'bold'
plt.rcParams['axes.titlesize'] = 12
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
plt.rcParams['legend.fontsize'] = 12
plt.rcParams['figure.titlesize'] = 12
plt.rcParams['image.cmap'] = 'jet'
plt.rcParams['image.interpolation'] = 'none'
1. Data Generation
The very fun part of this notebook is actually the data generation. I tried to build the
lineplots and boxplots in the most general way as possible, making up the x and y
labels, creating different lines and boxplots, again, in the most general way as
possible.
With this setup that you can virtually create an infinite numbers and kinds of plots. I
created two classes of data and performed a binary classification, but you can slightly
modify the code and create multiple classes.
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 4/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
def plot_line():
plt.figure(figsize=(10,10))
n_max_line = 10
n_line = np.random.choice(np.arange(1,n_max_line)
x_max,x_min = 10,-10
x_lims = [np.random.choice(np.arange(x_min,0,0.1)
x = np.linspace(min(x_lims),max(x_lims),100)
k_min,k_max = -5,5
for n in range(n_line):
pick_degree = np.random.choice(np.arange(1,5,
y = 0
k_random = np.random.choice(np.linspace(k
y=y+k_random*x**degree
plt.plot(x,y)
plt.xlabel(r.get_random_word(),fontsize=35)
plt.ylabel(r.get_random_word(),fontsize=35)
#plt.savefig(savename)
Hosted on Jovian View File
The y axis show polynomials with random numbers of degrees and random values
of coefficients
Here is an example:
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 5/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
plot_line()
Get started Open in app
def plot_box():
n_max_box = 5
n_box = np.random.choice(np.arange(1,n_max_box))
x_max,x_min = 10,-10
column_names = []
sigma_s = np.arange(1,10,1)
column_values = []
plt.figure(figsize=(10,10))
for n in range(n_box):
x_lims = [np.random.choice(np.arange(x_min,0,
x = np.linspace(min(x_lims),max(x_lims),100)
column_name = r.get_random_word()
column_names.append(column_name)
pick_sigma = x.mean()/np.random.choice(sigma_
pick_sigma = np.abs(pick_sigma)
column_values.append(np.random.normal(x.mean(
column_values = np.array(column_values).T
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 6/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
data = pd.DataFrame(column_values,columns=column_
sns boxplot(data=data)
Get started Open in app
Hosted on Jovian View File
The y axis show samples from a gaussian distributions with random values of
standard deviation
def plot_box():
n_max_box = 5
n_box = np.random.choice(np.arange(1,n_max_box))
x_max,x_min = 10,-10
column_names = []
sigma_s = np.arange(1,10,1)
column_values = []
plt.figure(figsize=(10,10))
for n in range(n_box):
x_lims = [np.random.choice(np.arange(x_min,0,
x = np.linspace(min(x_lims),max(x_lims),100)
column_name = r.get_random_word()
column_names.append(column_name)
pick_sigma = x.mean()/np.random.choice(sigma_
pick_sigma = np.abs(pick_sigma)
column_values.append(np.random.normal(x.mean(
column_values = np.array(column_values).T
data = pd.DataFrame(column_values,columns=column_
sns boxplot(data=data)
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 7/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
plot box()
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 8/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
def plot_line(savename):
plt.figure(figsize=(10,10))
n_max_line = 10
n_line = np.random.choice(np.arange(1,n_max_line)
x_max,x_min = 10,-10
x_lims = [np.random.choice(np.arange(x_min,0,0.1)
x = np.linspace(min(x_lims),max(x_lims),100)
k_min,k_max = -5,5
for n in range(n_line):
pick_degree = np.random.choice(np.arange(1,5,
y = 0
k_random = np.random.choice(np.linspace(k
y=y+k_random*x**degree
plt.plot(x,y)
plt.xlabel(r.get_random_word(),fontsize=35)
plt.ylabel(r.get_random_word(),fontsize=35)
plt.savefig(savename)
#plt.show()
plt.close()
def plot_box(savename):
n_max_box = 5
n_box = np.random.choice(np.arange(1,n_max_box))
x_max,x_min = 10,-10
column_names = []
sigma_s = np.arange(1,10,1)
l l []
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 9/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
def build_training_set(num):
mypath = 'TrainingSet/'
for n in range(1,num+1):
plot_box(mypath+'boxplot_'+str(n)+'.png')
def extract_training_set():
mypath = 'TrainingSet/'
training_set_labels = []
split_file = file.split('.')
if split_file[-1]=='png':
training_set_labels.append(split_file[0].
After you define this function, you will have your dataset by doing this:
build_training_set(50)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 10/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Get started
2 instance Open
has in app
been started
plt.figure(figsize=(32,32))
for i in range(9):
plt.subplot(3,3,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
x=np.random.randint(len(X_train))
plt.imshow(X_train[x], cmap=plt.cm.binary)
plt.xlabel(labels_train[x], fontsize=60)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 11/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
p ( _ [ ], )
plt.show()
Get started Open in app
The exact same process has to be done for the test set and the strings has to be converted
to something more readable to a ML model (sklearn will do this for you with the so
called LabelEncoder feature):
def build_test_set(num):
mypath = 'TestSet/'
for n in range(1,num+1):
plot_box(mypath+'boxplot_'+str(n)+'.png')
plot_line(mypath+'lineplot_'+str(n)+'.png')
build_test_set(10)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 12/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
1 instance
Get started has been
Open started
in app
Boxplot 1 has been stored!
5 i t h b t t d
Hosted on Jovian View File
The model that I used was the same of this article I published and you can find more
details about how the structure actually works.
size = X_train[0].shape[0]
classifier = Sequential()
# Step 1 - Convolution
classifier.add(Flatten())
classifier.summary()
Model: "sequential_4"
______________________________________________________
Layer (type) Output Shape
======================================================
conv2d 7 (Conv2D) (None 718 718 3)
Hosted on Jovian View File
train_images=np.array(train_images)
test_images=np.array(test_images)
Test_images=[]
for i in range(len(train_images)):
a=train_images[i].reshape(size,size,3)
Train_images.append(a)
Train_images=np.array(Train_images)
for j in range(len(test_images)):
b=test_images[j].reshape(size,size,3)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 14/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
Test_images.append(b)
Test_images=np.array(Test_images)
Epoch 1/10
100/100 [==============================] - 12s
122ms/step - loss: 15.4656 - accuracy: 0.5200 -
val_loss: 13.8420 - val_accuracy: 0.5000
Epoch 2/10
Hosted on Jovian View File
And as we can see, the final result is perfect. Even if it may sounds exciting, I have to
say that the experiment is pretty easy (we are all able to distinguish a plot with box
and a plot with lines) and the model is more than sufficiently powerful (a little bit of
overkill here).
3. Final Results
As a final prove that the model is correctly distinguish boxplots and lineplots, here are
some examples:
y_pred = classifier.predict(X_test).astype(int)
y_pred_string = le.inverse_transform(y_pred)
y_pred_string
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 15/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
plt.figure(figsize=(32,32))
for i in range(9):
plt.subplot(3,3,i+1)
plt.xticks([])
plt.yticks([])
plt.grid(False)
x=np.random.randint(len(X_test))
plt.imshow(X_test[x], cmap=plt.cm.binary)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 16/18