0% found this document useful (0 votes)
85 views

Exploring Microsoft PowerPoint AI, Using Python

The document describes how to use machine learning and Python to replicate the image recognition capabilities of Microsoft PowerPoint. It first discusses how PowerPoint can recognize that an image contains a box plot and provide an alt text description. It then shows how to build a machine learning model using convolutional neural networks (CNNs) to classify images as either box plots or line plots. Code is provided to generate random box plot and line plot images for use as training data for the CNN model. The goal is to demonstrate how basic image recognition can be achieved in a similar way to what PowerPoint utilizes.

Uploaded by

ferdad4real
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Exploring Microsoft PowerPoint AI, Using Python

The document describes how to use machine learning and Python to replicate the image recognition capabilities of Microsoft PowerPoint. It first discusses how PowerPoint can recognize that an image contains a box plot and provide an alt text description. It then shows how to build a machine learning model using convolutional neural networks (CNNs) to classify images as either box plots or line plots. Code is provided to generate random box plot and line plot images for use as training data for the CNN model. The goal is to demonstrate how basic image recognition can be achieved in a similar way to what PowerPoint utilizes.

Uploaded by

ferdad4real
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Get started Open in app

Follow 616K Followers

You have 2 free member-only stories left this month. Sign up for Medium and get an extra one

Photo by Tadas Sar on Unsplash

Exploring Microsoft PowerPoint AI, using


https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 1/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Python
Get started Open in app
Here’s how to replicate the PowerPoint AI using Machine Learning and Python

Piero Paialunga 23 hours ago · 5 min read

A couple of days ago I was working on a PowerPoint presentation for my PhD research
and this happened:

Screenshot made by me. SEE THE ALT TEXT!

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 2/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

It was not the exact same image, it was actually way more explicit, with the x label,y
Get started Open in app
label, title and all of that, but it is not really important right now.

The very interesting thing is the Alt Text. The AI system of PowerPoint is not only able to
detect that we actually have a 2d plot (or Chart) but it recognizes that we are talking
about a boxplot!

Of course, I don’t exactly know how they do this, but as I work with Machine Learning
and Data Science all the days of my life I can try to take a guess. As the readers may
know, the technology that it is very widely used to classify images is known as
Convolutional Neural Networks (CNNs).

They may have used CNNs as a multi-class classifier. Here is an example of a Butterly
image classifier (more than 70 species/classes). A way more complicated thing that they
may have done is image captioning. Nonetheless, CNNs are surely used in their deep
learning algorithm, at the very minimum as basic bricks of something that is much
larger and complex.

In this very small example I will show how it is possible to build a Machine Learning
model that helps you distinguish boxplots and other kinds of plots, for example
lineplots.

Let’s do this.

0. The Libraries
These are the libraries that I used for this notebook:

import keras

import matplotlib.pyplot as plt

import numpy as np

import warnings

warnings.filterwarnings("ignore")

from matplotlib import image

import pandas as pd
import tensorflow as tf

from tensorflow.keras import datasets, layers, models

import numpy as np

import random

from os import listdir

f th i t i fil j i
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 3/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
from os.path import isfile, join

import math

Get started Open in app


import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.preprocessing import LabelEncoder

from keras.models import Sequential

from keras.layers import Conv2D

from keras.layers import MaxPooling2D

from keras.layers import Flatten

from keras.layers import Dense

from keras.layers import Dropout

plt.style.use('ggplot')

plt.rcParams['font.family'] = 'sans-serif'

plt.rcParams['font.serif'] = 'Ubuntu'

plt.rcParams['font.monospace'] = 'Ubuntu Mono'

plt.rcParams['font.size'] = 14

plt.rcParams['axes.labelsize'] = 12

plt.rcParams['axes.labelweight'] = 'bold'

plt.rcParams['axes.titlesize'] = 12

plt.rcParams['xtick.labelsize'] = 12

plt.rcParams['ytick.labelsize'] = 12

plt.rcParams['legend.fontsize'] = 12

plt.rcParams['figure.titlesize'] = 12

plt.rcParams['image.cmap'] = 'jet'
plt.rcParams['image.interpolation'] = 'none'

plt rcParams['figure figsize'] = (12 10)


Hosted on Jovian View File

In a few words, I used keras, matplotlib, and a curious library known as


RandomWords that generate random english words. I used it to make up the x and y
axes.

1. Data Generation
The very fun part of this notebook is actually the data generation. I tried to build the
lineplots and boxplots in the most general way as possible, making up the x and y
labels, creating different lines and boxplots, again, in the most general way as
possible.

With this setup that you can virtually create an infinite numbers and kinds of plots. I
created two classes of data and performed a binary classification, but you can slightly
modify the code and create multiple classes.

Let’s dive in:

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 4/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

1.1 Line Plots


Get started Open in app
The code that I used to create the line plot is the following:

def plot_line():

plt.figure(figsize=(10,10))

n_max_line = 10

n_line = np.random.choice(np.arange(1,n_max_line)
x_max,x_min = 10,-10

x_lims = [np.random.choice(np.arange(x_min,0,0.1)
x = np.linspace(min(x_lims),max(x_lims),100)

k_min,k_max = -5,5

for n in range(n_line):

pick_degree = np.random.choice(np.arange(1,5,
y = 0

for degree in range(pick_degree):

k_random = np.random.choice(np.linspace(k
y=y+k_random*x**degree

plt.plot(x,y)

plt.xlabel(r.get_random_word(),fontsize=35)

plt.ylabel(r.get_random_word(),fontsize=35)

#plt.savefig(savename)
Hosted on Jovian View File

It has different degrees of randomness:

The x axis label and y axis label have random names

The x axis limits are random

The y axis show polynomials with random numbers of degrees and random values
of coefficients

The number of lines is random as well

Here is an example:

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 5/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

plot_line()
Get started Open in app

Hosted on Jovian View File

1.2 Box Plots


The code that I used to create the box plot is the following:

def plot_box():
n_max_box = 5

n_box = np.random.choice(np.arange(1,n_max_box))

x_max,x_min = 10,-10

column_names = []

sigma_s = np.arange(1,10,1)

column_values = []

plt.figure(figsize=(10,10))

for n in range(n_box):

x_lims = [np.random.choice(np.arange(x_min,0,
x = np.linspace(min(x_lims),max(x_lims),100)

column_name = r.get_random_word()

column_names.append(column_name)

pick_sigma = x.mean()/np.random.choice(sigma_
pick_sigma = np.abs(pick_sigma)

column_values.append(np.random.normal(x.mean(
column_values = np.array(column_values).T

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 6/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

data = pd.DataFrame(column_values,columns=column_
sns boxplot(data=data)
Get started Open in app
Hosted on Jovian View File

Different degrees of random here as well:

The x axis label and y axis label have random names

The x axis limits are random

The x axis quantities names are random

The y axis show samples from a gaussian distributions with random values of
standard deviation

The number of boxplots is random as well

def plot_box():
n_max_box = 5

n_box = np.random.choice(np.arange(1,n_max_box))

x_max,x_min = 10,-10

column_names = []

sigma_s = np.arange(1,10,1)

column_values = []

plt.figure(figsize=(10,10))

for n in range(n_box):

x_lims = [np.random.choice(np.arange(x_min,0,
x = np.linspace(min(x_lims),max(x_lims),100)

column_name = r.get_random_word()

column_names.append(column_name)

pick_sigma = x.mean()/np.random.choice(sigma_
pick_sigma = np.abs(pick_sigma)

column_values.append(np.random.normal(x.mean(
column_values = np.array(column_values).T

data = pd.DataFrame(column_values,columns=column_
sns boxplot(data=data)
Hosted on Jovian View File
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 7/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Get started Open in app

plot box()
Hosted on Jovian View File

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 8/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

1.2 Training Set and Test Set


Get started Open in app
Actually, the codes that I used to build the training set and test set are slightly
differences from the one above, that I was using to show you the results. Here is what
you will need:

Here you create the plots:

def plot_line(savename):

plt.figure(figsize=(10,10))

n_max_line = 10

n_line = np.random.choice(np.arange(1,n_max_line)
x_max,x_min = 10,-10

x_lims = [np.random.choice(np.arange(x_min,0,0.1)
x = np.linspace(min(x_lims),max(x_lims),100)

k_min,k_max = -5,5

for n in range(n_line):

pick_degree = np.random.choice(np.arange(1,5,
y = 0

for degree in range(pick_degree):

k_random = np.random.choice(np.linspace(k
y=y+k_random*x**degree

plt.plot(x,y)

plt.xlabel(r.get_random_word(),fontsize=35)

plt.ylabel(r.get_random_word(),fontsize=35)

plt.savefig(savename)

#plt.show()
plt.close()

def plot_box(savename):

n_max_box = 5

n_box = np.random.choice(np.arange(1,n_max_box))

x_max,x_min = 10,-10

column_names = []

sigma_s = np.arange(1,10,1)

l l []
Hosted on Jovian View File

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 9/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Get started Open in app


Here you create k of them and store them. CREATE A TRAINING SET AND TEST
SET FOLDER FIRST OR IT WON’T WORK!

def build_training_set(num):

mypath = 'TrainingSet/'

for n in range(1,num+1):

print('%i instance has been started'%(n))

plot_box(mypath+'boxplot_'+str(n)+'.png')

print('Boxplot %i has been stored!'%(n))

plot line(mypath+'lineplot '+str(n)+' png')


Hosted on Jovian View File

Here you read them and label them

def extract_training_set():

mypath = 'TrainingSet/'

onlyfiles = [f for f in listdir(mypath) if isfile


training_set_arrays = []

training_set_labels = []

for file in onlyfiles:

split_file = file.split('.')

if split_file[-1]=='png':

training_set_labels.append(split_file[0].

Hosted on Jovian View File

After you define this function, you will have your dataset by doing this:

build_training_set(50)

1 instance has been started

Boxplot 1 has been stored!

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 10/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Lineplot 1 has been stored!

Get started
2 instance Open
has in app
been started

Boxplot 2 has been stored!

Lineplot 2 has been stored!

3 instance has been started

Boxplot 3 has been stored!

Lineplot 3 has been stored!

4 instance has been started

Boxplot 4 has been stored!

Lineplot 4 has been stored!

5 instance has been started

Boxplot 5 has been stored!

Lineplot 5 has been stored!

6 instance has been started

Boxplot 6 has been stored!

Lineplot 6 has been stored!

7 instance has been started

Boxplot 7 has been stored!

Lineplot 7 has been stored!


Hosted on Jovian View File

Here are some examples of the training set:

plt.figure(figsize=(32,32))

for i in range(9):

plt.subplot(3,3,i+1)

plt.xticks([])

plt.yticks([])

plt.grid(False)

x=np.random.randint(len(X_train))

plt.imshow(X_train[x], cmap=plt.cm.binary)

plt.xlabel(labels_train[x], fontsize=60)

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 11/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
p ( _ [ ], )
plt.show()
Get started Open in app

Hosted on Jovian View File

The exact same process has to be done for the test set and the strings has to be converted
to something more readable to a ML model (sklearn will do this for you with the so
called LabelEncoder feature):

def build_test_set(num):

mypath = 'TestSet/'

for n in range(1,num+1):

print('%i instance has been started'%(n))

plot_box(mypath+'boxplot_'+str(n)+'.png')

print('Boxplot %i has been stored!'%(n))

plot_line(mypath+'lineplot_'+str(n)+'.png')

print('Lineplot %i has been stored!'%(n))

build_test_set(10)
https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 12/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

1 instance
Get started has been
Open started

in app
Boxplot 1 has been stored!

Lineplot 1 has been stored!

2 instance has been started

Boxplot 2 has been stored!

Lineplot 2 has been stored!

3 instance has been started

Boxplot 3 has been stored!

Lineplot 3 has been stored!

4 instance has been started

Boxplot 4 has been stored!

Lineplot 4 has been stored!

5 i t h b t t d
Hosted on Jovian View File

2. Machine Learning Model


The Machine Learning model that we are going to use is basically the application of
different Convolutional layers and some Max Pooling operations, it will then end up
with a softmax that will tell you the probability of the image to belong to the first class.

The model that I used was the same of this article I published and you can find more
details about how the structure actually works.

size = X_train[0].shape[0]

classifier = Sequential()

# Step 1 - Convolution

classifier.add(Conv2D(3, (3, 3), input_shape = (size,

classifier.add(MaxPooling2D(pool_size = (2, 2)))

l ifi dd(C 2D(3 (3 3) i t h


https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4
( i 13/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science
classifier.add(Conv2D(3, (3, 3), input_shape = (size,
Get started Open in app
classifier.add(MaxPooling2D(pool_size = (2, 2)))

classifier.add(Flatten())

#classifier.add(Dense(units = 32, activation = 'relu'


classifier.add(Dense(units = 1, activation = 'sigmoid

# Compiling the CNN

classifier.summary()

ERROR! Session/line number was not unique in database.


logging moved to new session 87

Model: "sequential_4"

______________________________________________________
Layer (type) Output Shape
======================================================
conv2d 7 (Conv2D) (None 718 718 3)
Hosted on Jovian View File

Here is how you train and test your model:

train_images, test_images = X_train,X_test

train_images=np.array(train_images)

test_images=np.array(test_images)

train_images, test_images = train_images / 255.0, tes


Train_images=[]

Test_images=[]

for i in range(len(train_images)):

a=train_images[i].reshape(size,size,3)

Train_images.append(a)

Train_images=np.array(Train_images)

for j in range(len(test_images)):

b=test_images[j].reshape(size,size,3)

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 14/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Test_images.append(b)

Test_images=np.array(Test_images)

Get started Open in app


train_images,test_images=Train_images, Test_images

classifier.compile(optimizer = 'adam', loss = 'binary


history = classifier.fit(train_images, train_labels,
validation_data=(test_images, test_la

Train on 100 samples, validate on 20 samples

Epoch 1/10
100/100 [==============================] - 12s
122ms/step - loss: 15.4656 - accuracy: 0.5200 -
val_loss: 13.8420 - val_accuracy: 0.5000

Epoch 2/10
Hosted on Jovian View File

And as we can see, the final result is perfect. Even if it may sounds exciting, I have to
say that the experiment is pretty easy (we are all able to distinguish a plot with box
and a plot with lines) and the model is more than sufficiently powerful (a little bit of
overkill here).

3. Final Results
As a final prove that the model is correctly distinguish boxplots and lineplots, here are
some examples:

y_pred = classifier.predict(X_test).astype(int)

y_pred_string = le.inverse_transform(y_pred)

y_pred_string

array(['lineplot', 'boxplot', 'boxplot', 'lineplot',


'lineplot'
Hosted on Jovian View File

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 15/18
1/25/22, 2:16 PM Exploring Microsoft PowerPoint AI, using Python | by Piero Paialunga | Jan, 2022 | Towards Data Science

Get started Open in app

And here are the plots:

plt.figure(figsize=(32,32))

for i in range(9):

plt.subplot(3,3,i+1)

plt.xticks([])

plt.yticks([])

plt.grid(False)

x=np.random.randint(len(X_test))

plt.imshow(X_test[x], cmap=plt.cm.binary)

plt.title('Predicted label: "%s", Real label: "%s


plt.show()

Hosted on Jovian View File

https://towardsdatascience.com/exploring-powerpoint-ai-using-python-75f94d55f8f4 16/18

You might also like