Problem Statement - 1 Movie Dataset Analysis
Problem Statement - 1 Movie Dataset Analysis
Problem Statement - 1 Movie Dataset Analysis
In this project, we aim to impart the ability to get rid of biases in a machine or an
AI system. Specifically, we will aim to go beyond information retrieval to do
reasoning over the multimodal dataset and develop algorithms to remove the
§ User should be able to ask questions (in text format), and the output
should be text and/or image.
§ User may also provide an image as an input, and the output should be
the plot/points relevant to that image.
Enable multi modal Question Answer system and help in capturing
information about the dataset.
Stage 1 - Extract the data from Wikipedia-Data folder and extract plot text
for each Bollywood movie. Using this data, one should be able to query the
dataset and ask natural language query and the output of the query should
be in natural language or an image. This image can be extracted from image
Data in the corresponding folder on github.
Stage 2 – Extract the data from image-data folder on github as an input and
the output should be text or natural language corresponding to the image.
This text can be taken from Wikipedia-data containing plot of each image.
2. Probable Use case to implement- Convert the movie plot into entity-
relationship graph where each path traversal provides a different story arc of
the movie
Convert the movie plot into entity-relationship graph where each path
traversal provides a different story arc of the movie
Stage 1 - Extract the data from wikipedia-data folder and extract plot text
for each Bollywood movie. Using this data, one should be able to summarize
the movie plot on 5 lines.
3. Probable Use case to implement- The data set has been used to show bias
present in Bollywood
Design and develop algorithm to remove gender bias in text.
Stage 1 – Extract Wikipedia plots data from Wikipedia-data folder and try to
construct a different and unbiased version of a story.
Stage 2 – Use attention model to pin point various parts in the story and then
debias those parts. Further show these nodes in an interactive visualization.
Develop interesting visualization to explore this dataset.
Stage 1 – To explore the whole dataset, we look for innovative ideas and
applications which allow a user to explore the whole dataset. This also
includes providing an interface to user to be able to navigate at relevant parts
of the dataset.
Stage 2 – The application should have the capability to flag the relevant
parts of the dataset and show those in the form of an interactive viz.
About Dataset
The dataset represents a large multimodal dataset derived out of multiple
sources. The data consists of:
Wikipedia Data - Contains text from plots of all movies from 1970 – 2017. The
plots are taken from Wikipedia.
Scripts Data – PDF scripts for 13 movies. The scripts contain complete
§ Solution should be AI driven.
§ Participants should demonstrate through system demo at least some
useful application.
§ Outcome should have document explaining thought process and design
approach to arrive at solution.
§ IBM Cloud
§ IBM Watson
§ App development framework for desktop (e.g. Python, Java) and mobile
(e.g. Android, iOS)
A: Python, Java
A: Android, iOS
A: Sign up on -
A: Yes, Each service comes with elaborate documentation with step by step
illustration to use the services available on IBM cloud, follow the VIEW DOCS,
link available on each service.
A: No