0% found this document useful (0 votes)
16 views13 pages

Projet Python

Uploaded by

islemfatmagamha1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views13 pages

Projet Python

Uploaded by

islemfatmagamha1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 13

Startup Investment Predictor :

Leveraging Machine Learning to Identify the Best Startups


to Invest In

Presented by
Ramy Lazghab & Islam Fatma Gamha
Introduction

Why This
Project?
Investors often face challenges in identifying promising
startups due to a lack of structured data and insights.

To develop a machine learning model that predicts the


market worth of startups based on critical factors like the
company age, region, deals flow, markets and products, and
the investments per stage.
Objectives 1. Automate the evaluation of startups using data-

Project
driven techniques.
2. Provide actionable insights on the top startups to

Objectives
invest in.
3. Compare multiple machine learning models for
performance.
4. Offer transparency and reproducibility through clear
data handling and predictions.
Methodology

Process :
We followed a methodology inspired by the CRISP-DM
framework, beginning with an in-depth analysis of the
sector’s specificities to guide our data scraping strategy.

We organized our work into phases by adhering to the


iterative CRISP-DM process. Our initial project assessment
highlighted areas that needed more attention and revealed
misconceptions, enabling us to adjust our approach and
refine our focus for better alignment with the project goals.
Scrapping - Dataset

Dataset
Dataset includes startup attributes like
company name, stage, dealflow, region,
creation date, market value, etc.

Overview
We Scraped the data from AngelList (name,
stage and deals flow).
We augmented our data through the
integration of Gemini.
We got over than 160 Startups data.
Data Preprocessing

Cleaning and
Preparing the
Data

Explain key preprocessing steps:

1. Converting creation date to startup age.


2. Normalization of the numeric data.
3. Handling missing data and one-hot encoding
categorical variables plus TF-IDF for the
textual data.
Data Visualization

Data Analysis
and visualization
Analyzing the key relationships between different
features and gaining an overview of the dataset's
characteristics.
List the models used and why:

1. Linear Regression: For its simplicity and


interpretability.
2. Lasso Regression: To handle feature selection and
regularization.
3. Support Vector Regression (SVR): This captures
complex patterns.
4. Random Forest Regression: For its ability to
handle non-linearity and feature importance
estimation

Machine
Learning Models
Model
Performance
Metrics used: Mean Squared Error (MSE) and R² Score.
Present key results:
Linear Regression achieved the highest R² score of 0.69 and
the lowest MSE.
Compare its performance against other models:

Linear Regression: R² = 0.69


Lasso Regression: R² = 0.10
Support Vector Regression: R² = 0.67
Random Forest Regression: R² = 0.60
What We
Learned
The most influential factors are "Stage, region, and startup
age significantly impact market worth."
Highlight unexpected findings: "Dealflow showed less
influence than initially anticipated."
Benefits of combining models.
Demo
Conclusion
This project provides a reliable, Machine learning can enhance
data-driven solution for startup investor decision-making and
investment decisions. reduce risks.
Thank You

You might also like