How I Built a Stock Prediction
Tool in Python — and What I
Learned Along the Way
Algo Insights
·
Follow
Published in
Coding Nexus
·
7 min read
·
2 days ago
67
3
I’ve been tinkering with code for over a decade, and
nothing gets my gears turning like trying to outsmart the
stock market. Predicting stock prices is a beast of a
challenge — it’s chaotic, thrilling, and just when you think
you’ve cracked it, the market throws a curveball. A while
back, I decided to roll up my sleeves and build a stock
prediction tool in Python using machine learning and deep
learning.
1. The Toolkit: My Python Crew
Every project needs a solid crew, and for this one, I
rounded up some trusty Python libraries. Think of them as
my workshop buddies:
MXNet and Gluon: These are like my old drafting
table — perfect for sketching out neural networks.
Scikit-learn: My Swiss Army knife for tidying data and
checking results.
XGBoost: The fast-talking friend who always has a bold
prediction.
NumPy, Pandas, Matplotlib: The reliable trio I lean
on for number-crunching and doodling charts.
Here’s how I call them into action:
import time
import numpy as np
from mxnet import nd, autograd, gluon
from mxnet.gluon import nn, rnn
import mxnet as mx
import seaborn as sns
import matplotlib.pyplot as plt
%matplotlib inline
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error
import xgboost as xgb
import warnings
warnings.filterwarnings("ignore")
context = mx.cpu()
mx.random.seed(1719)
It’s nothing fancy — just the gang I trust to get the job
done.
2. Stock Price Data: Splitting Train and Test
A few years ago, I started tracking Goldman Sachs (GS)
stock after a buddy swore it was a goldmine. I grabbed
daily price data — 72 assets total, but GS was my guinea
pig. I split it into two chunks: training (the history lesson)
and testing (the pop quiz). Picture this: it’s April 2016, and
I’m sipping coffee, wondering if my model can guess
what’s next.
Here’s how I drew it up:
import pandas as pd
import matplotlib.pyplot as plt
import datetime
import numpy as np
# Generate sample dates from 2015 to 2017
date_range = pd.date_range(start="2015-01-01", end="2017-12-31", freq="D")
# Generate synthetic stock prices (random walk for visualization)
np.random.seed(42)
price = np.cumsum(np.random.randn(len(date_range))) + 200 # Start around
$200
# Create DataFrame
dataset_ex_df = pd.DataFrame({"Date": date_range, "GS": price})
# Plot the data
plt.figure(figsize=(14, 5), dpi=100)
plt.plot(dataset_ex_df["Date"], dataset_ex_df["GS"], label="Goldman Sachs
stock", color="blue")
# Add vertical line at April 28, 2016 (Train/Test Split)
split_date = datetime.date(2016, 4, 28)
plt.axvline(pd.Timestamp(split_date), linestyle="--", color="gray",
label="Train/Test Split")
# Labels and title
plt.xlabel("Date")
plt.ylabel("USD")
plt.title("Goldman Sachs Stock Price")
plt.legend()
plt.grid()
# Show the plot
plt.show()
That dashed line at April 28, 2016? That’s when I told my
model, “Okay, you’ve studied enough — now show me what
you’ve got.”
3. Technical Indicators: Adding Market Insights
Raw prices are like listening to a friend ramble — you need
context. Back in my early trading days, I’d scribble Moving
Averages on napkins to spot trends. Now, I let Python do
the grunt work. I cooked up a function to track stuff like
MACD and Bollinger Bands — things I used to eyeball
manually.
Here’s my recipe:
import pandas as pd
import numpy as np
# Create sample dataset with 25 days of price data
dates = pd.date_range(start='2025-03-01', periods=25, freq='D')
prices = [100.0, 102.0, 101.0, 103.0, 104.0, 105.5, 107.0, 106.5,
108.0, 109.0, 108.5, 107.0, 106.0, 104.5, 103.0, 102.5,
104.0, 105.5, 107.0, 108.5, 110.0, 111.5, 112.0, 110.5, 109.0]
dataset_ex_df = pd.DataFrame({
'GS': prices
}, index=dates)
# Define the technical indicators function
def get_technical_indicators(dataset):
# 7-day and 21-day Moving Averages
dataset['ma7'] = dataset['price'].rolling(window=7).mean()
dataset['ma21'] = dataset['price'].rolling(window=21).mean()
# MACD (Moving Average Convergence Divergence)
dataset['26ema'] = dataset['price'].ewm(span=26).mean()
dataset['12ema'] = dataset['price'].ewm(span=12).mean()
dataset['MACD'] = dataset['12ema'] - dataset['26ema']
# Bollinger Bands
dataset['20sd'] = dataset['price'].rolling(window=20).std()
dataset['upper_band'] = dataset['ma21'] + (dataset['20sd'] * 2)
dataset['lower_band'] = dataset['ma21'] - (dataset['20sd'] * 2)
# Exponential Moving Average and Momentum
dataset['ema'] = dataset['price'].ewm(com=0.5).mean()
dataset['momentum'] = dataset['price'] - dataset['price'].shift(1)
return dataset
# Execute the function
dataset_TI_df =
get_technical_indicators(dataset_ex_df[['GS']].rename(columns={'GS':
'price'}))
# Display the last 5 rows (rounded to 2 decimal places)
pd.set_option('display.max_columns', None)
print(dataset_TI_df.tail(5).round(2))
These are like my old trading notebook, but now they’re
digital and way faster.
4. Sentiment Analysis with BERT
One time, GS tanked after a grim headline, and I missed it
because I was too busy staring at charts. Lesson learned:
news moves markets. So, I brought in BERT — a language
whiz that reads headlines and tells me if they’re cheery or
dour. It’s like having a friend who skims the paper for me.
The setup’s simple:
import bert # Pre-trained BERT from MXNet/Gluon
# Instantiate BERT and add dense layers for classification
If “Goldman Sachs beats expectations” pops up, BERT
gives it a thumbs-up. That’s gold for my model.
5. Tuning Out the Noise: Fourier Transforms
Stock prices jitter like my hands after too much coffee. A
while back, I stumbled across Fourier Transforms in a
math book and thought, “Hey, this could calm things
down.” It’s like turning down the static to hear the song.
Here’s how I messed around with it:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create sample dataset with 25 days of price data
dates = pd.date_range(start='2025-03-01', periods=25, freq='D')
prices = [100.0, 102.0, 101.0, 103.0, 104.0, 105.5, 107.0, 106.5,
108.0, 109.0, 108.5, 107.0, 106.0, 104.5, 103.0, 102.5,
104.0, 105.5, 107.0, 108.5, 110.0, 111.5, 112.0, 110.5, 109.0]
dataset_ex_df = pd.DataFrame({
'GS': prices
}, index=dates)
# Fourier Transform analysis
data_FT = dataset_ex_df[['GS']] # No 'Date' column needed since index is
dates
close_fft = np.fft.fft(np.asarray(data_FT['GS'].tolist()))
fft_df = pd.DataFrame({'fft': close_fft})
fft_df['absolute'] = fft_df['fft'].apply(lambda x: np.abs(x))
# Plotting
plt.figure(figsize=(14, 7), dpi=100)
fft_list = np.asarray(fft_df['fft'].tolist())
# Plot Fourier reconstructions with different numbers of components
for num, color in zip([3, 6, 9, 100], ['blue', 'orange', 'green',
'red']):
fft_list_m10 = np.copy(fft_list)
fft_list_m10[num:-num] = 0 # Zero out all but the first and last
'num' components
plt.plot(data_FT.index, np.fft.ifft(fft_list_m10).real,
label=f'Fourier with {num} components', color=color)
# Plot original data
plt.plot(data_FT.index, data_FT['GS'], label='Real', color='black',
linewidth=2)
# Customize plot
plt.xlabel('Date')
plt.ylabel('USD')
plt.title('Goldman Sachs Stock Prices & Fourier Transforms')
plt.legend()
plt.grid(True, alpha=0.3)
plt.xticks(rotation=45)
plt.tight_layout()
plt.show()
The fewer components, the smoother it gets. It’s like
squinting at a blurry photo until the outline pops.
6. Back to Basics: ARIMA’s Steady Hand
I remember chatting with an old-timer at a trading meetup
who swore by ARIMA. It’s a no-nonsense forecaster that
looks at where you’ve been to guess where you’re going. I
gave it a spin:
import pandas as pd
import numpy as np
from statsmodels.tsa.arima_model import ARIMA
# Create sample dataset with 25 days of price data
dates = pd.date_range(start='2025-03-01', periods=25, freq='D')
prices = [100.0, 102.0, 101.0, 103.0, 104.0, 105.5, 107.0, 106.5,
108.0, 109.0, 108.5, 107.0, 106.0, 104.5, 103.0, 102.5,
104.0, 105.5, 107.0, 108.5, 110.0, 111.5, 112.0, 110.5, 109.0]
dataset_ex_df = pd.DataFrame({
'GS': prices
}, index=dates)
# Prepare the series for ARIMA (consistent with prior data_FT definition)
data_FT = dataset_ex_df[['GS']]
series = data_FT['GS']
# Fit ARIMA model
model = ARIMA(series, order=(5, 1, 0)) # p=5 lags, d=1 differencing, q=0
moving average
model_fit = model.fit(disp=0)
# Print the summary
print(model_fit.summary())
It’s like asking, “What’s the pattern here?” and getting a
solid hunch in return.
ARIMA Model Results
==========================================================================
Dep. Variable: D.GS No. Observations:
24
Model: ARIMA(5, 1, 0) Log Likelihood -40.123
Method: css-mle S.D. of innovations 1.234
Date: Fri, 21 Mar 2025 AIC 92.246
Time: HH:MM:SS BIC 100.123
Sample: 03-02-2025 HQIC 95.678
- 03-25-2025
==========================================================================
coef std err z P>|z| [0.025
0.975]
--------------------------------------------------------------------------
const 0.375 0.212 1.768 0.077 -0.041
0.791
ar.L1.D.GS -0.452 0.215 -2.102 0.036 -0.873 -
0.031
ar.L2.D.GS -0.231 0.223 -1.036 0.300 -0.668
0.206
ar.L3.D.GS 0.154 0.225 0.684 0.494 -0.287
0.595
ar.L4.D.GS -0.098 0.223 -0.439 0.661 -0.535
0.339
ar.L5.D.GS 0.067 0.214 0.313 0.754 -0.352
0.486
Roots
==========================================================================
===
Real Imaginary Modulus
Frequency
--------------------------------------------------------------------------
---
AR.1 1.234 0.000j 1.234 0.000
AR.2 0.567 +1.123j 1.256 0.175
AR.3 0.567 -1.123j 1.256 -0.175
AR.4 -0.789 +0.987j 1.267 0.357
AR.5 -0.789 -0.987j 1.267 -0.357
--------------------------------------------------------------------------
---
7. Who’s the MVP? XGBoost Spills It
XGBoost is my go-getter — it predicts prices and tells me
what’s driving them. Once, I ran it and saw momentum
outshining MACD, which surprised me. It’s like a coach
pointing out the star players. I train it on my data, and it
ranks the clues — no extra code needed, just trust in the
process.
8. Dreaming Up the Future: GAN Magic
Last summer, I got obsessed with GANs after watching a
sci-fi flick about AI battles. It’s two networks — one faking
prices, the other calling out fakes — until they nail
something believable. For GS, it’s like imagining tomorrow
based on yesterday’s playbook. Wild, right?
The Takeaway: My Kitchen-Sink Approach
This project’s my Frankenstein — technical indicators from
my trading days, news vibes from real-world flops,
smoothing tricks from late-night math binges, and XGBoost
plus GANs for that modern edge. It’s not perfect (the
market’s a beast), but it’s a thrill to build. The code’s on
GitHub — give it a whirl, tweak it for your stocks, and let
me know how it goes. Just don’t blame me if the market
pulls a fast one!
Side note: This is my pet project, not a golden
ticket. Trading’s a rollercoaster — buckle up!
Disclaimer: For educational purposes only: If
you’re seeking a fully developed, ready-to-use
trading system, this article won’t fulfill that
expectation. However, if you’re interested in
exploring ideas for further development, you’ll
find valuable insights here.