ChatGPT has so much potential. And it can be so frustrating to use. After yesterday’s success, I found myself stumbling today. Conceptually, today’s prompt was more complex than yesterday’s, though not, I think, much more so. Here’s the prompt:
write a python script which calculates the return on a group of stocks. Use yfinance to retrieve the data from Yahoo Finance. The user should enter the stock tickers, separated by a comma, and the start and end date for the price history. The script should retrieve the closing price data from Yahoo Finance for both dates, then compute a percentage return. Then calculate the average return, the median return, the skew, and the kurtosis. Output the data to an Excel file called return_data.xlsx
The first response yielded the following script:
import yfinance as yf
import pandas as pd
from statistics import mean, median, skew, kurtosis
def calculate_return(tickers, start_date, end_date):
# Retrieve stock data from Yahoo Finance
stocks = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
# Calculate return for each stock
stocks['Return'] = (stocks['Close'] - stocks['Close'].shift(1))/stocks['Close'].shift(1)
stocks = stocks.dropna()
# Calculate average, median, skew, and kurtosis of returns
avg_return = mean(stocks['Return'])
median_return = median(stocks['Return'])
skew = skew(stocks['Return'])
kurtosis = kurtosis(stocks['Return'])
# Output data to Excel file
stocks.to_excel('return_data.xlsx')
return avg_return, median_return, skew, kurtosis
tickers = input("Enter stock tickers separated by a comma: ")
start_date = input("Enter start date (YYYY-MM-DD): ")
end_date = input("Enter end date (YYYY-MM-DD): ")
avg_return, median_return, skew, kurtosis = calculate_return(tickers, start_date, end_date)
print("Average return: ", avg_return)
print("Median return: ", median_return)
print("Skew: ", skew)
print("Kurtosis: ", kurtosis)
ChatGPT helpfully explained:
This script will prompt the user to enter the stock tickers, separated by a comma, and the start and end date for the price history. It will use the yfinance library to retrieve the closing price data for the specified stocks and date range, and then calculate the percentage return for each stock. It will then calculate the average return, median return, skew, and kurtosis of the returns. Finally, it will output the data to an Excel file called return_data.xlsx.
Please note that for this script to work you need to have yfinance, pandas, openpyxl and statistics package installed.
I tried running the script, after dutifully installing yfinance
, pandas
, openpyxl
and statistics
. Alas, this did not work. No matter, I thought: ChatGPT has shown me before that it can take an error and debug its own code. So I dutifully provided it with information about the error:
Error: Exception has occurred: ImportError
cannot import name 'skew' from 'statistics' (C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\statistics.py)
File "C:\Users\dave\Desktop\stock_yfinance_return.py", line 3, in <module>
from statistics import mean, median, skew, kurtosis
ImportError: cannot import name 'skew' from 'statistics' (C:\Program Files\WindowsApps\PythonSoftwareFoundation.Python.3.9_3.9.3568.0_x64__qbz5n2kfra8p0\lib\statistics.py)
It turns out that, notwithstanding the fact that skew and kurtosis are concepts in statistics, skew
and kurtosis
aren’t a thing in statistics
. But they are a thing in something called scipy.stats
, so: I dutifully installed scipy.stats
and had it re-write the script.
Here’s the latest script:
import yfinance as yf
import pandas as pd
from statistics import mean, median
from scipy.stats import skew, kurtosis
def calculate_return(tickers, start_date, end_date):
# Retrieve stock data from Yahoo Finance
stocks = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
# Calculate return for each stock
stocks['Return'] = (stocks['Close'] - stocks['Close'].shift(1))/stocks['Close'].shift(1)
stocks = stocks.dropna()
# Calculate average, median, skew, and kurtosis of returns
avg_return = mean(stocks['Return'])
median_return = median(stocks['Return'])
skew = skew(stocks['Return'])
kurtosis = kurtosis(stocks['Return'])
# Output data to Excel file
stocks.to_excel('return_data.xlsx')
return avg_return, median_return, skew, kurtosis
tickers = input("Enter stock tickers separated by a comma: ")
start_date = input("Enter start date (YYYY-MM-DD): ")
end_date = input("Enter end date (YYYY-MM-DD): ")
avg_return, median_return, skew, kurtosis = calculate_return(tickers, start_date, end_date)
print("Average return: ", avg_return)
print("Median return: ", median_return)
print("Skew: ", skew)
print("Kurtosis: ", kurtosis)
This threw a different error:
Error: Exception has occurred: KeyError
'Close'
KeyError: 'Close'
The above exception was the direct cause of the following exception:
File "C:\Users\dave\Desktop\import yfinance as yf.py", line 10, in calculate_return
stocks['Return'] = (stocks['Close'] - stocks['Close'].shift(1))/stocks['Close'].shift(1)
File "C:\Users\dave\Desktop\import yfinance as yf.py", line 24, in <module>
avg_return, median_return, skew, kurtosis = calculate_return(tickers, start_date, end_date)
KeyError: 'Close'
ChatGPT provided an error check in its updated script:
import yfinance as yf
import pandas as pd
from statistics import mean, median
from scipy.stats import skew, kurtosis
def calculate_return(tickers, start_date, end_date):
try:
# Retrieve stock data from Yahoo Finance
stocks = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
except ValueError as e:
print("Invalid tickers or problem with internet connection")
return
# Calculate return for each stock
stocks['Return'] = (stocks['Close'] - stocks['Close'].shift(1))/stocks['Close'].shift(1)
stocks = stocks.dropna()
# Calculate average, median, skew, and kurtosis of returns
avg_return = mean(stocks['Return'])
median_return = median(stocks['Return'])
skew = skew(stocks['Return'])
kurtosis = kurtosis(stocks['Return'])
# Output data to Excel file
stocks.to_excel('return_data.xlsx')
return avg_return, median_return, skew, kurtosis
tickers = input("Enter stock tickers separated by a comma: ")
start_date = input("Enter start date (YYYY-MM-DD): ")
end_date = input("Enter end date (YYYY-MM-DD): ")
avg_return, median_return, skew, kurtosis = calculate_return(tickers, start_date, end_date)
if avg_return:
print("Average return: ", avg_return)
print("Median return: ", median_return)
print("Skew: ", skew)
print("Kurtosis: ", kurtosis)
It helpfully explains:
In this script, I wrapped the call to yf.download() in a try-except block to catch any errors that might occur when retrieving the stock data. If an error is caught, the script will print an error message and return without proceeding with the calculations. This way, the script will not crash and will be able to handle any unexpected input.
And this time—no errors! So ChatGPT is able to debug its code to resolve errors. However. No spreadsheet was saved. ChatGPT now claims:
You are correct, I apologize for the oversight. The script as written will output the data to an Excel file called 'return_data.xlsx' using the to_excel() function from pandas. However, to do this you need to have the openpyxl package installed, which is a dependency for pandas to write to excel.
You can install openpyxl by running the command pip install openpyxl in your command prompt or terminal. Once openpyxl is installed, the to_excel() function will work as intended and will save the data to an Excel file.
It offered a “corrected” script:
import yfinance as yf
import pandas as pd
from statistics import mean, median
from scipy.stats import skew, kurtosis
def calculate_return(tickers, start_date, end_date):
try:
# Retrieve stock data from Yahoo Finance
stocks = yf.download(tickers, start=start_date, end=end_date, group_by='ticker')
except ValueError as e:
print("Invalid tickers or problem with internet connection")
return
# Calculate return for each stock
stocks['Return'] = (stocks['Close'] - stocks['Close'].shift(1))/stocks['Close'].shift(1)
stocks = stocks.dropna()
# Calculate average, median, skew, and kurtosis of returns
avg_return = mean(stocks['Return'])
median_return = median(stocks['Return'])
skew = skew(stocks['Return'])
kurtosis = kurtosis(stocks['Return'])
# Output data to Excel file
stocks.to_excel('return_data.xlsx')
return avg_return, median_return, skew, kurtosis
tickers = input("Enter stock tickers separated by a comma: ")
start_date = input("Enter start date (YYYY-MM-DD): ")
end_date = input("Enter end date (YYYY-MM-DD): ")
avg_return, median_return, skew, kurtosis = calculate_return(tickers, start_date, end_date)
if avg_return:
print("Average return: ", avg_return)
print("Median return: ", median_return)
print("Skew: ", skew)
print("Kurtosis: ", kurtosis)
I use scare quotes around “corrected” above because you’ll notice that, despite it claiming that the script required openpyxl
, its rewritten script does not use openpyxl
!
At this point, I gave up.
There’s an important lesson here, I think: as powerful as ChatGPT is, it’s not omnipotent. The technology is seductive precisely because when it works, it’s magical. When it doesn’t work, or more accurately, when it doesn’t work as we expect it to, it’s just another shitty piece of over-rated and over-hyped technology. (Cue Bryan Caplan’s pessimistic outlook.)
And yet—I suspect that someone with a slightly better understanding of Python than me would be able to look at the script and figure out quite quickly why it’s not working as one would expect it to. Unfortunately, I do not (yet) have that knowledge. Alas. So much promise.