##
## File: assignment11.py (STAT 3250)
## Topic: Assignment 11
##
## The file Stocks.zip is a zip file containing nearly 100 sets of price
## records for various stocks. A sample of the type of files contained
## in Stocks.zip is ABT.csv, which we have seen previously and is posted
## in recent course materials. Each file includes daily data for a specific
## stock, with stock ticker symbol given in the file name. Each line of
## a file includes the following:
##
## Date = date for recorded information
## Open = opening stock price
## High = high stock price
## Low = low stock price
## Close = closing stock price
## Volume = number of shares traded
## Adj Close = closing price adjusted for stock splits (ignored for this assignment)
## The time interval covered varies from stock to stock. For many files
## there are dates when the market was open but the data is not provided, so
## those records are missing. Note that some dates are not present because the
## market is closed on weekends and holidays. Those are not missing records.
## The Gradescope autograder will be evaluating your code on a subset
## of the set of files in the folder Stocks. Your code needs to automatically
## handle all assignments to the variables q1, q2, ... to accommodate the
## reduced set, so do not copy/paste things from the console window, and
## take care with hard-coding values.
## The autograder will contain a folder Stocks containing the stock data sets.
## This folder will be in the working directory so your code should be written
## assuming that is the case.
import pandas as pd # load pandas
import numpy as np # load numpy
pd.set_option('display.max_columns', 10) # Display 10 columns in console
## 1. Find the mean for the Open, High, Low, and Close entries for all
## records for all stocks. Give your results as a Series with index
## Open, High, Low, Close (in that order) and the co
esponding means
## as values.
q1 = None # Series of means of Open, High, Low, and Close
## 2. Find all stocks with an average Close price less than 30. Give you
## results as a Series with ticker symbol as index and average Close price.
## price as value. Sort the Series from lowest to highest average Close
## price. (Note: 'MSFT' is the ticker symbol for Microsoft. 'MSFT.csv',
## 'Stocks/MSFT.csv' and 'MSFT ' are not ticker symbols.)
q2 = None # Series of stocks with average close less than 30
## 3. Find the top-10 stocks in terms of the day-to-day volatility of the
## price, which we define to be the mean of the daily differences
## High - Low for each stock. Give your results as a Series with the
## ticker symbol as index and average day-to-day volatility as value.
## Sort the Series from highest to lowest average volatility.
q3 = None # Series of top-10 mean volatility
## 4. Repeat the previous problem, this time using the relative volatility,
## which we define to be the mean of
##
## XXXXXXXXXXHigh − Low)/(0.5(Open + Close))
##
## for each day. Provide your results as a Series with the same specifications
## as in the previous problem.
q4 = None # Series of top-10 mean relative volatility
## 5. For each day the market was open in October 2008, find the average
## daily Open, High, Low, Close, and Volume for all stocks that have
## records for October XXXXXXXXXXNote: The market is open on a given
## date if there is a record for that date in any of the files.)
## Give your results as a DataFrame with dates as index and columns of
## means Open, High, Low, Close, Volume (in that order). The dates should
## be sorted from oldest to most recent, with dates formatted (for example)
## XXXXXXXXXX, the same form as in the files.
q5 = None # DataFrame of means for each open day of Oct '08.
## 6. For 2011, find the date with the maximum average relative volatility
## for all stocks and the date with the minimum average relative
## volatility for all stocks. Give your results as a Series with
## the dates as index and co
esponding average relative volatility
## as values, with the maximum first and the minimum second.
q6 = None # Series of average relative volatilities
## 7. For XXXXXXXXXX, find the average relative volatility for all stocks on
## Monday, Tuesday, ..., Friday. Give your results as a Series with index
## 'Mon','Tue','Wed','Thu','Fri' (in that order) and co
esponding
## average relative volatility as values.
q7 = None # Series of average relative volatility by day of week
## 8. For each month of 2009, determine which stock had the maximum average
## relative volatility. Give your results as a Series with MultiIndex
## that includes the month (month number is fine) and co
esponding stock
## ticker symbol (in that order), and the average relative volatility
## as values. Sort the Series by month number 1, 2, ..., 12.
q8 = None # Series of maximum relative volatilities by month
## 9. The “Python Index” is designed to capture the collective movement of
## all of our stocks. For each date, this is defined as the average price
## for all stocks for which we have data on that day, weighted by the
## volume of shares traded for each stock. That is, for stock values
## S_1, S_2, ... with co
esponding volumes V_1, V_2, ..., the average
## weighted volume is
##
## XXXXXXXXXXS_1*V_1 + S_2*V_2 + ...)/(V_1 + V_2 + ...)
##
## Find the Open, High, Low, and Close for the Python Index for each date
## the market was open in January 2013.
## Give your results as a DataFrame with dates as index and columns of
## means Open, High, Low, Close (in that order). The dates should
## be sorted from oldest to most recent, with dates formatted (for example)
## XXXXXXXXXX, the same form as in the files.
q9 = None # DataFrame of Python Index values for each open day of Jan 2013.
## 10. For the years XXXXXXXXXXdetermine the top-8 month-year pairs in terms
## of average relative volatility of the Python Index. Give your results
## as a Series with MultiIndex that includes the month (month number is
## fine) and year (in that order), and the average relative volatility
## as values. Sort the Series by average relative volatility from
## largest to smallest.
q10 = None # Series of month-year pairs and average rel. volatilities
## 11. Each stock in the data set contains records starting at some date and
## ending at another date. In between the start and end dates there may be
## dates when the market was open but there is no record -- these are the
## missing records for the stock. For each stock, determine the percentage
## of records that are missing out of the total records that would be
## present if no records were missing. Give a Series of those stocks
## with less than 1.3% of records missing, with the stock ticker as index
## and the co
esponding percentage as values, sorted from lowest to
## highest percentage.
q11 = None # Series of stocks and percent missing