October 8
For solutions, purchase a LIVE CHAT plan or contact us
ECON3034 FINANCIAL ECONOMETRICS S2 2022
11.55pm on Wednesday 12 October 2022.
Part 1
Part 1 - Total number of marks: 32
The ‘Assignment_Question 1 EViews Workfile’ located under the ‘Assignment’ heading on
iLearn contains seven monthly return series for the period January 1980 – February 2020
(482 observations).
The following monthly return series appear in the file:
A. The excess return on a portfolio of stocks of U.S. companies in two different industries.
The first industry is food. Here the companies produce food products e.g., agricultural
crops and livestock, wholesale grocery and drink products, sugar and flour products,
fish, dairy and meat products etc. The second industry is transportation. Here the
companies supply transportation services e.g., airlines(passenger and freight), buses
(Greyhound), trucking and freight companies etc. The data on the portfolio industry
excess returns are from the website of Kenneth French under 17 Industry Portfolios.
• food rf _ (Average monthly excess return on a portfolio of food industry stocks.
The food companies are listed on U.S. exchanges and the return is in excess of
the U.S. risk free rate).
• transp rf _ (Average monthly excess return on a portfolio of transportation
industry stocks. The transportation companies are listed on U.S. exchanges and
the return is in excess of the U.S. risk free rate).
B. Returns on five pricing factors from Fama and French, also on the website of Kenneth
French.
• mkt rf _ (Excess return on a weighted portfolio of all stocks in the U.S. market. It
is the U.S. Market Risk Premium).
• hml
(High minus Low. Average monthly return on a portfolio of High Book-to-
Market (Value) stocks less the average monthly return on a portfolio of Low
Book-to-Market (Growth) stocks).
• smb
(Small minus Big. Average monthly return on a portfolio of small
capitalization stocks less the average monthly return on a portfolio of large (big)
capitalization stocks).
• rmw
(Robust minus Weak. Average monthly return on a portfolio of stocks for
companies with robust operating profitability less the average monthly return on
a portfolio of stocks for companies with weak operating profitability).
• cma
(Conservative minus Aggressive. Average monthly return on a portfolio of
stocks of companies which invest conservatively less the average monthly return
on a portfolio of stocks of companies which invest aggressively).
To read about the factors,
mkt rf _ , hml
and
smb,
search under Fama-French three factor
model and to read about the factors
rmw
and
cma
search under Fama-French five factor
model.
Note: All of the returns are expressed in percent, e.g., 2.65% is represented by 2.65, not by
0.0265. Also, please see the document ‘How to save images from AppStream’ on our iLearn
site under ‘Assignment’ in order to obtain an image of a table or graph in EViews.
Answer the following questions based on this dataset:
1. Estimate the following regression for the excess return on the portfolio of food
industry stocks for the full sample 1980m01 to 2020m02 and include a table of
results from EViews.
1 2 3 4 5 6 _ _ t t t t t t t food rf mkt rf hml smb rmw cma u = + + + + + +
Are the estimated coefficients on the five factors jointly statistically significant at the
5% level?
(2 marks)
2. For the food industry stocks regression, perform a test of the hypothesis that
3 4
=
against the alternative that
3 4
using EViews. What do you conclude?
(2 marks)
3. Estimate the following regression for the excess return on the portfolio of
transportation industry stocks for the full sample 1980M01 to 2020M02 and include
a table of results from EViews.
1 2 3 4 5 6 _ _ t t t t t t t transp rf mkt rf hml smb rmw cma u = + + + + + +
Are the estimated coefficients on the five factors jointly statistically significant at the
5% level?
(2 marks)
4. For the transportation industry stocks regression, perform a test of the hypothesis
that
3 4
=
against the alternative that
3 4
using EViews. What do you
conclude?
(2 marks)
5. Compare the estimated value of the coefficient
2
in the two regressions. Is the
estimated coefficient statistically different from one in each case? (Perform the test
in EViews). What do you conclude about the sensitivity of each portfolio to the
market risk premium?
(4 marks)
6. Compare the corresponding estimated coefficients on the other factors in terms of
their sign and significance in both regressions (i.e., compare the estimated
coefficient on
hml
in the food industry regression with that in the transportation
industry regression. Do similarly for the estimated coefficients on
smb, rmw
and
cma.
What does this comparison suggest about the average characteristics of stocks
in the food and transportation industries? (Hint: For example, are food industry
stocks value stocks on average, etc?).
(10 marks)
7. Is the estimate of
1
statistically significant at the 5% level in each of the
regressions? How do you interpret this result? (4 marks)
8. Perform White’s test (with no cross-product terms) for heteroscedasticity in the
estimated residuals from each regression. (Write out the null and alternative
hypotheses of the test, explain and provide the EViews output showing the results,
and clearly state the conclusion of the test. Use a 5% significance level).
(2 marks)
9. In view of the results of White’s test, should you be concerned about
heteroscedasticity and if so, what should you do, and would that change any of the
conclusions you reached in earlier questions. (4 marks)
Go to the next page for Part 2.
Part 2
Part 2 - Total number of marks: 28
The ‘Assignment_Question 2 EViews Workfile’ located under the ‘Assignment’ heading on
iLearn contains three monthly yield series for the period June 1993 – August 2022, i.e., for
1993M06 to 2022M08. The series
tb yr _3
is the yield to maturity on 3-year Australian
Treasury Bonds and the series
bab m _3
is the yield to maturity on Australian 3-month Bank
Accepted Bills. The series
spread
is the
tb yr _3
yield less the
bab m _3
yield. The data are
obtained from the Reserve Bank of Australia. The yields are represented, for example, as
5.95 which means 5.95% per year.
10. Conduct an ADF unit-root test on the
spread
series. Be sure to state the null and
alternative hypothesis for the test. Also conduct a KPSS unit root test and be sure to
state the null and alternative hypothesis for the test. Are the results from both tests
consistent with each other?
(4 marks)
11. Compute the ACF and PACF for the
spread
for the first 16 lags using EViews.
Comment on the pattern of the ACF and PACF and what they may suggest about the
ARMA time series model for the
spread.
(4 marks)
12. Consider the following two models where
t
y
denotes the series for
spread
:
Model 1:
t t t t 1 1 1 1 1 y c y u u = + + + − −
Model 2:
t t t t t t 2 1 1 2 2 1 1 2 2 y c y y u u u = + + + + + − − − −
Estimate each model in EViews, comment of the significance of the coefficients
(apart from the constant) and select the best model using the Akaike Information
Criteria (AIC) and the Schwarz Bayesian Information criteria (SBIC). (Hint: Sample
sizes need to be the same when comparing models with AIC or SBIC. That is,
estimate the models over the sample period 1993M08 to 2022M08 (i.e., beginning in
August 1993) because model 2, which has the most AR lags, namely two, uses up
two observations for the lags in the estimation. In the equation estimation settings
box, change 1993M06 to 1993M08).
(4 marks)
13. Using EViews, compute the ACF and PACF of the residuals(out to 16 lags) from
Model 1 and Model 2, respectively. Based on the ACF and PACF of the estimated
residuals, do you prefer one model over the other?
(4 marks)
14. Estimate Model 1 for the sample 1993M08 to 2020M12 and generate a dynamic
forecast for the period 2021M01 to 2022M08. (Hint: First estimate Model 1 being
sure to specify 1993M08 2020M12 in the estimation settings box. Having estimated
the model, select the forecast tab, select dynamic forecast, set the forecast sample
to 2021M01 to 2022M08, and in the forecast name box, type
spread df _ .
(i) EViews generates a graph for the dynamic forecasts you generated together
with the two-standard error band. Present this graph and comment on the
convergence or otherwise of the forecast values and on the behaviour of the
two-standard error band.
(4 marks)
(ii) Graph the actual
spread
series and the dynamic forecast of the
spread
series i.e., spread df _ on the same graph for the period 2021M01 to
2022M08. (Hint: Click on Quick/Sample and specify the sample as 2021M01
2022M08. Click OK. Then, on the main menu bar at the top of the screen,
click on Object/New Object/Group and OK. In the list of series box, type
spread
and
spread df _ and click OK. Then click on View/Graph to graph
both series together). Comment on the graph. (Hint: Refer to recent
movements in Australian interest rates).
(4 marks)
15. Now estimate Model 1 for the sample 1993M08 to 2020M12 and generate static
forecasts for the period 2021M01 to 2022M08. (Hint: Here select static forecasts,
and in the forecast name box, type
spread sf _ ).
(i) EViews generates a graph for the static forecasts. Present this graph and
comment on it.
(2 marks)
(ii) Graph the actual
spread
series and the static forecast of the
spread
series
i.e.,
spread sf _ on the same graph for the period 2021M01 to 2022M08.
Comment on the graph.
(2 marks)
===========================================================================
Lab 6: Laboratory Report Write-up
14 Oct
The Hypotheses
The hypotheses are the most important aspect of your Lab report,since only information that
is relevant to your hypotheses should be included in your Lab Report. The decision regarding
which, or how many, hypotheses you will test is entirely up to you. However, the hypotheses
you decide to test with the data must be relevant to previous research/theories defined in
the literature that you will review/define in the introduction section of your Laboratory
Report.
A hypothesis is essentially a prediction about some effect or relationship that you can test
using the data. Because they are predictions based on a theory, hypotheses are usually stated
formally as an ‘if-then’ statement. For example, we could state a hypothesis to be tested as:
‘IF individuals with a higher levels of Neuroticism are physiologically more sensitive to
negative stimuli, THEN Neuroticism score should be positively correlated with mean EDA, HR
and Respiration Rate in response to negative sounds.’
If you are expecting a particular result only because people have found that result previously,
then that is a simple expectation, it is not a hypothesis.
The Data
There is a total of 31 variables in the study that we ran over the last two weeks. We measured
each participant’s sex and their score on each of the big 5 personality dimensions (that’s 6
variables). We also measured 5 different responses (Rated Valence, Rated Arousal, EDA, HR
and Respiration Rate) to 5 different kinds of sounds (neghigh, neglow, poshigh, poslow and
start). Combined, that makes another 25 variables for which we have data. Which and how
many of these variables you use in your analyses will depend on the hypotheses you decide
to test. Your selection of hypotheses should be based on your reading of the relationships
between these variables in the reading list for the Lab Report, and on your own reading, as
well as hypotheses you can logically derive from existing theories.
There is a spreadsheet on Black Board containing the data for all students who participated
in the experiment.
YOU WILL NOT BE ABLE TO INCLUDE ALL THESE VARIABLES OR THE RESULTS OF THE RESULTS
SECTION OF THE LAB REPORT. YOU SHOULD ONLY INCLUDE THOSE VARIABLES AND RESULTS
THAT ARE RELEVANT TO THE HYPOTHESES THAT YOU CHOOSE TO TEST.
The Design
The experiment was a mixed design in which we are interested in looking at the relationships
between a number of variables. Your experiment included one clear independent variable
(Sex), which is also a between-subjects variable. The personality scores can also be thought
of/used as between-subjects independent variables if they are used to categorise participants
into high/low O, C, E, A, and N. They can also be used/thought of as covariates – variables
that might also affect the dependent variables, or that might affect relationships between
other variables, or as continuous variables that you can correlate with other variables.
The study also contained 4 within-subjects independent variables, that were manipulated
factorially. The sounds that were played were a 2 x 2 cross between two independent
(manipulated) variables, valence (positive or negative) and intensity (high or low), making the
4 sound variables poshigh, poslow, neghigh and neglow. There were 4 sounds that fit each of
those characteristics (making 16 sounds in total), that we averaged across to get a mean score
in each category for each participant.
The fifth kind of sound played was a sudden noise (named “start”) designed to produce a
startle response. It was just played 4 times at random in the sequence.
The other variables in our study are the dependent variables – the responses made to the
sounds, both self-report and physiological (Rated Valence, Rated Arousal, EDA, HR and
Respiration rate).
StatisticalAnalyses
It is always a good idea to have a look at the descriptive statistics for any variable you
include in your analysis, just to check that it is relatively normally distributed and has
been collected and recorded properly.
The table above shows an example of the descriptive statistics for the data from the
extraversion variable and the arousal ratings given to positive, high arousal stimuli(fromdata
I created) using SPSS (you could also get most of this info with formulas in excel, but SPSS, or
even better JASP, enables you to do it for all of the variables at once). I’ve provided all of this
info, and frequency histograms for each variable, in the JASP file online in which I analysed
your actual data. From the example above, produced using SPSS, we can see that of all the
students providing data for this variable (n = 49) that the mean E percentile was 52.88 (with
a SD of 24.16), and the data was reasonably normally distributed, since Skewness = -0.414.
You can use these kinds of results in a number of ways in your lab report; to describe the data
you are analyzing, to create graphs for use in your lab report, and to determine what type of
statistical analysis you will use to test your hypotheses.
In order to test the hypotheses you choose based on your readings, you will need to
understand the results of the appropriate statistical analyses of these variables. In the class
exercises below we will work through the statisticalresults provided to learn how to interpret
the JASP output of the results provided on Blackboard.
While the results below are limited to t-tests and correlations you are also welcome to
undertake your own analyses using the data files also provided on Blackboard. If you do
choose to do your own analyses (using JASP or any statistical package you like) your write-up
of the analyses must conform to the format of the Lab report described below.
=========================================================================
INTRODUCTION TO BAYESIAN DATA ANALYSIS (STAT3016/4116/7016)
Friday 21 October 2022, by 11:59pm
Problem 1 [20 marks]
The most recent statistics from Fundsquire show that 60 percent of Australian start-up businesses fail
within their first three years 1
. Suppose survival data is collected from a random sample of 50 start-ups
that started since 01 January 2015 and the operational status of each company is recorded as at 30 June
2022. The data file "StartUp.csv" contains two variables
• Status (X) - where Xi = 1 if company i is still operational as at 30 June 2022, and Xi = 0 if
company i is no longer operational as at 30 June 2022.
• Time (Y ) - the time (in years) to complete shutdown of the company if Xi = 0, or the censoring
time (ci) if Xi = 1.
Let Zi denote the true lifetime of company i. Assume Zi follows an Exponential distribution. The model
is
Yi =
(
Zi
if Xi = 0
ci
if Xi = 1 (that is Zi > ci)
(1)
Z1, ..., Zn|θ
iid∼ Exp(θ)
So Yi
is the observed survival or censoring time and Zi
is the true (but not always directly observed)
survival time. If the start-up fails before the study end date, then Yi = Zi
. If the start-up is still
operational at the study end date, then all we know is that Zi > ci
, and the observed life time is equal to
the censoring time ci
. The parameter θ is the rate parameter for the start-up true survival time. In this
problem, the rate parameter θ and some of the true survival times Z = (Z1, ..., Zn) are unknowns. Our
goal is to estimate the posterior density p(θ|x, y) (where y = (y1, ..., yn), x = (x1, ..., xn))
(a) [3 marks] Derive Jeffrey’s prior for θ for the Exponential sampling model Z1, ..., Zn|θ
iid∼ Exp(θ). Is
Jeffrey’s prior a proper prior for this model?
(b) [2 marks] Assuming the prior you obtained in part (a), derive the full conditional posterior distribution
p(θ|z, x, y).
(c) [3 marks] Derive the full conditional posterior distribution p(Zi
|θ, z−i
, x, y).
(d) [4 marks] Implement a Gibbs sampling scheme that approximates the joint posterior distribution of
θ and Z given y and x using the conditional distributions you derived in parts (b) and (c). Insert
your computer code here.
(e) [3 marks] Provide autocorrelation and traceplots for θ and a selection of the Zi
’s belonging to
censored units. Comment on these plots. Also report the effective sample sizes.
(f) [2 marks] Provide a 95% posterior interval estimate for θ. What is the expected survival time of a
recent start-up in Australia given the data?
(g) [3 marks] Is the Exponential distribution a valid sampling distribution assumption for this data?
Run some checks to support your answer. Suggest how the model assumptions could be modified if
your checks are not satisfied.
1The Australian, 11 March 2022
Problem 2 [20 marks]
An outlier is an observation that lies outside the overall pattern of a distribution 2
. Outliers are a
common occurrence in survey data sets. How to appropriately deal with outliers is open to debate. In
this problem we will look at the contaminated normal model as one approach to treat outliers that deviate
from a normal sampling distribution assumption.
Suppose we have data points y1, ..., yn. The assumed model is a normal model with mean μ and variance
σ
2
. To allow for the possibility of outliers, the normal model is modified as follows:
yi
|μ, σ2
, δi
, ui ∼ Normal(μ + δiui
, σ2
)
and
δi
|θ ∼ Bern(θ)
So if δi = 1 then the i
th observation is from a normal model with the same variance but location shifted
by the factor ui
.
Assume the following semiconjugate prior distributions:
μ ∼ Normal(μ0, τ 2
0
)
σ
2 ∼ InvGamma(ν0/2, ν0σ
2
0/2)
ui ∼ Normal(0, η2
)
θ ∼ Beta(a, b)
(a) [2 marks] Derive the conditional posterior distribution of μ.
(b) [2 marks] Derive the conditional posterior distribution of σ
2
.
(c) [2 marks] Derive the conditional posterior distribution of δi
.
(d) [2 marks] Derive the conditional posterior distribution of ui
.
(e) [2 marks] Derive the conditional posterior distribution of θ.
(f) [5 marks] The data set "FemaleLabour.csv" contains information on the female labour force
participation rate for different countries. Assuming the data follow a contaminated normal model as
specified above, write some R code to implement the Gibbs sampling algorithm to obtain posterior
draws of the model parameters using the full conditional distributions you derived in parts (a) to (e).
Insert your computer code here. Run your algorithm for at least 100000 iterations. Assume weakly
informative priors. You may ignore any missing values. Provide traceplots and autocorrelation plots
for μ, σ
2 and θ. Provide details on any burn-in period or thinning you applied to the sequence of
Gibbs sampler draws to improve the convergence diagnostics.
(g) [2 marks] Provide estimates of the posterior probability that each of the five smallest observations
are outliers.
(h [3 marks] Display a plot of the marginal posterior density of μ. Compare this plot to one produced
assuming a non-contaminated normal model, that is, yi
|μ, σ2
iid∼ Normal(μ, σ2
)
2Moore, D. S. and McCabe, G. P. Introduction to the Practice of Statistics, 3rd ed. New York: W. H. Freeman, 1999
Problem 3 [15 marks (STAT3016); 20 marks (STAT4116/STAT7016)]
The data set "topgear.csv" contains information on cars featured on the website of the BBC television
show TopGear. The data set is a subset of the data set available in the R package crmReg. Please see
this link https://search.r-project.org/CRAN/refmans/crmReg/html/topgear.html for a description
of the variables in the data set. In this question you are going to fit a Bayesian linear regression model
to predict the response variables MPG (fuel consumption in miles per gallon). Note that there are missing
values in the data set (as indicated by an NA entry).
(a) [3 marks] Firstly, let’s ignore any missing values. Using all other variables as candidate predictors
write down the steps of your sampling algorithm to simultaneously perform Bayesian model selection
and obtain posterior draws of the linear regression model parameters. Be sure to use mathematical
expressions for your sequence of conditional posterior distributions. (You may ignore any interactions
terms or higher order terms and consider main effects only). Assume weakly informative priors.
(b) [3 marks] Write some computer code to implement your sampling algorithm in part (a). Insert your
computer code here.
(c) [2 marks] Which variables are more strongly predictive of MPG? Provide some output from your
sampling algorithm as evidence.
(d) [2 marks] Create a plot of posterior predictive residuals. Comment on any lack of fit of the linear
model from looking at the residual graph.
(e) [3 marks] Provide some diagnostic plots and other diagnostic measures to show convergence and
stationarity of your sampling algorithm.
(f) [2 marks] Discuss how your results might change if missing values are imputed at each iteration of
the sampling algorithm rather than ignored.
(g) [5 marks] [STAT4116/STAT7016 ONLY] State the additional steps in your sampling algorithm
that you would need to implement in order to impute missing values at each iteration. (Note, you
do not actually need to run the modified algorithm that you propose here).
Problem 4 [20 marks]
The data set SPIndex.csv contains percentage returns for the S&P 500 stock index over 1,250 days
(observations), from 2001 to 2005. The data set is part of the ISLR library in R. The variables in the
data set are:
• Year
• Lag 1 to Lag 5 - percentage returns for the five previous days
• Volume - the number of shares traded on the previous day
• Today - percentage return on that date
• Direction - whether the market was Up or Down on that date
Consider a logistic regression model for predicting Direction as a function of Lag 1 to Lag 5 and
Volume. In this question you are required to perform Bayesian model selection on the logistic regression
model.
(a) [2 marks] Write down the equation of the logistic regression model to be estimated. Be sure to
clearly define your notation for all parameters and data variables. Remember to include parameters
to enable Bayesian model selection.
(b) [2 marks] Specify the prior distributions (with reasons) that you will be assuming.
(c) [5 marks] Write out the steps of a Metropolis-Hastings algorithm that you will run to obtain posterior
draws of the parameters of your model.
(d) [5 marks] Implement the Metropolis-Hastings algorithm your wrote in part (c) to approximate the
posterior distribution of your model parameters. Apply thinning and a burn-in period to your
sequence of posterior draws as required. Aim to achieve an acceptance rate of between 20%-50%.
To achieve this, you might like to initially run your algorithm for 1000 iterations and check the
acceptance rate. If the acceptance rate is too low or too high, adjust the tuning parameter of
your proposal distribution accordingly. Report the tuning parameter values you tested and the
corresponding acceptance rates.
Note: You must write your own code to run your Metropolis-Hastings algorithm and not use
any existing computer package or function written specifically to perform Bayesian inference using
posterior simulation.
(e) [3 marks] Provide diagnostic plots to assess convergence of your Metropolis-Hastings algorithm
and whether the sequence of posterior draws approximate an independent sample from the target
posterior distribution. If the diagnostics are not satisfied, discuss how you could modify your
Metropolis-Hastings algorithm to improve the accuracy of your MCMC approximation. (Note,
please provide your diagnostic plots after any thinning or burn-in adjustment).
(f) [3 marks] Which variables are important predictors of Direction? Provide some MCMC estimates
or plot(s) to support your answer. Obtain posterior means and posterior confidence intervals for
the important variables thus identified.
For solutions, purchase a LIVE CHAT plan or contact us
Follow us on Instagram and tag 10 friends for a $50 voucher! No minimum purchase required.