My earlier paper mentioned in “Accidental Art” on phased entry has been postponed.   This postponement is a fortunate side effect of my successful publication in Regional Science and Urban Economics.  My successful publication has propelled into a direction of crime economics, rather than one of phased entry. Currently I’m doing some basic modeling on counter-terrorism, which makes me feel like a criminal mastermind. Amusingly enough, there’s still some some beautiful accidental art being churned out by my model, this time in Matlab!

In an attempt to branch out and see what other people do in terms of work, I’ve been creating a model for the Student Investment Club to simultaneously forecast GDP, CPI, and Unemployment.  While such a prediction is neither easy nor obvious, I made an attempt at it using some basic methodologies.  The dependent variables that I used in this case were guided by the preferences of the group, rather than by any particular theoretical model.  As such, I have very little faith in the model to be a powerful predictor on a fundamental level, but I do expect it to be correlated with the actual values.

Attached is my presentation (5-10 minutes) about my preliminary forecasts and findings. It is meant to be delivered to a nontechnical audience and is meant to be a partial (but not complete) disclosure of the problems with my approach. Below is a version of the model I am using to make such a forecast, complete with commentary.

SIC Presentation – Professional (Warm)

library(tseries)
library(quantmod)
library(systemfit)

# Old work functions as a great key
# gdp<-getSymbols(‘GDPC1′,src=’FRED’)
# indp<-getSymbols(‘INDPRO’, src=’FRED’)
# ism<-getSymbols(‘NAPM’, src=’FRED’)
# cap<-getSymbols(‘TCU’,src=’FRED’)
# wage<-getSymbols(‘AHETPI’,src=’FRED’) #Productivity? Proxy:wages
# ppi<-getSymbols(‘PPIACO’,src=’FRED’)
# unemploy<-U1RATENSA
# libor<-USDONTD156N
#
# cpi<-getSymbols(‘CPIAUCSL’,src=’FRED’)
# nom_pers_inc<-getSymbols(‘PINCOME’,src=’FRED’) #this might need to be real
# senti<-getSymbols(‘UMCSENT’,src=’FRED’)
# #demand<-getSymbols(‘DEMOTHCONS’,src=’FRED’)#Consumer demand? Proxy: request for more loans
# #cpi<-getSymbols(‘TCU’,src=’FRED’) #Total sales? Proxy: Change in buisness inventories

#Get the data
out<-NULL
b_names<-c(“GDPC1″,”INDPRO”,”NAPM”,”TCU”,”AHETPI”,”PPIACO”,”CPIAUCSL”,”PINCOME”,”UMCSENT”,”FEDFUNDS”,”U1RATENSA”)
getSymbols(b_names,src=’FRED’)
b<-list(GDPC1,INDPRO,NAPM,TCU,AHETPI,PPIACO,CPIAUCSL,PINCOME,UMCSENT,FEDFUNDS,U1RATENSA)
FEDFUNDS<-na.exclude(FEDFUNDS)
out<-lapply(b, aggregate, by=as.yearqtr, mean)

# Scale it appropriately.
series<-lapply(out,window,start=as.yearqtr(“2000 Q1″), end=as.yearqtr(“2013 Q1″))#trims to a consistant window.
series<-lapply(series,cbind)
series<-data.frame(series)
names(series)<-b_names
series<-log(series) #log the series
series<-as.ts(series) #need time series for this following operator:
series<-diff.ts(series[,c("GDPC1","INDPRO","TCU","AHETPI","PPIACO","CPIAUCSL","PINCOME","UMCSENT","FEDFUNDS","U1RATENSA")]) #first difference
lagGDP<-series[,"GDPC1"]
lagCPI<-series[,"CPIAUCSL"]
lagUNEMP<-series[,"U1RATENSA"]
series<-data.frame(series) #back to df
series$NAPM<-matrix(NAPM[(dim(NAPM)[1]+2-dim(series)[1]):dim(NAPM)[1]]) #Some may be stationary! series$lvl_UMCSENT<-matrix(UMCSENT[(dim(UMCSENT)[1]+2-dim(series)[1]):dim(UMCSENT)[1]])
series$lvl_TCU<-matrix(TCU[(dim(TCU)[1]+2-dim(series)[1]):dim(TCU)[1]]) series$lvl_NAPM<-matrix(NAPM[(dim(NAPM)[1]+2-dim(series)[1]):dim(NAPM)[1]])
series$lvl_FEDFUNDS<-matrix(FEDFUNDS[(dim(FEDFUNDS)[1]+2-dim(series)[1]):dim(FEDFUNDS)[1]]) series$t.index<-zooreg(series, start=as.yearqtr(“2000 Q1″),end=as.yearqtr(“2013 Q1″), frequency = 4) #need a time trend
series$quarter<-as.vector(seq(from=1,to=4, by=1)) # series$PINCOME_2<-(series$PINCOME)^2 #are these acceptable? # series$GDPC_2<-(series$GDPC1)^2 series_hold<-data.frame(series) # documentation http://cran.r-project.org/web/packages/systemfit/vignettes/systemfit.pdf series$Lead_GDPC1<-lag(zoo(lagGDP),k=+2, na.pad=TRUE)
series$Lead_CPIAUCSL<-lag(zoo(lagCPI),k=+2, na.pad=TRUE) series$Lead_U1RATENSA<-lag(zoo(lagUNEMP),k=+2, na.pad=TRUE) #impact takes at least 2 quarters. This is needed because we are missing CPI numbers for last quarter. Sentiment is delayed 6 months as propietary information. If it is set to +2, the estimates are to see what it would be like if we had the current info (pay for it).
eq1<- Lead_GDPC1 ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eq2<- Lead_CPIAUCSL ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eq3<- Lead_U1RATENSA ~ INDPRO + lvl_NAPM + lvl_UMCSENT + GDPC1 + TCU + CPIAUCSL + FEDFUNDS + U1RATENSA + factor(quarter)
eqsystem<-list(GDP=eq1,CPI=eq2,UNEMP=eq3)
# series<-data.frame(series)
fit<-systemfit(eqsystem, method=”SUR”, data=series)
pred<-predict(fit,series, se.pred=TRUE)
pred_ci<-predict(fit, series, interval=”confidence”, level=0.95) #note events are not normal.
plot(series$GDPC1, type=”l”, col=”darkgreen”, ylab=”% Change in GDP”, xlab=”Quarters (since 2000)”, main=”GDP forecast”) #the dimseries -40 gets me 10years. points(pred[1], type=”l”, col=”blue”, lty=5) points(pred_ci[,c(3)],type=”l”, col=”red”, lty=2) points(pred_ci[,c(2)],type=”l”, col=”red”, lty=2) legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90) plot(series$CPIAUCSL, type=”l”, col=”darkgreen”, ylab=”% Change in CPI”, xlab=”Quarters (since 2000)”,main=”CPI forecast”)
points(pred[3], type=”l”, col=”blue”, lty=5)
points(pred_ci[,5],type=”l”, col=”red”, lty=2)
points(pred_ci[,6],type=”l”, col=”red”, lty=2)
legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90)

plot(series$U1RATENSA, type=”l”, col=”darkgreen”, ylab=”% Change in UNEMP”, xlab=”Quarters (since 2000)”, main=”UNEMP forecast”) points(pred[5], type=”l”, col=”blue”, lty=5) points(pred_ci[,8],type=”l”, col=”red”, lty=2) points(pred_ci[,9],type=”l”, col=”red”, lty=2) legend(x=”bottomleft”,c(“Green= Actual GDP”,”Red= 95% CI”,”Blue=Forecast”), cex=0.90) summary(fit) tail(pred) pred<-rbind(0,rbind(0,pred)) pred_ci<-rbind(0,rbind(0,pred_ci)) tail(series[c("CPIAUCSL","GDPC1","U1RATENSA")]) I’m working on phased entry with spatial price discrimination. This got spat out, and I’m really enjoying the subtle patterns. As perhaps a mere interesting note, cbind when combined with as.vector can be a particularly RAM-intensive set of commands. I noted the following script excerpt caused my computer to quickly consume 11GB of RAM on a 300k entry dataset. : for(j in c(1:200)){ mod.out$coefficients$count[1:80]<-lim.my.df[1:80,j] mod.out$coefficients$zero[1:80]<-lim.my.df[81:160,j] a<-predict(mod.out,clip.1) b<-predict(mod.out,clip.mean) diff.j<-mean(a-b) # diff[,paste(i,".",j)]<-diff.j diff<-as.vector(cbind(diff,diff.j)) } The purpose of this script is to use bootstrapped coefficients generate an average partial effect between clip.1 and clip.mean. We will later use this to get a estimate of the standard errors of the APE. As it stands, it eats all my RAM quite promptly and causes the computer to crash. The following script, nearly identical, does not have this problem: for(j in c(1:200)){ mod.out$coefficients$count[1:80]<-lim.my.df[1:80,j] mod.out$coefficients\$zero[1:80]<-lim.my.df[81:160,j]
a<-predict(mod.out,clip.1)
b<-predict(mod.out,clip.mean)
diff.j<-mean(a-b)
# diff[,paste(i,".",j)]<-diff.j
diff<-cbind(diff,diff.j)
}
diff<-as.vector(diff)

And this works just fine! In fact, it barely  consumes 25% of my RAM.

Here’s a quick proof common to several books, but is of course, no means exclusive to any one of them. Recall that if you have a correctly specified log likelihood function, ${f(\theta_0)}$, the first derivative is that of the score function: ${s(\theta)=\nabla f(\theta)}$. In order to optimize, we set this first derivative equal to zero. So, ${E(\nabla f(\theta))=0}$. Let’s take a Taylor series expansion of this around ${\theta_0}$ , where ${\theta_0}$ is the true parameter, assuming the third derivative is not particularly relevant:

${E(\nabla f(\theta_0))=0\approx E(\nabla f(\theta))+E(\nabla^2 f(\theta))(\theta-\theta_0)}$

We’ve already assumed the left hand side is equal to 0 because we’re optimizing it.

${0 = E(\nabla f (\theta)) + E(\nabla^2 f (\theta))(\theta-\theta_0)}$

${-E(\nabla f (\theta))=E(\nabla^2 f (\theta))(\theta - \theta_0 )}$

${-E(\nabla^2 f (\theta))^{-1}E(\nabla f (\theta))=(\theta - \theta_0 )}$

So if we want to square this to estimate something, such as variance:

${-E(\nabla^2 f(\theta))^{-1} E(\nabla f(\theta)\nabla f(\theta)')E(\nabla^2f(\theta))^{-1} = (\theta-\theta_0)}$

But don’t panic! We recall from previous work that ${E( \nabla f(\theta) \nabla f(\theta))= E(\nabla^2 f(\theta))}$, the score function squared is, in expectation, the same as its derivative. So this simplifies very nicely into:

${-E(\nabla^2 f (\theta))^{-1} = (\theta - \theta_0 )^2}$

Then since we know that the first step, ${E(\theta - \theta_0 ) = 0}$, we have a mean of our estimation. We have a variance of our process, ${-E(\nabla^2 f(\theta))^{-1}=(\theta-\theta_0)^2}$. So by the central limit theorem we know that:

${\sqrt{N}(\theta - \theta_0)^2 \sim N(0, -E(\nabla^2 f(\theta))^{-1} )}$

I haven’t been posting much about how to use R, and simultaneously, I discovered a problem for which my intuition was suffering. Without further ado, I present to you a Prelim problem with the solution, as assisted by R.

Consider the latent variable model:

${y_i=\alpha d_i+\beta x_i + u_i, i=1,2,3...,n}$

Here, ${x_i}$ is exogenous and independent from ${u_i}$, and ${d_i}$ is a dummy variable indicating whether a person participates in a training program. The complication is that ${d_i}$ is endogenous (No! The crowd recoils in horror!). Specifically:

${d_i=1\{d^{*}_i>0\}}$, where ${d^{*}_i=w_i\gamma+v_i}$

Grant that ${u_i,v_i}$ are both independent of ${x_i,w_i}$. Suppose that:

${u_i=\delta v_i +\xi_i}$

Where ${u_i\sim N(0,1),\xi_i\sim N(0,\sigma^2_\xi)}$, and ${\xi_i}$ and ${v_i}$ are independent. You will only be permitted iid observations of ${x_i,y_i,d_i,w_i}$. Begin by deriving ${E(y|d=1,x,w)}$.

Ah, yes. The warm up question…

${E(y|x,w,d=1)=\alpha+\beta x_i + \delta E(v_i|d_i=1) + E(\xi_i)}$ ${E(y|x,w,d=1)=\alpha+\beta x_i + \delta E(v_i|w_i\gamma+v_i>0)}$ ${E(y|x,w,d=1)=\alpha+\beta x_i + \delta E(v_i|v_i>-w_i\gamma)}$ ${E(y|x,w,d=1)=\alpha+\beta x_i + \delta \sigma_\xi E(\frac{v_i}{\sigma_\xi}|\frac{v_i}{\sigma_\xi}>\frac{-w_i\gamma}{\sigma_\xi})}$

One can also take first derivatives with respect to x, to show that this is biased if we simply use OLS to estimate it.

Given your answer to the first question, show me how you would obtain consistant estimates of ${\beta}$ given access to Probit and OLS.

Don’t mind if I do!

N<-100000
w<-matrix(rnorm(N,0,1))
x<-matrix(rnorm(N,0,1))
v<-rnorm(N,0,1)
xi<-rnorm(N,0,1)
gamma<-rnorm(1,0,1)
alpha<-rnorm(1,0,1)
beta<-rnorm(1,0,1)
delta<-rnorm(1,0,1)
dstar<-matrix(w*gamma+v)
d<-matrix(rep(0,N))
for(i in 1:N){
if(dstar[i,]>0){d[i,]<-1}
if(dstar[i,]<0){d[i,]<-0}
}
u<-delta*v+xi
y<-alpha*d+x*beta+u

This generates the data very nicely, but makes some assumptions about all the variances. They’re all 1. So be it. I could replace them with random numbers, but this is not particularly exiting in this demonstration. The concept is that we probit the first stage using both w and x (while x is irrelevant, in application we would use it anyway). Then we use the first stage as an estimate of the probability of receiving alpha, then use that estimated probability in a second stage. Since the estimated probability is uncorrelated with the error terms ${v_i}$ and ${\xi_i}$, we expect to see unbiased estimates.

#Q2-Instrumenting
phase1<-glm(d ~ 0 + w + x, family = binomial(link = "probit"))
paste(coef(phase1)[1], gamma, sep=" =? ")
dhat<-predict(phase1)
phase2<-lm(y ~ 0 + dhat + x)
paste(coef(phase2)[1],"=?", alpha, ",", coef(phase2)[2], "=?", beta, sep="")

We will notice such an estimation is unbiased for beta, but not alpha. This might strike us as strange, except for that y actually has such a relationship that we have failed to model. The coefficient on ${\alpha}$ will be biased in the direction of ${\delta}$. Note that the estimate is missing any attempt to compensate for ${\delta v_i}$. This is correlated with . I am currently trying to work out the size of the bias, but I think I need to use MLE to do it, my basic techniques are struggling here. We can actually just do OLS directly, if we so choose, because we have assumed X is uncorrelated with W and therefore di.

Then the question was: Can we estimate this with least squares?

#Q3-2SLS
phase1<-lm(d ~ 0 + w + x)
paste(coef(phase1)[1], gamma, sep=" =? ")dhat<-predict(phase1)
phase2<-lm(y ~ 0 + dhat + x)
paste(coef(phase2)[1],"=?", alpha, ",", coef(phase2)[2], "=?", beta, sep="")

The proof for the unbiasedness of linear probability models (the first stage, not the whole model) begins by writing the pdf of the function:

${P(d|w)\sim \frac{\phi(\frac{w\gamma}{\sigma^2_v})}{1-\Phi(\frac{w\gamma}{\sigma^2_v})}=\lambda(\frac{w\gamma}{\sigma^2_v})}$ This is also called the inverse mills ratio. Taking logs and maximizing with respect to ${\gamma, \sigma_v^2}$, we get:

${\frac{d}{d\gamma}\ln(\lambda(\frac{w\gamma}{\sigma^2_v}))=\frac{1}{\lambda(\frac{w\gamma}{\sigma^2_v})}\lambda(\frac{w\gamma}{\sigma^2_v})(\frac{w\gamma}{\sigma^2_v}-\lambda(\frac{w\gamma}{\sigma^2_v}))\frac{w_i}{\sigma_v^2}=0}$

${\hat{\gamma}=\lambda(\frac{w\gamma}{\sigma^2_v})\frac{\sigma^2_v}{w}=E(\frac{d_i*\sigma^2_v}{w_i})}$

${\frac{d}{d\sigma^2_v}\ln(\lambda(\frac{w\gamma}{\sigma^2_v}))=\frac{1}{\lambda(\frac{w\gamma}{\sigma^2_v})}\lambda(\frac{w\gamma}{\sigma^2_v})(\frac{w\gamma}{\sigma^2_v}-\lambda(\frac{w\gamma}{\sigma^2_v}))w_i*\gamma=0}$

${\hat{\sigma}^2_v=\lambda(\frac{w\gamma}{\sigma^2_v})\frac{1}{w\gamma}=E(\frac{d_i}{w_i\gamma})}$

Unfortunately, we can only estimate these up to a scale. But if we normalize ${\sigma_v^2}$ to 1, then we can determine the scale, and ${\gamma=E(d)/E(w)}$ which is the same as the linear relationship, given iid errors. If the first stage is unbiased, and after estimating, ${\hat{d}_i}$ uncorrelated with ${v}$, then the error term ${v}$ no longer biases the estimate.

Is the 2SLS asymtotically efficient? If so, can you do it better?

#Q4-CFT
cfa1<-glm(d ~ 0 + w + x,family=binomial(link="probit"))
paste(coef(cfa1)[1], gamma, sep=" =? ")
yhat<-predict(cfa1,type="response")
vhat<-d*yhat-(1-d)*(1-yhat))
cfa2<-lm(y~0+d+x+vhat)
paste(coef(cfa2)[1],"=?",alpha,",", coef(cfa2)[2],"=?",beta,",",coef(cfa2)[3],"=?",delta*sd(v),sep="")

It’s unbiased, but not efficient. The 2SLS error term, ${\delta v_i+\xi_i}$ is larger than necessary, since we know something about ${\delta v_i}$. This control function approach is the most efficient estimator- it gives us a consistent estimator of ${\delta}$!. The trickest part of estimating the equation is noticing that our estimate of ${\hat{v}_i}$ cannot directly be the residuals from the first stage (trust me, I tried!), instead, the errors are actually determined by the generalized residual: the difference between the estimated probability of success (given success actually occured!) and estimated probability of failure (given that failure actually occurred!). The point is that the residual error, if correctly specified, is an estimate of source of the endogeniety. If you use it correctly, you are actually estimating the endogeniety. If ${\hat{\delta}}$ is insignificantly different than 0, it’s likely that the endogeniety is not an issue.

Also, some thanks to my source.

This old prelim question comes with a large number of preloaded information, but the theory is ultimately not important. Don’t be afraid of theory from other subfields- you’re in econometrics now, the ugliest, nastiest one of the bunch. Let us say you have an equation:

${\begin{bmatrix}\Delta y_{t}\\\Delta m_{t}\\\end{bmatrix}=\begin{bmatrix}\mu_{y}\\\mu_{m}\\\end{bmatrix}+\begin{bmatrix}\theta_{yy}(L)&\theta_{ym}(L)\\\theta_{my}(L)&\theta_{mm}(L)\\\end{bmatrix}\begin{bmatrix}\epsilon_{yt}\\\epsilon_{mt}\\\end{bmatrix}}$

Where (L) represents that the item is part of an infinitely long polynomial of lags.

If this is all in log specifications, the long run elasticity of M with respect to Y is ${ \gamma_{ym}=\frac{\theta_{my}}{\theta{yy}}}$

${\begin{bmatrix}1&-\lambda_{ym}\\ -\lambda_{my}&1\\ \end{bmatrix} \begin{bmatrix} \Delta y_{t}\\ \Delta m_{t}\\ \end{bmatrix} = \begin{bmatrix}c_y&\alpha_{1,yy}&\alpha_{1,ym}\\c_m&\alpha_{1,my}&\alpha_{1,mm}\\\end{bmatrix}\begin{bmatrix}1\\\Delta y_{t-1}\\\Delta m_{t-1}\\\end{bmatrix}+\begin{bmatrix}\epsilon_{yt}\\\epsilon_{mt}\\\end{bmatrix}}$

If this is all in log specifications, the long run elasticity of M with respect to Y is ${\gamma=\frac{\theta_{my}(1)}{\theta{yy}(1)}}$.

Show how this could be consistently estimated when this coefficient is known. If you need, assume that in this case ${\gamma}$ is 1. If we rename the merely complicating factors associating the mean of Y and M:

${\begin{bmatrix} \Delta y_{t}\\ \Delta m_{t}\\ \end{bmatrix} =\begin{bmatrix}\mu_{y}\\ \mu_{m}\\\end{bmatrix}+\frac{1}{1-\lambda_{ym}\lambda_{my}}\begin{bmatrix}1&\lambda_{ym}\\ \lambda_{my}&1\\ \end{bmatrix}\begin{bmatrix}\epsilon_{yt}\\\epsilon_{mt}\\\end{bmatrix}}$

${\begin{bmatrix} \Delta y_{t}\\ \Delta m_{t}\\\end{bmatrix}= \begin{bmatrix}\mu_{y}\\\mu_{m}\\\end{bmatrix}+ \begin{bmatrix}\frac{1}{1-\lambda_{ym} \lambda_{my}} \epsilon_{yt}+\frac{\lambda_{ym}}{{1-\lambda_{ym} \lambda_{my}}}\epsilon_{mt}\\\frac{\lambda_{my}}{1-\lambda_{ym} \lambda_{my}} \epsilon_{yt}+\frac{1}{{1-\lambda_{ym} \lambda_{my}}}\epsilon_{mt}\\\end{bmatrix}}$

Consequently, ${\theta_{yy}=\frac{1}{1-\lambda_{ym} \lambda_{my}}}$, since there are no lags on the process, and ${\theta_{my}=\frac{\lambda_{my}}{1-\lambda_{ym} \lambda_{my}}}$. The ratio of the two is: ${\lambda_{my}=1}$, so this simplifies some of our problems rather dramatically. Note ${\lambda_{my}}$ cannot be one as a result of this, else we divide by zero! Nor can their product have a modulus greater than one.

So our specification is thusly:

${\begin{bmatrix}1&-\lambda_{ym}\\ -1&1\\ \end{bmatrix} \begin{bmatrix} \Delta y_{t}\\ \Delta m_{t}\\ \end{bmatrix} = \begin{bmatrix}c_y&\alpha_{1,yy}&\alpha_{1,ym}\\c_m&\alpha_{1,my}&\alpha_{1,mm}\\\end{bmatrix}\begin{bmatrix}1\\\Delta y_{t-1}\\\Delta m_{t-1}\\\end{bmatrix}+\begin{bmatrix}\epsilon_{yt}\\\epsilon_{mt}\\\end{bmatrix}}$

We estimate the model as such:

${\begin{bmatrix} \Delta y_{t}\\ \Delta m_{t}\\\end{bmatrix}= \begin{bmatrix}\frac{1}{1-\lambda_{ym}}&\frac{\lambda_{ym}}{{1-\lambda_{ym}}}\\\frac{1}{1-\lambda_{ym}} &\frac{1}{{1-\lambda_{ym}}}\\\end{bmatrix}\begin{bmatrix}c_y&\alpha_{1,yy}&\alpha_{1,ym}\\c_m&\alpha_{1,my}&\alpha_{1,mm}\\\end{bmatrix}\begin{bmatrix}1\\\Delta y_{t-1}\\\Delta m_{t-1}\\\end{bmatrix}+ \begin{bmatrix}\frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{\lambda_{ym}}{{1-\lambda_{ym}}}\epsilon_{mt}\\\frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{1}{{1-\lambda_{ym}}}\epsilon_{mt}\\\end{bmatrix}}$

${\begin{bmatrix} \Delta y_{t}\\ \Delta m_{t}\\\end{bmatrix}= \pi \begin{bmatrix}1\\ \Delta y_{t-1}\\ \Delta m_{t-1}\\ \end{bmatrix} + \begin{bmatrix} \frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{\lambda_{ym}}{{1-\lambda_{ym}}}\epsilon_{mt}\\\frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{1}{{1-\lambda_{ym}}}\epsilon_{mt}\\ \end{bmatrix}}$

So to optimize this, we need to estimate: ${\sigma^2_u,\sigma^2_y,\lambda_{ym},\pi }$ and we can do this with MLE, noting that the likelihood function for the joint combination of our variables of interest are:

${ \mathcal{L}(\Delta y_t,\Delta m_t|past)=\frac{1}{(2\pi|\Omega|)^{-1/2}}+e^{-(y-\pi x)(2\Omega)^{-1}(y-\pi x)'} }$

Where: ${\Omega= \begin{bmatrix} \frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{\lambda_{ym}}{{1-\lambda_{ym}}}\epsilon_{mt}\\\frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{1}{{1-\lambda_{ym}}}\epsilon_{mt}\\ \end{bmatrix}\begin{bmatrix} \frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{\lambda_{ym}}{{1-\lambda_{ym}}}\epsilon_{mt}\\\frac{1}{1-\lambda_{ym}} \epsilon_{yt}+\frac{1}{{1-\lambda_{ym}}}\epsilon_{mt}\\ \end{bmatrix}'=\begin{bmatrix}\frac{\sigma^2_{m}}{({1-\lambda_{ym}})^2}+\frac{\sigma^2_{y}\lambda_{ym}^2}{({1-\lambda_{ym}})^2}&\frac{\sigma^2_{m}}{({1-\lambda_{ym}})^2}+\frac{\sigma^2_{y}\lambda_{ym}}{({1-\lambda_{ym}})^2}\\\frac{\sigma^2_{m}}{({1-\lambda_{ym}})^2}+\frac{\sigma^2_{y}\lambda_{ym}}{({1-\lambda_{ym}})^2}&\frac{\sigma^2_{m}}{({1-\lambda_{ym}})^2}+\frac{\sigma^2_{y}\lambda_{ym}}{({1-\lambda_{ym}})^2}\\\end{bmatrix}}$.

We know this has a solution because the rank (${B^{-1}*\Gamma}$ appears to have linearly independent rows unless we are struck with a coincidental multicolinearity) and order conditions (there are 2 equations and 1 variable is excluded, so 2-1=1, identification occurs) are satisfied. Here’s a good article on rank and order conditions, which are a pain in the neck. Ultimately, you should probably reduce the system and estimate it to a scale anyway, if some equations are identified.

We take logs of ${\mathcal{L}}$, differentiate it with respect to: ${\sigma^2_u,\sigma^2_y,\lambda_{ym},\pi }$, and set the derivatives equal to zero to estimate (and the second derivatives must be negative!).

In either case, if we know ${\gamma_{my}=\lambda_{my}}$ is equal to some number, we then estimate ${\gamma_{ym}=\frac{\theta_{ym}(1)}{\theta_{mm}(1)}=\lambda_{ym}}$

Assuming I haven’t made a mistake, the standard error of ${\theta_{ym}}$ could be found by one over the inverse of the second derivative of the score function wrt theta, the information matrix.

Post Script:
I’d like to send a moment to thank In Theory for developing Latex2WP, which I have recently discovered as infinitely useful to me.