DTVEM Package Illustration

Loading in the Dataset

This code will illustrate the R package (DTVEM) with simulated data available in the DTVEM package.

Click here to download and install the DTVEM package.

First load the DTVEM package.

library(DTVEM)

Next load the simulated data included in the DTVEM package, called exampledat1.

data(exampledat1)

Get a look at the file structure.

head(exampledat1)
##   Time         X ID
## 1    1 -1.076422  1
## 2    2 -1.904713  1
## 3    3  1.307456  1
## 4    4        NA  1
## 5    5 -4.224493  1
## 6    6        NA  1

Illustrating the Inadequacies of the Partial Autocorrelation Function

Next we want to explore lags with typical time series techniques. Run a partial autocorrelation function for the variable for each ID.

#DEFINE THE NUMBER OF UNIQUE IDs
uniqueIDs=unique(exampledat1$ID)
pacfmatrix=matrix(NA,nrow=length(uniqueIDs),ncol=10) #SAVE THE PACF RESULTS FOR THE FIRST 10 lags
#CREATE A FOR LOOP TO RUN OVER EACH ID
for(i in 1:length(uniqueIDs)){
  temporarydata=exampledat1[exampledat1$ID==uniqueIDs[i],] #EXTRACT THE DATA FOR A GIVEN INDIVIDUAL
  pacfout=pacf(temporarydata[,"X"],na.action = na.pass,plot=FALSE) #RUN THE PARTIAL AUTOCORRELATION FUNCTION WITHIN R, IGNORE THE MISSING DATA
  pacfmatrix[i,]=pacfout$acf[1:10]
}

Okay, we have the results, now we have to plot them. Each grey line will represent a PACF estimate for a given individual. The black lines will represent the mean PACF estimate.

plot(1:10,pacfmatrix[1,],xlab="Lag",ylab="PACF Estimate",type="l",col="grey",ylim=c(-1,1))
#FOR EACH UNIQUE ID
for(i in 1:length(uniqueIDs)){
  lines(1:10,pacfmatrix[i,],col="grey")
}
lines(1:10,colMeans(pacfmatrix,na.rm = TRUE)) #PLOT THE AVERAGE LAGGED ESTIMTES
poscrit=2/sqrt(14-1)
negcrit=-2/sqrt(14-1)
lines(1:10,rep(poscrit,10),lty=2) #PLOT POSITIVE CRITICAL VALUE
lines(1:10,rep(negcrit,10),lty=2) #PLOT NEGATIVE CRITICAL VALUE

The results look like for any given individual the PACF results are highly unreliable. However, the PACF does suggest that there are potentially significantly negative lags at lag 6, 7, 8, 9, and 10.

Showing the strength of DTVEM

Let’s see what DTVEM has to say about this.

out=LAG("X",differentialtimevaryingpredictors=c("X"),outcome=c("X"),data=exampledat1,ID="ID",Time="Time",k=9,standardized=FALSE,predictionstart = 1,predictionsend = 10,predictionsinterval = 1)
## OpenMx version: 2.8.3 [GIT v2.8.3]
## R version: R version 3.4.4 (2018-03-15)
## Platform: x86_64-w64-mingw32
## Default optimiser: CSOLNP
## NPSOL-enabled?: Yes
## OpenMP-enabled?: No
## Setting up the data for variable #1 of 1.
## Completed stage 1 of DTVEM, now identifying the significant peaks/valleys.
## DTVEM stage 1 identified the following variables as potential peaks or valleys: XlagonXlag1, XlagonXlag2, XlagonXlag3, XlagonXlag4 
## Removing group-based time of residuals
## 
## Begin fit attempt 1 of at maximum 11 tries
## 
##  Lowest minimum so far:  3274.70348452396
## 
## Solution found
## 
## Start values from best fit:
## 0.225059455018684,-0.237669113091979,0.275143492004439,0.0240822052277956,2.08939869379243
## Summary of multisujbject 
##  
## free parameters:
##          name matrix row col    Estimate  Std.Error A
## 1 XlagonXlag1   s1.A   1   1  0.22505946 0.05061653  
## 2 XlagonXlag2   s1.A   1   2 -0.23766911 0.04816207  
## 3 XlagonXlag3   s1.A   1   3  0.27514349 0.04666714  
## 4 XlagonXlag4   s1.A   1   4  0.02408221 0.05312983  
## 5      Xresid   s1.Q   1   1  2.08939869 0.10551147  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              5                    895              3274.703
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 1400/900
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:       1484.703               3284.703                       NA
## BIC:      -3208.880               3310.925                 3295.041
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2018-08-18 10:58:28 
## Wall clock time: 7.172824 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.8.3 
## Need help?  See help(mxSummary) 
## 
## [1] "Variables that should be controlled for are: XlagonXlag1"
## [2] "Variables that should be controlled for are: XlagonXlag2"
## [3] "Variables that should be controlled for are: XlagonXlag3"

## Completed stage 1 of DTVEM, now identifying the significant peaks/valleys.
## Completed stage 2 of DTVEM (wide format). Now passing to Stage 1.5. Controlling for the following lags in the varying-coefficient model:  XlagonXlag1, XlagonXlag2, XlagonXlag3
## Removing group-based time of residuals
## 
## Begin fit attempt 1 of at maximum 11 tries
## 
##  Lowest minimum so far:  3274.90917415861
## 
## Solution found
## 
## Start values from best fit:
## 0.236008799391563,-0.248531920453681,0.2835748499677,2.08271335137451

## Summary of multisujbject 
##  
## free parameters:
##          name matrix row col   Estimate  Std.Error A
## 1 XlagonXlag1   s1.A   1   1  0.2360088 0.04461888  
## 2 XlagonXlag2   s1.A   1   2 -0.2485319 0.04220480  
## 3 XlagonXlag3   s1.A   1   3  0.2835748 0.04300307  
## 4      Xresid   s1.Q   1   1  2.0827134 0.10448100  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              4                    896              3274.909
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 1400/900
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:       1482.909               3282.909                       NA
## BIC:      -3215.919               3303.886                  3291.18
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2018-08-18 10:58:41 
## Wall clock time: 9.826756 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.8.3 
## Need help?  See help(mxSummary) 
## 
## [1] "Variables that should be controlled for are: XlagonXlag1"
## [2] "Variables that should be controlled for are: XlagonXlag2"
## [3] "Variables that should be controlled for are: XlagonXlag3"
## This is the #2 out of a possible 10 passes 
## Completed loop between 1.5 and 2
## DTVEM analysis completed,congradulations!
## Contact the author - Nick Jacobson: njacobson88@gmail.com if you have any questions.
## Summary of multisujbject 
##  
## free parameters:
##          name matrix row col   Estimate  Std.Error A
## 1 XlagonXlag1   s1.A   1   1  0.2360088 0.04461888  
## 2 XlagonXlag2   s1.A   1   2 -0.2485319 0.04220480  
## 3 XlagonXlag3   s1.A   1   3  0.2835748 0.04300307  
## 4      Xresid   s1.Q   1   1  2.0827134 0.10448100  
## 
## Model Statistics: 
##                |  Parameters  |  Degrees of Freedom  |  Fit (-2lnL units)
##        Model:              4                    896              3274.909
##    Saturated:             NA                     NA                    NA
## Independence:             NA                     NA                    NA
## Number of observations/statistics: 1400/900
## 
## Information Criteria: 
##       |  df Penalty  |  Parameters Penalty  |  Sample-Size Adjusted
## AIC:       1482.909               3282.909                       NA
## BIC:      -3215.919               3303.886                  3291.18
## CFI: NA 
## TLI: 1   (also known as NNFI) 
## RMSEA:  0  [95% CI (NA, NA)]
## Prob(RMSEA <= 0.05): NA
## To get additional fit indices, see help(mxRefModels)
## timestamp: 2018-08-18 10:58:41 
## Wall clock time: 9.826756 secs 
## optimizer:  CSOLNP 
## OpenMx version number: 2.8.3 
## Need help?  See help(mxSummary) 
## 
##          name matrix row col   Estimate  Std.Error lbound ubound lboundMet
## 1 XlagonXlag1   s1.A   1   1  0.2360088 0.04461888     NA     NA     FALSE
## 2 XlagonXlag2   s1.A   1   2 -0.2485319 0.04220480     NA     NA     FALSE
## 3 XlagonXlag3   s1.A   1   3  0.2835748 0.04300307     NA     NA     FALSE
##   uboundMet     tstat       pvalue  sig
## 1     FALSE  5.289438 1.226928e-07 TRUE
## 2     FALSE -5.888712 3.892167e-09 TRUE
## 3     FALSE  6.594293 4.272871e-11 TRUE

Unlike the PACF, DTVEM identifies the correct lag structure, which is based on true positive lags at lag 1, true negative lags at lag 2, and true positive lags at lag 3.

Let’s plot the 3-Dimensional 1st phase of the results as an illustration.

library(mgcv) #MGCV is used to plot the vary-coefficient model in stage 1.
## Loading required package: nlme
## This is mgcv 1.8-23. For overview type 'help("mgcv-package")'.
vis.gam(out$stage1out$mod,xlab="Time Differences",ylab="X lagged",zlab="Estimate",theta=-30,ticktype="detailed") #theta is used to rotate the output, 

This clearly shows a positive lag at 1, a negative lag at 2, and a positive lag at 3.

Okay, the results of the final state-space model were:

out$OpenMxstage2out$OpenMxout
##          name matrix row col   Estimate  Std.Error lbound ubound lboundMet
## 1 XlagonXlag1   s1.A   1   1  0.2360088 0.04461888     NA     NA     FALSE
## 2 XlagonXlag2   s1.A   1   2 -0.2485319 0.04220480     NA     NA     FALSE
## 3 XlagonXlag3   s1.A   1   3  0.2835748 0.04300307     NA     NA     FALSE
##   uboundMet     tstat       pvalue  sig
## 1     FALSE  5.289438 1.226928e-07 TRUE
## 2     FALSE -5.888712 3.892167e-09 TRUE
## 3     FALSE  6.594293 4.272871e-11 TRUE

So, X predicts significantly itself positively 1 lag later, negatively 2 hours later, and positively again 3 hours later.

Related