When it comes to data analysis, if you’re anything like me you probably work across several different platforms. Depending on your analytical needs you might get basic descriptives from Excel, but use programs like Stata and R for more complex routines.
One of the frustrations that go with this form of data science is the need to transfer data from one program to another.
It’s straight forward to export data in .csv format, and then import the data in a different program. But you may lose some important formatting such as variable and value labels in the data set.
Programs such as Stat Transfer make it easy to convert data from one program format to another. But as with the .csv export, it takes valuable time to convert and transfer the data. And you end up with multiple copies of the same data set clutering up your machine.
Wouldn’t it be way easier if you could just call one data analysis program from inside another? As a Stata user, I’ve often wished I could perform a quick analysis in R without having to go through all of this effort.
In this article, I’ll show you a method for writing your R code, running R, feeding it data, returning R output in a text file, and returning any changes in your dataset to Stata…all while working in Stata’s native environment. I’m doing this on a PC, so Mac users will need to forgive me.
Setting up the Data
I like to create the simplest path name I can, so for testing purposes I create a Stata folder right on the C:/ drive.
clear set more off log close _all cd "c:/Stata/"
Once that’s done, I create a fake data set with a known correlation structure using Stata’s
set obs 100 matrix c = (1,-.5,0 \ -.5,1,.4 \ 0,.4,1) corr2data x y z, corr(c)
Now save your test data set in the temp folder you created above
save "testout.dta" file close _all
Writing the R Program
So much for the foreplay…now let’s have fun! Using Stata’s
-file- command, we create a new file to hold the R code we want to run. This new file will be called
test.R, but in Stata we’ll refer to it by the alias
file open rcode using test.R, write replace
Next, we tell Stata that we want to write something to our new file. That something is a list of R commands that will set a new working directory (
c:/stata), read in our dataset, run analyses, and return the augmented data.
Since we want to write a text file for R to run, we’ll need to enclose the commands in `” and “’ quotes (notice the combination of single and double quotes). The quote combo is necessary because we’re including quotes inside the text of the R program.
Finally, notice that we need to end each line of the text writing process with
_newline, except the last line. This tells Stata to create a new line in the text file. Finally, we finish writing the R program text file by using the
-file close- command.
file write rcode /// `"setwd("c:/Stata/")"' _newline /// `"library(foreign)"' _newline /// `"data<-data.frame(read.dta("testout.dta"))"' _newline /// `"attach(data)"' _newline /// `"x2<-x*2"' _newline /// `"data2<-cbind(data,x2)"' _newline /// `"write.dta(data2,"testin.dta")"' file close rcode
Running R from Inside Stata
Stata can invoke an operating system window (i.e. a command prompt) using the
-shell-, or alternatively the
! command. All you need to do is provide Stata with the complete path and filename of the program you want to run. Adding the code
CMD BATCH tells Windows to run R in batch mode. Finally, we run the R script by telling R to execute the contents of
shell "C:\Program Files\R\R-2.15.1\bin\x64\R.exe" CMD BATCH test.R
Now we can read the output file from R back into Stata and summarize the changes to the dataset. I also clean up the directory by removing unneeded files using the
use testin.dta, clear summarize rm testout.dta rm test.R rm .RData
I leave the
test.Rout file so we can see the log from R, including output and the run-time log.
Now you’ve got your original data (plus an extra variable) back in Stata, and you have a log of the R results from your script. If you want to use this example as a template to start calling R from Stata for your own analyses, I’m including the complete script and comments in the code box below (note: I added some
-quietly- commands to keep your Stata log window a bit cleaner).
Let me know what you think about this, and happy coding!!
*// Set Working Directory clear set more off log close _all cd "c:/Stata/" *// Create Data set obs 100 matrix c = (1,-.5,0 \ -.5,1,.4 \ 0,.4,1) corr2data x y z, corr(c) *// Export in CSV format quietly: save "testout.dta" quietly: file close _all *// Write R Code *// dependencies: foreign quietly: file open rcode using test.R, write replace quietly: file write rcode /// `"setwd("c:/Stata/")"' _newline /// `"library(foreign)"' _newline /// `"data<-data.frame(read.dta("testout.dta"))"' _newline /// `"attach(data)"' _newline /// `"x2<-x*2"' _newline /// `"data2<-cbind(data,x2)"' _newline /// `"write.dta(data2,"testin.dta")"' quietly: file close rcode *// Run R quietly: shell "C:\Program Files\R\R-2.15.1\bin\x64\R.exe" CMD BATCH test.R *// Read Revised Data Back to Stata quietly: use testin.dta, clear summarize *// Clean up rm testout.dta rm test.R rm .RData