How to Call R from Stata

When it comes to data analysis, if you’re anything like me you probably work across several different platforms. Depending on your analytical needs you might get basic descriptives from Excel, but use programs like Stata and R for more complex routines.

One of the frustrations that go with this form of data science is the need to transfer data from one program to another.

It’s straight forward to export data in .csv format, and then import the data in a different program. But you may lose some important formatting such as variable and value labels in the data set.

Programs such as Stat Transfer make it easy to convert data from one program format to another. But as with the .csv export, it takes valuable time to convert and transfer the data. And you end up with multiple copies of the same data set clutering up your machine.

Wouldn’t it be way easier if you could just call one data analysis program from inside another? As a Stata user, I’ve often wished I could perform a quick analysis in R without having to go through all of this effort.

In this article, I’ll show you a method for writing your R code, running R, feeding it data, returning R output in a text file, and returning any changes in your dataset to Stata…all while working in Stata’s native environment. I’m doing this on a PC, so Mac users will need to forgive me.

Setting up the Data

I like to create the simplest path name I can, so for testing purposes I create a Stata folder right on the C:/ drive.

clear
set more off
log close _all
cd "c:/Stata/"

Once that’s done, I create a fake data set with a known correlation structure using Stata’s -matrix- and -corr2data- commands.

set obs 100
matrix c = (1,-.5,0 \ -.5,1,.4 \ 0,.4,1)
corr2data x y z, corr(c)

Now save your test data set in the temp folder you created above

save "testout.dta"
file close _all

Writing the R Program

So much for the foreplay…now let’s have fun! Using Stata’s -file- command, we create a new file to hold the R code we want to run. This new file will be called test.R, but in Stata we’ll refer to it by the alias rcode.

file open rcode using  test.R, write replace

Next, we tell Stata that we want to write something to our new file. That something is a list of R commands that will set a new working directory (c:/stata), read in our dataset, run analyses, and return the augmented data.

Since we want to write a text file for R to run, we’ll need to enclose the commands in `” and “’ quotes (notice the combination of single and double quotes). The quote combo is necessary because we’re including quotes inside the text of the R program.
Finally, notice that we need to end each line of the text writing process with _newline, except the last line. This tells Stata to create a new line in the text file. Finally, we finish writing the R program text file by using the -file close- command.

file write rcode ///
`"setwd("c:/Stata/")"' _newline ///
	`"library(foreign)"' _newline ///
	`"data<-data.frame(read.dta("testout.dta"))"' _newline ///
	`"attach(data)"' _newline ///
	`"x2<-x*2"' _newline ///
	`"data2<-cbind(data,x2)"' _newline ///
	`"write.dta(data2,"testin.dta")"'
file close rcode

Running R from Inside Stata

Stata can invoke an operating system window (i.e. a command prompt) using the -shell-, or alternatively the ! command. All you need to do is provide Stata with the complete path and filename of the program you want to run. Adding the code CMD BATCH tells Windows to run R in batch mode. Finally, we run the R script by telling R to execute the contents of test.R.

shell "C:\Program Files\R\R-2.15.1\bin\x64\R.exe" CMD BATCH test.R

Now we can read the output file from R back into Stata and summarize the changes to the dataset. I also clean up the directory by removing unneeded files using the -rm- command.

use testin.dta, clear
summarize
rm testout.dta
rm test.R
rm .RData

I leave the test.Rout file so we can see the log from R, including output and the run-time log.

Now you’ve got your original data (plus an extra variable) back in Stata, and you have a log of the R results from your script. If you want to use this example as a template to start calling R from Stata for your own analyses, I’m including the complete script and comments in the code box below (note: I added some -quietly- commands to keep your Stata log window a bit cleaner).

Let me know what you think about this, and happy coding!!

*// Set Working Directory
clear
set more off
log close _all
cd "c:/Stata/"

*// Create Data
set obs 100
matrix c = (1,-.5,0 \ -.5,1,.4 \ 0,.4,1)
corr2data x y z, corr(c)

*// Export in CSV format
quietly: save "testout.dta"
quietly: file close _all

*// Write R Code
*// dependencies: foreign
quietly: file open rcode using  test.R, write replace
quietly: file write rcode ///
	`"setwd("c:/Stata/")"' _newline ///
	`"library(foreign)"' _newline ///
	`"data<-data.frame(read.dta("testout.dta"))"' _newline ///
	`"attach(data)"' _newline ///
	`"x2<-x*2"' _newline ///
	`"data2<-cbind(data,x2)"' _newline ///
	`"write.dta(data2,"testin.dta")"'
quietly: file close rcode

*// Run R
quietly: shell "C:\Program Files\R\R-2.15.1\bin\x64\R.exe" CMD BATCH test.R

*// Read Revised Data Back to Stata
quietly: use testin.dta, clear
summarize

*// Clean up
rm testout.dta
rm test.R
rm .RData
Advertisements

7 thoughts on “How to Call R from Stata

  1. An interesting application of this could be to cross-validate models by running them through both their R and Stata estimators, with matching parameters. I have done this manually to detect differences in model defaults between gllamm and xtmelogit in Stata, and glmer in R, but I never took the time to chain the code as your technique makes possible.

    • You make a great point Statauser. It looks like this has been recently added to the SSC archive. Thank you for sharing. While this certainly makes it rather simple to run R code in Stata, users such as myself may want to have all of their code for a project (both Stata and R) in a single file. Do you know if -rsource- will run inline code from a Stata .do file? Or will it only run R code written in an external file? If it requires an external file, then users may still want to use the method I describe here to allow them to embed the R code in the Stata .do file.

  2. Is there a way to pass the contents of a stata macro (say, a variable list, or a matrix of parameter estimates) to R using this method? I’ve been trying and have been unsuccessful.

  3. Hi!

    This is a really useful thing to be able to do.

    I did find an issue with it though – $ to refer to a particular column from the stata dataframe has to be escaped, otherwise it replaces it with it a . . This could be got around using attach of course.

    Another thing, although I guess this is more due to Hmisc:::stata.get, is that R changes the column names from e.g. date_of_birth to date.of.birth

    Cheers!

  4. Hello, thanks a lot for posting this, I need to implement similar procedure, so i wanted to start with trying to run your codes but it seems not to produce testin.dta, and i get the errror “file testin.dta not found”, what i might be doing wrong?

Agree? Disagree? Tell Me What You Think

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s