Expressing Pi in Your Favorite Statistical Software

Pi_Warhol_sTo celebrate Pi Day, and provide some (hopefully) useful knowledge, I’ll show you how to represent Pi in your favorite statistical packages.

If your favorite isn’t on the list, I’m sorry…I can only do so much.

One thing to keep in mind about these examples is that most software packages use floating point arithmetic (FPA).

I won’t get into exactly what this means in this post. Just know that FPA will generally result in some rounding errors with highly precise numbers (i.e. lots of decimal places). However, below 16 decimal places, you can be reasonably assured that these packages return the same values.

Excel

=PI()

Note this function does not have any arguments. The value returned is accurate to 14 decimal places.

R

>pi

This returns pi to 6 decimal places. If you need more precision, you can get up to 15 decimal places with the following code (the integer 3, is the 16th digit):

>options(digits=16)
>pi

The digits option can go as high as 22, but the default R algorithm is only accurate up to 15 decimal places (see http://www.joyofpi.com/pi.html).

For greater precision, I recommend using the Rmpfr package. I set it to 256-bit precision, and achieved accuracy up to 75 decimal places.

Stata

. di c(pi)
or
. di _pi

As with R, the default precision is 6 decimal places. If you need to increase the precision, you can format the constant for up to 16 decimal places.

.di %19.0g _pi

SAS

I know less about the nuances of representing Pi in SAS. But my research in the SAS documentation suggests that pi can be stored with precision above 16 decimal places.

The basic code is:

data _null_;
  pi=constant('pi');
  put pi=;
run;

SPSS

This may be the worst package to use for representing pi, as IBM still has not included pi as a system constant in the program. Instead, we get to make use of our knowledge in trigonometry (did you just cringe? I did.)…

If you dig back far enough in your memory, you might recall that the tangent of (pi/4) =1. Using the inverse tangent function (the arctangent), you can create a variable to represent pi:

compute  pi = 4*ARTAN(1).

Hope you find this interesting and useful…Happy Pi Day!

Simulating Data with A Known Correlation Structure in Stata

Monte Carlo simulations are most commonly used to understand the properties of a particular statistic such as the mean, or an estimator like maximum likelihood (ML) regression methods.

The principal is straight forward. Create a data set with a known correlation or covariance structure. Then add in some random error, and estimate your statistic or model.

Replicate this process 1,000 or 10,000 times – collecting the relevant information from each trial – and you’ll have a nice sampling distribution with which to evaluate the properties of your model or statistic.

The replication can be accomplished easily enough with a -forvalues- loop.

In this article, you’ll find out how to accomplish the other part of the task: creating a data set with a known correlation structure.
Continue reading

Taking Stata for a Loop: -forvalues-

It doesn’t take long for new analysts to learn that copying and pasting code really speeds up the time needed to complete any job. This seems to be especially true when you need to create groups of new variables, or when performing the same transformation to a set of fields.

The reality is that copying and pasting code in these instances is actually the long way of accomplishing a task. Sure, the code will be easy to read. But you could complete the same tasks in a fraction of the time.

Fortunately, Stata has a set of built-in tools to make this process easier.

This article will show you how to use the -forvalues- command in Stata in order to automate repetitive tasks. Learning how to use this tool will help make your data analysis code cleaner, shorter, and faster to write.
Continue reading

How to Preserve Missing Values with Stata’s Collapse Command

You are a code-writing machine.

That 3-day project you started this morning might actually be completed by the end of the day.

As your fingers fly across the keyboard, you think you can hear Stata singing your praise softly in the background.

Then IT happens…

Your programs stops working right. The data begin looking like something from one of Lord Voldemort’s nightmares.

Your finely-tuned debugging skills kick in, and you track down the problem. That -collapse- command you issued a while back did something rather odd. It replaced all of the missing values in your data set with zeros!

But that’s not at all what you wanted! You wanted those to be missing values, not zeros.

Yep, we’ve all been there. Even the most seasoned Stata users get bit by this quirk every once in a while.

In this article, I show three ways Stata can treat missing values when using the -collapse- command and the sum() function.
Continue reading

How to Call R from Stata

When it comes to data analysis, if you’re anything like me you probably work across several different platforms. Depending on your analytical needs you might get basic descriptives from Excel, but use programs like Stata and R for more complex routines.

One of the frustrations that go with this form of data science is the need to transfer data from one program to another.

It’s straight forward to export data in .csv format, and then import the data in a different program. But you may lose some important formatting such as variable and value labels in the data set.

Programs such as Stat Transfer make it easy to convert data from one program format to another. But as with the .csv export, it takes valuable time to convert and transfer the data. And you end up with multiple copies of the same data set clutering up your machine.

Wouldn’t it be way easier if you could just call one data analysis program from inside another? As a Stata user, I’ve often wished I could perform a quick analysis in R without having to go through all of this effort.

In this article, I’ll show you a method for writing your R code, running R, feeding it data, returning R output in a text file, and returning any changes in your dataset to Stata…all while working in Stata’s native environment. I’m doing this on a PC, so Mac users will need to forgive me.
Continue reading

Adding Code Snippets to Your WordPress Posts

From time to time, I’ll be including code snippets of various programming languages in my posts. And I thought you might find it interesting to know how these are being created.

There are a few different methods for creating and highlighting code snippets. We can refer to them as the <code> method, the <pre> method, and the plugin method.

And in fact, I used some special codes to write the <code> and <pre> in the previous sentence…but we’ll get to that in just a minute. Let’s start with the main methods for introducing code snippets.

Continue reading

Data Analytics and the Three-Headed Monster

three headed dragonA recent theme in the blogosphere centers on how newcomers can get into the field of data science and statistical analysis. What are the necessary qualifications? And how can you go about getting those skills?

Unfortunately, the answers to these questions seem to present a quandary that was eloquently summed up by a comment I read on another blog I seem to have forgotten (perhaps it was Chandoo):

You need experience to get a job as an analyst. But the only way to get experience is to work in a job as an analyst.

Employers today are asking for more from all of their employees. And data analysts are no exception. In fact, the pressure to produce more with less is pushing many employers to merge business functions across smaller workforces.

For the data scientist and more importantly the aspiring marketing researcher or business intelligence analyst there is a three-headed monster to contend with. Each head represents a different role that you will need to fulfill in your career.
Continue reading