An Introduction to R for Econometrics
This webpage has accompanied an introduction to R, delivered in the TA sessions for the 'Econometrics I' course of Jesús Crespo-Cuaresma at IHS Vienna in 2007.
It remains posted as a reference but is not as up-to-date as its collection of web references.
Contents:
- Install R & an R editor (under Windows)
- Web resources on learning R
- Tutorial material - two script files with basic R commands
- Homeworks for this course and sample solutions
Installing R and the SciViews-R editor (under Windows)
Installing R
- If you are sitting in front of an Institute desktop, use this automatic script to check whether you have got R installed already
- If you don't have R installed, download it from the cran.r-project.org site. Windows users may directly download the newest "precompiled" version here.
Windows users: after the download, just double-click the file and choose the default options.
Install a GUI:
The default graphical user interface (GUI) delivered with R does not offer much in terms of functionality. I have used several independent R-GUIs over the past and now finally settled with Rstudio. In general I have tested the following GUIs, all of which are open-source:- RStudio is a handy and stable inteface for any opeerating system. The interface is reminiscent of Matlab's GUI, and offers not only coding amenities, but also a nice integration of plots, help and a view on variables in the workspace.
- The 'stable' version of Tinn-R from SciViews is a stable and lean GUI for R on Windows that I have used heavily in the past and still use. It's focus is on the script editor, with all the amenities known from other coding environments.
- JGR, which is a quite handy GUI independent of the operating system. In the past there it had some issues with stability, but it is a very nice coding environment.
- RKward is an interface for Linux-KDE that is nice to work with and stable as well.
Some useful references for starting with R
- R for Beginners by Emmanuel Paradis: The best introduction for absolute newbies to programming (also available in French)
- Econometrics in R by Grant Farnsworth is perfect for the reader having some basic experience in statistical programming.
- The R video tutorial (09:30 min) by Dan Goldstein is quite helpful
- Vincent Zoonekynd provides an introductory course in HTML
- Beginners should print out the R reference card, which really helps in getting through.
- Take a look at the R website
- The R Wiki has some very good sections for beginners
- If you are used to Matlab, you may find David Hiebeler's R-Matlab dictionary useful.
- Quick-R is another good reference, especially if you have experience in Stata or SPSS.
- Some links for more advanced R-ing are to be found on Martin Feldkircher's homepage
Tutorials on R
'Lecture' material:
- First session (2007-01-30):The TA1 R script file with introductory commands and the CSV data file used in the TA Session. (The best is to download the R script file and open it in Tinn-R.)
Topics covered: R objects and their manipulation, data input, basic matrix algebra in R, random numbers, basic regression commands.
- Second session (2007-02-14):The TA2 R script file with commands on getting data, the lm object, plotting time series and programming. Moreover, the first and second data file used in the TA Session.
(Download these files and set your working directory accordingly by the command
setwd
.) There also an additional 'data' file.
In addition to the material covered in the TA Sesssion, I included a bit more commands on plotting and time series (and many comments!). The section on programming has been expanded by many more examples which you may find useful.
Topics covered: matrix algebra, more on reading-in data and regression commands, packages, plotting charts, time series in R, programming functions (and vectorization).
Information on lecture rooms
Most tutorials will be held in 6., Stumpergasse 56, mainly in the 'EDV Raum' at ground floor.Only one lecture room (namely 'Schottenfeldgasse') is at 7., Schottenfeldgasse 29 in the third floor (ring if the office door is not open).
Homeworks in R
Requirements:
Homeworks in R have to be submitted as an R file via e-mail. This R file should work by direct execution, i.e. I should be able to type source("yourfile.R")
and get the result printed on the console or into an output file.
For the first homework, you may as well submit the necessary data file such that the routine works. For the subsequent homework, your R file should directly handle the data set given to you.
Homeworks will be graded along four dimensions:
- Correctness/Completeness: Your routine should deliver all requested results, and they should be correct.
- Usability: Upon typing
source("yourfile.R")
the results should be presented to the user somehow (e.g. printed on the console or into a file). Moreover the code should be flexible enough to be easily adjusted for other data, etc. - Efficiency/Elegancy: Your code should be computing as quickly as possible, and should be structured such that its components may be easily split. Elegant coding (i.e. few lines for a complex solution) is awarded extra points.
- Readability: The comments in and the structure of your code should be good enough that it may be understood by someone 'speaking R'.
Homework sample solutions:
- Homework sheet 1, no R exercises.
- Homework sheet 2, exercise 2: The data file in tab-separated format and a
sample .R script exemplify the solution to the exercise.
To examine it, download the .R file and the data file, and type in Rsetwd("path to which you downloaded the files")
and thensource("es2ex2.R")
.
- Homework sheet 3, exercise 3: The data file in CSV format, simply stored out of the original Excel file.
This data contains many non-availables (NAs), and there are, in principle, two methods to adress this problem:
- Cleaning the sample regressions from NAs individually:
This approach was chosen by all of the students. A commented sample solution outlining this approach can be found in SalaIMartin_es3_3.R.
To examine it, download it and the the data file and execute it via thesource()
command as above. Execution takes about 50 seconds on a Pentium M.
- Cleaning the data first, and then do the individual regressions:
This is the 'more right' approach, in my opinion. Moreover it lends itself to the use of the Frisch-Waugh theorem for the fixed regressors - which in turn enables the entire simulation to run in a few seconds. However, there are only 32 countries which are clean of NAs. So the data basis is quite weak.
Sample code for this approach is in the file SalaIMartinFrischWaugh_es3_3.R. It is quite similar to the solution above.
- Cleaning the sample regressions from NAs individually: