Sunday, May 07, 2006

Here is a nice example showing how to sort a data frame (from r-help):

# Sort a data frame by multiple columns
d = data.frame(b=factor(c("Hi","Med","Hi","Low"),levels=c("Low","Med","Hi"),
ordered=TRUE),
x=c("A","D","A","C"),y=c(8,3,9,9),z=c(1,1,1,2))
d[order(d$b,d$z,d$y),]

See the full post here.

Sunday, February 26, 2006

The ubiquitous putah creek dust busters

Taking a break from R...

In case anyone is interested, "the ubiquitous putah creek dust busters" are a band I put together to play on a Mark Olson/Creekdippers tribute album called "When We Sing Together." The album is just being released as an internet freebie. It's on dimeadozen.org, if you're lucky enough to have an account. We covered "Valentine King," I did the guitar parts and lead vocal. My wife had her debut as a drummer, Jen Faith did the backing vocal and Tariq Dinar played bass.

Leave a comment here if you want to contact me about this. I'm also on the vic-list.

Monday, January 02, 2006

R Code Editing

I do R work on Mac, Windows and *nix. Here are my editing preferences for the platforms.

For *nix platforms (including Mac OS X) I use Emacs with ESS. Emacs should be thought of as a long term investment: time put into learning it will eventually pay off in higher productivity. Emacs does auto-indenting, syntax coloring and a million other things, including psychotherapy. I learned with the help of one of the many Emacs cheat sheets available on the web. I found one I liked, printed it and put it in a notebook. I then decided to do all my editing in Emacs. Every time I hit a point where I didn't know how to do something I wanted to do I forced myself to find out how to do it in Emacs. I then wrote down the keystrokes in my notebook. Within a few days I had almost all of my favorite keystrokes recorded in the notebook and could move around Emacs almost as quickly as I could vi.

I've tried to use Emacs on windows but haven't had great success with it. The key strokes don't work the same way, nor does it do syntax highlighting in color. If I was willing to work on it I could probably figure out the problems but I haven't really wanted to. Digging around in Windows funk is miserable. Luckily there is an editing alternative I like for Windows called Tinn-R.

Tinn-R is free and easy to use. Anyone who can handle notepad should be able to figure it out.
A feature novice programmers appreciate is the automatic bracket matching it does. When you open or close a parenthesis or a curly brace Tinn-R will highlight the matching parenthesis or brace. It's a toy editor compared to Emacs with some real limitations; I still go into Windows emacs from time to time, but I'd say it does 95% of what I need it to do, and with a minimum of fuss. Another nice feature is that it integrates with the R GUI, you can highlight a section of code for example, in the editor and send it to R for evaluation by clicking up a toolbar button.

Jedit is another option. Written in java it has the advantage of being available on all platforms.
There is R syntax package available for it and it is quite powerful. It's an IDE type editor, not too hard to figure out if you've used those before. Some people find it too slow, but I've been happy with it, for the most part. It used to crash and freeze a lot, but is better now. It is a secondary tool for me, but still a useful one when I want an IDE type interface.

Thursday, December 29, 2005

An Important Lesson on Package Building

Building a package is a topic I will cover at some time in the future. For now I will just describe a sort of bug I found in the package install system. I don't know that it's a real bug, it's mostly my own fault. The only thing R could have done better is give me a little more information on what I did wrong, rather than a cryptic parse error.

Here is one of the errors I received on running: R CMD INSTALL bpkg

...
preparing package bpkg for lazy loading
Error in .C("cme", nrows = as.integer(nrow(obj@cmdf@rmat)), cols.imat = as.integ
er(ncol(obj@cmdf@imat)), :
C entry point "cme" not in load table
...

This error seems useful, however it isn't real. There were a couple of other unrelated error messages that would pop up during the install.

It took me a while to discover the real problem, I was sourcing the files back into the module from a piece of test code I had written. For example say I had three files in my package, a.R, b.R and c.R. In order to test part of the code suppose I wrote another file called test.R, which looks like this:

# file test.R
source("a.R")
x = foo()
...

Doing this created some sort of problem which caused the R CMD INSTALL bpkg command to fail. Sourcing in the files being installed in the package from the package caused a problem.

Debugging package installs can be tricky. If you're writing a complex package I recommend using a source control system like CVS or SVN. Do frequent check-ins to make sure you can roll back to your best working version, and don't neglect to try to install your package on some regular basis. When the package install fails you don't always have helpful information to go on.

The way I had to debug my problem was rolling back my versions via CVS until I found a version that worked. I then investigated what had changed and was able to solve my problem.

Every serious developer should know something about version control. CVS is available for all the common platforms. A few hours of serious study will get anyone up to speed enough to use it effectively, if not completely.

Monday, November 28, 2005

R file chooser: file.choose

The file.choose command in R opens an OS native file selection box. It's a very convenient feature.

Instead of writing a read.table like this, for example:

my.data = read.table("/home/mingus/data/proj1.dat", header=TRUE)

You can use instead:

my.data = read.table(file.choose(), header=TRUE)

This is an indispensible feature if you're opening a lot of data files in locations of which you're not
absolutely sure.

Another good command to check out for data frames is edit.

The command

edit(my.data)

,for example, will pop up a crude but usable data editor interface. It's much more convenient for data editing than using command line tools alone.

Saturday, November 19, 2005

Complex object composition

Suppose you have an S4 object which contains other S4 classes:

setClass("a",representation(foo="numeric"),contains=c("numeric"))

setClass("b",representation(bar="numeric",a = "a"),contains=c("numeric","a"))

setClass("c",representation(baz="numeric",b = "b"),contains=c("numeric","a","b"))

# now initialize a test

x = new("a",foo=1)
y = new("b",bar=2,a=x)
z = new("c",baz=3,b=y)

The question is how do you get at the foo slot in your z object?

One way to do it is to dereference all the way down to through
the slots like so:

>z@b@a@foo
[1] 1

Another way, which I've had to use, is this:

> attr(z@b@a,"foo")
[1] 1

Saturday, November 12, 2005

Histogram with normal curve

Given a vector of numbers this function will create a histogram with a normal
curve superimposed over it.

hist.curve = function(vec){
X <- quantile(vec, probs = seq(.001, .999, length = 1000))
Y <- dnorm(X, mean = mean(vec), sd = sqrt(var(vec)))
hist(vec, probability = T)
lines(X, Y, col = "red")
}