First Foray into R

by gorthx

I was talking with [name redacted] about a side project the other day and he said “OMG you’re still using *gnuplot*?” So I figured I’d better get with the program and learn some R.

Luckily for me, Portland has an R users’ group, and they held a hackathon/workshop last week, newbies welcome. They’re a good group of people & I heartily recommend the workshop night. Special thanks to Homer for the personalized help and suggestions.

Prior to the meeting, I installed R (using the instructions here).

I wanted to graph some old running data since I had a good idea of what it was supposed to look like. Here’s how I ended up doing it. (NOTE: this is the results of me reading online tutorials, floundering around, & then asking for help from the workshop mentors – there are more efficient ways to accomplish some of these tasks.)

Excerpt of my data file (date, distance, pace, kCal):

Fire up the R shell:

Then I loaded my file:
> mydata <- scan("miles_individual.dat", sep=",", what=list(date="", distance=0, pace="0:00", kcal=0))

And messed around with these commands to see what I had:

> names(mydata)
[1] "date" "distance" "pace" "kcal"
> str(mydata)
List of 4
$ date : chr [1:110] "01-Jan-2009" "04-Jan-2009" "06-Jan-2009" "08-Jan-2009" ...
$ distance: num [1:110] 2.36 6.4 2.77 3.59 4.49 1.94 2.16 2.39 0.94 2.51 ...
$ pace : chr [1:110] "9:51" "10:45" "11:00" "10:30" ...
$ kcal : num [1:110] 200 200 200 200 200 182 203 225 89 236 ...

The mydata command shows all the values I scanned in (output’s a bit too large).

Note that mydata$date is “chr”, which I’m guessing is “character’. Let’s convert those to actual dates:
mydata$date = strptime(mydata$date, "%d-%b-%Y")

and compare that to what I had before:

> str(mydata$date)
POSIXlt[1:110], format: "2009-01-01" "2009-01-04" "2009-01-06" "2009-01-08" ...

Unfortunately, using scan the way I did [1] left me with my values in a list (which is, of course, just what I told it to do); I need to convert them to a “data frame” so I can graph them.

> mydata =

Two more steps & I’m ready to graph:
Load the ggplot2 library:
> library("ggplot2")

Prep the graph:
> ggplot(mydata)

This pulled up a separate window with an empty graph.

I graphed distance first:
> ggplot(mydata, aes(x = date, y = distance)) + geom_line()

R Sample line graph

“aes” assigns “aesthetic” to the graph, telling it how you want it to look. I assigned date to the x-axis and distance to the y, and then specified a line to join the points. (Try leaving off the geom_line() specification and you’ll get a “No layers in plot” error.)

Then I tried a scatter representation:
> ggplot(mydata, aes(x = date, y = distance)) + geom_point()

Then, at Homer’s suggestion, I got fancy with it. I added a point to the graph indicating kcal, and make the size of that point reflect the value of the kcal [2]:
> ggplot(mydata, aes(x = date, y = distance)) + geom_line() + geom_point(aes(size = kcal))

Save the graph like so:
> dev.copy(png,"dist_plus_kcal.png")

And quit:

> q()

1 – comment: “Oh wow, I’ve never used scan that way before.” So, not the recommended method.
2 – This made me so excited.

Tags: , ,
%d bloggers like this: