R

Aggie Econ Club R Presentation

"How to become bad at coding

But better than 99.9% of the population"

Or:

Become the Master of your Technology!

"It is the allowing machines to be our masters and not our servants that so injures the beauty of life nowadays..." William Morris, 1888

“Have nothing in your house that you do not know to be useful, or believe to be beautiful.”

William Morris

http://humanetech.com/ The Time Well Spent Movement

A few brief remarks:

I can't teach you "Coding" or Programming in 30 minutes, so I am not going to try.

Teach yourself programming in 10 years, not 21 days

by Peter Norvig (Director of Research at Google)

What I am going to try to do is introduce you to the idea of coding, show you some things that R (and many other languages) can do, and make you want to learn more!

RULE 1: Use the exact capitalization, spelling, and punctuation!!!

RULE 2: a. Copy and paste long sections of code

b. Type short ones

c. Edit the previous one if it is similar

d. Anything with a # sign you don't have to type- this symbol is for comments

Pick your Poison:

- Download R 3.4.4 for Windows (62 MB)
- R Portable for Windows (Extract to USB Drive and run on any PC from your own drive!)

Download R 3.4.4 for Mac

In the future, go directly to https://cloud.r-project.org/

You should also check out RStudio

Google's R Style Guide

Learn R Free within R! Intro to the "swirl"package (video introduction)

Text file with code for today: code.txt

Some other code to try: R plays "Let it Snow"

----------------------------

The code:

#A) Messing Around

seq(1,10)

sum(seq(1,10))

sum(seq(1,1000000))

sum(as.numeric(seq(1,1000000)))

mydata<-seq(1,1000, by=2)

head(mydata)

mydata[25]

summary(mydata)

#B) A "FOR" Loop

#Let's repeat ourselves! A basic "For Loop"

for(i in 1:15) {

print("Aggie Pride!")

}

#A MORE INVOLVED LOOP!

#Let's take the 20th-30th items from the our list of numbers 1 to 100 and square them, one at a time,

#save it to a new data item, and also print them to the screen.

#Initialize variable `my.sq` and run a repeating loop

my.sq <- 0

for(i in 20:30) {

#put i-th element of `mydata` squared into `i minus 19th` position of `my.sq`

my.sq[i-19] <- mydata[i]^2

print(c(i,mydata[i],my.sq[i-19]))

}

#6 Let's look at the results again

my.sq

#-------------------------------------

#C) Make a list of all numbers from 1 to 1,000 not divisible by 2,3,4,5, 6 or 7

#%% is the mod or modulo operator.

mydata2<-seq(1,1000)

mylist<-mydata[which(mydata2%%2!=0)]

mylist<-mylist[which(mylist%%3!=0)]

mylist<-mylist[which(mylist%%5!=0)]

#10

mylist<-mylist[which(mylist%%7!=0)]

#11

mylist

#12

length(mylist)

#D) Let's look at some data from the Titanic shipwreck

#13 install titanic package

install.packages("titanic")

#14

#load the package into memory for use

library(titanic)

#15

#make a copy of the main dataset to play with

data1<-titanic_train

#16

summary(data1)

#17

table(data1$Survived)

attach(data1)

table(Survived)

#18

prop.table(table(Survived))

#19

prop.table(table(Survived,Sex))

prop.table(table(Survived,Sex),1)

#20

#tell R that some variables should be treated as categories (factors), making a copy

Survived1<-as.factor(Survived)

Pclass1<-as.factor(Pclass)

Sex1<-as.factor(Sex)

#21 A regression, so I don’t get hounded by people

summary(lm(Fare~Age+Pclass1+Survived1))

#22

#make some interesting graphs

plot(Pclass1,Survived1)

plot(Sex1,Survived1)

boxplot(Age)

boxplot(Age~Pclass1)

#23 two ways of getting average fare by ticket class

aggregate(Fare,by=list(Pclass1),FUN=mean)

tapply(Fare, Pclass1, mean)

#24 I don't like the output of the default "summary" command. Let's program a new one!

sumstats= function(y){

nums<-sapply(y, is.numeric)

sumst=sapply(y[,nums], function(x){ sumstat=c(mean(x,na.rm=TRUE),median(x,na.rm=TRUE),sd(x,na.rm=TRUE),min(x,na.rm=TRUE),max(x,na.rm=TRUE))

names(sumstat)=c("Mean","Median","SD","Min","Max")

sumstat})

aperm(sumst)

}

#25

sumstats(data1)

#26

by(data1,Pclass1,sumstats)

#27 A few more graphs

hist(Fare)

hist(Fare, breaks=100)

hist(Fare, breaks=100, xlim=c(0,100))

plot(Age~Fare)

plot(Age~Fare,pch=16)

plot(Age~Fare, col=Pclass,pch=16)

legend(400,70,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16)

#28 another nice graph...

plot(Age~data1$Fare, col=Pclass,pch=1+15*Survived,xlim=c(-100,500))

legend(400,80,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16, title="Survived")

legend(400,60,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=1,title="Perished")

#29 Now let's see who these people in the graph are!

identify(Age, Fare, labels=Name, tolerance=1)

identify(data1$Age,data1$Fare,labels=data1$Name,tolerance=1)

#30 Last trick for the day: You can read your own data into R in lots of ways. On Windows, the easiest way is to copy your data to the clipboard (select data, and hit edit, copy or control-c), then in R type:

mydata.b=read.delim("clipboard")

#now you can analyze data in R! Note: This is not considered a "professional" way to get your data into R, but it works!

Report abuse