R
Aggie Econ Club R Presentation
"How to become bad at coding
But better than 99.9% of the population"
Or:
Become the Master of your Technology!
"It is the allowing machines to be our masters and not our servants that so injures the beauty of life nowadays..." William Morris, 1888
“Have nothing in your house that you do not know to be useful, or believe to be beautiful.”
William Morris
http://humanetech.com/ The Time Well Spent Movement
A few brief remarks:
I can't teach you "Coding" or Programming in 30 minutes, so I am not going to try.
Teach yourself programming in 10 years, not 21 days
by Peter Norvig (Director of Research at Google)
What I am going to try to do is introduce you to the idea of coding, show you some things that R (and many other languages) can do, and make you want to learn more!
RULE 1: Use the exact capitalization, spelling, and punctuation!!!
RULE 2: a. Copy and paste long sections of code
b. Type short ones
c. Edit the previous one if it is similar
d. Anything with a # sign you don't have to type- this symbol is for comments
Pick your Poison:
Download R 3.4.4 for Windows (62 MB)
R Portable for Windows (Extract to USB Drive and run on any PC from your own drive!)
In the future, go directly to https://cloud.r-project.org/
You should also check out RStudio
Learn R Free within R! Intro to the "swirl"package (video introduction)
Text file with code for today: code.txt
Some other code to try: R plays "Let it Snow"
----------------------------
The code:
#A) Messing Around
#1
seq(1,10)
sum(seq(1,10))
#2
sum(seq(1,1000000))
sum(as.numeric(seq(1,1000000)))
#3
mydata<-seq(1,1000, by=2)
head(mydata)
mydata[25]
summary(mydata)
#B) A "FOR" Loop
#Let's repeat ourselves! A basic "For Loop"
#4
for(i in 1:15) {
print("Aggie Pride!")
}
#A MORE INVOLVED LOOP!
#Let's take the 20th-30th items from the our list of numbers 1 to 100 and square them, one at a time,
#save it to a new data item, and also print them to the screen.
#Initialize variable `my.sq` and run a repeating loop
#5
my.sq <- 0
for(i in 20:30) {
#put i-th element of `mydata` squared into `i minus 19th` position of `my.sq`
my.sq[i-19] <- mydata[i]^2
print(c(i,mydata[i],my.sq[i-19]))
}
#6 Let's look at the results again
my.sq
#-------------------------------------
#C) Make a list of all numbers from 1 to 1,000 not divisible by 2,3,4,5, 6 or 7
#%% is the mod or modulo operator.
#7
mydata2<-seq(1,1000)
mylist<-mydata[which(mydata2%%2!=0)]
#8
mylist<-mylist[which(mylist%%3!=0)]
#9
mylist<-mylist[which(mylist%%5!=0)]
#10
mylist<-mylist[which(mylist%%7!=0)]
#11
mylist
#12
length(mylist)
#D) Let's look at some data from the Titanic shipwreck
#13 install titanic package
install.packages("titanic")
#14
#load the package into memory for use
library(titanic)
#15
#make a copy of the main dataset to play with
data1<-titanic_train
#16
summary(data1)
#17
table(data1$Survived)
attach(data1)
table(Survived)
#18
prop.table(table(Survived))
#19
prop.table(table(Survived,Sex))
prop.table(table(Survived,Sex),1)
#20
#tell R that some variables should be treated as categories (factors), making a copy
Survived1<-as.factor(Survived)
Pclass1<-as.factor(Pclass)
Sex1<-as.factor(Sex)
#21 A regression, so I don’t get hounded by people
summary(lm(Fare~Age+Pclass1+Survived1))
#22
#make some interesting graphs
plot(Pclass1,Survived1)
plot(Sex1,Survived1)
boxplot(Age)
boxplot(Age~Pclass1)
#23 two ways of getting average fare by ticket class
aggregate(Fare,by=list(Pclass1),FUN=mean)
tapply(Fare, Pclass1, mean)
#24 I don't like the output of the default "summary" command. Let's program a new one!
sumstats= function(y){
nums<-sapply(y, is.numeric)
sumst=sapply(y[,nums], function(x){ sumstat=c(mean(x,na.rm=TRUE),median(x,na.rm=TRUE),sd(x,na.rm=TRUE),min(x,na.rm=TRUE),max(x,na.rm=TRUE))
names(sumstat)=c("Mean","Median","SD","Min","Max")
sumstat})
aperm(sumst)
}
#25
sumstats(data1)
#26
by(data1,Pclass1,sumstats)
#27 A few more graphs
hist(Fare)
hist(Fare, breaks=100)
hist(Fare, breaks=100, xlim=c(0,100))
plot(Age~Fare)
plot(Age~Fare,pch=16)
plot(Age~Fare, col=Pclass,pch=16)
legend(400,70,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16)
#28 another nice graph...
plot(Age~data1$Fare, col=Pclass,pch=1+15*Survived,xlim=c(-100,500))
legend(400,80,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16, title="Survived")
legend(400,60,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=1,title="Perished")
#29 Now let's see who these people in the graph are!
identify(Age, Fare, labels=Name, tolerance=1)
identify(data1$Age,data1$Fare,labels=data1$Name,tolerance=1)
#30 Last trick for the day: You can read your own data into R in lots of ways. On Windows, the easiest way is to copy your data to the clipboard (select data, and hit edit, copy or control-c), then in R type:
mydata.b=read.delim("clipboard")
#now you can analyze data in R! Note: This is not considered a "professional" way to get your data into R, but it works!