R

Aggie Econ Club R Presentation

"How to become bad at coding

But better than 99.9% of the population"

Or:

                                        Become the Master of your Technology!

"It is the allowing machines to be our masters and not our servants that so injures the beauty of life nowadays..."   William Morris, 1888 

“Have nothing in your house that you do not know to be useful, or believe to be beautiful.”  

William Morris

http://humanetech.com/      The Time Well Spent Movement

A few brief remarks:

Teach yourself programming in 10 years, not 21 days 

by Peter Norvig (Director of Research at Google)

RULE 1: Use the exact capitalization, spelling, and punctuation!!!

RULE 2: a. Copy and paste long sections of code

                b. Type short ones

                c. Edit the previous one if it is similar

                d. Anything with a # sign you don't have to type- this symbol is for comments

Pick your Poison:

In the future, go directly to https://cloud.r-project.org/ 

You should also check out RStudio 

Learn R Free within R! Intro to the "swirl"package (video introduction)

Text file with code for today:  code.txt

Some other code to try: R plays "Let it Snow"

----------------------------

The code:

#A) Messing Around

#1

  seq(1,10)

  sum(seq(1,10))

#2

  sum(seq(1,1000000))

  sum(as.numeric(seq(1,1000000)))

#3

  mydata<-seq(1,1000, by=2)

  head(mydata)

  mydata[25]

  summary(mydata)

#B) A "FOR" Loop

#Let's repeat ourselves!  A basic "For Loop"

 #4

for(i in 1:15) {

 print("Aggie Pride!")

}

#A MORE INVOLVED LOOP!

#Let's take the 20th-30th items from the our list of numbers 1 to 100 and square them, one at a time, 

#save it to a new data item, and also print them to the screen.

#Initialize variable `my.sq` and run a repeating loop

#5 

 my.sq <- 0

 for(i in 20:30) {

  #put i-th element of `mydata` squared into `i minus 19th` position of `my.sq`

  my.sq[i-19] <- mydata[i]^2

  print(c(i,mydata[i],my.sq[i-19]))

 }

#6 Let's look at the results again

 my.sq

#-------------------------------------

#C) Make a list of all numbers from 1 to 1,000 not divisible by 2,3,4,5, 6 or 7

#%% is the mod or modulo operator.

#7

 mydata2<-seq(1,1000)  

 mylist<-mydata[which(mydata2%%2!=0)]

#8

 mylist<-mylist[which(mylist%%3!=0)]

#9

 mylist<-mylist[which(mylist%%5!=0)]

#10

 mylist<-mylist[which(mylist%%7!=0)]

#11

 mylist

#12

 length(mylist)

#D) Let's look at some data from the Titanic shipwreck

#13   install titanic package

  install.packages("titanic")

#14

#load the package into memory for use

 library(titanic)

#15

#make a copy of the main dataset to play with

 data1<-titanic_train

#16

 summary(data1)

#17

 table(data1$Survived)

 attach(data1)

 table(Survived)

#18

 prop.table(table(Survived))

#19

 prop.table(table(Survived,Sex))

 prop.table(table(Survived,Sex),1)

#20

#tell R that some variables should be treated as categories (factors), making a copy

 Survived1<-as.factor(Survived)

 Pclass1<-as.factor(Pclass)

 Sex1<-as.factor(Sex)

#21 A regression, so I don’t get hounded by people

 summary(lm(Fare~Age+Pclass1+Survived1)) 

 

#22

#make some interesting graphs

 plot(Pclass1,Survived1)

 plot(Sex1,Survived1)

 boxplot(Age)

 boxplot(Age~Pclass1)

#23 two ways of getting average fare by ticket class

 aggregate(Fare,by=list(Pclass1),FUN=mean)

 tapply(Fare, Pclass1, mean)

#24 I don't like the output of the default "summary" command. Let's program a new one!

 sumstats= function(y){

  nums<-sapply(y, is.numeric)

  sumst=sapply(y[,nums], function(x){ sumstat=c(mean(x,na.rm=TRUE),median(x,na.rm=TRUE),sd(x,na.rm=TRUE),min(x,na.rm=TRUE),max(x,na.rm=TRUE))

  names(sumstat)=c("Mean","Median","SD","Min","Max")

  sumstat})

  aperm(sumst)

 }

#25

 sumstats(data1)

#26

 by(data1,Pclass1,sumstats)

#27 A few more graphs

 hist(Fare)

 hist(Fare, breaks=100)

 hist(Fare, breaks=100, xlim=c(0,100))

 plot(Age~Fare)

 plot(Age~Fare,pch=16)

 plot(Age~Fare, col=Pclass,pch=16)

 legend(400,70,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16)

#28 another nice graph...

 plot(Age~data1$Fare, col=Pclass,pch=1+15*Survived,xlim=c(-100,500))

 legend(400,80,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16, title="Survived")

 legend(400,60,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=1,title="Perished")

#29 Now let's see who these people in the graph are!

 identify(Age, Fare, labels=Name, tolerance=1)

 identify(data1$Age,data1$Fare,labels=data1$Name,tolerance=1)

#30  Last trick for the day: You can read your own data into R in lots of ways.  On Windows, the easiest way is to copy your data to the clipboard (select data, and hit edit, copy or control-c), then in R type:

 mydata.b=read.delim("clipboard")

#now you can analyze data in R! Note: This is not considered a "professional" way to get your data into R, but it works!