R

Aggie Economic Association 
R Presentation #2 (October 25, 2018)

"Using RStudio with R to explore Stats!"



Step 1) Download and Install R, unless you have already. 32 bit version
In the future, go directly to https://cloud.r-project.org/ to get the latest version

Step 2) Download and Install RStudio, unless you have already. 

Step 3) Download the R Script File we will be running today: 

Step 4) Create a new RStudio Project, and Open the R Script File.

Generating Random Numbers: 


Aggie Econ Assn. R Presentation #1 (Given April 17,2018)

"How to become bad at coding
But better than 99.9% of the population"

Or:

                                        Become the Master of your Technology!
"It is the allowing machines to be our masters and not our servants that so injures the beauty of life nowadays..."   William Morris, 1888 

“Have nothing in your house that you do not know to be useful, or believe to be beautiful.”  
William Morris

A
 few brief remarks:
  • I can't teach you "Coding" or Programming in 30 minutes, so I am not going to try.
by Peter Norvig (Director of Research at Google)
  • What I am going to try to do is introduce you to the idea of coding, show you some things that R (and many other languages) can do, and make you want to learn more!
RULE 1: Use the exact capitalization, spelling, and punctuation!!!
RULE 2: a. Copy and paste long sections of code
                b. Type short ones
                c. Edit the previous one if it is similar
                d. Anything with a # sign you don't have to type- this symbol is for comments

http://humanetech.com/
      The Time Well Spent Movement
Pick your Poison:
In the future, go directly to https://cloud.r-project.org/ 

You should also check out RStudio 


Learn R Free within R! Intro to the "swirl"package (video introduction)
Text file with code for today:  code.txt
Some other code to try: R plays "Let it Snow"
----------------------------
The code:

#A) Messing Around
#1
  seq(1,10)
  sum(seq(1,10))

#2
  sum(seq(1,1000000))
  sum(as.numeric(seq(1,1000000)))

#3
  mydata<-seq(1,1000, by=2)
  head(mydata)
  mydata[25]
  summary(mydata)


#B) A "FOR" Loop
#Let's repeat ourselves!  A basic "For Loop"
 #4

for(i in 1:15) {
 print("Aggie Pride!")
}

#A MORE INVOLVED LOOP!
#Let's take the 20th-30th items from the our list of numbers 1 to 100 and square them, one at a time, 
#save it to a new data item, and also print them to the screen.
#Initialize variable `my.sq` and run a repeating loop
#5 
 my.sq <- 0
 for(i in 20:30) {
  #put i-th element of `mydata` squared into `i minus 19th` position of `my.sq`
  my.sq[i-19] <- mydata[i]^2
  print(c(i,mydata[i],my.sq[i-19]))
 }

#6 Let's look at the results again
 my.sq

#-------------------------------------
#C) Make a list of all numbers from 1 to 1,000 not divisible by 2,3,4,5, 6 or 7
#%% is the mod or modulo operator.

#7
 mydata2<-seq(1,1000)  
 mylist<-mydata[which(mydata2%%2!=0)]
#8
 mylist<-mylist[which(mylist%%3!=0)]
#9
 mylist<-mylist[which(mylist%%5!=0)]
#10
 mylist<-mylist[which(mylist%%7!=0)]
#11
 mylist
#12
 length(mylist)

#D) Let's look at some data from the Titanic shipwreck
#13   install titanic package
  install.packages("titanic")

#14
#load the package into memory for use
 library(titanic)

#15
#make a copy of the main dataset to play with
 data1<-titanic_train

#16
 summary(data1)

#17
 table(data1$Survived)
 attach(data1)
 table(Survived)

#18
 prop.table(table(Survived))

#19
 prop.table(table(Survived,Sex))
 prop.table(table(Survived,Sex),1)


#20
#tell R that some variables should be treated as categories (factors), making a copy
 Survived1<-as.factor(Survived)
 Pclass1<-as.factor(Pclass)
 Sex1<-as.factor(Sex)

#21 A regression, so I don’t get hounded by people
 summary(lm(Fare~Age+Pclass1+Survived1)) 
 
#22
#make some interesting graphs
 plot(Pclass1,Survived1)
 plot(Sex1,Survived1)
 boxplot(Age)
 boxplot(Age~Pclass1)

#23 two ways of getting average fare by ticket class
 aggregate(Fare,by=list(Pclass1),FUN=mean)
 tapply(Fare, Pclass1, mean)

#24 I don't like the output of the default "summary" command. Let's program a new one!
 sumstats= function(y){
  nums<-sapply(y, is.numeric)
  sumst=sapply(y[,nums], function(x){ sumstat=c(mean(x,na.rm=TRUE),median(x,na.rm=TRUE),sd(x,na.rm=TRUE),min(x,na.rm=TRUE),max(x,na.rm=TRUE))
  names(sumstat)=c("Mean","Median","SD","Min","Max")
  sumstat})
  aperm(sumst)
 }

#25
 sumstats(data1)

#26
 by(data1,Pclass1,sumstats)

#27 A few more graphs
 hist(Fare)
 hist(Fare, breaks=100)
 hist(Fare, breaks=100, xlim=c(0,100))
 plot(Age~Fare)
 plot(Age~Fare,pch=16)
 plot(Age~Fare, col=Pclass,pch=16)
 legend(400,70,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16)

#28 another nice graph...
 plot(Age~data1$Fare, col=Pclass,pch=1+15*Survived,xlim=c(-100,500))
 legend(400,80,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=16, title="Survived")
 legend(400,60,legend=c("1st Class","2nd Class","3rd Class"),col=c(1,2,3), pch=1,title="Perished")

#29 Now let's see who these people in the graph are!
 identify(Age, Fare, labels=Name, tolerance=1)
 identify(data1$Age,data1$Fare,labels=data1$Name,tolerance=1)

#30  Last trick for the day: You can read your own data into R in lots of ways.  On Windows, the easiest way is to copy your data to the clipboard (select data, and hit edit, copy or control-c), then in R type:
 mydata.b=read.delim("clipboard")

#now you can analyze data in R! Note: This is not considered a "professional" way to get your data into R, but it works!

Comments