R : using Bioconductor

Report: Gene Expression profiling of paediatric AML

 

This report is from a one week assignment from my Masters course. 

Question: Read Ross et al. and replicate their analysis. Their datasets are available online. 

Note that due to time constraints the analysis was performed in a truncated version of the original dataset.

 

AIM:

The aim of this experiment is to identify pediatric AML subtypes based on gene expression profiles of leukemic blasts. Sub type identification can facilitate the development of personalized treatment options, specific to the gene expression profile. This could by extension, improve the statistics of AML patients cure.

 

Introduction:

The 5 known risk groups of AML (subtypes) covered in this experiment:

  • t(15;17)[PML-RAR]
  • t(8;21)[AML1-ETO],
  • inv16[CBF-MYH11]
  • MLL chimeric fusion genes
  • acute megakaryocytic morphology (FAB-M7)

In order to classify samples in distinctive subtypes (classes), oligonucleotide microarrays were utilized to analyze the expression of more than 22,000 genes  in diagnostic leukemic blasts from 84 pediatric AML patients; The data was divided randomly  into two set, training set (with 62 samples) and test set (with 21 samples). Train data was used to model a support vector machine (SVM) that is used to classify the samples into specific subtype class, and the test data was used to validate this SVM.

Table 1 :  Paediatric AML subgroup distribution
Genetic Subgroup train test
t(15;17)[PML-RAR] 11 4
t(8;21)[AML1-ETO] 16 5
inv16[CBF-MYH11] 10 4
MLL chimeric fusion genes 18 5
(FAB-M7) 7 3
Total number of sample 63 21

Step 0 : Before we begin – Load Libraries & Train Data

#===============================================================================
#       Load libraries
#===============================================================================
 
# Clear workspace
rm(list=ls())
 
# Close any open graphics devices
graphics.off()
 
library(affy)
library(affyPLM)
library(genefilter)
library(limma)
library(e1071)
library(Biobase)
library(annaffy)
 
#===============================================================================
#       Load Train data from affymetrix '.CEL' files
#===============================================================================
 
 cat("\n\n Loading Train Data...\n\n")
 
# first create an 'AnnotatedDataFrame' object with an experiment description file
Train.Data <- ReadAffy(filenames=rownames(pData(Train.Expt)), phenoData=Train.Expt, verbose=TRUE)
 
# check the data
dim(Train.Data)     #712 712
show(Train.Data)
pData(Train.Data)
head(exprs(Train.Data)) #first lines of expression data contained in Affy.Data

Step 1 : Quality assesement of raw Data

To Assess data quality at ‘probe level’. The plots clearly shows that it requied pre-processing.

#===============================================================================
#      Step 1: QC plots on raw train data
#===============================================================================
 
cat("\n\nHit a key to proceed to QC plots on raw data...
      \nGraphs will be saved in the current directory\n\n")
scan("")
 
#creating a vector for colours
pdat<-pData(Train.Data)
colour<-NULL
 
for(i in 1:length(pdat$Sample.type)){
  clas<-pdat$Sample.type[i]
  if(clas=="inv16-"){
    colour=c(colour,"green")
  } else if(clas=="MLL-"){
    colour=c(colour,"cyan")
  } else if(clas=="Other-M7-"){
    colour=c(colour,"#B580FE")
  } else if(clas=="t(15:17)-"){
    colour=c(colour,"magenta")
  }else if(clas=="t(8:21)-"){
    colour=c(colour,"yellow")
  } else{
      colour=c(colour,"white")
  }
} 
 
 
# box-and-whisker plot
tiff(filename="boxplot_Raw_TrainData.tif")
boxplot(Train.Data, col= colour, las=3, cex.axis=0.5, main = "Boxplot of raw data")
legend(-2, 15, horiz = TRUE, c("inv16-", "MLL-", "Other-M7-", "t(15:17)-", "t(8:21)-"),
       fill = c("green", "cyan", "#B580FE", "magenta", "yellow"))
dev.off()
 
# Data distribution per array
tiff(filename="histogram_Raw_TrainData.tif")
hist(Train.Data,type="l", col=colour,main = "Histogram of raw data")
legend(12, 1.5, c("inv16-", "MLL-", "Other-M7-", "t(15:17)-", "t(8:21)-"),
       fill = c("green", "cyan", "#B580FE", "magenta", "yellow"))
dev.off()
 
# Chip images (for first six array)
par(mfrow=c(2,3))
for (i in 1:6){                     #for (i in 1:ncol(exprs(Train.Data))){
image(Train.Data[,i])
}
 
# to save time, just save one image
tiff(filename="ChipImage1_Raw_TrainData.tif")
image(Train.Data[,1])
dev.off()
 
# RNA degradation
tiff(filename="RNAdeg_Raw_TrainData.tif")
RNA.deg <- AffyRNAdeg(Train.Data)
plotAffyRNAdeg(RNA.deg)
dev.off()
 
cat("\n\n the plots have been saved in the current directory...\n\n")

Figure 1 : the box plot shows uneven distribution - the data requires pre-processing, especially normalisation

Figure 2: the histogram also shows the uneven distribution - also confirming the need for pre-processing

Figure 3: the RNA degradation plot shows the negative gradients at the right end; hence confirms that pre-processing is necessary

Step 2: Pre-processing – background correction, normalisation, summarisation

  • Background correction: RMA
  • Normalisation: quantile
  • summarisation: Median.polish

The plots shows the dataset has improved vastly. Nonetheless, there seems to be a little degree of factuality between the samples, hence, deducing that filtering would be required.

#===============================================================================
#     Step 2:  Preprocessing of Train data
#===============================================================================
 
cat("\n\nHit a key to proceed to Normalisation of Train Data set...\n\n")
scan("")
 
# Close any open graphics devices
graphics.off()
 
 
Train.eset <- threestep(Train.Data,background.method="RMA.2",
                    normalize.method="quantile",summary.method="median.polish")
                    
 
#===============================================================================
#        QC on pre-processed data at probe set level (i.e. genes)
#===============================================================================
 
cat("\n\nProceeding to QC on pre-processed data...
      \nGraphs will be saved in the current directory\n\n")
 
 
dim(Train.eset)       #Features:  22283  Samples:  62 
               
# Boxplot
tiff(filename="boxplots_norm_TrainData.tif")
boxplot(as.data.frame(exprs(Train.eset)), las=3, cex.axis=0.5, col=colour, 
        pch=".",main = "Boxplots normalised data") 
        legend(-2, 14, horiz = TRUE, c("inv16-", "MLL-", "Other-M7-", "t(15:17)-", "t(8:21)-"),
         fill = c("green", "cyan", "#B580FE", "magenta", "yellow"))                    
dev.off()
 
# Scatter matrix of first 6 arrays against one another
tiff(filename="scatter_norm_TrainData.tif")
pairs(exprs(Train.eset)[,1:6], pch=".",main="Scatter plots",col=colour) 
dev.off()
 
# MA plots of first 6 arrays against one another (log-intensity vs log-ratio)
tiff(filename="MA_plot_norm_TrainData.tif")
mva.pairs(exprs(Train.eset)[,1:6],,col=colour)  
dev.off()
 
cat("\n\n the plots have been saved in the current directory...\n\n")

Figure 4: the normalised boxplot - more uniform distribution

Figure 5: Scatter plot of the first 6 samples : the data seems normalised

Figure 6: the MVA plots show that even if the data is normalised, the plots do look fairly good, but the data needs to be filtered

Step 3: Filtering

This step reduced the number genes by selecting for those of potential interest. Three filtering steps:

  1. remove all genes that showed no or very low expression in all samples
  2. a function was created that filtered all the genes with the expression level above log2(10), valid cut-off point in relation to the dataset
  3. The variance across the probes was analysed and top 1000 potential genes were selected.
#===============================================================================
#      Step 3: Filter
#===============================================================================
 
cat("\n\nHit a key to proceed to Filtering &amp; getting a subset...\n\n")
scan("")
 
# 
# Filter out genes that are absent in all samples
# 
 
#Performs the Wilcoxon signed rank-based gene expression presence/absence
eset.mas5 <- mas5calls(Train.Data)
 
#create a function
mascallsfilter <- function(cutoff = "A", number){
          function(x){
          sum((x) == (cutoff)) != number
          }
    } 
#filtering the absent genes       
m5cfun <- mascallsfilter(number=dim(eset.mas5)[2])     
mcfun <- filterfun(m5cfun)
mcsub<-genefilter(eset.mas5,mcfun)
sum(mcsub) #16073
 
#a subset of samples (it no longer has aby absent genes)
X <- exprs(Train.eset[mcsub,])
dim(X)        # 16073    62
 
 
# 
# Filter genes with an expression level log2(10)
# log2(10)because it seemed reasonable value (considering the scale of the data)
# 
 
#create a function
kfilt <- kOverA(dim(X)[2],log2(10))
kff <- filterfun(kfilt)
mksub <- genefilter(X, kff)
sum(mksub) # 13029
 
 
#a subset of samples (it no longer has aby absent genes or genes with an expression level below log2(10))
X <- exprs(Train.eset[mksub,])
dim(X)             #18238    62
 
#
# filtering for top 1000 genes with highest variance 
#
 
 
#assess the variance of a probe set across all samples.
vars <- apply(X, 1, var)          
#Filtering most significant genes based on varaince 
hvsub <- vars &gt; quantile(vars, (nrow(X) - 1000)/nrow(X))
sum(hvsub)      #1000
 
#===============================================================================
#       subset of training data (after filtering)
#===============================================================================
 
# subset the RMA expression set with the genes selected above
X <- exprs(Train.eset)    
dim(X) 
X <- X[mksub,]              
dim(X)      #18238    62  
TrainSub<-Train.eset[mksub,]       
exprs(TrainSub) <- X
 
 
X <- X[hvsub,]             ### only retain best genes  
dim(X)      #1000   62
TrainSub<-TrainSub[hvsub,]       
exprs(TrainSub) <- X

Step 4:  Exploratory analysis (HCA) and Heatmap

The genes selected after filtering were analysed using unsupervised 2-dimensional hierarchical clustering algorithm (HCA) to assess the major grouping of the subtypes based exclusively on their gene expression profiles. The clusters identified most of the major subgroups.

#===============================================================================
#      Step 4:  Exploratory analysis (HCA) and Heatmap
#===============================================================================
 
cat("\n\nProceeding to QC on filtered data...
      \nGraphs will be saved in the current directory\n\n")
 
 
A<-t(exprs(TrainSub))      
dim(A)                                #62 1000
 
label<-as.vector(TrainSub$Sample.type)
 
#creating a vector for colours
pdat<-pData(TrainSub)
colour<-NULL
for(i in 1:length(pdat$Sample.type)){
  clas<-pdat$Sample.type[i]
  if(clas=="inv16-"){
    colour=c(colour,"green")
  } else if(clas=="MLL-"){
    colour=c(colour,"cyan")
  } else if(clas=="Other-M7-"){
    colour=c(colour,"#B580FE")
  } else if(clas=="t(15:17)-"){
    colour=c(colour,"magenta")
  }else if(clas=="t(8:21)-"){
    colour=c(colour,"yellow")
  } else{
      colour=c(colour,"white")
  }
}
 
#computes the distance matrix 
my.dist <- function(x) dist(x, method="euclidean")
 
#Hierarchical cluster analysis on the filtered genes 
my.hclust <- function(d) hclust(d, method="ward")
 
#create and save heatmap
tiff(filename="HeatMap.tif")
heatmap(A, distfun=my.dist, hclustfun=my.hclust, labRow=label, col=colour,main="Heatmap of filtered Data")
dev.off()
 
#create Hierarchical cluster analysis dendogram
d <- as.dist(dist(A, method="euclidean"))
fit <- hclust(d, method="ward")
 
#save  Hierarchical cluster analysis dendogram
tiff(filename="Hclust.tif")
plot(fit, hang=-1, labels=label)
dev.off()
 
 
#PCA
#pca <- prcomp(A, scale=T) # Performs principal component analysis after scaling the data.
#summary(pca) # Prints variance summary for all principal components.
#library(scatterplot3d) # Loads 3D library.
#scatterplot3d(pca$x, pch=20, color=colour)
 
cat("\n\n the plots have been saved in the current directory...\n\n")

Figure 7: the Heatmap : showing the clusters

Figure 8: Unsupervised cluster analysis of paediatric AMLs - the clusters are highlughted by colours

Step 5: Statistical Analysis  for class discriminating genes

To identify expression signatures for each of the subtypes and to distinguish the class discriminating genes out of the 1000 genes that were selected after filtering; by applying parametric tests to rank genes with respect to the resulting p-values. Steps:

  • A design matrix the represented the experimental design was
  • a linear model was fit to the data set.
  • A contrast matrix was generated by venturing the contrasts of difference between the subtypes.
  • The contrast matrix was fit to the linear model which was used to compute moderate t-statistics using Bayesian model, ebayes. The ebayes analysis is analogous to the standard single-sample t-test with the exception of making use of information between genes.
  • The raw p values were corrected
  • the genes with adjusted p-values less than 0.05 were selected, 483 altogether, were used for classification.
#===============================================================================
#        Step 5: Statistical Analysis  for class discriminating genes
#===============================================================================
 
cat("\n\nHit a key to proceed to Statistical analysis using ebayes...\n\n")
scan("")
 
pda<-pData(TrainSub)
clas<-pda$Sample.type
 
 
# create design matrix
design <- model.matrix(~-1+factor(clas))         
colnames(design)<- c("inv16","MLL","Other_M7","t15_17","t8_21")
 
 
# fit linear model to the data
fit.train <- lmFit(TrainSub, design)
dim(fit.train) #  1000    5
 
# create contrasts (group comparisons) of interest, in this case 10 comp.
contrast.matrix <- makeContrasts( 
  + inv16 - MLL,          inv16 - Other_M7,   inv16 - t15_17,   inv16 - t8_21,
  +  MLL - Other_M7,      MLL - t15_17,       MLL - t8_21,
  +  Other_M7 - t15_17,   Other_M7 - t8_21,   t15_17 - t8_21,   levels=design)
dim(contrast.matrix) #5 10
 
# fit contrasts to linear model
fit2.train <- contrasts.fit(fit.train, contrast.matrix)
dim(fit2.train)  #  1000   10
 
# Given a series of related parameter estimates &amp; compute moderated t-statistics 
eb.train <- eBayes(fit2.train)
 
dim(eb.train)   # 1000   10
#write.table(eb.train, file="eb.train.txt", quote=F, sep="\t")
 
 
 
# correct raw p-values for multiple testing, (Benjamini &amp; Hochberg) 
#and give top-ranked genes               
top.all <- topTable(eb.train,n=nrow(exprs(TrainSub)),adjust="BH",coef=2)
 
 
 
 
#===============================================================================
#        Discriminating Genes
#===============================================================================
 
#getting a subget of only discriminating genes
class.sub<-top.all$ID[top.all$adj.P.Val&lt;0.05]
length(class.sub) #483
 
Train.sub <- TrainSub[class.sub,]  
 
 
#===============================================================================
#        Visualisation of results 
#===============================================================================
 
cat("\n\nProceeding to QC on results from ebayes...
      \nplots will be saved in the current directory\n\n")
 
 
#observing unadjusted p-values
tiff(filename="histpvalues.tif")
hist(top.all$P.Value,main='Histogram of P-values')
dev.off()
 
#comparing unadjusted and adjusted p-values
tiff(filename="pvalues.tif")
plot(top.all$P.Value,top.all$adj.P.Val,main="p values vs adjusted p values")    
abline(0,1)
dev.off()
 
 
# Moderated t-statistic
tiff(filename="ebayes.tif")                           
qqt(eb.train$t,df=eb.train$df.prior+eb.train$df.residual,
                            main="Moderated t-statistic")
abline(0,1)    #  Points off the line may be differentially expressed
dev.off()
 
 
#Volcano plot of log2 foldchanges versus significance (+label top 10)
tiff(filename="volcano.tif")
plot(top.all$logFC,-log2(top.all$P.Value),pch="+",cex=0.1,
      xlab="Log Fold-Change from non-filtered eBayes",
      ylab="-log10 P-Value from non-filtered eBayes",
      main='Volcano Plot')
t <- is.element(top.all$ID,top.all$ID[top.all$adj.P.Val&lt;0.05])
points(top.all$logFC[t],-log10(top.all$P.Value[t]),pch=16,col='red') 
t <- is.element(top.all$ID,top.all$ID[top.all$adj.P.Val&lt;0.05][1:25])
points(top.all$logFC[t],-log10(top.all$P.Value[t]),pch=7,col='black')
legend(-4.5,69,
c('sig. eBayes (p.value&lt;0.05)','top 25 eBayes (p.value&lt;0.05)','other'),
pch=c(16,7,16),col=c('red','black','grey85'),cex=1)
dev.off()
 
 
#MA plot
tiff(filename="MA_plot.tif")
M <- top.all$logFC #log fold-change
A <- top.all$AveExpr # ave. expr. level
mat <- cbind(A,M)           
plotMA(mat,main="MA plot")                   
abline(h=3,col="red")
dev.off()
 
cat("\n\n the plots have been saved in the current directory...\n\n")

Figure 9: the graph showing p values against the adjusted p values. adjusted p values were used in this analysis

Figure 10: Histogram of p -values - shows that most of the genes have a lower p-value, and removing genes with p-values>0.05 and classing the as insignificant will remove right genes that is causing the right skewness of the histogram

Figure 11: Moderated t-statisic graph - ttest was used to discrimiate genes

Figure 12: MA plot – seems normal – it is centred at M=0 and there is no trend to the M vs. A relationship

Figure 13: volcano plot: highlighting significant genes in red

CLASSIFICATION Step 6: SUPPORT VECTOR MACHINES (SVM)

SVM models are a popular classification method for highly multivariate data like this. The SVM was trained using type ‘C-SVM’ and a linear kernel as function settings. Once the SVM was trained, the number of samples misclassified was tested on the training set; result was zero misclassification. The performance of SVM was also accessed by 10 fold cross validation on the training set; result was 100% accuracy for all the subtype.

#===============================================================================
#      CLASSIFICATION
#===============================================================================
 
cat("\n\nHit a key to proceed to Classfication of genes using SVM...\n\n")
scan("")
 
#===============================================================================
#   Step 6:. SUPPORT VECTOR MACHINES
#===============================================================================
 
 
#===============================================================================
#      Train SVM
#===============================================================================
#library(e1071)
#library(Biobase)
 
cat("\n\nTraining the SVM using results from Train data set...\n\n")
 
Xm <- t(exprs(Train.sub))
dim(Xm)      #62 944          #483
 
resp <- phenoData(Train.sub)$Sample.type     
length(resp) #62
 
#train a support vector machine.  
svm.model <- svm(Xm, resp, type="C-classification", kernel="linear")
 
#View the model.
print(svm.model)    
summary(svm.model)
 
 
#===============================================================================
#       Estimate training error
#===============================================================================
 
cat("\n\nEstimating training error...\n\n")
 
#Check if the model is able to classify your *training* data.
 
# test on training set
trpred <- predict(svm.model, Xm) 
 
# count differences       
sum( trpred != resp)   #0 (no misclassification)    
 
# confusion matrix   
table(resp, trpred)                                   
   
 
#===============================================================================
#       10 fold cross-validation
#===============================================================================
 
cat("\n\nCross Validation to assess training perfomance...\n\n")
 
trcv <- svm(Xm, resp, type="C-classification", kernel="linear",cross=10)
summary(trcv)      #accuracy: 98.3871

Step 7:   Predict classes of test set

The SVM was then used on the test data set which would determine the true accuracy of the classification; result was zero misclassification. This provides validation that that the SVM has a high extent of accuracy for classification of samples into its respective subtypes.

#===============================================================================
#    Step 7:   Predict classes of test set
#===============================================================================
 
 
#=======================
#      Modify test set
#=======================
 
cat("\n\nLoading, normalization and filtering test data set...\n\n")
 
#Load data
Test.Expt <- read.AnnotatedDataFrame("test_set.txt", header=TRUE, row.names=1, sep="\t")
Test.Data <- ReadAffy(filenames=rownames(pData(Test.Expt)), phenoData=Test.Expt, verbose=TRUE) 
 
#pre-processing
Test.eset <- threestep(Test.Data,background.method="RMA.2",normalize.method="quantile",summary.method="median.polish")  
 
#filter
Xt <- exprs(Test.eset)
TestSub <- Test.eset[class.sub,]         
exprs(TestSub) <- Xt[class.sub,]     
      
 
#=========================
#      Predict classes 
#=========================
 
 
cat("\n\nPredicting the classes of training set...\n\n")
 
 
#Validate dataset. 
Xmtr <- t(exprs(TestSub))
tepred <- predict(svm.model, Xmtr) 
 
resptr <- phenoData(TestSub)$Sample.type          
sum(tepred != resptr)       #0 (no misclassification)                    
 
table(tepred, resptr)

 

Table 2: Diagnostic accuracies for paediatric AML
Genetic Subgroup Training, n = 62 Test, n = 21
t(15;17)[PML-RAR] 100 100
t(8;21)[AML1-ETO] 100 100
inv16[CBF-MYH11] 100 100
MLL chimeric fusion genes 100 100
(FAB-M7) 100 100

Step 8:    Random replacement of classes from train set  A test was tried by randomly replacing the labels on the training set and redoing the SVM classification on the missed up samples. The result gave a very low accuracy, usually less than 40%.

#===============================================================================
#   Step 8:    Random replacement of classes from train set
#===============================================================================
 
cat("\n\nRandom replacement of classes from train set...\n\n")
 
newlabs <- sample(c(rep("A",10), rep("B",18),rep("C",7), 
                            rep("D",11), rep("E",16)), 62)
 
trainlabs<-phenoData(Train.eset)$Sample.type
 
table(newlabs, trainlabs)
funnysvm <- svm(Xm, newlabs, type="C-classification", kernel="linear")
 
fpred <- predict(funnysvm, Xm)
sum( fpred != newlabs)
table(fpred, newlabs)
 
#cross validation
trfcv <- svm(Xm, newlabs, type="C-classification", kernel="linear",cross=10)
 
summary(trfcv)               #35 : accuracy

Step 9:  Classification of the other data set (Adult samples)

 
#===============================================================================
#     Step 9:  Clssaification of the other data set  (ADULT SET)
#===============================================================================
 
cat("\n\nHit a key to proceed to Classification of the other data set ...\n\n")
scan("")
 
#=======================
#      Modify data set
#=======================
 
#Load data
other.Expt <- read.AnnotatedDataFrame("other_set.txt", header=TRUE, row.names=1, sep="\t")
other.Data <- ReadAffy(filenames=rownames(pData(other.Expt)), phenoData=other.Expt, verbose=TRUE) 
 
#pre-processing
other.eset <- threestep(other.Data,background.method="RMA.2",normalize.method="quantile",summary.method="median.polish")  ####i wrote this
 
#filter and get subset
Xo <- exprs(other.eset)
OtherSub <- other.eset[class.sub,]         
exprs(OtherSub) <- Xo[class.sub,]           
 
 
#=========================
#      Predict classes 
#=========================
 
#predict
oXm <- t(exprs(OtherSub))
opred <- predict(svm.model, oXm) 
write.table(opred, file="Otherpred.txt", quote=F, sep="\t") 
 
#cross validation 
o<- svm(oXm,opred, type="C-classification", kernel="linear",cross=10)
 
summary(o)
 
cat("\n\nClassification of the other data set complete ! \n\n")

The dataset with adult samples came with 5 additional pediatric cases of T-ALL with MLL translocations, hence that dataset 25 sample altogether. 10 Fold validation gave a total percentage accuracy of 88%.

Table 3a:
T-ALL case distribution
Table 3b:
Adults case distribution Results and
results from the paper
Difference between
results and the results
from paper (Adults
cases only)
Genetic Subgroup No. of cases
(T-ALL)
Genetic Subgroup No. of cases
(T-ALL)
t(15;17)[PML-RAR] 0 4 4 0
t(8;21)[AML1-ETO] 0 7 6 1
inv16[CBF-MYH11] 0 3 2 1
MLL chimeric fusion genes 5 6 6 0
(FAB-M7) 0 0 2 0
Total 5 20 20 2 genes are
mismatched
(from FAB-M7)

 

These results are slightly different from that from the paper, it could probably be because different techniques were utilized by the paper, such as SAM and ANN instead of ebayes and SVM. Secondly, instead of the original datasets, we used a truncated version due to time constraints.

8 comments

  • Sukh (7 years)

    Thanks for this webpage, It helped me with my assignment alot. Of course, I’d actually managed to do most of it, with help from Lee, before i found your blog. Still, good to backup and further my understanding 🙂

  • Malika (7 years)

    Great website! Keep up the good work

  • sirus (5 years)

    Great post, I like it very much

  • John (4 years)

    Extremely informative blog. Please let me know where could I find the data, train_set.txt, used for this analysis.

    Thanks

  • Risha (4 years)

    Thanks John 🙂

    This report is from a one week assignment from my Masters course, where were we required to read Ross et al. and replicate their analysis using the datasets available online.

  • John (4 years)

    Thank you so much!

  • VJ (4 years)

    Good post. Is there any detailed literature for these kinds of analysis for those who wish to get started ?

  • Risha (4 years)

    Hi VJ. I have no idea. When doing this analysis, I referred to handouts provided during my Masters course as well as the the paper from Ross et al (link provided on top of this page). If you do come across some useful literature, do let us know 🙂

Leave a Reply

Your email address will not be published. Required fields are marked *

Go to Top