Web supplementary information for "Mapping eQTL networks with mixed graphical Markov models" by I. Tur, A. Roverato and R. Castelo, Genetics, in press

In this web supplementary information we provide R scripts and data to help reproducing the results contained in our paper "Mapping eQTL networks with mixed graphical models", published by the journal Genetics. To run the code from the figures you would have first to run the code at the bottom with the whole analysis which can take a long time. You can spare that step by loading the following resulting objects:

cross.Brem.RDataBrem and Kruglyak (2005) data as R/qtl cross object.
param.RDataParameter object derived from March 2014 annotations.
eqtlnet.q0.fdr.nrr.sel.RDataFinal resulting object from the analysis.
RBPids.txtRNA-binding protein ORF identifiers downloaded from Biomart.


These are the scripts for the figures.

Figure 1:Figure1.R
Figure 2:Figure2.R
Figure 3:Figure3.R
Figure 4:The PDF file qpgraphComparison.pdf contains a vignette describing all the steps to obtain the performance comparison between qpgraph and qtlhot (CMSTs) shown in in this figure. A tar ball with the code and all the intermediate resulting files can be found in this link.
Figure 5:Figure5.R
Figure 6:Figure6.R

In the paper we use the currently under development version of qpgraph which will become release in the coming month of October, 2014. Using this version of the software and the data we estimated the eQTL network with the following function calls. These functions are still undocumented but they will be in the forthcoming release of Bioconductor:

> library(qtl)
> library(qpgraph)
> library(snow)
> library(rlecuyer)
> library(Rmpi)
> library(BiocParallel)

> load("cross.Brem.RData") ## load objects 'cross.Brem' and 'pMap'

> param <- eQTLnetworkEstimationParam(cross.Brem, physicalMap=pMap,
+            geneAnnotation="TxDb.Scerevisiae.UCSC.sacCer3.sgdGene")

> eqtlnet.q0 <- eQTLnetworkEstimate(param, ~ marker + gene,
+                 verbose=TRUE,
+                 BPPARAM=SnowParam(workers=10, type="MPI"))
> eqtlnet.q0
eQTLnetwork object:
Organism: Saccharomyces cerevisiae
Genome: sacCer3
Gene annotation: UCSC Table sgdGene
Input size: 1857 markers 6141 genes
Model formula:  ~marker + gene

> eqtlnet.q0.fdr <- eQTLnetworkEstimate(param,
+                     estimate=eqtlnet.q0,
+                     p.value=0.01, method="fdr")
> eqtlnet.q0.fdr
eQTLnetwork object:
  Organism: Saccharomyces cerevisiae
  Genome: sacCer3
  Gene annotation: UCSC Table sgdGene
  Input size: 1857 markers 6141 genes
  Model formula:  ~marker + gene
  G^(0): 7998 vertices and 2295829 edges corresponding to
         92710 eQTL and 2203119 gene-gene associations meeting
         a fdr-adjusted p-value < 0.01
         and involving 6141 genes and 1857 eQTLs

> eqtlnet.q0.fdr.nrr <- eQTLnetworkEstimate(param,
+                         ~ marker + gene | gene(q=25,50,75,100),
+                         estimate=eqtlnet.q0.fdr, verbose=TRUE,
+                         BPPARAM=SnowParam(workers=10, type="MPI"))

> eqtlnet.q0.fdr.nrr <- eQTLnetworkEstimate(param,
+                         estimate=eqtlnet.q0.fdr.nrr, epsilon=0.1)
> eqtlnet.q0.fdr.nrr
eQTLnetwork object:
  Organism: Saccharomyces cerevisiae
  Genome: sacCer3
  Gene annotation: UCSC Table sgdGene
  Input size: 1857 markers 6141 genes
  Model formula:  ~marker | gene(q = 0, 25, 50, 75, 100) 
  G^(0,25,50,75,100): 7998 vertices and 5297 edges corresponding to
                      3498 eQTL and 1799 gene-gene associations meeting
                      a fdr-adjusted p-value < 0.01,
                      a non-rejection rate epsilon < 0.10
                      and involving 6141 genes and 1857 eQTLs

> ## number of genes involved in at least one eQTL
> eqtls <- alleQTL(eqtlnet.q0.fdr.nrr)
> length(unique(eqtls$gene))
[1] 450
> ## median number of eQTLs per gene
> median(sapply(split(eqtls$QTL, eqtls$gene), length))
[1] 6
> ## fraction of genes with > 10 eQTLs on the same chromosome
> eQTLxChr <- split(eqtls, eqtls$chrom)
> eQTLxGenexChr <- lapply(eQTLxChr, function(x) split(x$QTL, x$gene))
> neQTLxGenexChr <- unlist(lapply(eQTLxGenexChr, sapply, length),
+                          use.names=FALSE)
> mean(neQTLxGenexChr > 10)
[1] 0.2737307

> eqtlnet.q0.fdr.nrr.sel <- eQTLnetworkEstimate(param,
+                             estimate=eqtlnet.q0.fdr.nrr, alpha=0.05)
> eqtlnet.q0.fdr.nrr.sel
eQTLnetwork object:
  Organism: Saccharomyces cerevisiae
  Genome: sacCer3
  Gene annotation: UCSC Table sgdGene
  Input size: 1857 markers 6141 genes
  Model formula:  ~marker | gene(q = 0, 25, 50, 75, 100) 
  G^(0,25,50,75,100,*): 1781 vertices and 2300 edges corresponding to
                        500 eQTL and 1799 gene-gene associations meeting
                        a fdr-adjusted p-value < 0.01,
                        a non-rejection rate epsilon < 0.10,
                        a forward eQTL selection significance level alpha < 0.05
                        and involving 1391 genes and 389 loci

> ## number of genes involved in at least one eQTL
> seleqtls <- alleQTL(eqtlnet.q0.fdr.nrr.sel)
> length(unique(seleqtls$gene))
[1] 450
> ## distribution of the number of eQTLs per gene
> table(sapply(split(seleqtls$QTL, seleqtls$gene), length))

  1   2   3 
402  46   2 
> ## number of genes without any eQTL
>length(geneNames(eqtlnet.q0.fdr.nrr.sel))-length(unique(seleqtls$gene))
[1] 941
  

R session information

The code in the previous R scripts was run with the following configuration of R:
> sessionInfo()
R version 3.1.0 (2014-04-10)
Platform: x86_64-unknown-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF8       LC_NUMERIC=C             
 [3] LC_TIME=en_US.UTF8        LC_COLLATE=en_US.UTF8    
 [5] LC_MONETARY=en_US.UTF8    LC_MESSAGES=en_US.UTF8   
 [7] LC_PAPER=en_US.UTF8       LC_NAME=C                
 [9] LC_ADDRESS=C              LC_TELEPHONE=C           
[11] LC_MEASUREMENT=en_US.UTF8 LC_IDENTIFICATION=C      

attached base packages:
[1] qpgraph_1.21.25 vimcom_0.9-93   setwidth_1.0-3 
[4] colorout_1.0-2 

other attached packages:
[1] qpgraph_1.99.2 vimcom_0.9-93   setwidth_1.0-3 
[4] colorout_1.0-2 

loaded via a namespace (and not attached):
 [1] annotate_1.43.5          AnnotationDbi_1.27.10   
 [3] BatchJobs_1.3            BBmisc_1.7              
 [5] Biobase_2.25.0           BiocGenerics_0.11.5     
 [7] BiocParallel_0.99.19     biomaRt_2.21.1          
 [9] Biostrings_2.33.14       bitops_1.0-6            
[11] brew_1.0-6               checkmate_1.4           
[13] codetools_0.2-9          DBI_0.3.0               
[15] digest_0.6.4             fail_1.2                
[17] foreach_1.4.2            genefilter_1.47.6       
[19] GenomeInfoDb_1.1.19      GenomicAlignments_1.1.29
[21] GenomicFeatures_1.17.14  GenomicRanges_1.17.40   
[23] GGBase_3.27.3            graph_1.43.0            
[25] grid_3.1.0               IRanges_1.99.28         
[27] iterators_1.0.7          lattice_0.20-29         
[29] Matrix_1.1-4             mvtnorm_1.0-0           
[31] parallel_3.1.0           qtl_1.33-7              
[33] RCurl_1.95-4.3           Rgraphviz_2.9.1         
[35] Rsamtools_1.17.33        RSQLite_0.11.4          
[37] rtracklayer_1.25.16      S4Vectors_0.2.4         
[39] sendmailR_1.1-2          snpStats_1.15.4         
[41] splines_3.1.0            stats4_3.1.0            
[43] stringr_0.6.2            survival_2.37-7         
[45] tools_3.1.0              XML_3.98-1.1            
[47] xtable_1.7-4             XVector_0.5.8           
[49] zlibbioc_1.11.1