C2G2 Fall 2013

How Fli1 and PDEF affect specific networks/pathways

 * Students: Connor
 * Fli1 and PDEF data with some correction (or test different methods to account for low read data =e.g., False discovery, confidence interval, or whatever).  Bob got low reads removed using FDR of 0.05 in Partek.
 * Venn diagram finalization
 * Repeat Pathway analyses
 * Review real time data and tabulate vs read number and fold changes
 * Extend our validated gene set and examine galaxy pipeline vs Partek

Fli1 knockdown data (parental MCF10A, shcontrol, Fli1 sh 1, Fl1 sh2); want to see how loss of function identified genes compares with gain of function genes (MDA-MB-231 with Fli1 expression)
== Is there a difference between DF and R lung cancer (at total RNA expression level as well as splice variants); difference between non-tumor and tumor samples. Will define lung cancer samples in terms of ras or EGFR mutation or ALK4 fusions(Lung Data) == RNA-seq exon count data summarized in SAM format files from TOPHAT will be explored using the Bioconductor packages ‘DeSeq’ and ‘DEXSeq’ (34, 35). Tests of differential alternative splicing and differential gene expression between samples from aggressive phenotype and disease-free phenotype will be modeled using a generalized linear model assuming a negative binomial distribution of the exon counts. The Benjamini-Hochberg approach will be used to control the false discovery rate at 20% (38) and identify splice variants for follow-up in Aims 1B and 1C. An alternative method for expression of different splice variants is CuffLink (39, 40). 34.	Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11(10):R106. PMCID: 3218662. 35.	Anders S, Reyes A, Huber W. Detecting differential usage of exons from RNA-seq data. Genome Res. 2012;22(10):2008-17. PMCID: 3460195.
 * Students: Connor, Tori, and Matt
 * Normal vs Tumor RNA (disease free vs relapse) expression
 * Connor and/or Matt might be interested in running our standard differential gene expression analysis with cufflinks/tuxedo
 * Heatmap /distribution of RNA differences between DF v R (low read)
 * Gene signature study: use subset for discovery and then some for testing signature; other approaches (PAUL?)
 * Pathway guide
 * Connor
 * Effect of read depth on data
 * High read data for alternative splicing differences (Bob has been working on this using Partek)
 * Compare alternative splicing programs.

ShedLab

 * Transfer duplicates of genomic data from HMS Orchestra account to secure C2G2 account (Tori? -- rel small but critical file mgmt tasks).
 * Create figs for caretta brain DEG paper w/pathway apps, Venn diagram maker, R Studio; data already analyzed (Connor 1-cr Ind Study, co-auth) share results for discussion and cross ref
 * Begin validation qPCRs for brain DEGs (open slot, wet work at HML or downtown)
 * Begin hosting Markov model Dfam de novo repeat annotation on C2G2 cluster for schistosome snail genome project (Kelsey, 6-cr senior thesis, co-auth)
 * Build Galaxy wrapper for Dfam pipe (Kelsey+Matt)

Bioinformatics: Can we develop a predictive model for DF and R using deep-learning?

 * I think this would make a very solid addition to Tori's bachelor thesis. It might also be of interest to Matt.

Bioinformatics: Can post-processing improve the accuracy of RNA-seq quantification (i.e., improving cufflinks)

 * I saw an interesting paper at a conference about a program called GeneScissors. This paper working on the cufflinks output to improve it's accuracy. We could look into this paper more closely. Similar approaches could possibly be taken with splice variants, etc.

Bioinformatics: What is the effect of read depth on differential gene expression, assembly, and splice variants?

 * This could be incorporated into the parameters paper.

Bioinformatics: What are the critical parameters in next-generation pipelines?

 * This is a general question that we've been pursuing. I think we could work towards a investigation paper that everyone could contribute to. We'll need someone to take on the point role.

Bioinformatics: How can de novo assemblies be compared and possibly merged?

 * This also leads to how to improve the run-time and accuracy of de novo assembly. Matt has done a lot with assembly now, and he may be interested in pursuing this a little further. Or at the very least, we can find what others have done to compare and merge assemblies.

Bioinformatics: Create novel ways to identify gene sets and pathway analysis?

 * This focuses on improving either the visualization or accuracy of pathway algorithms. There is an interesting article about finding correlations among variables that we could use to incorporate some type of visualization. I also liked Starr's idea about more of a three dimensional histogram. We can also incorporate cytoscape into this. Tori has experience with it.

Bifrost

 * Students: Jeremy
 * Verify that Dropbox is working
 * Test out client program
 * Integrate with R
 * Fix wrapper so variable number of files are downloaded

Cyberinfrastructure

 * Set up Blast2Go and wwwblast server
 * Fix trans-abyss
 * Fix cuffdiff outputs
 * Kelly's groomer problem

Collaborator Projects - We have several ongoing projects that are being driven by others

 * Whale (Demetri)
 * Stripped Bass (Bob Chapman and Paul)
 * Dolphin (Fran)
 * Project is just beginning. Right now the data is stored on Amazon storage.
 * High density, strand-specific RNA-seq analysis of the Pacific Whiteleg Shrimp, Litopenaeus vannamei (Jill Johnson)
 * Hypoxia in coastal waters has increased dramatically world-wide due to growth in human populations in these areas, causing adverse impacts to many estuarine organisms. In coastal areas of the southeastern U.S., elevated CO2, called hypercapnia, co-occurs with hypoxia.  Hypercapnia causes significant acidification and presents its own challenges for aquatic organisms.  Acclimation to hypoxia in the Pacific whiteleg shrimp involves regulation at the level of the transcriptome.  Previous microarray results suggest that the hypoxia-specific transcriptomic signature is reduced or reversed with the addition of elevated carbon dioxide to the system.  Specifically, significant changes in the transcription of genes encoding the respiratory pigment hemocyanin, antioxidants, and machinery of protein synthesis, a transcript profile that is consistent with acclimation to hypoxia was identified.  In the present study we use high throughput RNA sequencing (RNA-Seq) to explore the regulation of transcriptionally-based response, acclimation, and resiliency to low oxygen/high CO2 conditions in Litopenaeus vannamei, with particular focus on the two known subunits of the copper-containing respiratory pigment, hemocyanin (Hc). mRNA of juvenile L. vannamei exposed to normoxia (n = 18), hypoxia (n = 18), or hypercapnic hypoxia (n = 15) was pooled and sequenced in a strand-specific manner on the Illumina HiSeq 2500 platform. A total of 4.5 x 108 single end 100 bp high quality (>Q30) raw reads were generated, and 27,976 contigs with a mean length of 1054 bp (262 bp minimum; 40,543 bp maximum) were assembled using the de novo assembler, Trinity (N50: 4264 bp).  Raw reads were mapped back on to the Trinity assembly using the short-read aligner, Bowtie (81.2% RMBT).  Although verification of the number of transcripts encoded in the genome is not possible in the absence of an annotated genome, the average absolute depth of read coverage across all transcripts was 1642X (Cufflinks: 17X minimum; 542,751X maximum.) Due to the extensive depth of coverage, new isoforms of the large Hc subunit have been identified.  Current work is underway to assess Hc subunit usage in relation to low oxygen and high CO2 conditions.  (NSF IOS-1147008)
 * Current status is to compare assemblies and to try and run trans-abyss
 * Dinoflagellate, Karenia Brevis analyzed by Kelly Fridey (adviser is Fran)
 * Karenia brevis is a dinoflagellate responsible for harmful algal blooms in the Gulf of Mexico that cause extensive marine animal mortalities and human illnesses. K. brevis blooms are particularly damaging when they persist at high density in coastal waters over long periods of time. To gain insight into how K. brevis cells cope with changing coastal conditions, the current project was undertaken to define the mechanisms regulating dinoflagellate stress responses.  Previous work in our laboratory revealed a lack of transcriptional activation of stress genes under conditions that induced stress proteins.  This is consistent with an emerging view that dinoflagellate gene expression is regulated predominantly at the post-transcriptional level, in part by differential rates of translation.  Translational activity can be measured using polysome profiles.  Polysomes are messenger RNAs (mRNAs) with multiple ribosomes attached and represent the mRNA pool being actively translated, whereas ribosome-free mRNAs are translationally inactive.  In this study, triplicate cultures of K. brevis were exposed to a 5°C heat shock for a short time course (0, 30, or 60 min) to determine their translational response to heat stress.  Sucrose density gradient fractionation was used to separate polysomes from ribosome-free RNA, detected at an absorbance of 254 nm. The abundance of polysomes decreased rapidly in response to heat shock, indicating a suppression of translation, with the lowest polysome abundance found at 60 minutes of exposure.  RNA was isolated from the translationally active fractions at each time point and RNA-seq analysis was performed on an Illumina Hi-Seq2000 sequencer at a depth of 15 million reads per sample.  A reference K. brevis transcriptome was assembled from 113 million Illumina reads (50 bp, paired ends) using Trinity.   This assembly contains 127K unique contigs.  Quantitative read mapping to the reference transcriptome, currently in progress, will assess whether stress response gene transcripts are specifically recruited to the actively translated RNA pool following heat shock.