Analysis of DNA methylation patterns relies increasingly on sequencing-based profiling methods. The four most frequently used sequencing-based technologies are the bisulfite-based methods MethylC-seq and reduced representation bisulfite sequencing (RRBS), and the enrichment-based techniques methylated DNA immunoprecipitation sequencing (MeDIP-seq) and methylated DNA binding domain sequencing (MBD-seq). We applied all four methods to biological replicates of human embryonic stem cells to assess their genome-wide CpG coverage, resolution, cost, concordance and the influence of CpG density and genomic context. The methylation levels assessed by the two bisulfite methods were concordant (their difference did not exceed a given threshold) for 82% for CpGs and 99% of the non-CpG cytosines. Using binary methylation calls, the two enrichment methods were 99% concordant and regions assessed by all four methods were 97% concordant. We combined MeDIP-seq with methylation-sensitive restriction enzyme (MRE-seq) sequencing for comprehensive methylome coverage at lower cost. This, along with RNA-seq and ChIP-seq of the ES cells enabled us to detect regions with allele-specific epigenetic states, identifying most known imprinted regions and new loci with monoallelic epigenetic marks and monoallelic expression.
Components of the ERK cascade are recruited to genes, but it remains unknown how they are regulated at these sites. The RNA-binding protein heterogeneous nuclear ribonucleoprotein (hnRNP) K interacts with kinases and is found along genes including the mitogen-inducible early response gene EGR-1. Here, we used chromatin immunoprecipitations to study co-recruitment of hnRNP K and ERK cascade activity along the EGR-1 gene. These measurements revealed that the spatiotemporal binding patterns of ERK cascade transducers (GRB2, SOS, B-Raf, MEK, and ERK) at the EGR-1 locus resemble both hnRNP K and RNA polymerase II (Pol II). Inhibition of EGR-1 transcription with either serum-responsive factor knockdown or 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole altered recruitment of all of the above ERK cascade components along this locus that mirrored the changes in Pol II and hnRNP K profiles. siRNA knockdown of hnRNP K decreased the levels of active MEK and ERK at the EGR-1, changes associated with decreased levels of elongating pre-mRNA and less efficient splicing. The hnRNP K dependence and pattern of ERK cascade activation at the c-MYC locus were different from at EGR-1. Ribonucleoprotein immunoprecipitations revealed that hnRNP K was associated with the EGR-1 but not c-MYC mRNAs. These data suggest a model where Pol II transcription-driven recruitment of hnRNP K along the EGR-1 locus compartmentalizes activation of the ERK cascade at these genes, events that regulate synthesis of mature mRNA.
In Saccharomyces cerevisiae, all ends of telomeric DNA contain telomeric repeats of (TG(1-3)), but the number and position of subtelomeric X and Y' repeat elements vary. Using chromatin immunoprecipitation and genome-wide analyses, we here demonstrate that the subtelomeric X and Y' elements have distinct structural and functional properties. Y' elements are transcriptionally active and highly enriched in nucleosomes, whereas X elements are repressed and devoid of nucleosomes. In contrast to X elements, the Y' elements also lack the classical hallmarks of heterochromatin, such as high Sir3 and Rap1 occupancy as well as low levels of histone H4 lysine 16 acetylation. Our analyses suggest that the presence of X and Y' elements govern chromatin structure and transcription activity at individual chromosome ends.
Nicotiana tabacum leaves are covered by trichomes involved in the secretion of large amounts of secondary metabolites, some of which play a major role in plant defense. However, little is known about the metabolic pathways that operate in these structures. We undertook a proteomic analysis of N. tabacum trichomes in order to identify their protein complement. Efficient trichome isolation was obtained by abrading frozen leaves. After homogenization, soluble proteins and a microsomal fraction were prepared by centrifugation. Gel-based and gel-free proteomic analyses were then performed. 2-DE analysis of soluble proteins led to the identification of 1373 protein spots, which were digested and analyzed by MS/MS, leading to 680 unique identifications. Both soluble proteins and microsomal fraction were analyzed by LC MALDI-MS/MS after trypsin digestion, leading to 858 identifications, many of which had not been identified after 2-DE, indicating that the two methods complement each other. Many enzymes putatively involved in secondary metabolism were identified, including enzymes involved in the synthesis of terpenoid precursors and in acyl sugar production. Several transporters were also identified, some of which might be involved in secondary metabolite transport. Various (a)biotic stress response proteins were also detected, supporting the role of trichomes in plant defense
Complementary techniques that deepen information content and minimize reagent costs are required to realize the full potential of massively parallel sequencing. Here, we describe a resequencing approach that directs focus to genomic regions of high interest by combining hybridization-based purification of multi-megabase regions with sequencing on the Illumina Genome Analyzer (GA). The capture matrix is created by a microarray on which probes can be programmed as desired to target any non-repeat portion of the genome, while the method requires only a basic familiarity with microarray hybridization. We present a detailed protocol suitable for 1-2 microg of input genomic DNA and highlight key design tips in which high specificity (>65% of reads stem from enriched exons) and high sensitivity (98% targeted base pair coverage) can be achieved. We have successfully applied this to the enrichment of coding regions, in both human and mouse, ranging from 0.5 to 4 Mb in length. From genomic DNA library production to base-called sequences, this procedure takes approximately 9-10 d inclusive of array captures and one Illumina flow cell run.
BACKGROUND: In order to identify new virulence determinants in Y. pseudotuberculosis a comparison between its genome and that of Yersinia pestis was undertaken. This reveals dozens of pseudogenes in Y. pestis, which are still putatively functional in Y. pseudotuberculosis and may be important in the enteric lifestyle. One such gene, YPTB1572 in the Y. pseudotuberculosis IP32953 genome sequence, encodes a protein with similarity to invasin, a classic adhesion/invasion protein, and to intimin, the attaching and effacing protein from enteropathogenic (EPEC) and enterohaemorraghic (EHEC) Escherichia coli. RESULTS: We termed YPTB1572 Ifp (Intimin family protein) and show that it is able to bind directly to human HEp-2 epithelial cells. Cysteine and tryptophan residues in the C-terminal region of intimin that are essential for function in EPEC and EHEC are conserved in Ifp. Protein binding occurred at distinct foci on the HEp-2 cell surface and can be disrupted by mutation of a single cysteine residue at the C-terminus of the protein. Temporal expression analysis using lux reporter constructs revealed that ifp is expressed at late log phase at 37°C in contrast to invasin, suggesting that Ifp is a late stage adhesin. An ifp defined mutant showed a reduction in adhesion to HEp-2 cells and was attenuated in the Galleria mellonella infection model. CONCLUSION: A new Y. pseudotuberculosis adhesin has been identified and characterised. This Ifp is a new member in the family of invasin/intimin outer membrane adhesins.
Despite the power of massively parallel sequencing platforms, a drawback is the short length of the sequence reads produced. We demonstrate that short reads can be locally assembled into longer contigs using paired-end sequencing of restriction-site associated DNA (RAD-PE) fragments. We use this RAD-PE contig approach to identify single nucleotide polymorphisms (SNPs) and determine haplotype structure in threespine stickleback and to sequence E. coli and stickleback genomic DNA with overlapping contigs of several hundred nucleotides. We also demonstrate that adding a circularization step allows the local assembly of contigs up to 5 kilobases (kb) in length. The ease of assembly and accuracy of the individual contigs produced from each RAD site sequence suggests RAD-PE sequencing is a useful way to convert genome-wide short reads into individually-assembled sequences hundreds or thousands of nucleotides long.
Over the course of more than a century of laboratory experimentation, Bacillus subtilis has become "domesticated," losing its ability to carry out many behaviors characteristic of its wild ancestors. One such characteristic is the ability to form architecturally complex communities, referred to as biofilms. Previous work has shown that the laboratory strain 168 forms markedly attenuated biofilms compared with the wild strain NCIB3610 (3610), even after repair of a mutation in sfp (a gene involved in surfactin production) previously known to impair biofilm formation. Here, we show that in addition to the sfp mutation, mutations in epsC, swrA, and degQ are necessary and sufficient to explain the inability of the laboratory strain to produce robust biofilms. Finally, we show that the architecture of the biofilm is markedly influenced by a large plasmid present in 3610 but not 168 and that the effect of the plasmid can be attributed to a gene we designate rapP. When rapP is introduced into 168 together with wild-type alleles of sfp, epsC, swrA, and degQ, the resulting repaired laboratory strain forms biofilms that are as robust as and essentially indistinguishable in architecture from those of the wild strain, 3610. Thus, domestication of B. subtilis involved the accumulation of four mutations and the loss of a plasmid-borne gene.
Turning genetic discoveries identified in genome-wide association (GWA) studies into biological mechanisms is an important challenge in human genetics. Many GWA signals map outside exons, suggesting that the associated variants may lie within regulatory regions. We applied the formaldehyde-assisted isolation of regulatory elements (FAIRE) method in a megakaryocytic and an erythroblastoid cell line to map active regulatory elements at known loci associated with hematological quantitative traits, coronary artery disease, and myocardial infarction. We showed that the two cell types exhibit distinct patterns of open chromatin and that cell-specific open chromatin can guide the finding of functional variants. We identified an open chromatin region at chromosome 7q22.3 in megakaryocytes but not erythroblasts, which harbors the common non-coding sequence variant rs342293 known to be associated with platelet volume and function. Resequencing of this open chromatin region in 643 individuals provided strong evidence that rs342293 is the only putative causative variant in this region. We demonstrated that the C- and G-alleles differentially bind the transcription factor EVI1 affecting PIK3CG gene expression in platelets and macrophages. A protein-protein interaction network including up- and down-regulated genes in Pik3cg knockout mice indicated that PIK3CG is associated with gene pathways with an established role in platelet membrane biogenesis and thrombus formation. Thus, rs342293 is the functional common variant at this locus; to the best of our knowledge this is the first such variant to be elucidated among the known platelet quantitative trait loci (QTLs). Our data suggested a molecular mechanism by which a non-coding GWA index SNP modulates platelet phenotype.
Targeted sequencing is a cost-efficient way to obtain answers to biological questions in many projects, but the choice of the enrichment method to use can be difficult. In this study we compared two hybridization methods for target enrichment for massively parallel sequencing and single nucleotide polymorphism (SNP) discovery, namely Nimblegen sequence capture arrays and the SureSelect liquid-based hybrid capture system. We prepared sequencing libraries from three HapMap samples using both methods, sequenced the libraries on the Illumina Genome Analyzer, mapped the sequencing reads back to the genome, and called variants in the sequences. 74-75% of the sequence reads originated from the targeted region in the SureSelect libraries and 41-67% in the Nimblegen libraries. We could sequence up to 99.9% and 99.5% of the regions targeted by capture probes from the SureSelect libraries and from the Nimblegen libraries, respectively. The Nimblegen probes covered 0.6 Mb more of the original 3.1 Mb target region than the SureSelect probes. In each sample, we called more SNPs and detected more novel SNPs from the libraries that were prepared using the Nimblegen method. Thus the Nimblegen method gave better results when judged by the number of SNPs called, but this came at the cost of more over-sampling.
The regulation of neutrophil lifespan by induction of apoptosis is critical for maintaining an effective host response and preventing excessive inflammation. The hypoxia-inducible factor (HIF) oxygen-sensing pathway has a major effect on the susceptibility of neutrophils to apoptosis, with a marked delay in cell death observed under hypoxic conditions. HIF expression and transcriptional activity are regulated by the oxygen-sensitive prolyl hydroxylases (PHD1-3), but the role of PHDs in neutrophil survival is unclear. We examined PHD expression in human neutrophils and found that PHD3 was strongly induced in response to hypoxia and inflammatory stimuli in vitro and in vivo. Using neutrophils from mice deficient in Phd3, we demonstrated a unique role for Phd3 in prolonging neutrophil survival during hypoxia, distinct from other hypoxia-associated changes in neutrophil function and metabolic activity. Moreover, this selective defect in neutrophil survival occurred in the presence of preserved HIF transcriptional activity but was associated with upregulation of the proapoptotic mediator Siva1 and loss of its binding target Bcl-xL. In vivo, using an acute lung injury model, we observed increased levels of neutrophil apoptosis and clearance in Phd3-deficient mice compared with WT controls. We also observed reduced neutrophilic inflammation in an acute mouse model of colitis. These data support what we believe to be a novel function for PHD3 in regulating neutrophil survival in hypoxia and may enable the development of new therapeutics for inflammatory disease.
BACKGROUND: Different high-throughput nucleic acid sequencing platforms are currently available but a trade-off currently exists between the cost and number of reads that can be generated versus the read length that can be achieved. METHODOLOGY/PRINCIPAL FINDINGS: We describe an experimental and computational pipeline yielding millions of reads that can exceed 200 bp with quality scores approaching that of traditional Sanger sequencing. The method combines an automatable gel-less library construction step with paired-end sequencing on a short-read instrument. With appropriately sized library inserts, mate-pair sequences can overlap, and we describe the SHERA software package that joins them to form a longer composite read. CONCLUSIONS/SIGNIFICANCE: This strategy is broadly applicable to sequencing applications that benefit from low-cost high-throughput sequencing, but require longer read lengths. We demonstrate that our approach enables metagenomic analyses using the Illumina Genome Analyzer, with low error rates, and at a fraction of the cost of pyrosequencing.
The western black cottonwood (Populus trichocarpa) was the first tree to have its genome fully sequenced and has emerged as the model species for the study of secondary growth and wood formation. It is also a good candidate species for the production of lignocellulosic biofuels. Here, we present and make available to the research community the results of the sequencing of the transcriptome of developing xylem in 20 accessions with high-throughput next generation sequencing technology. We found over 0.5 million putative single nucleotide polymorphisms (SNPs) in 26,595 genes that are expressed in developing secondary xylem. More than two-thirds of all SNPs were found in annotated exons, with 18% and 14% in regions of the genome annotated as introns and intergenic, respectively, where only 3% and 4% of sequence reads mapped. This suggests that the current annotation of the poplar genome is remarkably incomplete and that there are many transcripts and novel genes waiting to be annotated. We hope that this resource will stimulate further research in expression profiling, detection of alternative splicing and adaptive evolution in poplar.
To exploit contemporary sequencing technologies for targeted genetic analyses, we developed a hybridization enrichment strategy for DNA capture that uses PCR products as subgenomic traps. We applied this strategy to 115 kilobases of the human genome encompassing 47 genes implicated in cardiovascular disease. Massively parallel sequencing of captured subgenomic libraries interrogated 99.8% of targeted nucleotides >or=20 times ( approximately 40,000-fold enrichment), enabling sensitive and specific detection of sequence variation and copy-number variation.
Terminal osseous dysplasia (TOD) is an X-linked dominant male-lethal disease characterized by skeletal dysplasia of the limbs, pigmentary defects of the skin, and recurrent digital fibroma with onset in female infancy. After performing X-exome capture and sequencing, we identified a mutation at the last nucleotide of exon 31 of the FLNA gene as the most likely cause of the disease. The variant c.5217G>A was found in six unrelated cases (three families and three sporadic cases) and was not found in 400 control X chromosomes, pilot data from the 1000 Genomes Project, or the FLNA gene variant database. In the families, the variant segregated with the disease, and it was transmitted four times from a mildly affected mother to a more seriously affected daughter. We show that, because of nonrandom X chromosome inactivation, the mutant allele was not expressed in patient fibroblasts. RNA expression of the mutant allele was detected only in cultured fibroma cells obtained from 15-year-old surgically removed material. The variant activates a cryptic splice site, removing the last 48 nucleotides from exon 31. At the protein level, this results in a loss of 16 amino acids (p.Val1724_Thr1739del), predicted to remove a sequence at the surface of filamin repeat 15. Our data show that TOD is caused by this single recurrent mutation in the FLNA gene.
Screening large numbers of target regions in multiple DNA samples for sequence variation is an important application of next-generation sequencing but an efficient method to enrich the samples in parallel has yet to be reported. We describe an advanced method that combines DNA samples using indexes or barcodes prior to target enrichment to facilitate this type of experiment. Sequencing libraries for multiple individual DNA samples, each incorporating a unique 6-bp index, are combined in equal quantities, enriched using a single in-solution target enrichment assay and sequenced in a single reaction. Sequence reads are parsed based on the index, allowing sequence analysis of individual samples. We show that the use of indexed samples does not impact on the efficiency of the enrichment reaction. For three- and nine-indexed HapMap DNA samples, the method was found to be highly accurate for SNP identification. Even with sequence coverage as low as 8x, 99% of sequence SNP calls were concordant with known genotypes. Within a single experiment, this method can sequence the exonic regions of hundreds of genes in tens of samples for sequence and structural variation using as little as 1 μg of input DNA per sample.
Chromatin is regulated by cross talk among different histone modifications, which can occur between residues within the same tail or different tails in the nucleosome. The latter is referred to as trans-tail regulation, and the best-characterized example of this is the dependence of H3 methylation on H2B ubiquitylation. Here we describe a novel form of trans-tail regulation of histone modifications involving the N-terminal tail of histone H2A. Mutating or deleting residues in the N-terminal tail of H2A reduces H2B ubiquitylation and H3K4 methylation but does not affect the recruitment of the modifying enzymes, Rad6/Bre1 and COMPASS, to genes. The H2A tail is required for the incorporation of Cps35 into COMPASS, and increasing the level of ubiquitylated H2B in H2A tail mutants suppresses the H3K4 methylation defect, suggesting that the H2A tail regulates H2B-H3 cross talk. We mapped the region primarily responsible for this regulation to the H2A repression domain, HAR. The HAR and K123 of H2B are in close proximity to each other on the nucleosome, suggesting that they form a docking site for the ubiquitylation machinery. Interestingly, the HAR is partially occluded by nucleosomal DNA, suggesting that the function of the H2A cross talk pathway is to restrict histone modifications to nucleosomes altered by transcription.
The messenger RNA of the intronless CEBPA gene is translated into distinct protein isoforms through the usage of consecutive translation initiation sites. These translational isoforms have distinct functions in the regulation of differentiation and proliferation due to the presence of different N-terminal sequences. Here, we describe the function of an N-terminally extended protein isoform of CCAAT enhancer-binding protein a (C/EBPa) that is translated from an alternative non-AUG initiation codon. We show that a basic amino-acid motif within its N-terminus is required for nucleolar retention and for interaction with nucleophosmin (NPM). In the nucleoli, extended-C/EBPa occupies the ribosomal DNA (rDNA) promoter and associates with the Pol I-specific factors upstream-binding factor 1 (UBF-1) and SL1 to stimulate rRNA synthesis. Furthermore, during differentiation of HL-60 cells, endogenous expression of extended-C/EBPa is lost concomitantly with nucleolar C/EBPa immunostaining probably reflecting the reduced requirement for ribosome biogenesis in differentiated cells. Finally, overexpression of extended-C/EBPa induces an increase in cell size. Altogether, our results suggest that control of rRNA synthesis is a novel function of C/EBPa adding to its role as key regulator of cell growth and proliferation.
The transcription factor CCAAT/enhancer-binding protein alpha (C/EBPalpha) coordinates proliferation arrest and the differentiation of myeloid progenitors, adipocytes, hepatocytes, keratinocytes, and cells of the lung and placenta. C/EBPalpha transactivates lineage-specific differentiation genes and inhibits proliferation by repressing E2F-regulated genes. The myeloproliferative C/EBPalpha BRM2 mutant serves as a paradigm for recurrent human C-terminal bZIP C/EBPalpha mutations that are involved in acute myeloid leukemogenesis. BRM2 fails to repress E2F and to induce adipogenesis and granulopoiesis. The data presented here show that, independently of pocket proteins, C/EBPalpha interacts with the dimerization partner (DP) of E2F and that C/EBPalpha-E2F/DP interaction prevents both binding of C/EBPalpha to its cognate sites on DNA and transactivation of C/EBP target genes. The BRM2 mutant, in addition, exhibits enhanced interaction with E2F-DP and reduced affinity toward DNA and yet retains transactivation potential and differentiation competence that becomes exposed when E2F/DP levels are low. Our data suggest a tripartite balance between C/EBPalpha, E2F/DP, and pocket proteins in the control of proliferation, differentiation, and tumorigenesis.
Multiple myeloma (MM) is a genetically heterogeneous disease, which to date remains fatal. Finding a common mechanism for initiation and progression of MM continues to be challenging. By means of integrative genomics, we identified an underexpressed gene signature in MM patient cells compared to normal counterpart plasma cells. This profile was enriched for previously defined H3K27-tri-methylated genes, targets of the Polycomb group (PcG) proteins in human embryonic fibroblasts. Additionally, the silenced gene signature was more pronounced in ISS stage III MM compared to stage I and II. Using chromatin immunoprecipitation (ChIP) assay on purified CD138+ cells from four MM patients and on two MM cell lines, we found enrichment of H3K27me3 at genes selected from the profile. As the data implied that the Polycomb-targeted gene profile would be highly relevant for pharmacological treatment of MM, we used two compounds to chemically revert the H3K27-tri-methylation mediated gene silencing. The S-adenosylhomocysteine hydrolase inhibitor 3-Deazaneplanocin (DZNep) and the histone deacetylase inhibitor LBH589 (Panobinostat), reactivated the expression of genes repressed by H3K27me3, depleted cells from the PRC2 component EZH2 and induced apoptosis in human MM cell lines. In the immunocompetent 5T33MM in vivo model for MM, treatment with LBH589 resulted in gene upregulation, reduced tumor load and increased overall survival. Taken together, our results reveal a common gene signature in MM, mediated by gene silencing via the Polycomb repressor complex. The importance of the underexpressed gene profile in MM tumor initiation and progression should be subjected to further studies.