Heng Li, author of BWA, states here that BWA will only set the ‘proper pair flag’ to 1 for Illumina reads aligned FR (for SOLiD it allows FF or RR). If read pairs don’t align FR, most aligners will flag them as “not a proper pair” in the SAM/BAM file by zeroing the FLAG 0×02 bit (proper pair flag) ( see SAM spec). Illumina), your reads are supposed to align FR, and if they instead align RF, FF or RR, that’s a problem and often indicates the reads aligned incorrectly (though it could also mean they aligned correctly and that a real inversion or translocation exists in the sample’s genome – see notes from Devin Absher’s talk on calling structural variants). This is different from FR because it means the reverse read aligned at a lower base pair position than the forward read, and thus that they are pointing away from another.īut if you’re just doing conventional paired-end sequencing (i.e. Some specialized technologies, such as using circularized DNA fragments to create large insert jumping libraries, switch things around so that your reads ought to align in an “RF” position – reverse/forward, in that order. This is all for conventional paired-end sequencing. This is known as an “FR” read – forward/reverse, in that order. When you align them to the genome, one read should align to the forward strand, and the other should align to the reverse strand, at a higher base pair position than the first one so that they are pointed towards one another. Therefore when you open your FASTQ files and look at a pair of reads, the sequences you see are, conceptually, pointing towards each other on opposite strands. This means your two reads are the reverse complement of the 100 3′-most bases of the Watson strand and the Crick strand these reads are assumed to be identical to the 100 5′-most bases of the Crick strand and Watson strand respectively. In conventional paired-end sequencing, you simply sequence using the adapter for one end, and then once you’re done you start over sequencing using the adapter for the other end. To avoid this problem, sequencing technologies ligate non-complementary adapters to the 3′ and 5′ ends of DNA fragments so that the primer for one adapter only begins synthesis on one strand and not on its complement. If you were to read both of the strands from their respective 3′ ends at once, you’d be getting two different sequences and your results would be uninterpretable. This means you end up with both strands of DNA. In any sequencing technology, you PCR amplify the individual DNA fragments once they have hybridized to flowcells or beads. Since the new strand is synthesized 5′-to-3′, you are working your way up the template strand in a 3′-to-5′ direction. That’s because this is how DNA polymerase works in our cells (indeed, in every living thing’s cells) and sequencing relies on DNA polymerase. Sequencing by synthesis, which is how most commercially available high-throughput sequencing technologies work as of December 2012 (see notes on sequencing technologies), always synthesizes the new strand (which becomes your read) in a 5′-to-3′ direction. It will start out big picture and then get into the weeds. This topic is incredibly easy to get confused about, so here is as clear an explanation as I can muster. Forward and reverse reads in paired-end sequencing
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |