Package org.snpeff.fileIterator
Class VcfFileIterator
Opens a VCF file and iterates over all entries (i.e. VCF lines in the file)
Format: VCF 4.1
Reference: http://www.1000genomes.org/wiki/Analysis/Variant%20Call%20Format/vcf-variant-call-format-version-41
Old 4.0 format: http://www.1000genomes.org/wiki/doku.php?id=1000_genomes:analysis:vcf4.0
1. CHROM chromosome: an identifier from the reference genome. All entries for a specific CHROM should form a contiguous block within the VCF file.(Alphanumeric String, Required)
2. POS position: The reference position, with the 1st base having position 1. Positions are sorted numerically, in increasing order, within each reference sequence CHROM. (Integer, Required)
3. ID semi-colon separated list of unique identifiers where available. If this is a dbSNP variant it is encouraged to use the rs number(s). No identifier should be present in more than one data record. If there is no identifier available, then the missing value should be used. (Alphanumeric String)
4. REF reference base(s): Each base must be one of A,C,G,T,N. Bases should be in uppercase. Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For InDels, the reference String must include the base before the event (which must be reflected in the POS field). (String, Required).
5. ALT comma separated list of alternate non-reference alleles called on at least one of the samples. Options are base Strings made up of the bases A,C,G,T,N, or an angle-bracketed ID String (ââ). If there are no alternative alleles, then the missing value should be used. Bases should be in uppercase. (Alphanumeric String; no whitespace, commas, or angle-brackets are permitted in the ID String itself)
6. QUAL phred-scaled quality score for the assertion made in ALT. i.e. give -10log_10 prob(call in ALT is wrong). If ALT is 0/0 (no variant) then this is -10log_10 p(variant), and if ALT is not â.â this is -10log_10 p(no variant). High QUAL scores indicate high confidence calls. Although traditionally people use integer phred scores, this field is permitted to be a floating point to enable higher resolution for low confidence calls if desired. (Numeric)
7. FILTER filter: PASS if this position has passed all filters, i.e. a call is made at this position. Otherwise, if the site has not passed all filters, a semicolon-separated list of codes for filters that fail. e.g. âq10;s50â might indicate that at this site the quality is below 10 and the number of samples with data is below 50% of the total number of samples. â0â is reserved and should not be used as a filter String. If filters have not been applied, then this field should be set to the missing value. (Alphanumeric String)
8. INFO additional information: (Alphanumeric String) INFO fields are encoded as a semicolon-separated series of short keys with optional values in the format: =[,data]. Arbitrary keys are permitted, although the following sub-fields are reserved (albeit optional):
- AA ancestral allele
- AC allele count in genotypes, for each ALT allele, in the same order as listed
- AF allele frequency for each ALT allele in the same order as listed: use this when estimated from primary data, not called genotypes
- AN total number of alleles in called genotypes
- BQ RMS base quality at this position
- CIGAR cigar string describing how to align an alternate allele to the reference allele
- DB dbSNP membership
- DP combined depth across samples, e.g. DP=154
- END end position of the variant described in this record (esp. for CNVs)
- H2 membership in hapmap2
- MQ RMS mapping quality, e.g. MQ=52
- MQ0 Number of MAPQ == 0 reads covering this record
- NS Number of samples with data
- SB strand bias at this position
- SOMATIC indicates that the record is a somatic mutation, for cancer genomics
- VALIDATED validated by follow-up experiment
Warning: You can have more than one variant (and variant type) per VCF line (i.e. VCfEntry), e.g.:
TTG -> TTGTG,T Insertion of 'TG' and deletion of 'TG'
TA -> T,TT Deletion of 'A' and SNP (A replaced by T)
T -> TTTTGTG,TTTTG,TTGTG Insertion of 'TTTGTG', insertion of 'TTTG' and insertion of 'TGTG'
- Author:
- pcingola
-
Field Summary
FieldsFields inherited from class org.snpeff.fileIterator.MarkerFileIterator
createChromos, genome, ignoreChromosomeErrors, inOffset
-
Constructor Summary
ConstructorsConstructorDescriptionVcfFileIterator
(BufferedReader reader) VcfFileIterator
(String fileName) VcfFileIterator
(String fileName, Genome genome) -
Method Summary
Modifier and TypeMethodDescriptionstatic VcfFileIterator
fromString
(String vcfLines) Create a VcfFileIterator from a string containig VCF linesGet sample namesGet VcfHeaderprotected void
init()
boolean
boolean
parseVcfLine
(String line) Parse a line from a VCF fileRead a field an return a valueRead only header infoprotected VcfEntry
readNext()
Read next elementvoid
setCreateChromos
(boolean createChromos) void
setErrorIfUnsorted
(boolean errorIfUnsorted) void
setExpandIub
(boolean expandIub) void
setInOffset
(int inOffset) void
setParseNow
(boolean parseNow) Should we parse vcfEntries later? (lazy parsing)void
setVcfHeader
(VcfHeader header) Set headerMethods inherited from class org.snpeff.fileIterator.MarkerFileIterator
getChromosome, getGenome, init, isIgnoreChromosomeErrors, loadMarkers, parsePosition, sanityCheckChromo, setIgnoreChromosomeErrors
Methods inherited from class org.snpeff.fileIterator.FileIterator
close, countNewLineChars, getFilePointer, getLine, getLineNum, guessNewLineChars, hasNext, hasSeek, isDebug, iterator, load, next, readLine, ready, remove, seek, setAutoClose, setDebug, setVerbose, toString
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, wait
Methods inherited from interface java.lang.Iterable
forEach, spliterator
Methods inherited from interface java.util.Iterator
forEachRemaining
-
Field Details
-
MISSING
- See Also:
-
-
Constructor Details
-
VcfFileIterator
public VcfFileIterator() -
VcfFileIterator
-
VcfFileIterator
-
VcfFileIterator
-
-
Method Details
-
fromString
Create a VcfFileIterator from a string containig VCF lines -
getSampleNames
Get sample names -
getVcfHeader
Get VcfHeader -
init
protected void init() -
isExpandIub
public boolean isExpandIub() -
isHeadeSection
public boolean isHeadeSection() -
parse
-
parseVcfLine
Parse a line from a VCF file -
readField
Read a field an return a value -
readHeader
Read only header info -
readNext
Description copied from class:FileIterator
Read next element- Specified by:
readNext
in classFileIterator<VcfEntry>
-
setCreateChromos
public void setCreateChromos(boolean createChromos) - Overrides:
setCreateChromos
in classMarkerFileIterator<VcfEntry>
-
setErrorIfUnsorted
public void setErrorIfUnsorted(boolean errorIfUnsorted) -
setExpandIub
public void setExpandIub(boolean expandIub) -
setInOffset
public void setInOffset(int inOffset) - Overrides:
setInOffset
in classMarkerFileIterator<VcfEntry>
-
setParseNow
public void setParseNow(boolean parseNow) Should we parse vcfEntries later? (lazy parsing) -
setVcfHeader
Set header
-