Package org.apache.hadoop.mapred
Class SequenceFileInputFilter<K,V>
java.lang.Object
org.apache.hadoop.mapred.FileInputFormat<K,V>
org.apache.hadoop.mapred.SequenceFileInputFormat<K,V>
org.apache.hadoop.mapred.SequenceFileInputFilter<K,V>
- All Implemented Interfaces:
InputFormat<K,V>
A class that allows a map/red job to work on a sample of sequence files.
The sample is decided by the filter class set by the job.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceorg.apache.hadoop.mapred.SequenceFileInputFilter.Filterfilter interfacestatic classorg.apache.hadoop.mapred.SequenceFileInputFilter.FilterBasebase class for Filtersstatic classorg.apache.hadoop.mapred.SequenceFileInputFilter.MD5FilterThis class returns a set of records by examing the MD5 digest of its key against a filtering frequency f.static classorg.apache.hadoop.mapred.SequenceFileInputFilter.PercentFilterThis class returns a percentage of records The percentage is determined by a filtering frequency f using the criteria record# % f == 0.static classorg.apache.hadoop.mapred.SequenceFileInputFilter.RegexFilterRecords filter by matching key to regexNested classes/interfaces inherited from class org.apache.hadoop.mapred.FileInputFormat
FileInputFormat.Counter -
Field Summary
Fields inherited from class org.apache.hadoop.mapred.FileInputFormat
INPUT_DIR_NONRECURSIVE_IGNORE_SUBDIRS, INPUT_DIR_RECURSIVE, LOG, NUM_INPUT_FILES -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptiongetRecordReader(InputSplit split, JobConf job, Reporter reporter) Create a record reader for the given splitstatic voidsetFilterClass(Configuration conf, Class filterClass) set the filter classMethods inherited from class org.apache.hadoop.mapred.SequenceFileInputFormat
listStatusMethods inherited from class org.apache.hadoop.mapred.FileInputFormat
addInputPath, addInputPathRecursively, addInputPaths, computeSplitSize, getBlockIndex, getInputPathFilter, getInputPaths, getSplitHosts, getSplits, isSplitable, makeSplit, makeSplit, setInputPathFilter, setInputPaths, setInputPaths, setMinSplitSize
-
Constructor Details
-
SequenceFileInputFilter
public SequenceFileInputFilter()
-
-
Method Details
-
getRecordReader
public RecordReader<K,V> getRecordReader(InputSplit split, JobConf job, Reporter reporter) throws IOException Create a record reader for the given split- Specified by:
getRecordReaderin interfaceInputFormat<K,V> - Overrides:
getRecordReaderin classSequenceFileInputFormat<K,V> - Parameters:
split- file splitjob- job configurationreporter- reporter who sends report to task tracker- Returns:
- RecordReader
- Throws:
IOException
-
setFilterClass
set the filter class- Parameters:
conf- application configurationfilterClass- filter class
-