Package org.apache.hadoop.io
Class SequenceFile.Sorter
java.lang.Object
org.apache.hadoop.io.SequenceFile.Sorter
- Enclosing class:
- SequenceFile
Sorts key/value pairs in a sequence-format file.
For best performance, applications should make sure that the Writable.readFields(DataInput) implementation of their keys is
very efficient. In particular, it should avoid allocating memory.
-
Nested Class Summary
Nested ClassesModifier and TypeClassDescriptionstatic interfaceThe interface to iterate over raw keys/values of SequenceFiles.classThis class defines a merge segment. -
Constructor Summary
ConstructorsConstructorDescriptionSorter(FileSystem fs, Class<? extends WritableComparable> keyClass, Class valClass, Configuration conf) Sort and merge files containing the named classes.Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf) Sort and merge using an arbitraryRawComparator.Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf, SequenceFile.Metadata metadata) Sort and merge using an arbitraryRawComparator. -
Method Summary
Modifier and TypeMethodDescriptioncloneFileAttributes(Path inputFile, Path outputFile, Progressable prog) Clones the attributes (like compression of the input file and creates a corresponding Writerintintmerge(List<SequenceFile.Sorter.SegmentDescriptor> segments, Path tmpDir) Merges the list of segments of typeSegmentDescriptorMerges the contents of files passed in Path[]Merges the contents of files passed in Path[] using a max factor value that is already setvoidMerge the provided files.Merges the contents of files passed in Path[]voidsetFactor(int factor) Set the number of streams to merge at once.voidsetMemory(int memory) Set the total amount of buffer memory, in bytes.voidsetProgressable(Progressable progressable) Set the progressable object in order to report progress.voidPerform a file sort from a set of input files into an output file.voidThe backwards compatible interface to sort.sortAndIterate(Path[] inFiles, Path tempDir, boolean deleteInput) Perform a file sort from a set of input files and return an iterator.voidwriteFile(SequenceFile.Sorter.RawKeyValueIterator records, SequenceFile.Writer writer) Writes records from RawKeyValueIterator into a file represented by the passed writer.
-
Constructor Details
-
Sorter
public Sorter(FileSystem fs, Class<? extends WritableComparable> keyClass, Class valClass, Configuration conf) Sort and merge files containing the named classes.- Parameters:
fs- input FileSystem.keyClass- input keyClass.valClass- input valClass.conf- input Configuration.
-
Sorter
public Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf) Sort and merge using an arbitraryRawComparator.- Parameters:
fs- input FileSystem.comparator- input RawComparator.keyClass- input keyClass.valClass- input valClass.conf- input Configuration.
-
Sorter
public Sorter(FileSystem fs, RawComparator comparator, Class keyClass, Class valClass, Configuration conf, SequenceFile.Metadata metadata) Sort and merge using an arbitraryRawComparator.- Parameters:
fs- input FileSystem.comparator- input RawComparator.keyClass- input keyClass.valClass- input valClass.conf- input Configuration.metadata- input metadata.
-
-
Method Details
-
setFactor
public void setFactor(int factor) Set the number of streams to merge at once.- Parameters:
factor- factor.
-
getFactor
public int getFactor()- Returns:
- Get the number of streams to merge at once.
-
setMemory
public void setMemory(int memory) Set the total amount of buffer memory, in bytes.- Parameters:
memory- buffer memory.
-
getMemory
public int getMemory()- Returns:
- Get the total amount of buffer memory, in bytes.
-
setProgressable
Set the progressable object in order to report progress.- Parameters:
progressable- input Progressable.
-
sort
Perform a file sort from a set of input files into an output file.- Parameters:
inFiles- the files to be sortedoutFile- the sorted output filedeleteInput- should the input files be deleted as they are read?- Throws:
IOException- raised on errors performing I/O.
-
sortAndIterate
public SequenceFile.Sorter.RawKeyValueIterator sortAndIterate(Path[] inFiles, Path tempDir, boolean deleteInput) throws IOException Perform a file sort from a set of input files and return an iterator.- Parameters:
inFiles- the files to be sortedtempDir- the directory where temp files are created during sortdeleteInput- should the input files be deleted as they are read?- Returns:
- iterator the RawKeyValueIterator
- Throws:
IOException- raised on errors performing I/O.
-
sort
The backwards compatible interface to sort.- Parameters:
inFile- the input file to sort.outFile- the sorted output file.- Throws:
IOException- raised on errors performing I/O.
-
merge
public SequenceFile.Sorter.RawKeyValueIterator merge(List<SequenceFile.Sorter.SegmentDescriptor> segments, Path tmpDir) throws IOException Merges the list of segments of typeSegmentDescriptor- Parameters:
segments- the list of SegmentDescriptorstmpDir- the directory to write temporary files into- Returns:
- RawKeyValueIterator
- Throws:
IOException- raised on errors performing I/O.
-
merge
public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, boolean deleteInputs, Path tmpDir) throws IOException Merges the contents of files passed in Path[] using a max factor value that is already set- Parameters:
inNames- the array of path namesdeleteInputs- true if the input files should be deleted when unnecessarytmpDir- the directory to write temporary files into- Returns:
- RawKeyValueIteratorMergeQueue
- Throws:
IOException- raised on errors performing I/O.
-
merge
public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, boolean deleteInputs, int factor, Path tmpDir) throws IOException Merges the contents of files passed in Path[]- Parameters:
inNames- the array of path namesdeleteInputs- true if the input files should be deleted when unnecessaryfactor- the factor that will be used as the maximum merge fan-intmpDir- the directory to write temporary files into- Returns:
- RawKeyValueIteratorMergeQueue
- Throws:
IOException- raised on errors performing I/O.
-
merge
public SequenceFile.Sorter.RawKeyValueIterator merge(Path[] inNames, Path tempDir, boolean deleteInputs) throws IOException Merges the contents of files passed in Path[]- Parameters:
inNames- the array of path namestempDir- the directory for creating temp files during mergedeleteInputs- true if the input files should be deleted when unnecessary- Returns:
- RawKeyValueIteratorMergeQueue
- Throws:
IOException- raised on errors performing I/O.
-
cloneFileAttributes
public SequenceFile.Writer cloneFileAttributes(Path inputFile, Path outputFile, Progressable prog) throws IOException Clones the attributes (like compression of the input file and creates a corresponding Writer- Parameters:
inputFile- the path of the input file whose attributes should be clonedoutputFile- the path of the output fileprog- the Progressable to report status during the file write- Returns:
- Writer
- Throws:
IOException- raised on errors performing I/O.
-
writeFile
public void writeFile(SequenceFile.Sorter.RawKeyValueIterator records, SequenceFile.Writer writer) throws IOException Writes records from RawKeyValueIterator into a file represented by the passed writer.- Parameters:
records- the RawKeyValueIteratorwriter- the Writer created earlier- Throws:
IOException- raised on errors performing I/O.
-
merge
Merge the provided files.- Parameters:
inFiles- the array of input path namesoutFile- the final output file- Throws:
IOException- raised on errors performing I/O.
-