Package org.apache.hadoop.hdfs
Class DFSOutputStream
java.lang.Object
java.io.OutputStream
org.apache.hadoop.fs.FSOutputSummer
org.apache.hadoop.hdfs.DFSOutputStream
- All Implemented Interfaces:
Closeable,Flushable,AutoCloseable,org.apache.hadoop.fs.CanSetDropBehind,org.apache.hadoop.fs.StreamCapabilities,org.apache.hadoop.fs.Syncable
- Direct Known Subclasses:
DFSStripedOutputStream
@Private
public class DFSOutputStream
extends org.apache.hadoop.fs.FSOutputSummer
implements org.apache.hadoop.fs.Syncable, org.apache.hadoop.fs.CanSetDropBehind, org.apache.hadoop.fs.StreamCapabilities
DFSOutputStream creates files from a stream of bytes.
The client application writes data that is cached internally by
this stream. Data is broken up into packets, each packet is
typically 64K in size. A packet comprises of chunks. Each chunk
is typically 512 bytes and has an associated checksum with it.
When a client application fills up the currentPacket, it is
enqueued into the dataQueue of DataStreamer. DataStreamer is a
thread that picks up packets from the dataQueue and sends it to
the first datanode in the pipeline.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
org.apache.hadoop.fs.StreamCapabilities.StreamCapability -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected final longprotected final ByteArrayManagerprotected final intprotected final AtomicReference<CachingStrategy>protected intprotected booleanprotected DFSPacketprotected final DFSClientprotected final longprotected longprotected longprotected intprotected booleanprotected final Stringprotected org.apache.hadoop.hdfs.DataStreamerFields inherited from interface org.apache.hadoop.fs.StreamCapabilities
ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO, VECTOREDIO_BUFFERS_SLICED -
Constructor Summary
ConstructorsModifierConstructorDescriptionprotectedDFSOutputStream(DFSClient dfsClient, String src, HdfsFileStatus stat, EnumSet<org.apache.hadoop.fs.CreateFlag> flag, org.apache.hadoop.util.Progressable progress, org.apache.hadoop.util.DataChecksum checksum, String[] favoredNodes, boolean createStreamer) Construct a new output stream for creating a file. -
Method Summary
Modifier and TypeMethodDescriptionprotected voidIf the reopened file did not end at chunk boundary and the above write filled up its partial chunk.protected voidvoidclose()Closes this output stream and releases any system resources associated with this stream.protected voidprotected voidcloseThreads(boolean force) protected voidcompleteFile(ExtendedBlock last) protected voidcomputePacketChunkSize(int psize, int csize) protected DFSPacketcreatePacket(int packetSize, int chunksPerPkt, long offsetInBlock, long seqno, boolean lastPacketInBlock) UseByteArrayManagerto create buffer for non-heartbeat packets.protected org.apache.hadoop.tracing.TraceScopeprotected voidWaits till all existing data is flushed and confirmations received from datanodes.protected longprotected EnumSet<AddBlockFlag>intNote that this is not a public API; useHdfsDataOutputStream.getCurrentBlockReplication()instead.org.apache.hadoop.fs.FileEncryptionInfolonglongReturns the size of a file as it was when this stream was openedintDeprecated.protected org.apache.hadoop.hdfs.DataStreamerReturns the data streamer object.booleanhasCapability(String capability) voidhflush()Flushes out to all replicas of the block.voidhsync()voidhsync(EnumSet<HdfsDataOutputStream.SyncFlag> syncFlags) The expected semantics is all data have flushed out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache).protected voidrecoverLease(boolean recoverLeaseOnCloseException) If recoverLeaseOnCloseException is true and an exception occurs when closing a file, recover lease.voidsetArtificialSlowdown(long period) voidsetChunksPerPacket(int value) voidsetDropBehind(Boolean dropBehind) protected voidstart()toString()protected voidwriteChunk(byte[] b, int offset, int len, byte[] checksum, int ckoff, int cklen) protected voidwriteChunk(ByteBuffer buffer, int len, byte[] checksum, int ckoff, int cklen) Methods inherited from class org.apache.hadoop.fs.FSOutputSummer
convertToByteStream, flush, flushBuffer, flushBuffer, getBufferedDataSize, getChecksumSize, getDataChecksum, resetChecksumBufSize, setChecksumBufSize, write, writeMethods inherited from class java.io.OutputStream
nullOutputStream, write
-
Field Details
-
dfsClient
-
byteArrayManager
-
closed
protected volatile boolean closed -
src
-
fileId
protected final long fileId -
blockSize
protected final long blockSize -
bytesPerChecksum
protected final int bytesPerChecksum -
currentPacket
-
streamer
protected org.apache.hadoop.hdfs.DataStreamer streamer -
packetSize
protected int packetSize -
chunksPerPacket
protected int chunksPerPacket -
lastFlushOffset
protected long lastFlushOffset -
initialFileSize
protected long initialFileSize -
shouldSyncBlock
protected boolean shouldSyncBlock -
cachingStrategy
-
-
Constructor Details
-
DFSOutputStream
protected DFSOutputStream(DFSClient dfsClient, String src, HdfsFileStatus stat, EnumSet<org.apache.hadoop.fs.CreateFlag> flag, org.apache.hadoop.util.Progressable progress, org.apache.hadoop.util.DataChecksum checksum, String[] favoredNodes, boolean createStreamer) Construct a new output stream for creating a file.
-
-
Method Details
-
createPacket
protected DFSPacket createPacket(int packetSize, int chunksPerPkt, long offsetInBlock, long seqno, boolean lastPacketInBlock) throws InterruptedIOException UseByteArrayManagerto create buffer for non-heartbeat packets.- Throws:
InterruptedIOException
-
checkClosed
- Specified by:
checkClosedin classorg.apache.hadoop.fs.FSOutputSummer- Throws:
IOException
-
getPipeline
-
computePacketChunkSize
protected void computePacketChunkSize(int psize, int csize) -
createWriteTraceScope
protected org.apache.hadoop.tracing.TraceScope createWriteTraceScope()- Overrides:
createWriteTraceScopein classorg.apache.hadoop.fs.FSOutputSummer
-
writeChunk
protected void writeChunk(byte[] b, int offset, int len, byte[] checksum, int ckoff, int cklen) throws IOException - Specified by:
writeChunkin classorg.apache.hadoop.fs.FSOutputSummer- Throws:
IOException
-
writeChunk
protected void writeChunk(ByteBuffer buffer, int len, byte[] checksum, int ckoff, int cklen) throws IOException - Throws:
IOException
-
adjustChunkBoundary
protected void adjustChunkBoundary()If the reopened file did not end at chunk boundary and the above write filled up its partial chunk. Tell the summer to generate full crc chunks from now on. -
hasCapability
- Specified by:
hasCapabilityin interfaceorg.apache.hadoop.fs.StreamCapabilities- Overrides:
hasCapabilityin classorg.apache.hadoop.fs.FSOutputSummer
-
hflush
Flushes out to all replicas of the block. The data is in the buffers of the DNs but not necessarily in the DN's OS buffers. It is a synchronous operation. When it returns, it guarantees that flushed data become visible to new readers. It is not guaranteed that data has been flushed to persistent store on the datanode. Block allocations are persisted on namenode.- Specified by:
hflushin interfaceorg.apache.hadoop.fs.Syncable- Throws:
IOException
-
hsync
- Specified by:
hsyncin interfaceorg.apache.hadoop.fs.Syncable- Throws:
IOException
-
hsync
The expected semantics is all data have flushed out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache). Note that only the current block is flushed to the disk device. To guarantee durable sync across block boundaries the stream should be created withCreateFlag.SYNC_BLOCK.- Parameters:
syncFlags- Indicate the semantic of the sync. Currently used to specify whether or not to update the block length in NameNode.- Throws:
IOException
-
getNumCurrentReplicas
Deprecated.- Throws:
IOException
-
getCurrentBlockReplication
Note that this is not a public API; useHdfsDataOutputStream.getCurrentBlockReplication()instead.- Returns:
- the number of valid replicas of the current block
- Throws:
IOException
-
flushInternal
Waits till all existing data is flushed and confirmations received from datanodes.- Throws:
IOException
-
start
protected void start() -
closeThreads
- Throws:
IOException
-
close
Closes this output stream and releases any system resources associated with this stream.- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classOutputStream- Throws:
IOException
-
closeImpl
- Throws:
IOException
-
recoverLease
protected void recoverLease(boolean recoverLeaseOnCloseException) If recoverLeaseOnCloseException is true and an exception occurs when closing a file, recover lease. -
completeFile
- Throws:
IOException
-
setArtificialSlowdown
@VisibleForTesting public void setArtificialSlowdown(long period) -
setChunksPerPacket
@VisibleForTesting public void setChunksPerPacket(int value) -
getInitialLen
public long getInitialLen()Returns the size of a file as it was when this stream was opened -
getAddBlockFlags
-
getFileEncryptionInfo
public org.apache.hadoop.fs.FileEncryptionInfo getFileEncryptionInfo()- Returns:
- the FileEncryptionInfo for this stream, or null if not encrypted.
-
flushInternalWithoutWaitingAck
- Throws:
IOException
-
setDropBehind
- Specified by:
setDropBehindin interfaceorg.apache.hadoop.fs.CanSetDropBehind- Throws:
IOException
-
getFileId
@VisibleForTesting public long getFileId() -
getNamespace
-
getUniqKey
-
getStreamer
protected org.apache.hadoop.hdfs.DataStreamer getStreamer()Returns the data streamer object. -
toString
-
HdfsDataOutputStream.getCurrentBlockReplication().