Class DFSOutputStream

java.lang.Object
java.io.OutputStream
org.apache.hadoop.fs.FSOutputSummer
org.apache.hadoop.hdfs.DFSOutputStream
All Implemented Interfaces:
Closeable, Flushable, AutoCloseable, org.apache.hadoop.fs.CanSetDropBehind, org.apache.hadoop.fs.StreamCapabilities, org.apache.hadoop.fs.Syncable
Direct Known Subclasses:
DFSStripedOutputStream

@Private public class DFSOutputStream extends org.apache.hadoop.fs.FSOutputSummer implements org.apache.hadoop.fs.Syncable, org.apache.hadoop.fs.CanSetDropBehind, org.apache.hadoop.fs.StreamCapabilities
DFSOutputStream creates files from a stream of bytes. The client application writes data that is cached internally by this stream. Data is broken up into packets, each packet is typically 64K in size. A packet comprises of chunks. Each chunk is typically 512 bytes and has an associated checksum with it. When a client application fills up the currentPacket, it is enqueued into the dataQueue of DataStreamer. DataStreamer is a thread that picks up packets from the dataQueue and sends it to the first datanode in the pipeline.
  • Field Details

    • dfsClient

      protected final DFSClient dfsClient
    • byteArrayManager

      protected final ByteArrayManager byteArrayManager
    • closed

      protected volatile boolean closed
    • src

      protected final String src
    • fileId

      protected final long fileId
    • blockSize

      protected final long blockSize
    • bytesPerChecksum

      protected final int bytesPerChecksum
    • currentPacket

      protected DFSPacket currentPacket
    • streamer

      protected org.apache.hadoop.hdfs.DataStreamer streamer
    • packetSize

      protected int packetSize
    • chunksPerPacket

      protected int chunksPerPacket
    • lastFlushOffset

      protected long lastFlushOffset
    • initialFileSize

      protected long initialFileSize
    • shouldSyncBlock

      protected boolean shouldSyncBlock
    • cachingStrategy

      protected final AtomicReference<CachingStrategy> cachingStrategy
  • Constructor Details

    • DFSOutputStream

      protected DFSOutputStream(DFSClient dfsClient, String src, HdfsFileStatus stat, EnumSet<org.apache.hadoop.fs.CreateFlag> flag, org.apache.hadoop.util.Progressable progress, org.apache.hadoop.util.DataChecksum checksum, String[] favoredNodes, boolean createStreamer)
      Construct a new output stream for creating a file.
  • Method Details

    • createPacket

      protected DFSPacket createPacket(int packetSize, int chunksPerPkt, long offsetInBlock, long seqno, boolean lastPacketInBlock) throws InterruptedIOException
      Use ByteArrayManager to create buffer for non-heartbeat packets.
      Throws:
      InterruptedIOException
    • checkClosed

      protected void checkClosed() throws IOException
      Specified by:
      checkClosed in class org.apache.hadoop.fs.FSOutputSummer
      Throws:
      IOException
    • getPipeline

      @VisibleForTesting public DatanodeInfo[] getPipeline()
    • computePacketChunkSize

      protected void computePacketChunkSize(int psize, int csize)
    • createWriteTraceScope

      protected org.apache.hadoop.tracing.TraceScope createWriteTraceScope()
      Overrides:
      createWriteTraceScope in class org.apache.hadoop.fs.FSOutputSummer
    • writeChunk

      protected void writeChunk(byte[] b, int offset, int len, byte[] checksum, int ckoff, int cklen) throws IOException
      Specified by:
      writeChunk in class org.apache.hadoop.fs.FSOutputSummer
      Throws:
      IOException
    • writeChunk

      protected void writeChunk(ByteBuffer buffer, int len, byte[] checksum, int ckoff, int cklen) throws IOException
      Throws:
      IOException
    • adjustChunkBoundary

      protected void adjustChunkBoundary()
      If the reopened file did not end at chunk boundary and the above write filled up its partial chunk. Tell the summer to generate full crc chunks from now on.
    • hasCapability

      public boolean hasCapability(String capability)
      Specified by:
      hasCapability in interface org.apache.hadoop.fs.StreamCapabilities
      Overrides:
      hasCapability in class org.apache.hadoop.fs.FSOutputSummer
    • hflush

      public void hflush() throws IOException
      Flushes out to all replicas of the block. The data is in the buffers of the DNs but not necessarily in the DN's OS buffers. It is a synchronous operation. When it returns, it guarantees that flushed data become visible to new readers. It is not guaranteed that data has been flushed to persistent store on the datanode. Block allocations are persisted on namenode.
      Specified by:
      hflush in interface org.apache.hadoop.fs.Syncable
      Throws:
      IOException
    • hsync

      public void hsync() throws IOException
      Specified by:
      hsync in interface org.apache.hadoop.fs.Syncable
      Throws:
      IOException
    • hsync

      public void hsync(EnumSet<HdfsDataOutputStream.SyncFlag> syncFlags) throws IOException
      The expected semantics is all data have flushed out to all replicas and all replicas have done posix fsync equivalent - ie the OS has flushed it to the disk device (but the disk may have it in its cache). Note that only the current block is flushed to the disk device. To guarantee durable sync across block boundaries the stream should be created with CreateFlag.SYNC_BLOCK.
      Parameters:
      syncFlags - Indicate the semantic of the sync. Currently used to specify whether or not to update the block length in NameNode.
      Throws:
      IOException
    • getNumCurrentReplicas

      @Deprecated public int getNumCurrentReplicas() throws IOException
      Throws:
      IOException
    • getCurrentBlockReplication

      public int getCurrentBlockReplication() throws IOException
      Note that this is not a public API; use HdfsDataOutputStream.getCurrentBlockReplication() instead.
      Returns:
      the number of valid replicas of the current block
      Throws:
      IOException
    • flushInternal

      protected void flushInternal() throws IOException
      Waits till all existing data is flushed and confirmations received from datanodes.
      Throws:
      IOException
    • start

      protected void start()
    • closeThreads

      protected void closeThreads(boolean force) throws IOException
      Throws:
      IOException
    • close

      public void close() throws IOException
      Closes this output stream and releases any system resources associated with this stream.
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class OutputStream
      Throws:
      IOException
    • closeImpl

      protected void closeImpl() throws IOException
      Throws:
      IOException
    • recoverLease

      protected void recoverLease(boolean recoverLeaseOnCloseException)
      If recoverLeaseOnCloseException is true and an exception occurs when closing a file, recover lease.
    • completeFile

      protected void completeFile(ExtendedBlock last) throws IOException
      Throws:
      IOException
    • setArtificialSlowdown

      @VisibleForTesting public void setArtificialSlowdown(long period)
    • setChunksPerPacket

      @VisibleForTesting public void setChunksPerPacket(int value)
    • getInitialLen

      public long getInitialLen()
      Returns the size of a file as it was when this stream was opened
    • getAddBlockFlags

      protected EnumSet<AddBlockFlag> getAddBlockFlags()
    • getFileEncryptionInfo

      public org.apache.hadoop.fs.FileEncryptionInfo getFileEncryptionInfo()
      Returns:
      the FileEncryptionInfo for this stream, or null if not encrypted.
    • flushInternalWithoutWaitingAck

      protected long flushInternalWithoutWaitingAck() throws IOException
      Throws:
      IOException
    • setDropBehind

      public void setDropBehind(Boolean dropBehind) throws IOException
      Specified by:
      setDropBehind in interface org.apache.hadoop.fs.CanSetDropBehind
      Throws:
      IOException
    • getFileId

      @VisibleForTesting public long getFileId()
    • getNamespace

      @VisibleForTesting public String getNamespace()
    • getUniqKey

      @VisibleForTesting public String getUniqKey()
    • getStreamer

      protected org.apache.hadoop.hdfs.DataStreamer getStreamer()
      Returns the data streamer object.
    • toString

      public String toString()
      Overrides:
      toString in class Object