Class DFSInputStream

java.lang.Object
java.io.InputStream
org.apache.hadoop.fs.FSInputStream
org.apache.hadoop.hdfs.DFSInputStream
All Implemented Interfaces:
Closeable, AutoCloseable, org.apache.hadoop.fs.ByteBufferPositionedReadable, org.apache.hadoop.fs.ByteBufferReadable, org.apache.hadoop.fs.CanSetDropBehind, org.apache.hadoop.fs.CanSetReadahead, org.apache.hadoop.fs.CanUnbuffer, org.apache.hadoop.fs.HasEnhancedByteBufferAccess, org.apache.hadoop.fs.PositionedReadable, org.apache.hadoop.fs.Seekable, org.apache.hadoop.fs.StreamCapabilities
Direct Known Subclasses:
DFSStripedInputStream

@Private public class DFSInputStream extends org.apache.hadoop.fs.FSInputStream implements org.apache.hadoop.fs.ByteBufferReadable, org.apache.hadoop.fs.CanSetDropBehind, org.apache.hadoop.fs.CanSetReadahead, org.apache.hadoop.fs.HasEnhancedByteBufferAccess, org.apache.hadoop.fs.CanUnbuffer, org.apache.hadoop.fs.StreamCapabilities, org.apache.hadoop.fs.ByteBufferPositionedReadable
DFSInputStream provides bytes from a named file. It handles negotiation of the namenode and various datanodes as necessary.
  • Field Details

    • tcpReadsDisabledForTesting

      @VisibleForTesting public static boolean tcpReadsDisabledForTesting
    • dfsClient

      protected final DFSClient dfsClient
    • closed

      protected AtomicBoolean closed
    • src

      protected final String src
    • verifyChecksum

      protected final boolean verifyChecksum
    • currentLocatedBlock

      protected LocatedBlock currentLocatedBlock
    • pos

      protected long pos
    • blockEnd

      protected long blockEnd
    • locatedBlocks

      protected LocatedBlocks locatedBlocks
    • cachingStrategy

      protected CachingStrategy cachingStrategy
    • readStatistics

      protected final ReadStatistics readStatistics
    • infoLock

      protected final Object infoLock
    • failures

      protected int failures
      This variable tracks the number of failures since the start of the most recent user-facing operation. That is to say, it should be reset whenever the user makes a call on this stream, and if at any point during the retry logic, the failure count exceeds a threshold, the errors will be thrown back to the operation. Specifically this counts the number of times the client has gone back to the namenode to get a new list of block locations, and is capped at maxBlockAcquireFailures
  • Method Details

    • addToLocalDeadNodes

      protected void addToLocalDeadNodes(DatanodeInfo dnInfo)
    • removeFromLocalDeadNodes

      protected void removeFromLocalDeadNodes(DatanodeInfo dnInfo)
    • getLocalDeadNodes

      protected ConcurrentHashMap<DatanodeInfo,DatanodeInfo> getLocalDeadNodes()
    • getDFSClient

      protected DFSClient getDFSClient()
    • getFileLength

      public long getFileLength()
    • getCurrentDatanode

      public DatanodeInfo getCurrentDatanode()
      Returns the datanode from which the stream is currently reading.
    • getCurrentBlock

      public ExtendedBlock getCurrentBlock()
      Returns the block containing the target position.
    • getAllBlocks

      public List<LocatedBlock> getAllBlocks() throws IOException
      Return collection of blocks that has already been located.
      Throws:
      IOException
    • getSrc

      protected String getSrc()
    • getLocatedBlocks

      protected LocatedBlocks getLocatedBlocks()
    • getBlockAt

      protected LocatedBlock getBlockAt(long offset) throws IOException
      Get block at the specified position. Fetch it from the namenode if not cached.
      Parameters:
      offset - block corresponding to this offset in file is returned
      Returns:
      located block
      Throws:
      IOException
    • fetchBlockAt

      protected LocatedBlock fetchBlockAt(long offset) throws IOException
      Fetch a block from namenode and cache it
      Throws:
      IOException
    • getBlockReader

      protected BlockReader getBlockReader(LocatedBlock targetBlock, long offsetInBlock, long length, InetSocketAddress targetAddr, org.apache.hadoop.fs.StorageType storageType, DatanodeInfo datanode) throws IOException
      Throws:
      IOException
    • close

      public void close() throws IOException
      Close it down!
      Specified by:
      close in interface AutoCloseable
      Specified by:
      close in interface Closeable
      Overrides:
      close in class InputStream
      Throws:
      IOException
    • read

      public int read() throws IOException
      Specified by:
      read in class InputStream
      Throws:
      IOException
    • readWithStrategy

      protected int readWithStrategy(org.apache.hadoop.hdfs.ReaderStrategy strategy) throws IOException
      Throws:
      IOException
    • getCurrentBlockLocationsLength

      protected int getCurrentBlockLocationsLength()
    • read

      public int read(@Nonnull byte[] buf, int off, int len) throws IOException
      Read the entire buffer.
      Overrides:
      read in class InputStream
      Throws:
      IOException
    • read

      public int read(ByteBuffer buf) throws IOException
      Specified by:
      read in interface org.apache.hadoop.fs.ByteBufferReadable
      Throws:
      IOException
    • getBestNodeDNAddrPair

      protected org.apache.hadoop.hdfs.DFSInputStream.DNAddrPair getBestNodeDNAddrPair(LocatedBlock block, Collection<DatanodeInfo> ignoredNodes)
      Get the best node from which to stream the data.
      Parameters:
      block - LocatedBlock, containing nodes in priority order.
      ignoredNodes - Do not choose nodes in this array (may be null)
      Returns:
      The DNAddrPair of the best node. Null if no node can be chosen.
    • reportLostBlock

      protected void reportLostBlock(LocatedBlock lostBlock, Collection<DatanodeInfo> ignoredNodes)
      Warn the user of a lost block
    • fetchBlockByteRange

      protected void fetchBlockByteRange(LocatedBlock block, long start, long end, ByteBuffer buf, DFSUtilClient.CorruptedBlocks corruptedBlocks, Map<InetSocketAddress,List<IOException>> exceptionMap) throws IOException
      Throws:
      IOException
    • refreshLocatedBlock

      protected LocatedBlock refreshLocatedBlock(LocatedBlock block) throws IOException
      Refresh cached block locations.
      Parameters:
      block - The currently cached block locations
      Returns:
      Refreshed block locations
      Throws:
      IOException
    • getHedgedReadOpsLoopNumForTesting

      @VisibleForTesting public long getHedgedReadOpsLoopNumForTesting()
    • tokenRefetchNeeded

      protected static boolean tokenRefetchNeeded(IOException ex, InetSocketAddress targetAddr)
      Should the block access token be refetched on an exception
      Parameters:
      ex - Exception received
      targetAddr - Target datanode address from where exception was received
      Returns:
      true if block access token has expired or invalid and it should be refetched
    • read

      public int read(long position, byte[] buffer, int offset, int length) throws IOException
      Read bytes starting from the specified position.
      Specified by:
      read in interface org.apache.hadoop.fs.PositionedReadable
      Overrides:
      read in class org.apache.hadoop.fs.FSInputStream
      Parameters:
      position - start read from this position
      buffer - read buffer
      offset - offset into buffer
      length - number of bytes to read
      Returns:
      actual number of bytes read
      Throws:
      IOException
    • reportCheckSumFailure

      protected void reportCheckSumFailure(DFSUtilClient.CorruptedBlocks corruptedBlocks, int dataNodeCount, boolean isStriped)
      DFSInputStream reports checksum failure. For replicated blocks, we have the following logic: Case I : client has tried multiple data nodes and at least one of the attempts has succeeded. We report the other failures as corrupted block to namenode. Case II: client has tried out all data nodes, but all failed. We only report if the total number of replica is 1. We do not report otherwise since this maybe due to the client is a handicapped client (who can not read). For erasure-coded blocks, each block in corruptedBlockMap is an internal block in a block group, and there is usually only one DataNode corresponding to each internal block. For this case we simply report the corrupted blocks to NameNode and ignore the above logic.
      Parameters:
      corruptedBlocks - map of corrupted blocks
      dataNodeCount - number of data nodes who contains the block replicas
    • skip

      public long skip(long n) throws IOException
      Overrides:
      skip in class InputStream
      Throws:
      IOException
    • seek

      public void seek(long targetPos) throws IOException
      Seek to a new arbitrary location
      Specified by:
      seek in interface org.apache.hadoop.fs.Seekable
      Specified by:
      seek in class org.apache.hadoop.fs.FSInputStream
      Throws:
      IOException
    • seekToNewSource

      public boolean seekToNewSource(long targetPos) throws IOException
      Seek to given position on a node other than the current node. If a node other than the current node is found, then returns true. If another node could not be found, then returns false.
      Specified by:
      seekToNewSource in interface org.apache.hadoop.fs.Seekable
      Specified by:
      seekToNewSource in class org.apache.hadoop.fs.FSInputStream
      Throws:
      IOException
    • getPos

      public long getPos()
      Specified by:
      getPos in interface org.apache.hadoop.fs.Seekable
      Specified by:
      getPos in class org.apache.hadoop.fs.FSInputStream
    • available

      public int available() throws IOException
      Return the size of the remaining available bytes if the size is less than or equal to Integer.MAX_VALUE, otherwise, return Integer.MAX_VALUE.
      Overrides:
      available in class InputStream
      Throws:
      IOException
    • markSupported

      public boolean markSupported()
      We definitely don't support marks
      Overrides:
      markSupported in class InputStream
    • mark

      public void mark(int readLimit)
      Overrides:
      mark in class InputStream
    • reset

      public void reset() throws IOException
      Overrides:
      reset in class InputStream
      Throws:
      IOException
    • read

      public int read(long position, ByteBuffer buf) throws IOException
      Specified by:
      read in interface org.apache.hadoop.fs.ByteBufferPositionedReadable
      Throws:
      IOException
    • readFully

      public void readFully(long position, ByteBuffer buf) throws IOException
      Specified by:
      readFully in interface org.apache.hadoop.fs.ByteBufferPositionedReadable
      Throws:
      IOException
    • getReadStatistics

      public ReadStatistics getReadStatistics()
      Get statistics about the reads which this DFSInputStream has done.
    • clearReadStatistics

      public void clearReadStatistics()
      Clear statistics about the reads which this DFSInputStream has done.
    • getFileEncryptionInfo

      public org.apache.hadoop.fs.FileEncryptionInfo getFileEncryptionInfo()
    • closeCurrentBlockReaders

      protected void closeCurrentBlockReaders()
    • setReadahead

      public void setReadahead(Long readahead) throws IOException
      Specified by:
      setReadahead in interface org.apache.hadoop.fs.CanSetReadahead
      Throws:
      IOException
    • setDropBehind

      public void setDropBehind(Boolean dropBehind) throws IOException
      Specified by:
      setDropBehind in interface org.apache.hadoop.fs.CanSetDropBehind
      Throws:
      IOException
    • read

      public ByteBuffer read(org.apache.hadoop.io.ByteBufferPool bufferPool, int maxLength, EnumSet<org.apache.hadoop.fs.ReadOption> opts) throws IOException, UnsupportedOperationException
      Specified by:
      read in interface org.apache.hadoop.fs.HasEnhancedByteBufferAccess
      Throws:
      IOException
      UnsupportedOperationException
    • releaseBuffer

      public void releaseBuffer(ByteBuffer buffer)
      Specified by:
      releaseBuffer in interface org.apache.hadoop.fs.HasEnhancedByteBufferAccess
    • unbuffer

      public void unbuffer()
      Specified by:
      unbuffer in interface org.apache.hadoop.fs.CanUnbuffer
    • hasCapability

      public boolean hasCapability(String capability)
      Specified by:
      hasCapability in interface org.apache.hadoop.fs.StreamCapabilities
    • maybeRegisterBlockRefresh

      protected void maybeRegisterBlockRefresh()
      Many DFSInputStreams can be opened and closed in quick succession, in which case they would be registered/deregistered but never need to be refreshed. Defers registering with the located block refresher, in order to avoid an additional source of unnecessary synchronization for short-lived DFSInputStreams.