Package org.apache.hadoop.hdfs
Class DFSInputStream
java.lang.Object
java.io.InputStream
org.apache.hadoop.fs.FSInputStream
org.apache.hadoop.hdfs.DFSInputStream
- All Implemented Interfaces:
Closeable,AutoCloseable,org.apache.hadoop.fs.ByteBufferPositionedReadable,org.apache.hadoop.fs.ByteBufferReadable,org.apache.hadoop.fs.CanSetDropBehind,org.apache.hadoop.fs.CanSetReadahead,org.apache.hadoop.fs.CanUnbuffer,org.apache.hadoop.fs.HasEnhancedByteBufferAccess,org.apache.hadoop.fs.PositionedReadable,org.apache.hadoop.fs.Seekable,org.apache.hadoop.fs.StreamCapabilities
- Direct Known Subclasses:
DFSStripedInputStream
@Private
public class DFSInputStream
extends org.apache.hadoop.fs.FSInputStream
implements org.apache.hadoop.fs.ByteBufferReadable, org.apache.hadoop.fs.CanSetDropBehind, org.apache.hadoop.fs.CanSetReadahead, org.apache.hadoop.fs.HasEnhancedByteBufferAccess, org.apache.hadoop.fs.CanUnbuffer, org.apache.hadoop.fs.StreamCapabilities, org.apache.hadoop.fs.ByteBufferPositionedReadable
DFSInputStream provides bytes from a named file. It handles
negotiation of the namenode and various datanodes as necessary.
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.hadoop.fs.StreamCapabilities
org.apache.hadoop.fs.StreamCapabilities.StreamCapability -
Field Summary
FieldsModifier and TypeFieldDescriptionprotected longprotected CachingStrategyprotected AtomicBooleanprotected LocatedBlockprotected final DFSClientprotected intThis variable tracks the number of failures since the start of the most recent user-facing operation.protected final Objectprotected LocatedBlocksprotected longprotected final ReadStatisticsprotected final Stringstatic booleanprotected final booleanFields inherited from interface org.apache.hadoop.fs.StreamCapabilities
ABORTABLE_STREAM, DROPBEHIND, HFLUSH, HSYNC, IOSTATISTICS, IOSTATISTICS_CONTEXT, PREADBYTEBUFFER, READAHEAD, READBYTEBUFFER, UNBUFFER, VECTOREDIO, VECTOREDIO_BUFFERS_SLICED -
Method Summary
Modifier and TypeMethodDescriptionprotected voidaddToLocalDeadNodes(DatanodeInfo dnInfo) intReturn the size of the remaining available bytes if the size is less than or equal toInteger.MAX_VALUE, otherwise, returnInteger.MAX_VALUE.voidClear statistics about the reads which this DFSInputStream has done.voidclose()Close it down!protected voidprotected LocatedBlockfetchBlockAt(long offset) Fetch a block from namenode and cache itprotected voidfetchBlockByteRange(LocatedBlock block, long start, long end, ByteBuffer buf, DFSUtilClient.CorruptedBlocks corruptedBlocks, Map<InetSocketAddress, List<IOException>> exceptionMap) Return collection of blocks that has already been located.protected org.apache.hadoop.hdfs.DFSInputStream.DNAddrPairgetBestNodeDNAddrPair(LocatedBlock block, Collection<DatanodeInfo> ignoredNodes) Get the best node from which to stream the data.protected LocatedBlockgetBlockAt(long offset) Get block at the specified position.protected BlockReadergetBlockReader(LocatedBlock targetBlock, long offsetInBlock, long length, InetSocketAddress targetAddr, org.apache.hadoop.fs.StorageType storageType, DatanodeInfo datanode) Returns the block containing the target position.protected intReturns the datanode from which the stream is currently reading.protected DFSClientorg.apache.hadoop.fs.FileEncryptionInfolonglongprotected ConcurrentHashMap<DatanodeInfo,DatanodeInfo> protected LocatedBlockslonggetPos()Get statistics about the reads which this DFSInputStream has done.protected StringgetSrc()booleanhasCapability(String capability) voidmark(int readLimit) booleanWe definitely don't support marksprotected voidMany DFSInputStreams can be opened and closed in quick succession, in which case they would be registered/deregistered but never need to be refreshed.intread()intread(byte[] buf, int off, int len) Read the entire buffer.intread(long position, byte[] buffer, int offset, int length) Read bytes starting from the specified position.intread(long position, ByteBuffer buf) intread(ByteBuffer buf) read(org.apache.hadoop.io.ByteBufferPool bufferPool, int maxLength, EnumSet<org.apache.hadoop.fs.ReadOption> opts) voidreadFully(long position, ByteBuffer buf) protected intreadWithStrategy(org.apache.hadoop.hdfs.ReaderStrategy strategy) protected LocatedBlockrefreshLocatedBlock(LocatedBlock block) Refresh cached block locations.voidreleaseBuffer(ByteBuffer buffer) protected voidremoveFromLocalDeadNodes(DatanodeInfo dnInfo) protected voidreportCheckSumFailure(DFSUtilClient.CorruptedBlocks corruptedBlocks, int dataNodeCount, boolean isStriped) DFSInputStream reports checksum failure.protected voidreportLostBlock(LocatedBlock lostBlock, Collection<DatanodeInfo> ignoredNodes) Warn the user of a lost blockvoidreset()voidseek(long targetPos) Seek to a new arbitrary locationbooleanseekToNewSource(long targetPos) Seek to given position on a node other than the current node.voidsetDropBehind(Boolean dropBehind) voidsetReadahead(Long readahead) longskip(long n) protected static booleantokenRefetchNeeded(IOException ex, InetSocketAddress targetAddr) Should the block access token be refetched on an exceptionvoidunbuffer()Methods inherited from class org.apache.hadoop.fs.FSInputStream
readFully, readFully, toString, validatePositionedReadArgsMethods inherited from class java.io.InputStream
nullInputStream, read, readAllBytes, readNBytes, readNBytes, skipNBytes, transferToMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, wait, wait, waitMethods inherited from interface org.apache.hadoop.fs.PositionedReadable
maxReadSizeForVectorReads, minSeekForVectorReads, readVectored, readVectored
-
Field Details
-
tcpReadsDisabledForTesting
@VisibleForTesting public static boolean tcpReadsDisabledForTesting -
dfsClient
-
closed
-
src
-
verifyChecksum
protected final boolean verifyChecksum -
currentLocatedBlock
-
pos
protected long pos -
blockEnd
protected long blockEnd -
locatedBlocks
-
cachingStrategy
-
readStatistics
-
infoLock
-
failures
protected int failuresThis variable tracks the number of failures since the start of the most recent user-facing operation. That is to say, it should be reset whenever the user makes a call on this stream, and if at any point during the retry logic, the failure count exceeds a threshold, the errors will be thrown back to the operation. Specifically this counts the number of times the client has gone back to the namenode to get a new list of block locations, and is capped at maxBlockAcquireFailures
-
-
Method Details
-
addToLocalDeadNodes
-
removeFromLocalDeadNodes
-
getLocalDeadNodes
-
getDFSClient
-
getFileLength
public long getFileLength() -
getCurrentDatanode
Returns the datanode from which the stream is currently reading. -
getCurrentBlock
Returns the block containing the target position. -
getAllBlocks
Return collection of blocks that has already been located.- Throws:
IOException
-
getSrc
-
getLocatedBlocks
-
getBlockAt
Get block at the specified position. Fetch it from the namenode if not cached.- Parameters:
offset- block corresponding to this offset in file is returned- Returns:
- located block
- Throws:
IOException
-
fetchBlockAt
Fetch a block from namenode and cache it- Throws:
IOException
-
getBlockReader
protected BlockReader getBlockReader(LocatedBlock targetBlock, long offsetInBlock, long length, InetSocketAddress targetAddr, org.apache.hadoop.fs.StorageType storageType, DatanodeInfo datanode) throws IOException - Throws:
IOException
-
close
Close it down!- Specified by:
closein interfaceAutoCloseable- Specified by:
closein interfaceCloseable- Overrides:
closein classInputStream- Throws:
IOException
-
read
- Specified by:
readin classInputStream- Throws:
IOException
-
readWithStrategy
- Throws:
IOException
-
getCurrentBlockLocationsLength
protected int getCurrentBlockLocationsLength() -
read
Read the entire buffer.- Overrides:
readin classInputStream- Throws:
IOException
-
read
- Specified by:
readin interfaceorg.apache.hadoop.fs.ByteBufferReadable- Throws:
IOException
-
getBestNodeDNAddrPair
protected org.apache.hadoop.hdfs.DFSInputStream.DNAddrPair getBestNodeDNAddrPair(LocatedBlock block, Collection<DatanodeInfo> ignoredNodes) Get the best node from which to stream the data.- Parameters:
block- LocatedBlock, containing nodes in priority order.ignoredNodes- Do not choose nodes in this array (may be null)- Returns:
- The DNAddrPair of the best node. Null if no node can be chosen.
-
reportLostBlock
Warn the user of a lost block -
fetchBlockByteRange
protected void fetchBlockByteRange(LocatedBlock block, long start, long end, ByteBuffer buf, DFSUtilClient.CorruptedBlocks corruptedBlocks, Map<InetSocketAddress, List<IOException>> exceptionMap) throws IOException- Throws:
IOException
-
refreshLocatedBlock
Refresh cached block locations.- Parameters:
block- The currently cached block locations- Returns:
- Refreshed block locations
- Throws:
IOException
-
getHedgedReadOpsLoopNumForTesting
@VisibleForTesting public long getHedgedReadOpsLoopNumForTesting() -
tokenRefetchNeeded
Should the block access token be refetched on an exception- Parameters:
ex- Exception receivedtargetAddr- Target datanode address from where exception was received- Returns:
- true if block access token has expired or invalid and it should be refetched
-
read
Read bytes starting from the specified position.- Specified by:
readin interfaceorg.apache.hadoop.fs.PositionedReadable- Overrides:
readin classorg.apache.hadoop.fs.FSInputStream- Parameters:
position- start read from this positionbuffer- read bufferoffset- offset into bufferlength- number of bytes to read- Returns:
- actual number of bytes read
- Throws:
IOException
-
reportCheckSumFailure
protected void reportCheckSumFailure(DFSUtilClient.CorruptedBlocks corruptedBlocks, int dataNodeCount, boolean isStriped) DFSInputStream reports checksum failure. For replicated blocks, we have the following logic: Case I : client has tried multiple data nodes and at least one of the attempts has succeeded. We report the other failures as corrupted block to namenode. Case II: client has tried out all data nodes, but all failed. We only report if the total number of replica is 1. We do not report otherwise since this maybe due to the client is a handicapped client (who can not read). For erasure-coded blocks, each block in corruptedBlockMap is an internal block in a block group, and there is usually only one DataNode corresponding to each internal block. For this case we simply report the corrupted blocks to NameNode and ignore the above logic.- Parameters:
corruptedBlocks- map of corrupted blocksdataNodeCount- number of data nodes who contains the block replicas
-
skip
- Overrides:
skipin classInputStream- Throws:
IOException
-
seek
Seek to a new arbitrary location- Specified by:
seekin interfaceorg.apache.hadoop.fs.Seekable- Specified by:
seekin classorg.apache.hadoop.fs.FSInputStream- Throws:
IOException
-
seekToNewSource
Seek to given position on a node other than the current node. If a node other than the current node is found, then returns true. If another node could not be found, then returns false.- Specified by:
seekToNewSourcein interfaceorg.apache.hadoop.fs.Seekable- Specified by:
seekToNewSourcein classorg.apache.hadoop.fs.FSInputStream- Throws:
IOException
-
getPos
public long getPos()- Specified by:
getPosin interfaceorg.apache.hadoop.fs.Seekable- Specified by:
getPosin classorg.apache.hadoop.fs.FSInputStream
-
available
Return the size of the remaining available bytes if the size is less than or equal toInteger.MAX_VALUE, otherwise, returnInteger.MAX_VALUE.- Overrides:
availablein classInputStream- Throws:
IOException
-
markSupported
public boolean markSupported()We definitely don't support marks- Overrides:
markSupportedin classInputStream
-
mark
public void mark(int readLimit) - Overrides:
markin classInputStream
-
reset
- Overrides:
resetin classInputStream- Throws:
IOException
-
read
- Specified by:
readin interfaceorg.apache.hadoop.fs.ByteBufferPositionedReadable- Throws:
IOException
-
readFully
- Specified by:
readFullyin interfaceorg.apache.hadoop.fs.ByteBufferPositionedReadable- Throws:
IOException
-
getReadStatistics
Get statistics about the reads which this DFSInputStream has done. -
clearReadStatistics
public void clearReadStatistics()Clear statistics about the reads which this DFSInputStream has done. -
getFileEncryptionInfo
public org.apache.hadoop.fs.FileEncryptionInfo getFileEncryptionInfo() -
closeCurrentBlockReaders
protected void closeCurrentBlockReaders() -
setReadahead
- Specified by:
setReadaheadin interfaceorg.apache.hadoop.fs.CanSetReadahead- Throws:
IOException
-
setDropBehind
- Specified by:
setDropBehindin interfaceorg.apache.hadoop.fs.CanSetDropBehind- Throws:
IOException
-
read
public ByteBuffer read(org.apache.hadoop.io.ByteBufferPool bufferPool, int maxLength, EnumSet<org.apache.hadoop.fs.ReadOption> opts) throws IOException, UnsupportedOperationException - Specified by:
readin interfaceorg.apache.hadoop.fs.HasEnhancedByteBufferAccess- Throws:
IOExceptionUnsupportedOperationException
-
releaseBuffer
- Specified by:
releaseBufferin interfaceorg.apache.hadoop.fs.HasEnhancedByteBufferAccess
-
unbuffer
public void unbuffer()- Specified by:
unbufferin interfaceorg.apache.hadoop.fs.CanUnbuffer
-
hasCapability
- Specified by:
hasCapabilityin interfaceorg.apache.hadoop.fs.StreamCapabilities
-
maybeRegisterBlockRefresh
protected void maybeRegisterBlockRefresh()Many DFSInputStreams can be opened and closed in quick succession, in which case they would be registered/deregistered but never need to be refreshed. Defers registering with the located block refresher, in order to avoid an additional source of unnecessary synchronization for short-lived DFSInputStreams.
-