Class DatanodeManager

java.lang.Object
org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager

@Private @Evolving public class DatanodeManager extends Object
Manage datanodes, include decommission and other activities.
  • Method Details

    • initSlowPeerTracker

      public void initSlowPeerTracker(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.util.Timer timer, boolean dataNodePeerStatsEnabled)
      Determines whether slow peer tracker should be enabled. If dataNodePeerStatsEnabledVal is true, slow peer tracker is initialized.
      Parameters:
      conf - The configuration to use while initializing slowPeerTracker.
      timer - Timer object for slowPeerTracker.
      dataNodePeerStatsEnabled - To determine whether slow peer tracking should be enabled.
    • restartSlowPeerCollector

      public void restartSlowPeerCollector(long interval)
    • getNetworkTopology

      public org.apache.hadoop.net.NetworkTopology getNetworkTopology()
      Returns:
      the network topology.
    • getDatanodeAdminManager

      @VisibleForTesting public DatanodeAdminManager getDatanodeAdminManager()
    • getHostConfigManager

      public HostConfigManager getHostConfigManager()
    • setHeartbeatExpireInterval

      @VisibleForTesting public void setHeartbeatExpireInterval(long expiryMs)
    • getFSClusterStats

      @VisibleForTesting public FSClusterStats getFSClusterStats()
    • getBlockInvalidateLimit

      @VisibleForTesting public int getBlockInvalidateLimit()
    • getDatanodeStatistics

      public DatanodeStatistics getDatanodeStatistics()
      Returns:
      the datanode statistics.
    • setAvoidSlowDataNodesForReadEnabled

      public void setAvoidSlowDataNodesForReadEnabled(boolean enable)
    • getEnableAvoidSlowDataNodesForRead

      @VisibleForTesting public boolean getEnableAvoidSlowDataNodesForRead()
    • setMaxSlowpeerCollectNodes

      public void setMaxSlowpeerCollectNodes(int maxNodes)
    • getMaxSlowpeerCollectNodes

      @VisibleForTesting public int getMaxSlowpeerCollectNodes()
    • sortLocatedBlocks

      public void sortLocatedBlocks(String targetHost, List<org.apache.hadoop.hdfs.protocol.LocatedBlock> locatedBlocks)
      Sort the non-striped located blocks by the distance to the target host. For striped blocks, it will only move decommissioned/decommissioning/stale/slow nodes to the bottom. For example, assume we have storage list: d0, d1, d2, d3, d4, d5, d6, d7, d8, d9 mapping to block indices: 0, 1, 2, 3, 4, 5, 6, 7, 8, 2 Here the internal block b2 is duplicated, locating in d2 and d9. If d2 is a decommissioning node then should switch d2 and d9 in the storage list. After sorting locations, will update corresponding block indices and block tokens.
    • getDatanodeByHost

      public DatanodeDescriptor getDatanodeByHost(String host)
      Returns:
      the datanode descriptor for the host.
    • getDatanodeByHostName

      public DatanodeDescriptor getDatanodeByHostName(String hostname)
      Parameters:
      hostname - hostname of the datanode
      Returns:
      the datanode descriptor for the host.
    • getDatanodeByXferAddr

      public DatanodeDescriptor getDatanodeByXferAddr(String host, int xferPort)
      Returns:
      the datanode descriptor for the host.
    • getDatanodes

      public Set<DatanodeDescriptor> getDatanodes()
      Returns:
      the datanode descriptors for all nodes.
    • getHost2DatanodeMap

      public org.apache.hadoop.hdfs.server.blockmanagement.Host2NodesMap getHost2DatanodeMap()
      Returns:
      the Host2NodesMap
    • getDatanode

      public DatanodeDescriptor getDatanode(String datanodeUuid)
      Get a datanode descriptor given corresponding DatanodeUUID
    • getDatanode

      public DatanodeDescriptor getDatanode(org.apache.hadoop.hdfs.protocol.DatanodeID nodeID) throws UnregisteredNodeException
      Get data node by datanode ID.
      Parameters:
      nodeID - datanode ID
      Returns:
      DatanodeDescriptor or null if the node is not found.
      Throws:
      UnregisteredNodeException
    • getDatanodeStorageInfos

      public DatanodeStorageInfo[] getDatanodeStorageInfos(org.apache.hadoop.hdfs.protocol.DatanodeID[] datanodeID, String[] storageIDs, String format, Object... args) throws UnregisteredNodeException
      Throws:
      UnregisteredNodeException
    • removeDatanode

      public void removeDatanode(org.apache.hadoop.hdfs.protocol.DatanodeID node) throws UnregisteredNodeException
      Remove a datanode
      Throws:
      UnregisteredNodeException
    • getDatanodesSoftwareVersions

      public HashMap<String,Integer> getDatanodesSoftwareVersions()
    • resolveNetworkLocation

      public List<String> resolveNetworkLocation(List<String> names)
      Resolve network locations for specified hosts
      Returns:
      Network locations if available, Else returns null
    • registerDatanode

      public void registerDatanode(DatanodeRegistration nodeReg) throws DisallowedDatanodeException, UnresolvedTopologyException
      Register the given datanode with the namenode. NB: the given registration is mutated and given back to the datanode.
      Parameters:
      nodeReg - the datanode registration
      Throws:
      DisallowedDatanodeException - if the registration request is denied because the datanode does not match includes/excludes
      UnresolvedTopologyException - if the registration request is denied because resolving datanode network location fails.
    • refreshNodes

      public void refreshNodes(org.apache.hadoop.conf.Configuration conf) throws IOException
      Rereads conf to get hosts and exclude list file names. Rereads the files to update the hosts and exclude lists. It checks if any of the hosts have changed states:
      Throws:
      IOException
    • getNumLiveDataNodes

      public int getNumLiveDataNodes()
      Returns:
      the number of live datanodes.
    • getNumDeadDataNodes

      public int getNumDeadDataNodes()
      Returns:
      the number of dead datanodes.
    • getNumOfDataNodes

      public int getNumOfDataNodes()
      Returns:
      the number of datanodes.
    • getDecommissioningNodes

      public List<DatanodeDescriptor> getDecommissioningNodes()
      Returns:
      list of datanodes where decommissioning is in progress.
    • getEnteringMaintenanceNodes

      public List<DatanodeDescriptor> getEnteringMaintenanceNodes()
      Returns:
      list of datanodes that are entering maintenance.
    • shouldAvoidStaleDataNodesForWrite

      public boolean shouldAvoidStaleDataNodesForWrite()
      Whether stale datanodes should be avoided as targets on the write path. The result of this function may change if the number of stale datanodes eclipses a configurable threshold.
      Returns:
      whether stale datanodes should be avoided on the write path
    • getBlocksPerPostponedMisreplicatedBlocksRescan

      public long getBlocksPerPostponedMisreplicatedBlocksRescan()
    • getHeartbeatInterval

      public long getHeartbeatInterval()
    • getHeartbeatRecheckInterval

      public long getHeartbeatRecheckInterval()
    • getNumStaleNodes

      public int getNumStaleNodes()
      Returns:
      Return the current number of stale DataNodes (detected by HeartbeatManager).
    • getNumStaleStorages

      public int getNumStaleStorages()
      Get the number of content stale storages.
    • fetchDatanodes

      public void fetchDatanodes(List<DatanodeDescriptor> live, List<DatanodeDescriptor> dead, boolean removeDecommissionNode)
      Fetch live and dead datanodes.
    • getDatanodeListForReport

      public List<DatanodeDescriptor> getDatanodeListForReport(org.apache.hadoop.hdfs.protocol.HdfsConstants.DatanodeReportType type)
      For generating datanode reports
    • getAllSlowDataNodes

      public List<DatanodeDescriptor> getAllSlowDataNodes()
    • handleHeartbeat

      public DatanodeCommand[] handleHeartbeat(DatanodeRegistration nodeReg, org.apache.hadoop.hdfs.server.protocol.StorageReport[] reports, String blockPoolId, long cacheCapacity, long cacheUsed, int xceiverCount, int xmitsInProgress, int failedVolumes, VolumeFailureSummary volumeFailureSummary, @Nonnull org.apache.hadoop.hdfs.server.protocol.SlowPeerReports slowPeers, @Nonnull org.apache.hadoop.hdfs.server.protocol.SlowDiskReports slowDisks) throws IOException
      Handle heartbeat from datanodes.
      Throws:
      IOException
    • handleLifeline

      public void handleLifeline(DatanodeRegistration nodeReg, org.apache.hadoop.hdfs.server.protocol.StorageReport[] reports, long cacheCapacity, long cacheUsed, int xceiverCount, int failedVolumes, VolumeFailureSummary volumeFailureSummary) throws IOException
      Handles a lifeline message sent by a DataNode.
      Parameters:
      nodeReg - registration info for DataNode sending the lifeline
      reports - storage reports from DataNode
      cacheCapacity - cache capacity at DataNode
      cacheUsed - cache used at DataNode
      xceiverCount - estimated count of transfer threads running at DataNode
      failedVolumes - count of failed volumes at DataNode
      volumeFailureSummary - info on failed volumes at DataNode
      Throws:
      IOException - if there is an error
    • setBalancerBandwidth

      public void setBalancerBandwidth(long bandwidth) throws IOException
      Tell all datanodes to use a new, non-persistent bandwidth value for dfs.datanode.balance.bandwidthPerSec. A system administrator can tune the balancer bandwidth parameter (dfs.datanode.balance.bandwidthPerSec) dynamically by calling "dfsadmin -setBalanacerBandwidth newbandwidth", at which point the following 'bandwidth' variable gets updated with the new value for each node. Once the heartbeat command is issued to update the value on the specified datanode, this value will be set back to 0.
      Parameters:
      bandwidth - Blanacer bandwidth in bytes per second for all datanodes.
      Throws:
      IOException
    • markAllDatanodesStaleAndSetKeyUpdateIfNeed

      public void markAllDatanodesStaleAndSetKeyUpdateIfNeed()
    • clearPendingQueues

      public void clearPendingQueues()
      Clear any actions that are queued up to be sent to the DNs on their next heartbeats. This includes block invalidations, recoveries, and replication requests.
    • resetLastCachingDirectiveSentTime

      public void resetLastCachingDirectiveSentTime()
      Reset the lastCachingDirectiveSentTimeMs field of all the DataNodes we know about.
    • toString

      public String toString()
      Overrides:
      toString in class Object
    • clearPendingCachingCommands

      public void clearPendingCachingCommands()
    • setShouldSendCachingCommands

      public void setShouldSendCachingCommands(boolean shouldSendCachingCommands)
    • setHeartbeatInterval

      public void setHeartbeatInterval(long intervalSeconds)
    • setHeartbeatRecheckInterval

      public void setHeartbeatRecheckInterval(int recheckInterval)
    • setBlockInvalidateLimit

      public void setBlockInvalidateLimit(int configuredBlockInvalidateLimit)
    • getSlowPeersReport

      public String getSlowPeersReport()
      Retrieve information about slow peers as a JSON. Returns null if we are not tracking slow peers.
      Returns:
    • getSlowPeersUuidSet

      public Set<String> getSlowPeersUuidSet()
      Returns all tracking slow peers.
      Returns:
    • getSlowNodesUuidSet

      public static Set<String> getSlowNodesUuidSet()
      Returns all tracking slow datanodes uuids.
      Returns:
    • getSlowPeerTracker

      @VisibleForTesting public SlowPeerTracker getSlowPeerTracker()
      Use only for testing.
    • getSlowDiskTracker

      @VisibleForTesting public SlowDiskTracker getSlowDiskTracker()
      Use only for testing.
    • addSlowPeers

      @VisibleForTesting public void addSlowPeers(String dnUuid)
    • getSlowDisksReport

      public String getSlowDisksReport()
      Retrieve information about slow disks as a JSON. Returns null if we are not tracking slow disks.
      Returns:
    • getDatanodeStorageReport

      public org.apache.hadoop.hdfs.server.protocol.DatanodeStorageReport[] getDatanodeStorageReport(org.apache.hadoop.hdfs.protocol.HdfsConstants.DatanodeReportType type)
      Generates datanode reports for the given report type.
      Parameters:
      type - type of the datanode report
      Returns:
      array of DatanodeStorageReports
    • getDatanodeMap

      @VisibleForTesting public Map<String,DatanodeDescriptor> getDatanodeMap()
    • setMaxSlowPeersToReport

      public void setMaxSlowPeersToReport(int maxSlowPeersToReport)
    • isSlowPeerCollectorInitialized

      @VisibleForTesting public boolean isSlowPeerCollectorInitialized()
    • getSlowPeerCollectionInterval

      @VisibleForTesting public long getSlowPeerCollectionInterval()