Package org.apache.hadoop.hdfs.tools
Class DiskBalancerCLI
java.lang.Object
org.apache.hadoop.conf.Configured
org.apache.hadoop.hdfs.tools.DiskBalancerCLI
- All Implemented Interfaces:
org.apache.hadoop.conf.Configurable,org.apache.hadoop.util.Tool
public class DiskBalancerCLI
extends org.apache.hadoop.conf.Configured
implements org.apache.hadoop.util.Tool
DiskBalancer is a tool that can be used to ensure that data is spread evenly
across volumes of same storage type.
For example, if you have 3 disks, with 100 GB , 600 GB and 200 GB on each disk, this tool will ensure that each disk will have 300 GB.
This tool can be run while data nodes are fully functional.
At very high level diskbalancer computes a set of moves that will make disk utilization equal and then those moves are executed by the datanode.
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringSpecifies the maximum disk bandwidth to use per second.static final StringTemplate for the Before File.static final StringCancels a running plan.static final intspecify default top number of nodes to be processed.static final StringExecutes a given plan file on the target datanode.static final StringHelp for the program.static final StringSpecifies the maximum errors to tolerate.static final StringName or address of the node to execute against.static final StringOutput file name, for commands like report, plan etc.static final StringComputes a plan for a given set of nodes.static final StringTemplate for the plan file. it is node.plan.json.static final intstatic final StringReports the status of disk balancer operation.static final StringThe report command prints out a disk fragmentation report about the data cluster.static final StringSkips date check(now by default the plan is valid for 24 hours), and force execute the plan.static final StringPercentage of data unevenness that we are willing to live with.static final Stringspecify top number of nodes to be processed.static final StringRuns the command in verbose mode. -
Constructor Summary
ConstructorsConstructorDescriptionDiskBalancerCLI(org.apache.hadoop.conf.Configuration conf) Construct a DiskBalancer.DiskBalancerCLI(org.apache.hadoop.conf.Configuration conf, PrintStream printStream) -
Method Summary
Modifier and TypeMethodDescriptionstatic org.apache.commons.cli.OptionsReturns Cancel Options.Gets current command associated with this instance of DiskBalancer.static org.apache.commons.cli.OptionsRetuns execute options.static org.apache.commons.cli.OptionsReturns help options.static org.apache.commons.cli.OptionsReturns Plan options.static org.apache.commons.cli.OptionsReturns Query Options.static org.apache.commons.cli.OptionsReturns Report Options.static voidMain for the DiskBalancer Command handling.intExecute the command with the given arguments.Methods inherited from class org.apache.hadoop.conf.Configured
getConf, setConfMethods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, waitMethods inherited from interface org.apache.hadoop.conf.Configurable
getConf, setConf
-
Field Details
-
PLAN
Computes a plan for a given set of nodes.- See Also:
-
OUTFILE
Output file name, for commands like report, plan etc. This is an optional argument, by default diskbalancer will write all its output to /system/reports/diskbalancer of the current cluster it is operating against.- See Also:
-
HELP
Help for the program.- See Also:
-
THRESHOLD
Percentage of data unevenness that we are willing to live with. For example - a value like 10 indicates that we are okay with 10 % +/- from idealStorage Target.- See Also:
-
BANDWIDTH
Specifies the maximum disk bandwidth to use per second.- See Also:
-
MAXERROR
Specifies the maximum errors to tolerate.- See Also:
-
EXECUTE
Executes a given plan file on the target datanode.- See Also:
-
SKIPDATECHECK
Skips date check(now by default the plan is valid for 24 hours), and force execute the plan.- See Also:
-
REPORT
The report command prints out a disk fragmentation report about the data cluster. By default it prints the DEFAULT_TOP machines names with high nodeDataDensity {DiskBalancerDataNode#getNodeDataDensity} values. This means that these are the nodes that deviates from the ideal data distribution.- See Also:
-
TOP
specify top number of nodes to be processed.- See Also:
-
DEFAULT_TOP
public static final int DEFAULT_TOPspecify default top number of nodes to be processed.- See Also:
-
NODE
Name or address of the node to execute against.- See Also:
-
VERBOSE
Runs the command in verbose mode.- See Also:
-
PLAN_VERSION
public static final int PLAN_VERSION- See Also:
-
QUERY
Reports the status of disk balancer operation.- See Also:
-
CANCEL
Cancels a running plan.- See Also:
-
BEFORE_TEMPLATE
Template for the Before File. It is node.before.json.- See Also:
-
PLAN_TEMPLATE
Template for the plan file. it is node.plan.json.- See Also:
-
-
Constructor Details
-
DiskBalancerCLI
public DiskBalancerCLI(org.apache.hadoop.conf.Configuration conf) Construct a DiskBalancer.- Parameters:
conf-
-
DiskBalancerCLI
-
-
Method Details
-
main
Main for the DiskBalancer Command handling.- Parameters:
argv- - System Args Strings[]- Throws:
Exception
-
run
Execute the command with the given arguments.- Specified by:
runin interfaceorg.apache.hadoop.util.Tool- Parameters:
args- command specific arguments.- Returns:
- exit code.
- Throws:
Exception
-
getPlanOptions
public static org.apache.commons.cli.Options getPlanOptions()Returns Plan options.- Returns:
- Options.
-
getHelpOptions
public static org.apache.commons.cli.Options getHelpOptions()Returns help options.- Returns:
- - help options.
-
getExecuteOptions
public static org.apache.commons.cli.Options getExecuteOptions()Retuns execute options.- Returns:
- - execute options.
-
getQueryOptions
public static org.apache.commons.cli.Options getQueryOptions()Returns Query Options.- Returns:
- query Options
-
getCancelOptions
public static org.apache.commons.cli.Options getCancelOptions()Returns Cancel Options.- Returns:
- Options
-
getReportOptions
public static org.apache.commons.cli.Options getReportOptions()Returns Report Options.- Returns:
- Options
-
getCurrentCommand
Gets current command associated with this instance of DiskBalancer.
-