Class JobHistoryUtils
java.lang.Object
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils
-
Field Summary
FieldsModifier and TypeFieldDescriptionstatic final StringSuffix for configuration files.static final org.apache.hadoop.fs.permission.FsPermissionPermissions for the history done dir and derivatives.static final org.apache.hadoop.fs.permission.FsPermissionUmask for the done dir and derivatives.static final org.apache.hadoop.fs.permission.FsPermissionstatic final org.apache.hadoop.fs.permission.FsPermissionPermissions for the intermediate done directory.static final org.apache.hadoop.fs.permission.FsPermissionPermissions for the history staging dir while JobInProgress.static final org.apache.hadoop.fs.permission.FsPermissionPermissions for the user directory under the staging directory.static final StringJob History File extension.static final intstatic final StringSuffix for summary files.static final Patternstatic final Stringstatic final int -
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionstatic Stringstatic org.apache.hadoop.fs.PathFilterGets a PathFilter which would match configuration files.static StringgetConfiguredHistoryIntermediateDoneDirPrefix(org.apache.hadoop.conf.Configuration conf) Gets the configured directory prefix for intermediate done history files.static org.apache.hadoop.fs.permission.FsPermissiongetConfiguredHistoryIntermediateUserDoneDirPermissions(org.apache.hadoop.conf.Configuration conf) Gets the configured directory permissions for the user directories in the Gets the configured permissions for the user directories and files in the both need full permissions, this is enforced by this method.static StringgetConfiguredHistoryServerDoneDirPrefix(org.apache.hadoop.conf.Configuration conf) Gets the configured directory prefix for Done history files.static StringgetConfiguredHistoryStagingDirPrefix(org.apache.hadoop.conf.Configuration conf, String jobId) Gets the configured directory prefix for In Progress history files.static List<org.apache.hadoop.fs.FileStatus>getHistoryDirsForCleaning(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, long cutoff) Looks for the dirs to clean.static org.apache.hadoop.fs.PathFilterGets a PathFilter which would match job history file names.static StringgetHistoryIntermediateDoneDirForUser(org.apache.hadoop.conf.Configuration conf) Gets the user directory for intermediate done history files.static StringgetIntermediateConfFileName(JobId jobId) Get the done configuration file name for a job.static StringGet the done summary file name for a job.static org.apache.hadoop.mapreduce.JobIDgetJobIDFromHistoryFilePath(String pathString) Returns the jobId from a job history file name.static org.apache.hadoop.fs.PathgetPreviousJobHistoryPath(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.yarn.api.records.ApplicationAttemptId applicationAttemptId) static org.apache.hadoop.fs.PathgetStagingConfFile(org.apache.hadoop.fs.Path logDir, JobId jobId, int attempt) Gets the conf file path for jobs in progress.static org.apache.hadoop.fs.PathgetStagingJobHistoryFile(org.apache.hadoop.fs.Path dir, String jobId, int attempt) Get the job history file path for non Done history files.static org.apache.hadoop.fs.PathgetStagingJobHistoryFile(org.apache.hadoop.fs.Path dir, JobId jobId, int attempt) Get the job history file path for non Done history files.static StringExtracts the timstamp component from the path.static StringhistoryLogSubdirectory(JobId id, String timestampComponent, String serialNumberFormat) Gets the history subdirectory based on the jobId, timestamp and serial number format.static booleanisValidJobHistoryFileName(String pathString) Checks whether the provided path string is a valid job history file.static intjobSerialNumber(JobId id) Computes a serial number used as part of directory naming for the given jobId.static List<org.apache.hadoop.fs.FileStatus>localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail) static List<org.apache.hadoop.fs.FileStatus>localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail, org.apache.hadoop.fs.PathFilter filter) static List<org.apache.hadoop.fs.FileStatus>localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail, org.apache.hadoop.fs.PathFilter filter, AtomicBoolean hasFlatFiles) static StringserialNumberDirectoryComponent(JobId id, String serialNumberFormat) Gets the serial number part of the path based on the jobId and serialNumber format.static booleanshouldCreateNonUserDirectory(org.apache.hadoop.conf.Configuration conf) static StringtimestampDirectoryComponent(long millisecondTime) Gets the timestamp component based on millisecond time.
-
Field Details
-
HISTORY_STAGING_DIR_PERMISSIONS
public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_STAGING_DIR_PERMISSIONSPermissions for the history staging dir while JobInProgress. -
HISTORY_STAGING_USER_DIR_PERMISSIONS
public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_STAGING_USER_DIR_PERMISSIONSPermissions for the user directory under the staging directory. -
HISTORY_DONE_DIR_PERMISSION
public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_DONE_DIR_PERMISSIONPermissions for the history done dir and derivatives. -
HISTORY_DONE_FILE_PERMISSION
public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_DONE_FILE_PERMISSION -
HISTORY_DONE_DIR_UMASK
public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_DONE_DIR_UMASKUmask for the done dir and derivatives. -
HISTORY_INTERMEDIATE_DONE_DIR_PERMISSIONS
public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_INTERMEDIATE_DONE_DIR_PERMISSIONSPermissions for the intermediate done directory. -
CONF_FILE_NAME_SUFFIX
Suffix for configuration files.- See Also:
-
SUMMARY_FILE_NAME_SUFFIX
Suffix for summary files.- See Also:
-
JOB_HISTORY_FILE_EXTENSION
Job History File extension.- See Also:
-
VERSION
public static final int VERSION- See Also:
-
SERIAL_NUMBER_DIRECTORY_DIGITS
public static final int SERIAL_NUMBER_DIRECTORY_DIGITS- See Also:
-
TIMESTAMP_DIR_REGEX
- See Also:
-
TIMESTAMP_DIR_PATTERN
-
-
Constructor Details
-
JobHistoryUtils
public JobHistoryUtils()
-
-
Method Details
-
isValidJobHistoryFileName
Checks whether the provided path string is a valid job history file.- Parameters:
pathString- the path to be checked.- Returns:
- true is the path is a valid job history filename else return false
-
getJobIDFromHistoryFilePath
public static org.apache.hadoop.mapreduce.JobID getJobIDFromHistoryFilePath(String pathString) throws IOException Returns the jobId from a job history file name.- Parameters:
pathString- the path string.- Returns:
- the JobId
- Throws:
IOException- if the filename format is invalid.
-
getConfFileFilter
public static org.apache.hadoop.fs.PathFilter getConfFileFilter()Gets a PathFilter which would match configuration files.- Returns:
- the patch filter
PathFilterfor matching conf files.
-
getHistoryFileFilter
public static org.apache.hadoop.fs.PathFilter getHistoryFileFilter()Gets a PathFilter which would match job history file names.- Returns:
- the path filter
PathFiltermatching job history files.
-
getConfiguredHistoryStagingDirPrefix
public static String getConfiguredHistoryStagingDirPrefix(org.apache.hadoop.conf.Configuration conf, String jobId) throws IOException Gets the configured directory prefix for In Progress history files.- Parameters:
conf- the configuration for hte jobjobId- the id of the job the history file is for.- Returns:
- A string representation of the prefix.
- Throws:
IOException
-
getConfiguredHistoryIntermediateDoneDirPrefix
public static String getConfiguredHistoryIntermediateDoneDirPrefix(org.apache.hadoop.conf.Configuration conf) Gets the configured directory prefix for intermediate done history files.- Parameters:
conf-- Returns:
- A string representation of the prefix.
-
getConfiguredHistoryIntermediateUserDoneDirPermissions
public static org.apache.hadoop.fs.permission.FsPermission getConfiguredHistoryIntermediateUserDoneDirPermissions(org.apache.hadoop.conf.Configuration conf) Gets the configured directory permissions for the user directories in the Gets the configured permissions for the user directories and files in the both need full permissions, this is enforced by this method.- Parameters:
conf- The configuration object- Returns:
- FsPermission of the user directories
-
getConfiguredHistoryServerDoneDirPrefix
public static String getConfiguredHistoryServerDoneDirPrefix(org.apache.hadoop.conf.Configuration conf) Gets the configured directory prefix for Done history files.- Parameters:
conf- the configuration object- Returns:
- the done history directory
-
getHistoryIntermediateDoneDirForUser
public static String getHistoryIntermediateDoneDirForUser(org.apache.hadoop.conf.Configuration conf) throws IOException Gets the user directory for intermediate done history files.- Parameters:
conf- the configuration object- Returns:
- the intermediate done directory for jobhistory files.
- Throws:
IOException
-
shouldCreateNonUserDirectory
public static boolean shouldCreateNonUserDirectory(org.apache.hadoop.conf.Configuration conf) -
getStagingJobHistoryFile
public static org.apache.hadoop.fs.Path getStagingJobHistoryFile(org.apache.hadoop.fs.Path dir, JobId jobId, int attempt) Get the job history file path for non Done history files. -
getStagingJobHistoryFile
public static org.apache.hadoop.fs.Path getStagingJobHistoryFile(org.apache.hadoop.fs.Path dir, String jobId, int attempt) Get the job history file path for non Done history files. -
getIntermediateConfFileName
Get the done configuration file name for a job.- Parameters:
jobId- the jobId.- Returns:
- the conf file name.
-
getIntermediateSummaryFileName
Get the done summary file name for a job.- Parameters:
jobId- the jobId.- Returns:
- the conf file name.
-
getStagingConfFile
public static org.apache.hadoop.fs.Path getStagingConfFile(org.apache.hadoop.fs.Path logDir, JobId jobId, int attempt) Gets the conf file path for jobs in progress.- Parameters:
logDir- the log directory prefix.jobId- the jobId.attempt- attempt number for this job.- Returns:
- the conf file path for jobs in progress.
-
serialNumberDirectoryComponent
Gets the serial number part of the path based on the jobId and serialNumber format.- Parameters:
id-serialNumberFormat-- Returns:
- the serial number part of the patch based on the jobId and serial number format.
-
getTimestampPartFromPath
Extracts the timstamp component from the path.- Parameters:
path-- Returns:
- the timestamp component from the path
-
historyLogSubdirectory
public static String historyLogSubdirectory(JobId id, String timestampComponent, String serialNumberFormat) Gets the history subdirectory based on the jobId, timestamp and serial number format.- Parameters:
id-timestampComponent-serialNumberFormat-- Returns:
- the history sub directory based on the jobid, timestamp and serial number format
-
timestampDirectoryComponent
Gets the timestamp component based on millisecond time.- Parameters:
millisecondTime-- Returns:
- the timestamp component based on millisecond time
-
doneSubdirsBeforeSerialTail
-
jobSerialNumber
Computes a serial number used as part of directory naming for the given jobId.- Parameters:
id- the jobId.- Returns:
- the serial number used as part of directory naming for the given jobid
-
localGlobber
public static List<org.apache.hadoop.fs.FileStatus> localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail) throws IOException - Throws:
IOException
-
localGlobber
public static List<org.apache.hadoop.fs.FileStatus> localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail, org.apache.hadoop.fs.PathFilter filter) throws IOException - Throws:
IOException
-
localGlobber
public static List<org.apache.hadoop.fs.FileStatus> localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail, org.apache.hadoop.fs.PathFilter filter, AtomicBoolean hasFlatFiles) throws IOException - Throws:
IOException
-
getPreviousJobHistoryPath
public static org.apache.hadoop.fs.Path getPreviousJobHistoryPath(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.yarn.api.records.ApplicationAttemptId applicationAttemptId) throws IOException - Throws:
IOException
-
getHistoryDirsForCleaning
public static List<org.apache.hadoop.fs.FileStatus> getHistoryDirsForCleaning(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, long cutoff) throws IOException Looks for the dirs to clean. The folder structure is YYYY/MM/DD/Serial so we can use that to more efficiently find the directories to clean by comparing the cutoff timestamp with the timestamp from the folder structure.- Parameters:
fc- done dir FileContextroot- folder for completed jobscutoff- The cutoff for the max history age- Returns:
- The list of directories for cleaning
- Throws:
IOException
-