Class JobHistoryUtils

java.lang.Object
org.apache.hadoop.mapreduce.v2.jobhistory.JobHistoryUtils

@Private @Unstable public class JobHistoryUtils extends Object
  • Field Details

    • HISTORY_STAGING_DIR_PERMISSIONS

      public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_STAGING_DIR_PERMISSIONS
      Permissions for the history staging dir while JobInProgress.
    • HISTORY_STAGING_USER_DIR_PERMISSIONS

      public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_STAGING_USER_DIR_PERMISSIONS
      Permissions for the user directory under the staging directory.
    • HISTORY_DONE_DIR_PERMISSION

      public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_DONE_DIR_PERMISSION
      Permissions for the history done dir and derivatives.
    • HISTORY_DONE_FILE_PERMISSION

      public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_DONE_FILE_PERMISSION
    • HISTORY_DONE_DIR_UMASK

      public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_DONE_DIR_UMASK
      Umask for the done dir and derivatives.
    • HISTORY_INTERMEDIATE_DONE_DIR_PERMISSIONS

      public static final org.apache.hadoop.fs.permission.FsPermission HISTORY_INTERMEDIATE_DONE_DIR_PERMISSIONS
      Permissions for the intermediate done directory.
    • CONF_FILE_NAME_SUFFIX

      public static final String CONF_FILE_NAME_SUFFIX
      Suffix for configuration files.
      See Also:
    • SUMMARY_FILE_NAME_SUFFIX

      public static final String SUMMARY_FILE_NAME_SUFFIX
      Suffix for summary files.
      See Also:
    • JOB_HISTORY_FILE_EXTENSION

      public static final String JOB_HISTORY_FILE_EXTENSION
      Job History File extension.
      See Also:
    • VERSION

      public static final int VERSION
      See Also:
    • SERIAL_NUMBER_DIRECTORY_DIGITS

      public static final int SERIAL_NUMBER_DIRECTORY_DIGITS
      See Also:
    • TIMESTAMP_DIR_REGEX

      public static final String TIMESTAMP_DIR_REGEX
      See Also:
    • TIMESTAMP_DIR_PATTERN

      public static final Pattern TIMESTAMP_DIR_PATTERN
  • Constructor Details

    • JobHistoryUtils

      public JobHistoryUtils()
  • Method Details

    • isValidJobHistoryFileName

      public static boolean isValidJobHistoryFileName(String pathString)
      Checks whether the provided path string is a valid job history file.
      Parameters:
      pathString - the path to be checked.
      Returns:
      true is the path is a valid job history filename else return false
    • getJobIDFromHistoryFilePath

      public static org.apache.hadoop.mapreduce.JobID getJobIDFromHistoryFilePath(String pathString) throws IOException
      Returns the jobId from a job history file name.
      Parameters:
      pathString - the path string.
      Returns:
      the JobId
      Throws:
      IOException - if the filename format is invalid.
    • getConfFileFilter

      public static org.apache.hadoop.fs.PathFilter getConfFileFilter()
      Gets a PathFilter which would match configuration files.
      Returns:
      the patch filter PathFilter for matching conf files.
    • getHistoryFileFilter

      public static org.apache.hadoop.fs.PathFilter getHistoryFileFilter()
      Gets a PathFilter which would match job history file names.
      Returns:
      the path filter PathFilter matching job history files.
    • getConfiguredHistoryStagingDirPrefix

      public static String getConfiguredHistoryStagingDirPrefix(org.apache.hadoop.conf.Configuration conf, String jobId) throws IOException
      Gets the configured directory prefix for In Progress history files.
      Parameters:
      conf - the configuration for hte job
      jobId - the id of the job the history file is for.
      Returns:
      A string representation of the prefix.
      Throws:
      IOException
    • getConfiguredHistoryIntermediateDoneDirPrefix

      public static String getConfiguredHistoryIntermediateDoneDirPrefix(org.apache.hadoop.conf.Configuration conf)
      Gets the configured directory prefix for intermediate done history files.
      Parameters:
      conf -
      Returns:
      A string representation of the prefix.
    • getConfiguredHistoryIntermediateUserDoneDirPermissions

      public static org.apache.hadoop.fs.permission.FsPermission getConfiguredHistoryIntermediateUserDoneDirPermissions(org.apache.hadoop.conf.Configuration conf)
      Gets the configured directory permissions for the user directories in the Gets the configured permissions for the user directories and files in the both need full permissions, this is enforced by this method.
      Parameters:
      conf - The configuration object
      Returns:
      FsPermission of the user directories
    • getConfiguredHistoryServerDoneDirPrefix

      public static String getConfiguredHistoryServerDoneDirPrefix(org.apache.hadoop.conf.Configuration conf)
      Gets the configured directory prefix for Done history files.
      Parameters:
      conf - the configuration object
      Returns:
      the done history directory
    • getHistoryIntermediateDoneDirForUser

      public static String getHistoryIntermediateDoneDirForUser(org.apache.hadoop.conf.Configuration conf) throws IOException
      Gets the user directory for intermediate done history files.
      Parameters:
      conf - the configuration object
      Returns:
      the intermediate done directory for jobhistory files.
      Throws:
      IOException
    • shouldCreateNonUserDirectory

      public static boolean shouldCreateNonUserDirectory(org.apache.hadoop.conf.Configuration conf)
    • getStagingJobHistoryFile

      public static org.apache.hadoop.fs.Path getStagingJobHistoryFile(org.apache.hadoop.fs.Path dir, JobId jobId, int attempt)
      Get the job history file path for non Done history files.
    • getStagingJobHistoryFile

      public static org.apache.hadoop.fs.Path getStagingJobHistoryFile(org.apache.hadoop.fs.Path dir, String jobId, int attempt)
      Get the job history file path for non Done history files.
    • getIntermediateConfFileName

      public static String getIntermediateConfFileName(JobId jobId)
      Get the done configuration file name for a job.
      Parameters:
      jobId - the jobId.
      Returns:
      the conf file name.
    • getIntermediateSummaryFileName

      public static String getIntermediateSummaryFileName(JobId jobId)
      Get the done summary file name for a job.
      Parameters:
      jobId - the jobId.
      Returns:
      the conf file name.
    • getStagingConfFile

      public static org.apache.hadoop.fs.Path getStagingConfFile(org.apache.hadoop.fs.Path logDir, JobId jobId, int attempt)
      Gets the conf file path for jobs in progress.
      Parameters:
      logDir - the log directory prefix.
      jobId - the jobId.
      attempt - attempt number for this job.
      Returns:
      the conf file path for jobs in progress.
    • serialNumberDirectoryComponent

      public static String serialNumberDirectoryComponent(JobId id, String serialNumberFormat)
      Gets the serial number part of the path based on the jobId and serialNumber format.
      Parameters:
      id -
      serialNumberFormat -
      Returns:
      the serial number part of the patch based on the jobId and serial number format.
    • getTimestampPartFromPath

      public static String getTimestampPartFromPath(String path)
      Extracts the timstamp component from the path.
      Parameters:
      path -
      Returns:
      the timestamp component from the path
    • historyLogSubdirectory

      public static String historyLogSubdirectory(JobId id, String timestampComponent, String serialNumberFormat)
      Gets the history subdirectory based on the jobId, timestamp and serial number format.
      Parameters:
      id -
      timestampComponent -
      serialNumberFormat -
      Returns:
      the history sub directory based on the jobid, timestamp and serial number format
    • timestampDirectoryComponent

      public static String timestampDirectoryComponent(long millisecondTime)
      Gets the timestamp component based on millisecond time.
      Parameters:
      millisecondTime -
      Returns:
      the timestamp component based on millisecond time
    • doneSubdirsBeforeSerialTail

      public static String doneSubdirsBeforeSerialTail()
    • jobSerialNumber

      public static int jobSerialNumber(JobId id)
      Computes a serial number used as part of directory naming for the given jobId.
      Parameters:
      id - the jobId.
      Returns:
      the serial number used as part of directory naming for the given jobid
    • localGlobber

      public static List<org.apache.hadoop.fs.FileStatus> localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail) throws IOException
      Throws:
      IOException
    • localGlobber

      public static List<org.apache.hadoop.fs.FileStatus> localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail, org.apache.hadoop.fs.PathFilter filter) throws IOException
      Throws:
      IOException
    • localGlobber

      public static List<org.apache.hadoop.fs.FileStatus> localGlobber(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, String tail, org.apache.hadoop.fs.PathFilter filter, AtomicBoolean hasFlatFiles) throws IOException
      Throws:
      IOException
    • getPreviousJobHistoryPath

      public static org.apache.hadoop.fs.Path getPreviousJobHistoryPath(org.apache.hadoop.conf.Configuration conf, org.apache.hadoop.yarn.api.records.ApplicationAttemptId applicationAttemptId) throws IOException
      Throws:
      IOException
    • getHistoryDirsForCleaning

      public static List<org.apache.hadoop.fs.FileStatus> getHistoryDirsForCleaning(org.apache.hadoop.fs.FileContext fc, org.apache.hadoop.fs.Path root, long cutoff) throws IOException
      Looks for the dirs to clean. The folder structure is YYYY/MM/DD/Serial so we can use that to more efficiently find the directories to clean by comparing the cutoff timestamp with the timestamp from the folder structure.
      Parameters:
      fc - done dir FileContext
      root - folder for completed jobs
      cutoff - The cutoff for the max history age
      Returns:
      The list of directories for cleaning
      Throws:
      IOException