Class SimpleExponentialTaskRuntimeEstimator

java.lang.Object
org.apache.hadoop.mapreduce.v2.app.speculate.SimpleExponentialTaskRuntimeEstimator
All Implemented Interfaces:
TaskRuntimeEstimator

public class SimpleExponentialTaskRuntimeEstimator extends Object
A task Runtime Estimator based on exponential smoothing.
  • Field Details

    • context

      protected AppContext context
    • startTimes

      protected final Map<org.apache.hadoop.mapreduce.v2.api.records.TaskAttemptId,Long> startTimes
    • mapperStatistics

      protected final Map<Job,DataStatistics> mapperStatistics
    • reducerStatistics

      protected final Map<Job,DataStatistics> reducerStatistics
    • doneTasks

      protected final Set<Task> doneTasks
  • Constructor Details

    • SimpleExponentialTaskRuntimeEstimator

      public SimpleExponentialTaskRuntimeEstimator()
  • Method Details

    • contextualize

      public void contextualize(org.apache.hadoop.conf.Configuration conf, AppContext context)
      Specified by:
      contextualize in interface TaskRuntimeEstimator
    • estimatedRuntime

      public long estimatedRuntime(org.apache.hadoop.mapreduce.v2.api.records.TaskAttemptId id)
      Description copied from interface: TaskRuntimeEstimator
      Estimate a task attempt's total runtime. Includes the time already elapsed.
      Parameters:
      id - the TaskAttemptId of the attempt we are asking about
      Returns:
      our best estimate of the attempt's runtime, or -1 if we don't have enough information yet to produce an estimate.
    • estimatedNewAttemptRuntime

      public long estimatedNewAttemptRuntime(org.apache.hadoop.mapreduce.v2.api.records.TaskId id)
      Description copied from interface: TaskRuntimeEstimator
      Estimates how long a new attempt on this task will take if we start one now
      Specified by:
      estimatedNewAttemptRuntime in interface TaskRuntimeEstimator
      Parameters:
      id - the TaskId of the task we are asking about
      Returns:
      our best estimate of a new attempt's runtime, or -1 if we don't have enough information yet to produce an estimate.
    • hasStagnatedProgress

      public boolean hasStagnatedProgress(org.apache.hadoop.mapreduce.v2.api.records.TaskAttemptId id, long timeStamp)
      Description copied from interface: TaskRuntimeEstimator
      Returns true if the estimator has no updates records for a threshold time window. This helps to identify task attempts that are stalled at the beginning of execution.
      Parameters:
      id - the TaskAttemptId of the attempt we are asking about
      timeStamp - the time of the report we compare with
      Returns:
      true if the task attempt has no progress for a given time window
    • runtimeEstimateVariance

      public long runtimeEstimateVariance(org.apache.hadoop.mapreduce.v2.api.records.TaskAttemptId id)
      Description copied from interface: TaskRuntimeEstimator
      Computes the width of the error band of our estimate of the task runtime as returned by TaskRuntimeEstimator.estimatedRuntime(TaskAttemptId)
      Parameters:
      id - the TaskAttemptId of the attempt we are asking about
      Returns:
      our best estimate of the attempt's runtime, or -1 if we don't have enough information yet to produce an estimate.
    • updateAttempt

      public void updateAttempt(TaskAttemptStatusUpdateEvent.TaskAttemptStatus status, long timestamp)
      Specified by:
      updateAttempt in interface TaskRuntimeEstimator
    • enrollAttempt

      public void enrollAttempt(TaskAttemptStatusUpdateEvent.TaskAttemptStatus status, long timestamp)
      Specified by:
      enrollAttempt in interface TaskRuntimeEstimator
    • attemptEnrolledTime

      public long attemptEnrolledTime(org.apache.hadoop.mapreduce.v2.api.records.TaskAttemptId attemptID)
      Specified by:
      attemptEnrolledTime in interface TaskRuntimeEstimator
    • dataStatisticsForTask

      protected DataStatistics dataStatisticsForTask(org.apache.hadoop.mapreduce.v2.api.records.TaskId taskID)
    • thresholdRuntime

      public long thresholdRuntime(org.apache.hadoop.mapreduce.v2.api.records.TaskId taskID)
      Description copied from interface: TaskRuntimeEstimator
      Find a maximum reasonable execution wallclock time. Includes the time already elapsed. Find a maximum reasonable execution time. Includes the time already elapsed. If the projected total execution time for this task ever exceeds its reasonable execution time, we may speculate it.
      Specified by:
      thresholdRuntime in interface TaskRuntimeEstimator
      Parameters:
      taskID - the TaskId of the task we are asking about
      Returns:
      the task's maximum reasonable runtime, or MAX_VALUE if we don't have enough information to rule out any runtime, however long.