首页手记【Spark Core】任务执行机制和Task源码浅析2

【Spark Core】任务执行机制和Task源码浅析2

标签：

Spark

引言

上一小节《任务执行机制和Task源码浅析1》介绍了Executor的注册过程。
这一小节，我将从Executor端，就接收LaunchTask消息之后Executor的执行任务过程进行介绍。

1. Executor的launchTasks函数

DriverActor提交任务，发送LaunchTask指令给CoarseGrainedExecutorBackend，接收到指令之后，让它内部的executor来发起任务，即调用空闲的executor的launchTask函数。
下面是CoarseGrainedExecutorBackend中receiveWithLogging的部分代码：

    case LaunchTask(data) =>      if (executor == null) {
        logError("Received LaunchTask command but executor was null")        System.exit(1)
      } else {        val ser = env.closureSerializer.newInstance()        val taskDesc = ser.deserialize[TaskDescription](data.value)
        logInfo("Got assigned task " + taskDesc.taskId)
        executor.launchTask(this, taskId = taskDesc.taskId, attemptNumber = taskDesc.attemptNumber,
          taskDesc.name, taskDesc.serializedTask)
      }

Executor执行task：

  def launchTask(
      context: ExecutorBackend,
      taskId: Long,
      attemptNumber: Int,
      taskName: String,
      serializedTask: ByteBuffer) {    val tr = new TaskRunner(context, taskId = taskId, attemptNumber = attemptNumber, taskName,
      serializedTask)
    runningTasks.put(taskId, tr)
    threadPool.execute(tr)
  }

Executor内部维护一个线程池，可以跑多个task，每一个提交的task都会包装成TaskRunner交由threadPool执行。

2. TaskRunner的run方法

run方法中val value = task.run(taskAttemptId = taskId, attemptNumber = attemptNumber)是真正执行task中的任务。

下面是TaskRunner中run方法的部分代码：

      try {        val (taskFiles, taskJars, taskBytes) = Task.deserializeWithDependencies(serializedTask)
        updateDependencies(taskFiles, taskJars)        // 反序列化Task
        task = ser.deserialize[Task[Any]](taskBytes, Thread.currentThread.getContextClassLoader)        // If this task has been killed before we deserialized it, let's quit now. Otherwise,
        // continue executing the task.
        if (killed) {          // Throw an exception rather than returning, because returning within a try{} block
          // causes a NonLocalReturnControl exception to be thrown. The NonLocalReturnControl
          // exception will be caught by the catch block, leading to an incorrect ExceptionFailure
          // for the task.
          throw new TaskKilledException
        }

        attemptedTask = Some(task)
        logDebug("Task " + taskId + "'s epoch is " + task.epoch)
        env.mapOutputTracker.updateEpoch(task.epoch)        // Run the actual task and measure its runtime.
        // 运行Task, 具体可以去看ResultTask和ShuffleMapTask
        taskStart = System.currentTimeMillis()        val value = task.run(taskAttemptId = taskId, attemptNumber = attemptNumber)        val taskFinish = System.currentTimeMillis()        // If the task has been killed, let's fail it.
        if (task.killed) {          throw new TaskKilledException
        }        // 对结果进行序列化
        val resultSer = env.serializer.newInstance()        val beforeSerialization = System.currentTimeMillis()        val valueBytes = resultSer.serialize(value)        val afterSerialization = System.currentTimeMillis()        // 更新任务的相关监控信息，会反映到监控页面上的
        for (m <- task.metrics) {
          m.setExecutorDeserializeTime(taskStart - deserializeStartTime)
          m.setExecutorRunTime(taskFinish - taskStart)
          m.setJvmGCTime(gcTime - startGCTime)
          m.setResultSerializationTime(afterSerialization - beforeSerialization)
        }        val accumUpdates = Accumulators.values        // 对结果进行再包装，包装完再进行序列化
        val directResult = new DirectTaskResult(valueBytes, accumUpdates, task.metrics.orNull)        val serializedDirectResult = ser.serialize(directResult)        val resultSize = serializedDirectResult.limit        // directSend = sending directly back to the driver
        val serializedResult = {          if (maxResultSize > 0 && resultSize > maxResultSize) {
            logWarning(s"Finished $taskName (TID $taskId). Result is larger than maxResultSize " +              s"(${Utils.bytesToString(resultSize)} > ${Utils.bytesToString(maxResultSize)}), " +              s"dropping it.")
            ser.serialize(new IndirectTaskResult[Any](TaskResultBlockId(taskId), resultSize))
          } else if (resultSize >= akkaFrameSize - AkkaUtils.reservedSizeBytes) {            // 如果中间结果的大小超过了spark.akka.frameSize（默认是10M）的大小，就要提升序列化级别了，超过内存的部分要保存到硬盘的
            val blockId = TaskResultBlockId(taskId)
            env.blockManager.putBytes(
              blockId, serializedDirectResult, StorageLevel.MEMORY_AND_DISK_SER)
            logInfo(              s"Finished $taskName (TID $taskId). $resultSize bytes result sent via BlockManager)")
            ser.serialize(new IndirectTaskResult[Any](blockId, resultSize))
          } else {
            logInfo(s"Finished $taskName (TID $taskId). $resultSize bytes result sent to driver")
            serializedDirectResult
          }
        }        // 将任务完成和taskresult,通过statusUpdate报告给driver
        execBackend.statusUpdate(taskId, TaskState.FINISHED, serializedResult)

      } catch {        //异常处理代码，略去...
      } finally {        // 清理为ResultTask注册的shuffle内存，最后把task从正在运行的列表当中删除
        // Release memory used by this thread for shuffles
        env.shuffleMemoryManager.releaseMemoryForThisThread()        // Release memory used by this thread for unrolling blocks
        env.blockManager.memoryStore.releaseUnrollMemoryForThisThread()        // Release memory used by this thread for accumulators
        Accumulators.clear()
        runningTasks.remove(taskId)
      }
    }

3. Task执行过程

TaskRunner会启动一个新的线程，我们看一下run方法中的调用过程：
TaskRunner.run --> Task.run --> Task.runTask --> RDD.iterator --> RDD.computeOrReadCheckpoint --> RDD.compute。

Task的run函数代码：

  /**
   * Called by Executor to run this task.
   *
   * @param taskAttemptId an identifier for this task attempt that is unique within a SparkContext.
   * @param attemptNumber how many times this task has been attempted (0 for the first attempt)
   * @return the result of the task
   */
  final def run(taskAttemptId: Long, attemptNumber: Int): T = {
    context = new TaskContextImpl(stageId = stageId, partitionId = partitionId,
      taskAttemptId = taskAttemptId, attemptNumber = attemptNumber, runningLocally = false)    TaskContextHelper.setTaskContext(context)
    context.taskMetrics.setHostname(Utils.localHostName())
    taskThread = Thread.currentThread()    if (_killed) {
      kill(interruptThread = false)
    }    try {
      runTask(context)
    } finally {
      context.markTaskCompleted()      TaskContextHelper.unset()
    }
  }

ShuffleMapTask和ResultTask分别实现了不同的runTask函数。

ShuffleMapTask的runTask函数代码：

  override def runTask(context: TaskContext): MapStatus = {    // Deserialize the RDD using the broadcast variable.
    val ser = SparkEnv.get.closureSerializer.newInstance()    val (rdd, dep) = ser.deserialize[(RDD[_], ShuffleDependency[_, _, _])](      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)    //此处的taskBinary即为在org.apache.spark.scheduler.DAGScheduler#submitMissingTasks序列化的task的广播变量取得的  

    metrics = Some(context.taskMetrics)    var writer: ShuffleWriter[Any, Any] = null
    try {      val manager = SparkEnv.get.shuffleManager
      writer = manager.getWriter[Any, Any](dep.shuffleHandle, partitionId, context)
      writer.write(rdd.iterator(partition, context).asInstanceOf[Iterator[_ <: Product2[Any, Any]]])      // 将rdd计算的结果写入memory或者disk  
      return writer.stop(success = true).get
    } catch {      case e: Exception =>        try {          if (writer != null) {
            writer.stop(success = false)
          }
        } catch {          case e: Exception =>
            log.debug("Could not stop writer", e)
        }        throw e
    }
  }

ResultTask的runTask函数代码：

  override def runTask(context: TaskContext): U = {    // Deserialize the RDD and the func using the broadcast variables.
    val ser = SparkEnv.get.closureSerializer.newInstance()    val (rdd, func) = ser.deserialize[(RDD[T], (TaskContext, Iterator[T]) => U)](      ByteBuffer.wrap(taskBinary.value), Thread.currentThread.getContextClassLoader)

    metrics = Some(context.taskMetrics)
    func(context, rdd.iterator(partition, context))
  }

4. Task状态更新

Task执行是通过TaskRunner来运行，它需要通过ExecutorBackend和Driver通信，通信消息是StatusUpdate：

Task运行之前，告诉Driver当前Task的状态为TaskState.RUNNING。

Task运行之后，告诉Driver当前Task的状态为TaskState.FINISHED，并返回计算结果。
如果Task运行过程中发生错误，告诉Driver当前Task的状态为TaskState.FAILED，并返回错误原因。
如果Task在中途被Kill掉了，告诉Driver当前Task的状态为TaskState.FAILED。

5. Task执行完毕

Task执行完毕，在TaskRunner的run函数中，通过statusUpdate通知ExecuteBackend，结果保存在DirectTaskResult中。
SchedulerBackend接收到StatusUpdate之后做如下判断：如果任务已经成功处理，则将其从监视列表中删除。如果整个作业中的所有任务都已经完成，则将占用的资源释放。
TaskSchedulerImpl将当前顺利完成的任务放入完成队列，同时取出下一个等待运行的Task。

下面CoarseGrainedSchedulerBackend是中处理StatusUpdate消息的代码：

      case StatusUpdate(executorId, taskId, state, data) =>        //statusUpdate函数处理处理从taskset删除已完成的task等工作
        scheduler.statusUpdate(taskId, state, data.value)        if (TaskState.isFinished(state)) {
          executorDataMap.get(executorId) match {            case Some(executorInfo) =>
              executorInfo.freeCores += scheduler.CPUS_PER_TASK
              makeOffers(executorId)            case None =>              // Ignoring the update since we don't know about the executor.
              logWarning(s"Ignored task status update ($taskId state $state) " +                "from unknown executor $sender with ID $executorId")
          }
        }

scheduler.statusUpdate函数进行如下步骤：

TaskScheduler通过TaskId找到管理这个Task的TaskSetManager（负责管理一批Task的类），从TaskSetManager里面删掉这个Task，并把Task插入到TaskResultGetter（负责获取Task结果的类）的成功队列里；

TaskResultGetter获取到结果之后，调用TaskScheduler的handleSuccessfulTask方法把结果返回；
TaskScheduler调用TaskSetManager的handleSuccessfulTask方法，处理成功的Task；
TaskSetManager调用DAGScheduler的taskEnded方法，告诉DAGScheduler这个Task运行结束了，如果这个时候Task全部成功了，就会结束TaskSetManager。

DAGScheduler在taskEnded方法里触发CompletionEvent事件，在处理CompletionEvent消息事件中调用DAGScheduler的handleTaskCompletion函数，针对ResultTask和ShuffleMapTask区别对待结果：
1）ResultTask：
job的numFinished加1，如果numFinished等于它的分片数，则表示任务该Stage结束，标记这个Stage为结束，最后调用JobListener（具体实现在JobWaiter）的taskSucceeded方法，把结果交给resultHandler（经过包装的自己写的那个匿名函数）处理，如果完成的Task数量等于总任务数，任务退出。
2）ShuffleMapTask：

调用Stage的addOutputLoc方法，把结果添加到Stage的outputLocs列表里

如果该Stage没有等待的Task了，就标记该Stage为结束
把Stage的outputLocs注册到MapOutputTracker里面，留个下一个Stage用
如果Stage的outputLocs为空，表示它的计算失败，重新提交Stage
找出下一个在等待并且没有父亲的Stage提交

转载请注明作者Jason Ding及其出处

作者：JasonDing
链接：https://www.jianshu.com/p/afd51966eac8

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

慕虎7371278

手记
篇

粉丝

203

获赞与收藏

873

关注作者，订阅最新文章

相关文章推荐

【Spark Core】任务执行机制和Task源码浅析1

【Spark Core】TaskScheduler源码与任务提交原理浅析2

【Spark Core】TaskScheduler源码与任务提交原理浅析1

【Spark Core】从作业提交到任务调度完整生命周期浅析

Spark-Core源码精读(12)、Task的提交流程分析

阅读免费教程

后端通用面试教程

41个小节 30273 342

网络编程入门教程

20个小节 12461 235

Pandas 入门教程

25个小节 18362 330

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空