Spark Streaming中Driver的容错主要是ReceiverTracker、Dstream.graph、JobGenerator的容错
第一、看ReceiverTracker的容错,主要是ReceiverTracker接收元数据的存入WAL,看ReceiverTracker的addBlock方法,代码如下
def addBlock(receivedBlockInfo: ReceivedBlockInfo): Boolean = { try { val writeResult = writeToLog(BlockAdditionEvent(receivedBlockInfo)) if (writeResult) { synchronized { getReceivedBlockQueue(receivedBlockInfo.streamId) += receivedBlockInfo } logDebug(s"Stream ${receivedBlockInfo.streamId} received " + s"block ${receivedBlockInfo.blockStoreResult.blockId}") } else { logDebug(s"Failed to acknowledge stream ${receivedBlockInfo.streamId} receiving " + s"block ${receivedBlockInfo.blockStoreResult.blockId} in the Write Ahead Log.") } writeResult } catch { case NonFatal(e) => logError(s"Error adding block $receivedBlockInfo", e) false } }
writeToLog方法就是进行WAL的操作,看writeToLog的代码
private def writeToLog(record: ReceivedBlockTrackerLogEvent): Boolean = { if (isWriteAheadLogEnabled) { logTrace(s"Writing record: $record") try { writeAheadLogOption.get.write(ByteBuffer.wrap(Utils.serialize(record)), clock.getTimeMillis()) true } catch { case NonFatal(e) => logWarning(s"Exception thrown while writing record: $record to the WriteAheadLog.", e) false } } else { true } }
首先判断是否开启了WAL,根据一下isWriteAheadLogEnabled值
private[streaming] def isWriteAheadLogEnabled: Boolean = writeAheadLogOption.nonEmpty
接着看writeAheadLogOption
private val writeAheadLogOption = createWriteAheadLog()
再看createWriteAheadLog()方法
private def createWriteAheadLog(): Option[WriteAheadLog] = { checkpointDirOption.map { checkpointDir => val logDir = ReceivedBlockTracker.checkpointDirToLogDir(checkpointDirOption.get) WriteAheadLogUtils.createLogForDriver(conf, logDir, hadoopConf) } }
根据checkpoint的配置,获取checkpoint的目录,这里可以看出,checkpoint可以有多个目录。
写完WAL才将receivedBlockInfo放到内存队列getReceivedBlockQueue中
第二、看ReceivedBlockTracker的allocateBlocksToBatch方法,代码如下
def allocateBlocksToBatch(batchTime: Time): Unit = synchronized { if (lastAllocatedBatchTime == null || batchTime > lastAllocatedBatchTime) { val streamIdToBlocks = streamIds.map { streamId => (streamId, getReceivedBlockQueue(streamId).dequeueAll(x => true)) }.toMap val allocatedBlocks = AllocatedBlocks(streamIdToBlocks) if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) { timeToAllocatedBlocks.put(batchTime, allocatedBlocks) lastAllocatedBatchTime = batchTime } else { logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery") } } else { // This situation occurs when: // 1. WAL is ended with BatchAllocationEvent, but without BatchCleanupEvent, // possibly processed batch job or half-processed batch job need to be processed again, // so the batchTime will be equal to lastAllocatedBatchTime. // 2. Slow checkpointing makes recovered batch time older than WAL recovered // lastAllocatedBatchTime. // This situation will only occurs in recovery time. logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery") } }
从getReceivedBlockQueue中获取每一个receiver的ReceivedBlockQueue队列赋值给streamIdToBlocks,然后包装一下
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks)
allocatedBlocks就是根据时间获取的一批元数据,交给对应batchDuration的job,job在执行的时候就可以使用,在使用前先进行WAL,如果job出错恢复后,可以知道数据计算到什么位置
val allocatedBlocks = AllocatedBlocks(streamIdToBlocks) if (writeToLog(BatchAllocationEvent(batchTime, allocatedBlocks))) { timeToAllocatedBlocks.put(batchTime, allocatedBlocks) lastAllocatedBatchTime = batchTime } else { logInfo(s"Possibly processed batch $batchTime need to be processed again in WAL recovery") }
第三、看cleanupOldBatches方法,cleanupOldBatches的功能是从内存中清楚不用的batches元数据,再删除WAL的数据,再删除之前把要删除的batches信息也进行WAL
def cleanupOldBatches(cleanupThreshTime: Time, waitForCompletion: Boolean): Unit = synchronized { require(cleanupThreshTime.milliseconds < clock.getTimeMillis()) val timesToCleanup = timeToAllocatedBlocks.keys.filter { _ < cleanupThreshTime }.toSeq logInfo("Deleting batches " + timesToCleanup) if (writeToLog(BatchCleanupEvent(timesToCleanup))) { timeToAllocatedBlocks --= timesToCleanup writeAheadLogOption.foreach(_.clean(cleanupThreshTime.milliseconds, waitForCompletion)) } else { logWarning("Failed to acknowledge batch clean up in the Write Ahead Log.") } }
总结一下上面的三种WAL,对应下面的三种事件,这就是ReceiverTracker的容错
/** Trait representing any event in the ReceivedBlockTracker that updates its state. */private[streaming] sealed trait ReceivedBlockTrackerLogEventprivate[streaming] case class BlockAdditionEvent(receivedBlockInfo: ReceivedBlockInfo) extends ReceivedBlockTrackerLogEventprivate[streaming] case class BatchAllocationEvent(time: Time, allocatedBlocks: AllocatedBlocks) extends ReceivedBlockTrackerLogEventprivate[streaming] case class BatchCleanupEvent(times: Seq[Time]) extends ReceivedBlockTrackerLogEvent
看一下Dstream.graph和JobGenerator的容错,从开始
private def generateJobs(time: Time) { SparkEnv has been removed. SparkEnv.set(ssc.env) Try { // allocate received blocks to batch // 分配接收到的数据给batch jobScheduler.receiverTracker.allocateBlocksToBatch(time) // 使用分配的块生成jobs graph.generateJobs(time) // generate jobs using allocated block } match { case Success(jobs) => // 获取元数据信息 val streamIdToInputInfos = jobScheduler.inputInfoTracker.getInfo(time) // 提交jobSet jobScheduler.submitJobSet(JobSet(time, jobs, streamIdToInputInfos)) case Failure(e) => jobScheduler.reportError("Error generating jobs for time " + time, e) } eventLoop.post(DoCheckpoint(time, clearCheckpointDataLater = false)) }
jobs生成完成后发送DoCheckpoint消息,最终调用doCheckpoint方法,代码如下
private def doCheckpoint(time: Time, clearCheckpointDataLater: Boolean) { if (shouldCheckpoint && (time - graph.zeroTime).isMultipleOf(ssc.checkpointDuration)) { logInfo("Checkpointing graph for time " + time) ssc.graph.updateCheckpointData(time) checkpointWriter.write(new Checkpoint(ssc, time), clearCheckpointDataLater) } }
updateCheckpointData和checkpointWriter.write做了什么,后续
作者:海纳百川_spark
链接:https://www.jianshu.com/p/5397bc160c6b
共同学习,写下你的评论
评论加载中...
作者其他优质文章