首页手记 Spark Streaming Dynamic...

Spark Streaming Dynamic Resource Allocation 文档(非官方特性)

标签：

Spark

必要配置

通过下面参数开启DRA

spark.streaming.dynamicAllocation.enabled=true

设置最大最小的Executor 数目：

spark.streaming.dynamicAllocation.minExecutors=0
spark.streaming.dynamicAllocation.maxExecutors=50

可选配置

这些参数可以不用配置，都已经提供了一个较为合理的默认值

开启日志:

spark.streaming.dynamicAllocation.debug=true

设置DRA 生效延时：

spark.streaming.dynamicAllocation.delay.rounds=10

设置DRA 计算资源量时参考的周期数：

spark.streaming.dynamicAllocation.rememberBatchSize=1

设置DRA 释放资源的步调：

spark.streaming.dynamicAllocation.releaseRounds=5

设置DRA 资源额外保留比例:

spark.streaming.dynamicAllocation.reserveRate=0.2

DRA 算法说明

减少资源时，采用启发式算法。根据之前周期的处理时间，计算需要保留的资源量(A)，然后尝试分多轮试探性的减少(B),每个计算周期都会重复A,B动作，最后会收敛到一个具体的数值。

如果一旦发生延时，则会立马向Yarn申请spark.streaming.dynamicAllocation.maxExecutors 个Executor，以保证可以最快速度消除延时。富余出来的资源会通过减少资源的动作慢慢进行减少，让程序趋于稳定。

发生减少资源的动作，则剔除的掉的Executor 会被立刻(几毫秒/纳秒)屏蔽，并且不再分配Task,之后再由Yarn异步移除。

添加资源的动作，则由Yarn决定

注意事项

请务必保证你Package 的App包不包含spark 相关的组件。否则你会看到自己的设置并不生效，因为运行的时候用了你的App里的spark-core,spark-streaming jar包了。

一些可以参考的调整

如果系统趋向稳定后，经过人工观察发现其实还可以再降资源，则可以尝试调低

spark.streaming.dynamicAllocation.releaseRounds=5
spark.streaming.dynamicAllocation.reserveRate=0.2

建议releaseRounds 不低于2，reserveRate 不低于0.05。避免系统发生颠簸。

测试代码

object IamGod {
  def main(args: Array[String]): Unit = {

    def createContext = {
      val conf = new SparkConf().setAppName("DRA Test")
      val ssc = new StreamingContext(conf, Seconds(30))

      val items1 = Seq.fill(30)(Seq((10 + scala.util.Random.nextInt(10)) * 1000))
      val items2 = Seq.fill(30)(Seq((30 + scala.util.Random.nextInt(10)) * 1000))
      val items3 = Seq.fill(30)(Seq((20 + scala.util.Random.nextInt(10)) * 1000))

      val fileInput = new TestInputStream[Int](ssc, items1 ++ items2 ++ items3, 10)

      val logs = fileInput.map(f => Thread.sleep(f))

      logs.foreachRDD { rdd =>
        rdd.count()
      }

      ssc
    }

    val ssc = createContext

    ssc.start()
    ssc.awaitTermination()

  }

}

前面引用了一个测试类：

class TestInputStream[T: ClassTag](_ssc: StreamingContext, input: Seq[Seq[T]], numPartitions: Int)
  extends InputDStream[T](_ssc) {

  def start() {}

  def stop() {}

  def compute(validTime: Time): Option[RDD[T]] = {
    logInfo("Computing RDD for time " + validTime)
    val index = ((validTime - zeroTime) / slideDuration - 1).toInt
    val selectedInput = if (index < input.size) input(index) else Seq[T]()    // lets us test cases where RDDs are not created
    if (selectedInput == null) {      return None
    }    // Report the input data's information to InputInfoTracker for testing
    val inputInfo = StreamInputInfo(id, selectedInput.length.toLong)
    ssc.scheduler.inputInfoTracker.reportInfo(validTime, inputInfo)

    val rdd = ssc.sc.makeRDD(selectedInput, numPartitions)
    logInfo("Created RDD " + rdd.id + " with " + selectedInput)
    Some(rdd)
  }
}

作者：祝威廉
链接：https://www.jianshu.com/p/9c6cdf429b52

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

慕标5832272

全栈工程师

手记
篇

粉丝

229

获赞与收藏

1001

关注作者，订阅最新文章

相关文章推荐

Spark Streaming Dynamic Resource Allocation

Spark 动态资源分配(Dynamic Resource Allocation) 解析

Spark Streaming 数据产生与导入相关的内存分析

[译]Spark 2.1.0官方文档翻译

Spark Streaming资源动态申请和动态控制消费速率原理剖析

阅读免费教程

后端通用面试教程

41个小节 30273 342

网络编程入门教程

20个小节 12461 235

Pandas 入门教程

25个小节 18362 330

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空