首页手记 A Brief Introduction of...

A Brief Introduction of Kyuubi Architecture

标签：

Spark

Kyuubi Architecture

Kyuubi is an enhanced edition of the Apache Spark's primordial Thrift JDBC/ODBC Server. It is mainly designed for directly running SQL towards a cluster with all components including HDFS, YARN, Hive MetaStore, and itself secured. The main purpose of Kyuubi is to realize an architecture that can not only speed up SQL queries using Spark SQL Engine, and also be compatible with the HiveServer2's behavior as much as possible. Thus, Kyuubi use the same protocol of HiveServer2, which can be found at HiveServer2 Thrift API as the client-server communication mechanism, and a user session level SparkContext instantiating/registering/caching/recycling mechanism to implement multi-tenant functionality.

kyuubi_architecture

Unified Interface

Because Kyuubi use the same protocol of HiveServer2, it supports all kinds of JDBC/ODBC clients, and user applications written based on this Thrift API as shown in the picture above. Cat Tom can use various types of clients to create connections with the Kyuubi Server, and each connection is bound to a SparkSession instance which also contains a independent HiveMetaStoreClient to interact with Hive MetaStore Server. Tom can set session level configurations for each connection without affecting each other.

Runtime Resource Resiliency

Kyuubi does not occupy any resources from the Cluster Manager(Yarn) during startup, and will give all resources back to Yarn if there is not any active session interacting with a SparkContext. And also with the ability of Spark Dynamic Resource Allocation, it also allows us to dynamically allocating resources within a SparkContext a.k.a a Yarn Application.

Kyuubi Dynamic Resource Requesting

Session Level Resource Configurations
Kyuubi supports all Spark/Hive/Hadoop configurations, such as spark.executor.cores/memory, to be set in the connection string which will be used to initialize SparkContext.

Example

jdbc:hive2://<host>:<port>/;hive.server2.proxy.user=tom#spark.yarn.queue=theque;spark.executor.instances=3;spark.executor.cores=3;spark.executor.memory=10g

Kyuubi Dynamic SparkContext Cache

Kyuubi implements a SparkSessionCacheManager to control SparkSession/SparkContext for instantiating, registering, caching, reusing, and recycling. Different user has one and only one SparkContext instance in Kyuubi Server after it connects to the server for the first time, which will be cached in SparkSessionCacheManager for the whole connection life time and a while after all connections closed.

impersonation

All connections belong to the same user shares this SparkContext to generate their own SparkSessions

Spark Dynamic Resource Allocation

Spark provides a mechanism to dynamically adjust the resources your application occupies based on the workload. It means that your application may give resources back to the cluster if they are no longer used and request them again later when
there is demand. This feature is particularly useful if multiple applications share resources in your Spark cluster.

Please refer to Dynamic Resource Allocation to see more.

Please refer to Dynamic Allocation Configuration to learn how to configure.

With these features, Kyuubi allows us to use computing resources more efficiently.

Security

Authentication

Please refer to the Authentication/Security Guide in the online documentation for an overview on how to enable security for Kyuubi.

Authorization

Kyuubi can be integrated with Spark Authorizer to offer row/column level access control. Kyuubi does not explicitly support spark-authorizer plugin yet, here is an example you may refer to Spark Branch Authorized

authorization

High Availability

Multiple Kyuubi Server instances can register themselves with ZooKeeper when spark.kyuubi.ha.enabled=true and then the clients can find a Kyuubi Server through ZooKeeper. When a client requests a server instance, ZooKeeper randomly returns a selected registered one. This feature offers:

High Availability
Load Balancing
Rolling Upgrade

HA Configurations

Name	Default	Description
spark.kyuubi.ha.enabled	false	Whether KyuubiServer supports dynamic service discovery for its clients. To support this, each instance of KyuubiServer currently uses ZooKeeper to register itself, when it is brought up. JDBC/ODBC clients should use the ZooKeeper ensemble: spark.kyuubi.ha.zk.quorum in their connection string.
spark.kyuubi.ha.zk.quorum	none	Comma separated list of ZooKeeper servers to talk to, when KyuubiServer supports service discovery via Zookeeper.
spark.kyuubi.ha.zk.namespace	kyuubiserver	The parent node in ZooKeeper used by KyuubiServer when supporting dynamic service discovery.

Kyuubi Internal

Kyuubi's internal is very simple to understand, which is shown as the picture below. We may take about it more detailly later.

kyuubi_internal

作者：风景不美
链接：https://www.jianshu.com/p/b046a623f038

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

海绵宝宝撒

JAVA开发工程师

手记
篇

粉丝

40

获赞与收藏

127

关注作者，订阅最新文章

阅读免费教程

后端通用面试教程

41个小节 32888 371

网络编程入门教程

20个小节 13641 256

Pandas 入门教程

25个小节 20282 387

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空