描述了Spark Master和Worker启动的流程, 里面无论是Master还是Workermain方法的第一步都是构建RpcEnv, 这个是消息通信的核心, 这里就来详细分析分析Rpc
首先看看Master和Worker的一段相似构建RpcEnv的代码:
Master: val rpcEnv = RpcEnv.create(SYSTEM_NAME, host, port, conf, securityMgr) val masterEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME, new Master(rpcEnv, rpcEnv.address, webUiPort, securityMgr, conf))//这句master终端点send a message to the corresponding [[RpcEndpoint]],这个RpcEndpoint就是Master val portsResponse = masterEndpoint.askWithRetry[BoundPortsResponse](BoundPortsRequest) Worker: val rpcEnv = RpcEnv.create(systemName, host, port, conf, securityMgr) val masterAddresses = masterUrls.map(RpcAddress.fromSparkURL(_)) rpcEnv.setupEndpoint(ENDPOINT_NAME, new Worker(rpcEnv, webUiPort, cores, memory, masterAddresses, ENDPOINT_NAME, workDir, conf, securityMgr))
查看可知道其实这两部比较重要, RpcEnv.create和rpcEnv.setupEndpoint。这里就单独详细分析这两块的内容
RpcEnv.create
RpcEnv.create流程图大致为如此:
image.png
底层是启动Netty的Server, 开启Netty端通信(server = transportContext.createServer(host, port, bootstraps))
rpcEnv.setupEndpoint
Spark所有的消息实际上都是通过RpcEnv处理, 然后RpcEnv分发到对应的Endpoint。RpcEndpointRef相当于RpcEndpoint的引用, 如果想给RpcEndpoint发送消息,则需要先获取RpcEndpoint的引用RpcEndpointRef
这里以Master举例:
val masterEndpoint = rpcEnv.setupEndpoint(ENDPOINT_NAME, new Master(rpcEnv, rpcEnv.address, webUiPort, securityMgr, conf))
new Master是一个RpcEndpoint,会转到NettyRpcEnv类的setupEndpoint方法:
dispatcher.registerRpcEndpoint(name, endpoint)
这之后会转到Dispatcher类的registerRpcEndpoint方法中:
def registerRpcEndpoint(name: String, endpoint: RpcEndpoint): NettyRpcEndpointRef = {//因为Dispatcher关联NettyRpcEnv对象, 因此可以通过nettyEnv.address获取。nettyEnv.address代表启动此NettyRpcEnv的address(由host和Port构成)
val addr = RpcEndpointAddress(nettyEnv.address, name)//创建endpointRef ,此处应该是对应Master的RpcEndpointRef, 它实际上是一个NettyRpcEndpointRef对象
val endpointRef = new NettyRpcEndpointRef(nettyEnv.conf, addr, nettyEnv) synchronized { if (stopped) { throw new IllegalStateException("RpcEnv has been stopped")
}//判断endpoints是否有对应名字的EndPointData, 没有就加入进去
if (endpoints.putIfAbsent(name, new EndpointData(name, endpoint, endpointRef)) != null) { throw new IllegalArgumentException(s"There is already an RpcEndpoint called $name")
}
val data = endpoints.get(name)//添加进入endpointRefs
endpointRefs.put(data.endpoint, data.ref)//将data添加进入receivers队列, 等待线程池拉取,取其消息进行执行。
receivers.offer(data) // for the OnStart message
}
endpointRef
}Dispatcher有几个变量很重要:
//endpoints是一个线程安全的ConcurrentMap,key是名字,值是EndpointDataprivate val endpoints: ConcurrentMap[String, EndpointData] = new ConcurrentHashMap[String, EndpointData]//endpointRefs 存放了RpcEndpoint与RpcEndpointRef的一一映射关系 private val endpointRefs: ConcurrentMap[RpcEndpoint, RpcEndpointRef] = new ConcurrentHashMap[RpcEndpoint, RpcEndpointRef] // Track the receivers whose inboxes may contain messages.//receivers是一个队列,Dispatcher会有threadpool线程池去消费receivers中的信息 private val receivers = new LinkedBlockingQueue[EndpointData]
EndpointData由名字,RpcEndpoint,NettyRpcEndpointRef构成,并会实例化Inbox,Inbox new对象时会将OnStart加到Messages的队列中作为inbox的首条消息,这也是为何RpcEndpoint构造函数执行完之后就立马执行onStar()函数了 private class EndpointData(
val name: String,
val endpoint: RpcEndpoint,
val ref: NettyRpcEndpointRef) {
val inbox = new Inbox(ref, endpoint)
}NettyRpcEnv有两个方法用于序列化和反序列化的,因为NettyRpcEnv需要远程传输,远程通信:
private[netty] def serialize(content: Any): ByteBuffer = {
javaSerializerInstance.serialize(content)
}
private[netty] def deserialize[T: ClassTag](client: TransportClient, bytes: ByteBuffer): T = {
NettyRpcEnv.currentClient.withValue(client) {
deserialize { () =>
javaSerializerInstance.deserialize[T](bytes)
}
}
}同样的Worker启动的进程也是如此,通过setupEndpoint方法创建Worker 与 NettyRpcEnvRef的映射关系。
Rpc通信
首先看看RpcEndpointRef中的两个总要方法:
/** * Sends a one-way asynchronous message. Fire-and-forget semantics. */ def send(message: Any): Unit/** * Send a message to the corresponding [[RpcEndpoint.receiveAndReply)]] and return a [[Future]] to * receive the reply within the specified timeout. * * This method only sends the message once and never retries. */ def ask[T: ClassTag](message: Any, timeout: RpcTimeout): Future[T]
一个是send, 源码描述它就是一种异步的one-way的消息, 实际上也就是发送过去无需回复。
而ask与send不同需要回复,它是发送一个消息到指定的终端点,然后接收此消息的终端点收到消息处理后进行reply,这个可能是local模式也可能remote模式。
image.png
这里我们从中选中一点代码进行分析:如Master中:
如以下代码:
case Heartbeat(workerId, worker) =>
idToWorker.get(workerId) match { case Some(workerInfo) =>
workerInfo.lastHeartbeat = System.currentTimeMillis() case None => if (workers.map(_.id).contains(workerId)) {
logWarning(s"Got heartbeat from unregistered worker $workerId." + " Asking it to re-register.")
worker.send(ReconnectWorker(masterUrl))
} else {
logWarning(s"Got heartbeat from unregistered worker $workerId." + " This worker was never registered, so ignoring the heartbeat.")
}
}上面的代码逻辑是Worker会定时发送心跳包到Master端, 如果Master检测到workerId对应的workerInfo找不到了, 则会校验workers集合是不是包含此workerId,包含则会发送重连给Worker
worker.send(ReconnectWorker(masterUrl))中worker是RpcendpointRef,实际上也是NettyRpcEnvRef,接着就到:
override def send(message: Any): Unit = {
require(message != null, "Message is null")
nettyEnv.send(RequestMessage(nettyEnv.address, this, message)) // RequestMessage(address, RpcEnv, message)
}
RequestMessage:/**
* The message that is sent from the sender to the receiver.
*/private[netty] case class RequestMessage(
senderAddress: RpcAddress, receiver: NettyRpcEndpointRef, content: Any)然后就到:
private[netty] def send(message: RequestMessage): Unit = {
val remoteAddr = message.receiver.address if (remoteAddr == address) { // Message to a local RPC endpoint.
try {
dispatcher.postOneWayMessage(message)
} catch { case e: RpcEnvStoppedException => logWarning(e.getMessage)
}
} else { // Message to a remote RPC endpoint.
postToOutbox(message.receiver, OneWayOutboxMessage(serialize(message)))
}
}即根据messager的receiver方的地址与本机地址是否相同, 相同说明是local Rpc,不同则说明是remote Rpc,, 这里由于Master节点要往Worker节点发消息, 则属于remote 模式。下面分别介绍两种模式下的情景。
(1)remote RPC:
private def postToOutbox(receiver: NettyRpcEndpointRef, message: OutboxMessage): Unit = { if (receiver.client != null) {
message.sendWith(receiver.client)
} else {
require(receiver.address != null, "Cannot send message to client endpoint with no listen address.")
val targetOutbox = {
val outbox = outboxes.get(receiver.address) if (outbox == null) {
val newOutbox = new Outbox(this, receiver.address)
val oldOutbox = outboxes.putIfAbsent(receiver.address, newOutbox) if (oldOutbox == null) {
newOutbox
} else {
oldOutbox
}
} else {
outbox
}
} if (stopped.get) { // It's possible that we put `targetOutbox` after stopping. So we need to clean it.
outboxes.remove(receiver.address)
targetOutbox.stop()
} else {
targetOutbox.send(message)
}
}
}remote Rpc发送最终会通过TransportClient去发送,
/**
* Sends an opaque message to the RpcHandler on the server-side. The callback will be invoked
* with the server's response or upon any failure.
*
* @param message The message to send.
* @param callback Callback to handle the RPC's reply.
* @return The RPC's id.
*/
public long sendRpc(ByteBuffer message, final RpcResponseCallback callback) {即通过Netty框架将数据发送到远程服务器端的RpcHandler那里, 让其去处理。
然后NettyRpcHandler收到消息,就会发到inBox中,让线程池来消费消息
/** Posts a message sent by a remote endpoint. */
def postRemoteMessage(message: RequestMessage, callback: RpcResponseCallback): Unit = {
val rpcCallContext = new RemoteNettyRpcCallContext(nettyEnv, callback, message.senderAddress)
val rpcMessage = RpcMessage(message.senderAddress, message.content, rpcCallContext)
postMessage(message.receiver.name, rpcMessage, (e) => callback.onFailure(e))
}线程池消费:
/** Message loop used for dispatching messages. */
private class MessageLoop extends Runnable { override def run(): Unit = {
NettyRpcEnv.rpcThreadFlag.value = true
try { while (true) { try {
val data = receivers.take() if (data == PoisonPill) { // Put PoisonPill back so that other MessageLoops can see it.
receivers.offer(PoisonPill) return
}
data.inbox.process(Dispatcher.this)
} catch { case NonFatal(e) => logError(e.getMessage, e)
}
}
} catch { case ie: InterruptedException => // exit
}
}
}处理远程消息的代码:
**
* Process stored messages.
*/ def process(dispatcher: Dispatcher): Unit = {
var message: InboxMessage = null
inbox.synchronized { if (!enableConcurrent && numActiveThreads != 0) { return
}
message = messages.poll() if (message != null) {
numActiveThreads += 1
} else { return
}
} while (true) {
safelyCall(endpoint) {
message match { case RpcMessage(_sender, content, context) => try {
endpoint.receiveAndReply(context).applyOrElse[Any, Unit](content, { msg => throw new SparkException(s"Unsupported message $message from ${_sender}")
})
} catch { case NonFatal(e) =>
context.sendFailure(e) // Throw the exception -- this exception will be caught by the safelyCall function.
// The endpoint's onError function will be called.
throw e
}
作者:kason_zhang
链接:https://www.jianshu.com/p/bda13682889f
共同学习,写下你的评论
评论加载中...
作者其他优质文章

