首页手记 IDEA开发Spark Streaming

IDEA开发Spark Streaming

标签：

Spark

本人使用spark2.1.0， scala 2.11.0 hadoop 2.7.3。打开IDEA新建一个Maven工程，
去Spark官网查找maven对spark的配置，比如寻找kafka的配置，maven中的使用

Source  ArtifactKafka   spark-streaming-kafka-0-8_2.11Flume   spark-streaming-flume_2.11Kinesis spark-streaming-kinesis-asl_2.11 [Amazon Software License]

最后我们在pom.xml中的配置如下：

<?xml version="1.0" encoding="UTF-8"?><project xmlns="http://maven.apache.org/POM/4.0.0"
         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
         xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>com.kason.spark</groupId>
    <artifactId>spark_platform_learn</artifactId>
    <version>1.0-SNAPSHOT</version>
    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <spark.version>2.1.0</spark.version>
        <scala.version>2.11</scala.version>
        <hadoop.version>2.7.3</hadoop.version>
    </properties>
    <dependencies>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-sql_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-hive_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.hadoop</groupId>
            <artifactId>hadoop-client</artifactId>
            <version>2.6.0</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-streaming-kafka-0-8_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-mllib_${scala.version}</artifactId>
            <version>${spark.version}</version>
        </dependency>
        <dependency>
            <groupId>mysql</groupId>
            <artifactId>mysql-connector-java</artifactId>
            <version>5.1.39</version>
        </dependency>
        <dependency>
            <groupId>junit</groupId>
            <artifactId>junit</artifactId>
            <version>4.12</version>
        </dependency>
    </dependencies>

    <!-- maven官方 http://repo1.maven.org/maven2/  或 http://repo2.maven.org/maven2/ （延迟低一些） -->
    <repositories>
        <repository>
            <id>central</id>
            <name>Maven Repository Switchboard</name>
            <layout>default</layout>
            <url>http://repo2.maven.org/maven2</url>
            <snapshots>
                <enabled>false</enabled>
            </snapshots>
        </repository>
    </repositories>

    <build>
        <sourceDirectory>src/main/scala</sourceDirectory>
        <testSourceDirectory>src/test/scala</testSourceDirectory>

        <plugins>
            <plugin>
                <!-- MAVEN 编译使用的JDK版本 -->
                <groupId>org.apache.maven.plugins</groupId>
                <artifactId>maven-compiler-plugin</artifactId>
                <version>3.3</version>
                <configuration>
                    <source>1.8</source>
                    <target>1.8</target>
                    <encoding>UTF-8</encoding>
                </configuration>
            </plugin>
        </plugins>
    </build></project>

SparkStreaming Demo code:

package com.scala.action.streamingimport org.apache.spark.SparkConfimport org.apache.spark.streaming.{Seconds, StreamingContext}/**
  * Created by kason_zhang on 4/7/2017.
  */object SparkStreaming {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setMaster("local[3]").setAppName("BasicStreamingExample")
    val ssc = new StreamingContext(conf, Seconds(5))

    val lines = ssc.socketTextStream("10.64.24.78" , 9999)
    val words = lines.flatMap(_.split(" "))
    val wc = words.map(x => (x, 1)).reduceByKey((x, y) => x + y)
    wc.print
    wc.saveAsTextFiles("D:\\work\\cloud\\log\\word.txt")
    println("pandas: sscstart")
    ssc.start()
    println("pandas: awaittermination")
    ssc.awaitTermination()
    println("pandas: done!")
  }

}

Centos 安装nc(netstat)

yum install nc

启动监听端口Server
nc -lk 9999
防火墙开启9999端口

[kason@kason Desktop]$ suPassword: [root@kason Desktop]# /sbin/iptables -I INPUT -p tcp --dport 9999 -j ACCEPT[root@kason Desktop]# /etc/init.d/iptables saveiptables: Saving firewall rules to /etc/sysconfig/iptables:[  OK  ]
[root@kason Desktop]# service iptables restartiptables: Setting chains to policy ACCEPT: filter          [  OK  ]iptables: Flushing firewall rules:                         [  OK  ]iptables: Unloading modules:                               [  OK  ]iptables: Applying firewall rules:                         [  OK  ]
[root@kason Desktop]# vi /etc/init.d/iptables [root@kason Desktop]# /etc/init.d/iptables status

之后Linux下执行nc -lk 9999然后就可以在下面输入要发送的数据

[kason@kason Desktop]$ nc -lk 9999hell owoo
goo ndate
better best
goo ndate
vi /e   
hello world
hello world out
hell o
kkl portal
hell o kkl

IDEA的输出结果如下：

-------------------------------------------
Time: 1491875390000 ms
-------------------------------------------
(o,1)
(kkl,1)
(hell,1)

因此使用IDEA开发一个简单的demo就完成了。

作者：kason_zhang
链接：https://www.jianshu.com/p/fd28dcdf3a1b

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

慕码人8056858

手记
篇

粉丝

350

获赞与收藏

1323

关注作者，订阅最新文章

相关文章推荐

大数据分析技术与实战之 Spark Streaming

Spark Streaming

Kafka+Spark Streaming实现单词数量的实时统计

Spark Streaming

9.Spark Streaming

阅读免费教程

后端通用面试教程

41个小节 30273 342

网络编程入门教程

20个小节 12461 235

Pandas 入门教程

25个小节 18362 330

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

IDEA开发Spark Streaming

相关文章推荐

阅读免费教程