首页手记『 Spark 』1. spark 简介

『 Spark 』1. spark 简介

标签：

Spark

写在前面

本系列是综合了自己在学习spark过程中的理解记录＋对参考文章中的一些理解＋个人实践spark过程中的一些心得而来。写这样一个系列仅仅是为了梳理个人学习spark的笔记记录，并非为了做什么教程，所以一切以个人理解梳理为主，没有必要的细节就不会记录了。若想深入了解，最好阅读参考文章和官方文档。

其次，本系列是基于目前最新的 spark 1.6.0 系列开始的，spark 目前的更新速度很快，记录一下版本好还是必要的。
最后，如果各位觉得内容有误，欢迎留言备注，所有留言 24 小时内必定回复，非常感谢。
Tips: 如果插图看起来不明显，可以：1. 放大网页；2. 新标签中打开图片，查看原图哦。

1. 如何向别人介绍 spark

Apache Spark™ is a fast and general engine for large-scale data processing.

Apache Spark is a fast and general-purpose cluster computing system.
It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs.
It also supports a rich set of higher-level tools including :

Spark SQL for SQL and structured data processing, extends to DataFrames and DataSets
MLlib for machine learning
GraphX for graph processing
Spark Streaming for stream data processing

2. spark 诞生的一些背景

introduction-to-spark-1.jpg

introduction-to-spark-2.jpg

Spark started in 2009, open sourced 2010, unlike the various specialized systems[hadoop, storm], Spark’s goal was to :

generalize MapReduce to support new apps within same engine
- it's perfectly compatible with hadoop, can run on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3.
speed up iteration computing over hadoop.
- use memory + disk instead of disk as data storage medium
- design a new programming modal, RDD, which make the data processing more graceful [RDD transformation, action, distributed jobs, stages and tasks]

introduction-to-spark-4.jpg

introduction-to-spark-5.jpg

3. 为何选用 spark

designed, implemented and used as libs, instead of specialized systems;
- much more useful and maintainable

introduction-to-spark-3.jpg

from history, it is designed and improved upon hadoop and storm, it has perfect genes;
documents, community, products and trends;
it provides sql, dataframes, datasets, machine learning lib, graph computing lib and activitily growth 3-party lib, easy to use, cover lots of use cases in lots field;
it provides ad-hoc exploring, which boost your data exploring and pre-processing and help you build your data ETL, processing job;

4. Next

下一篇，简单介绍 spark 里必须深刻理解的基本概念。

作者：litaotao
链接：https://www.jianshu.com/p/d6d2acbd87fa

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

慕神8447489

手记
篇

粉丝

174

获赞与收藏

961

关注作者，订阅最新文章

阅读免费教程

后端通用面试教程

41个小节 32886 371

网络编程入门教程

20个小节 13641 256

Pandas 入门教程

25个小节 20282 387

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空