首页手记 Cluster Analysis with Iris...

Cluster Analysis with Iris Dataset

标签：

大数据

Data Science Day 19:

In Supervised Learning, we specify the possible categorical values and train the models for pattern recognition. However, *what if we don't have the existing classified data model to learn from? *

[caption id="attachment_1074" align="alignnone" width="750"]

image

Radfotosonn / Pixabay[/caption]

The case we model the data in order to discover the way it clusters, based on certain attributes is Unsupervised Learning.

Clustering Analysis in one of the Unsupervised Techniques, it rather than learning by example, learn by observation.

There are 3 types of clustering methods in general, Partitioning, Hierarchical, and Density-based clustering.

1.Partitioning: n objects is grouped into k ≤ n disjoint clusters.
Partitioning methods are based on a distance measure, it applies iterative relocation until some distance-based error metric is minimized.

2.Hierarchical: either combining(agglomerative) or splitting(divisive) cluster based on some measure (distance, density or continuity), in a stepwise fashion.

Agglomerative starts with each point in its own cluster and combine them in steps, and divisive starts with the data in one cluster and divide it up

3. The density-based method is based on its density; it measures the cluster "goodness".

Example with Iris Dataset

Partitioning: K-Means=3

image

#Iris datasetiris=datasets.load_iris()
x=iris.data
y=iris.target#Plottingfig = plt.figure(1, figsize=(7,7))
ax = Axes3D(fig, rect=[0, 0, 0.95, 1], elev=48, azim=134)
ax.scatter(x[:, 3], x[:, 0], x[:, 2],
          c=labels.astype(np.float), edgecolor="k", s=50)
ax.set_xlabel("Petal width")
ax.set_ylabel("Sepal length")
ax.set_zlabel("Petal length")
plt.title("Iris Clustering K Means=3", fontsize=14)
plt.show()

  2.   **Hierarchical **

image

#Hierachy Clustering hier=linkage(x,"ward")
max_d=7.08plt.figure(figsize=(25,10))
plt.title('Iris Hierarchical Clustering Dendrogram')
plt.xlabel('Species')
plt.ylabel('distance')
dendrogram(
    hier,
    truncate_mode='lastp',  
    p=50,                  
    leaf_rotation=90.,      
    leaf_font_size=8.,     
)
plt.axhline(y=max_d, c='k')
plt.show()

 3. **Density-based method DBSCAN**

image

dbscan=DBSCAN()
dbscan.fit(x)
pca=PCA(n_components=2).fit(x)
pca_2d=pca.transform(x)for i in range(0, pca_2d.shape[0]):    if dbscan.labels_[i] == 0:
        c1 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='r', marker='+')    elif dbscan.labels_[i] == 1:
        c2 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='g', marker='o')    elif dbscan.labels_[i] == -1:
        c3 = plt.scatter(pca_2d[i, 0], pca_2d[i, 1], c='b', marker='*')

plt.legend([c1, c2, c3], ['Cluster 1', 'Cluster 2', 'Noise'])
plt.title('DBSCAN finds 2 clusters and Noise')
plt.show()

Thanks very much to Dr.Rumbaugh's clustering analysis notes!

Happy studying!

作者：乌然娅措
链接：https://www.jianshu.com/p/90aed81f9fee

点击查看更多内容

为 TA 点赞

若觉得本文不错，就分享一下吧！

评论

评论

共同学习，写下你的评论

评论加载中...

展开查看更多评论

作者其他优质文章

正在加载中

海绵宝宝撒

JAVA开发工程师

手记
篇

粉丝

40

获赞与收藏

127

关注作者，订阅最新文章

阅读免费教程

后端通用面试教程

41个小节 32882 371

网络编程入门教程

20个小节 13640 256

Pandas 入门教程

25个小节 20282 387

推荐

评论

收藏

共同学习，写下你的评论



感谢您的支持，我会继续努力的～

扫码打赏，你说多少就多少

赞赏金额会直接到老师账户

支付方式

打开微信扫一扫，即可进行扫码打赏哦

今天注册有机会得

100积分直接送

付费专栏免费学

大额优惠券免费领

立即参与放弃机会

点击
抽奖

慕课手记新用户专享福利

恭喜你，你的运气太好了，居然抽中了 100个积分！

恭喜你，抽中了价值元的专栏！

太棒了，直接落到你账户里！

积分商城里的罗技鼠标、机械键盘、
Kindle 阅读器、小米平衡车
Apple iPad （10.2英寸）、大额优惠券
在等着你去兑换了噢

作者：

免费赠送

兑换码：1111222211 复制

优惠券可用于购买实战课、体系课
无门槛使用

先去看看，有什么好东西马上兑换我爱学习，选课去


热搜

最近搜索清空

Cluster Analysis with Iris Dataset

Data Science Day 19:

Example with Iris Dataset

阅读免费教程