ZBLOG

读取数据moon.csv, 部分数据如下 X1 X2 Label 0.682988 -0.3452 1 2.04099 -0.13161 1 -0.13975 0.454391 1 0.885748 0.286342 0 1.182459 -0.49771 1 1.019208 -0.42547 1 0.389362 -0.24712 1 -0.56038 0.712681 0 0.901844 0.053931 0 0.206748 0.006208 1 1.323523...

为了完成这些任务,你需要使用一些数据分析和机器学习的库,如pandas、matplotlib、sklearn等。以下是一个可能的解决方案:

  1. import pandas as pd
  2. import matplotlib.pyplot as plt
  3. from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
  4. from sklearn.metrics import silhouette_score
  5. # 读取CSV文件
  6. data = pd.read_csv("moon.csv")
  7. # 绘制散点图
  8. plt.scatter(data[data['Label'] == 0]['X1'], data[data['Label'] == 0]['X2'], color='blue', label='Label 0')
  9. plt.scatter(data[data['Label'] == 1]['X1'], data[data['Label'] == 1]['X2'], color='red', label='Label 1')
  10. plt.xlabel('X1')
  11. plt.ylabel('X2')
  12. plt.legend()
  13. plt.show()
  14. # 使用K-means进行聚类分析并生成新标签
  15. kmeans = KMeans(n_clusters=2)
  16. kmeans.fit(data[['X1', 'X2']])
  17. new_labels_kmeans = kmeans.labels_
  18. data['New_Label_KMeans'] = new_labels_kmeans
  19. # 统计正确分类比例及轮廓系数(K-means)
  20. correct_count_kmeans = sum(data['New_Label_KMeans'] == data['Label'])
  21. accuracy_kmeans = correct_count_kmeans / len(data)
  22. silhouette_coef_kmeans = silhouette_score(data[['X1', 'X2']], new_labels_kmeans)
  23. # 使用层次聚类进行聚类分析并生成新标签
  24. hierarchical_clustering = AgglomerativeClustering(n_clusters=2)
  25. new_labels_hierarchical = hierarchical_clustering.fit_predict(data[['X1', 'X2']])
  26. data['New_Label_Hierarchical'] = new_labels_hierarchical
  27. # 统计正确分类比例及轮廓系数(层次聚类)
  28. correct_count_hierarchical = sum(data['New_Label_Hierarchical'] == data['Label'])
  29. accuracy_hierarchical = correct_count_hierarchical / len(data)
  30. silhouette_coef_hierarchical = silhouette_score(data[['X1', 'X2']], new_labels_hierarchical)
  31. # 使用DBSCAN进行聚类分析并生成新标签
  32. dbscan = DBSCAN(eps=0.3, min_samples=5)
  33. new_labels_dbscan = dbscan.fit_predict(data[['X1', 'X2']])
  34. data['New_Label_DBSCAN'] = new_labels_dbscan
  35. # 统计正确分类比例及轮廓系数(DBSCAN)
  36. correct_count_dbscan = sum(data['New_Label_DBSCAN'] == data['Label'])
  37. accuracy_dbscan = correct_count_dbscan / len(data)
  38. silhouette_coef_dbscan = silhouette_score(data[['X1', 'X2']], new_labels_dbscan)
  39. # 将新的聚类标签写入原始CSV文件
  40. data.to_csv("moon.csv", index=False)
  41. print(f"K-means accuracy: {accuracy_kmeans}, Silhouette coefficient: {silhouette_coef_kmeans}")
  42. print(f"Hierarchical clustering accuracy: {accuracy_hierarchical}, Silhouette coefficient: {silhouette_coef_hierarchical}")
  43. print(f"DBSCAN accuracy: {accuracy_dbscan}, Silhouette coefficient: {silhouette_coef_dbscan}")

请确保你已经安装了所需的库,并将代码和数据文件放在同一个目录下。运行这段代码后,你将得到散点图以及各种聚类算法的正确分类比例和轮廓系数。


内容由零声教学AI助手提供,问题来源于学员提问

本站部分文章来源于网络,版权归原作者所有,如有侵权请联系站长删除。
转载请注明出处:https://golang.0voice.com/?id=14432

分享:
扫描分享到社交APP
上一篇
下一篇
发表列表
游客 游客
此处应有掌声~
评论列表

还没有评论,快来说点什么吧~

联系我们

在线咨询: 点击这里给我发消息

微信号:3007537140

上班时间: 10:30-22:30

关注我们
x

注册

已经有帐号?