好的,下面是您所提到的任务的代码实现:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.metrics import silhouette_score
# 读取数据
data = pd.read_csv('moon.csv')
# 绘制散点图
plt.scatter(data['X1'], data['X2'], c=data['Label'])
plt.xlabel('X1')
plt.ylabel('X2')
plt.title('Scatter Plot')
plt.show()
# 使用K-means进行聚类并生成新标签
kmeans = KMeans(n_clusters=2)
kmeans_labels = kmeans.fit_predict(data[['X1', 'X2']])
data['KMeans_Label'] = kmeans_labels
# 统计正确分类比例
correct_kmeans = sum(data['Label'] == data['KMeans_Label']) / len(data)
# 计算轮廓系数
silhouette_kmeans = silhouette_score(data[['X1', 'X2']], data['KMeans_Label'])
# 使用层次聚类进行聚类并生成新标签
hierarchical = AgglomerativeClustering(n_clusters=2)
hierarchical_labels = hierarchical.fit_predict(data[['X1', 'X2']])
data['Hierarchical_Label'] = hierarchical_labels
# 统计正确分类比例
correct_hierarchical = sum(data['Label'] == data['Hierarchical_Label']) / len(data)
# 计算轮廓系数
silhouette_hierarchical = silhouette_score(data[['X1', 'X2']], data['Hierarchical_Label'])
# 使用DBSCAN进行聚类并生成新标签
dbscan = DBSCAN(eps=0.3, min_samples=5)
dbscan_labels = dbscan.fit_predict(data[['X1', 'X2']])
data['DBSCAN_Label'] = dbscan_labels
# 统计正确分类比例
correct_dbscan = sum(data['Label'] == data['DBSCAN_Label']) / len(data)
# 计算轮廓系数
silhouette_dbscan = silhouette_score(data[['X1', 'X2']], data['DBSCAN_Label'])
# 将新标签写回原文件
data.to_csv('moon.csv', index=False)
print("K-means - 正确分类比例:", correct_kmeans, "轮廓系数:", silhouette_kmeans)
print("层次聚类 - 正确分类比例:", correct_hierarchical, "轮廓系数:", silhouette_hierarchical)
print("DBSCAN - 正确分类比例:", correct_dbscan, "轮廓系数:", silhouette_dbscan)
请确保在执行代码之前安装了所需的依赖库(如pandas、matplotlib和scikit-learn)。代码将绘制散点图,并使用K-means、层次聚类和DBSCAN算法进行聚类分析,然后计算每种方法的正确分类比例和轮廓系数。最后,新的聚类标签将被写回原始数据文件中。
内容由零声教学AI助手提供,问题来源于学员提问




