为了完成这个任务,你可以使用Python的数据分析和可视化库,如pandas、matplotlib和scikit-learn。以下是一个基本的示例代码:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans, AgglomerativeClustering, DBSCAN
from sklearn.metrics import silhouette_score
# 读取数据
data = pd.read_csv("moon.csv")
# 绘制散点图
plt.scatter(data['X1'], data['X2'], c=data['Label'])
plt.xlabel('X1')
plt.ylabel('X2')
plt.title('Scatter Plot with True Labels')
plt.show()
# 忽略真实标签进行K-means聚类
kmeans = KMeans(n_clusters=2)
data['KMeans_Label'] = kmeans.fit_predict(data[['X1', 'X2']])
# 统计正确分类的比例
correct_kmeans = sum(data['Label'] == data['KMeans_Label']) / len(data)
# 计算轮廓系数
silhouette_kmeans = silhouette_score(data[['X1', 'X2']], data['KMeans_Label'])
print("K-Means Correct Classification Rate:", correct_kmeans)
print("K-Means Silhouette Score:", silhouette_kmeans)
# 层次聚类
hierarchical = AgglomerativeClustering(n_clusters=2)
data['Hierarchical_Label'] = hierarchical.fit_predict(data[['X1', 'X2']])
# 统计正确分类的比例
correct_hierarchical = sum(data['Label'] == data['Hierarchical_Label']) / len(data)
# 计算轮廓系数
silhouette_hierarchical = silhouette_score(data[['X1', 'X2']], data['Hierarchical_Label'])
print("Hierarchical Correct Classification Rate:", correct_hierarchical)
print("Hierarchical Silhouette Score:", silhouette_hierarchical)
# DBSCAN聚类
dbscan = DBSCAN(eps=0.3, min_samples=5)
data['DBSCAN_Label'] = dbscan.fit_predict(data[['X1', 'X2']])
# 统计正确分类的比例
correct_dbscan = sum(data['Label'] == data['DBSCAN_Label']) / len(data)
# 计算轮廓系数
silhouette_dbscan = silhouette_score(data[['X1', 'X2']], data['DBSCAN_Label'])
print("DBSCAN Correct Classification Rate:", correct_dbscan)
print("DBSCAN Silhouette Score:", silhouette_dbscan)
# 将新的聚类标签写入原始CSV文件
data.to_csv('moon_with_labels.csv', index=False)
请注意,这只是一个基本示例,你可以根据自己的需求进行修改和扩展。另外,需要确保安装所需的库,并且将数据文件moon.csv放在与代码相同的目录下。
内容由零声教学AI助手提供,问题来源于学员提问




