# ML：T分布随机邻居嵌入（t-SNE）算法

2021年5月5日13:28:29 发表评论 853 次浏览

T分布随机邻居嵌入(t-SNE)是一种非线性降维技术, 非常适合在二维或三维的低维空间中嵌入高维数据以进行可视化。

t-SNE如何工作？

t-SNE是一种非线性降维算法, 它基于具有特征的数据点的相似性来查找数据中的模式, 将点的相似性计算为点A选择点B作为其邻居的条件概率。

### 在MNIST数据集上应用t-SNE

``````# Importing Necessary Modules.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.manifold import TSNE
from sklearn.preprocessing import StandardScaler``````

``````# Reading the data using pandas

# print first five rows of df

# save the labels into a variable l.
l = df[ 'label' ]

# Drop the label feature and store the pixel data in d.
d = df.drop( "label" , axis = 1 )``````

``````# Data-preprocessing: Standardizing the data
from sklearn.preprocessing import StandardScaler

standardized_data = StandardScaler().fit_transform(data)

print (standardized_data.shape)``````

``````# TSNE
# Picking the top 1000 points as TSNE
# takes a lot of time for 15K points
data_1000 = standardized_data[ 0 : 1000 , :]
labels_1000 = labels[ 0 : 1000 ]

model = TSNE(n_components = 2 , random_state = 0 )
# configuring the parameteres
# the number of components = 2
# default perplexity = 30
# default learning rate = 200
# default Maximum number of iterations
# for the optimization = 1000

tsne_data = model.fit_transform(data_1000)

# creating a new data frame which
# help us in ploting the result data
tsne_data = np.vstack((tsne_data.T, labels_1000)).T
tsne_df = pd.DataFrame(data = tsne_data, columns = ( "Dim_1" , "Dim_2" , "label" ))

# Ploting the result of tsne
sn.FacetGrid(tsne_df, hue = "label" , size = 6 ). map (