# 回归分析教程：Python逻辑回归示例和代码

2021年9月18日23:19:24 发表评论 1,429 次浏览

• '1' 表示true/成功；或者
• '0' 表示false/失败

## 在 Python 中应用逻辑回归的步骤

### 第 1 步：收集数据

Python如何实现逻辑回归？首先举一个简单的例子，假设你的目标是用 Python 构建逻辑回归模型，以确定候选人是否会被名牌大学录取。

• 因变量表示一个人是否被录取；和
• 3 个自变量是 GMAT 分数、GPA 和工作经验年数

### 第 2 步：导入所需的 Python 包

• pandas – 用于创建 DataFrame 以在 Python 中捕获数据集
• sklearn – 用于在 Python 中构建逻辑回归模型
• seaborn – 用于创建混淆矩阵
• matplotlib – 用于显示图表

``````import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
import matplotlib.pyplot as plt``````

### 第 3 步：构建dataframe

``````import pandas as pd
candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
}

print (df)``````

### 第 4 步：在 Python 中创建逻辑回归

Python逻辑回归示例：现在，设置自变量（表示为 X）和因变量（表示为 y）：

``````X = df[['gmat', 'gpa','work_experience']]

``X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)``

``````logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)``````

``````confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)``````

Python逻辑回归教程：对于最后一部分，打印精度并绘制混淆矩阵：

``````print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))
plt.show()``````

``````import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn import metrics
import seaborn as sn
import matplotlib.pyplot as plt

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
}

#print (df)

X = df[['gmat', 'gpa','work_experience']]

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)

confusion_matrix = pd.crosstab(y_test, y_pred, rownames=['Actual'], colnames=['Predicted'])
sn.heatmap(confusion_matrix, annot=True)

print('Accuracy: ',metrics.accuracy_score(y_test, y_pred))
plt.show()``````

• TP = True Positives = 4
• TN = True Negatives = 4
• FP = False Positives = 1
• FN = False Negatives = 1

## Python如何实现逻辑回归？深入研究结果

• print（X_test）
• print（y_pred）

``````import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
}

X = df[['gmat', 'gpa','work_experience']]

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)  #train is based on 75% of the dataset, test is based on 25% of dataset

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)
y_pred=logistic_regression.predict(X_test)

print (X_test) #test dataset
print (y_pred) #predicted values``````

## 检查新数据集的预测

Python逻辑回归示例：假设你有一组新数据，其中有 5 个新候选项：

Python如何实现逻辑回归？你的目标是使用现有的逻辑回归模型来预测新候选人是否会被录取。

``````new_candidates = {'gmat': [590,740,680,610,710],
'gpa': [2,3.7,3.3,2.3,3],
'work_experience': [3,4,6,1,5]
}

df2 = pd.DataFrame(new_candidates,columns= ['gmat', 'gpa','work_experience'])``````

``````import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

candidates = {'gmat': [780,750,690,710,680,730,690,720,740,690,610,690,710,680,770,610,580,650,540,590,620,600,550,550,570,670,660,580,650,660,640,620,660,660,680,650,670,580,590,690],
'gpa': [4,3.9,3.3,3.7,3.9,3.7,2.3,3.3,3.3,1.7,2.7,3.7,3.7,3.3,3.3,3,2.7,3.7,2.7,2.3,3.3,2,2.3,2.7,3,3.3,3.7,2.3,3.7,3.3,3,2.7,4,3.3,3.3,2.3,2.7,3.3,1.7,3.7],
'work_experience': [3,4,3,5,4,6,1,4,5,1,3,5,6,4,3,1,4,6,2,3,2,1,4,1,2,6,4,2,6,5,1,2,4,6,5,1,2,1,4,5],
}

X = df[['gmat', 'gpa','work_experience']]

X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.25,random_state=0)  #in this case, you may choose to set the test_size=0. You should get the same prediction here

logistic_regression= LogisticRegression()
logistic_regression.fit(X_train,y_train)

new_candidates = {'gmat': [590,740,680,610,710],
'gpa': [2,3.7,3.3,2.3,3],
'work_experience': [3,4,6,1,5]
}

df2 = pd.DataFrame(new_candidates,columns= ['gmat', 'gpa','work_experience'])
y_pred=logistic_regression.predict(df2)

print (df2)
print (y_pred)``````