pandas相关系数和协方差Titanic

Published on Aug. 22, 2023, 12:10 p.m.

pandas相关系数和协方差.

import pandas as pd
import numpy as np
df=pd.read_csv("/content/train.csv")
df

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

PassengerId Survived Pclass Name Sex Age SibSp Parch Ticket Fare Cabin Embarked
0 1 0 3 Braund, Mr. Owen Harris male 22.0 1 0 A/5 21171 7.2500 NaN S
1 2 1 1 Cumings, Mrs. John Bradley (Florence Briggs Th… female 38.0 1 0 PC 17599 71.2833 C85 C
2 3 1 3 Heikkinen, Miss. Laina female 26.0 0 0 STON/O2. 3101282 7.9250 NaN S
3 4 1 1 Futrelle, Mrs. Jacques Heath (Lily May Peel) female 35.0 1 0 113803 53.1000 C123 S
4 5 0 3 Allen, Mr. William Henry male 35.0 0 0 373450 8.0500 NaN S
886 887 0 2 Montvila, Rev. Juozas male 27.0 0 0 211536 13.0000 NaN S
887 888 1 1 Graham, Miss. Margaret Edith female 19.0 0 0 112053 30.0000 B42 S
888 889 0 3 Johnston, Miss. Catherine Helen “Carrie” female NaN 1 2 W./C. 6607 23.4500 NaN S
889 890 1 1 Behr, Mr. Karl Howell male 26.0 0 0 111369 30.0000 C148 C
890 891 0 3 Dooley, Mr. Patrick male 32.0 0 0 370376 7.7500 NaN Q

891 rows × 12 columns

.colab-df-container {
display:flex;
flex-wrap:wrap;
gap: 12px;
}

.colab-df-convert {
background-color: #E8F0FE;
border: none;
border-radius: 50%;
cursor: pointer;
display: none;
fill: #1967D2;
height: 32px;
padding: 0 0 0 0;
width: 32px;
}

.colab-df-convert:hover {
background-color: #E2EBFA;
box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
fill: #174EA6;
}

[theme=dark] .colab-df-convert {
background-color: #3B4455;
fill: #D2E3FC;
}

[theme=dark] .colab-df-convert:hover {
background-color: #434B5C;
box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
fill: #FFFFFF;
}

const buttonEl =
document.querySelector(‘#df-aa2392b6-ff11-4264-84ef-734f69abd0b0 button.colab-df-convert’);
buttonEl.style.display =
google.colab.kernel.accessAllowed ? ‘block’ : ‘none’;

async function convertToInteractive(key) {
const element = document.querySelector(‘#df-aa2392b6-ff11-4264-84ef-734f69abd0b0’);
const dataTable =
await google.colab.kernel.invokeFunction(‘convertToInteractive’,
[key], {});
if (!dataTable) return;

const docLinkHtml = ‘Like what you see? Visit the ‘ +
‘<a target=”_blank” href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook’
+ ‘ to learn more about interactive tables.’;
element.innerHTML = ‘’;
dataTable[‘output_type’] = ‘display_data’;
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement(‘div’);
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}

一.协方差
只表示线性相关的方向,取值正无穷到负无穷。
也就是说,协方差为正值,说明一个变量变大另一个变量也随之变大(正相关);取负值说明一个变量变大另一个变量变小(负相关),取0说明两个变量没有相关关系。

注意:协方差的绝对值不反映线性相关的程度(其绝对值与变量的取值范围有关系)。

二.相关系数
不仅表示线性相关的方向,还表示线性相关的程度,取值[-1,1]。

也就是说,相关系数为正值,说明一个变量变大另一个变量也变大;取负值说明一个变量变大另一个变量变小,取0说明两个变量没有相关关系。
同时,相关系数的绝对值越接近1,线性关系越显著。
通常情况下,当相关系数的绝对值大于2/sqrt(N),N为样本点的数量时,我们认为线性关系是存在的。


协方差确定两个变量的关系,即正相关,负相关/无关
相关系数确定两个变量的关系&相关程度

协方差计算

dataframe.cov(): 计算所有变量之间的协方差

df.cov()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

PassengerId Survived Pclass Age SibSp Parch Fare
PassengerId 66231.000000 -0.626966 -7.561798 138.696504 -16.325843 -0.342697 161.883369
Survived -0.626966 0.236772 -0.137703 -0.551296 -0.018954 0.032017 6.221787
Pclass -7.561798 -0.137703 0.699015 -4.496004 0.076599 0.012429 -22.830196
Age 138.696504 -0.551296 -4.496004 211.019125 -4.163334 -2.344191 73.849030
SibSp -16.325843 -0.018954 0.076599 -4.163334 1.216043 0.368739 8.748734
Parch -0.342697 0.032017 0.012429 -2.344191 0.368739 0.649728 8.661052
Fare 161.883369 6.221787 -22.830196 73.849030 8.748734 8.661052 2469.436846

.colab-df-container {
display:flex;
flex-wrap:wrap;
gap: 12px;
}

.colab-df-convert {
background-color: #E8F0FE;
border: none;
border-radius: 50%;
cursor: pointer;
display: none;
fill: #1967D2;
height: 32px;
padding: 0 0 0 0;
width: 32px;
}

.colab-df-convert:hover {
background-color: #E2EBFA;
box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
fill: #174EA6;
}

[theme=dark] .colab-df-convert {
background-color: #3B4455;
fill: #D2E3FC;
}

[theme=dark] .colab-df-convert:hover {
background-color: #434B5C;
box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
fill: #FFFFFF;
}

const buttonEl =
document.querySelector(‘#df-a2f5a22f-3aaf-4025-b37d-2598d4b131f6 button.colab-df-convert’);
buttonEl.style.display =
google.colab.kernel.accessAllowed ? ‘block’ : ‘none’;

async function convertToInteractive(key) {
const element = document.querySelector(‘#df-a2f5a22f-3aaf-4025-b37d-2598d4b131f6’);
const dataTable =
await google.colab.kernel.invokeFunction(‘convertToInteractive’,
[key], {});
if (!dataTable) return;

const docLinkHtml = ‘Like what you see? Visit the ‘ +
‘<a target=”_blank” href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook’
+ ‘ to learn more about interactive tables.’;
element.innerHTML = ‘’;
dataTable[‘output_type’] = ‘display_data’;
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement(‘div’);
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}

相关系数计算

dataframe.corr():计算所有变量之间的相关系数

df.corr()

.dataframe tbody tr th:only-of-type {
vertical-align: middle;
}

.dataframe tbody tr th {
vertical-align: top;
}

.dataframe thead th {
text-align: right;
}

PassengerId Survived Pclass Age SibSp Parch Fare
PassengerId 1.000000 -0.005007 -0.035144 0.036847 -0.057527 -0.001652 0.012658
Survived -0.005007 1.000000 -0.338481 -0.077221 -0.035322 0.081629 0.257307
Pclass -0.035144 -0.338481 1.000000 -0.369226 0.083081 0.018443 -0.549500
Age 0.036847 -0.077221 -0.369226 1.000000 -0.308247 -0.189119 0.096067
SibSp -0.057527 -0.035322 0.083081 -0.308247 1.000000 0.414838 0.159651
Parch -0.001652 0.081629 0.018443 -0.189119 0.414838 1.000000 0.216225
Fare 0.012658 0.257307 -0.549500 0.096067 0.159651 0.216225 1.000000

.colab-df-container {
display:flex;
flex-wrap:wrap;
gap: 12px;
}

.colab-df-convert {
background-color: #E8F0FE;
border: none;
border-radius: 50%;
cursor: pointer;
display: none;
fill: #1967D2;
height: 32px;
padding: 0 0 0 0;
width: 32px;
}

.colab-df-convert:hover {
background-color: #E2EBFA;
box-shadow: 0px 1px 2px rgba(60, 64, 67, 0.3), 0px 1px 3px 1px rgba(60, 64, 67, 0.15);
fill: #174EA6;
}

[theme=dark] .colab-df-convert {
background-color: #3B4455;
fill: #D2E3FC;
}

[theme=dark] .colab-df-convert:hover {
background-color: #434B5C;
box-shadow: 0px 1px 3px 1px rgba(0, 0, 0, 0.15);
filter: drop-shadow(0px 1px 2px rgba(0, 0, 0, 0.3));
fill: #FFFFFF;
}

const buttonEl =
document.querySelector(‘#df-637b0494-27fe-47a3-ad8d-06e1656c073d button.colab-df-convert’);
buttonEl.style.display =
google.colab.kernel.accessAllowed ? ‘block’ : ‘none’;

async function convertToInteractive(key) {
const element = document.querySelector(‘#df-637b0494-27fe-47a3-ad8d-06e1656c073d’);
const dataTable =
await google.colab.kernel.invokeFunction(‘convertToInteractive’,
[key], {});
if (!dataTable) return;

const docLinkHtml = ‘Like what you see? Visit the ‘ +
‘<a target=”_blank” href=https://colab.research.google.com/notebooks/data_table.ipynb>data table notebook’
+ ‘ to learn more about interactive tables.’;
element.innerHTML = ‘’;
dataTable[‘output_type’] = ‘display_data’;
await google.colab.output.renderOutput(dataTable, element);
const docLink = document.createElement(‘div’);
docLink.innerHTML = docLinkHtml;
element.appendChild(docLink);
}

series.corr(series): 计算指定变量之间的协方差

df['Age'].corr(df['Pclass'])
-0.36922601531551724

https://colab.research.google.com/drive/1fEha3cjo3noLYCnLO8-RCIEGMjW6nxnq#scrollTo=6O1CulKF16lL