Pandas DataFrames: Reversing 'one-hot' encoding
Published on Aug. 22, 2023, 12:12 p.m.
I want to go from this data frame which is essentially one hot encoded .
In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})
Out[2]:
fox monkey rabbit
0 0 0 1
1 0 1 0
2 1 0 0
3 0 0 0
4 0 0 0
In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})
Out[3]:
animal
0 monkey
1 rabbit
2 fox
To this one it is encoded ‘ reverse’ one-hot.
df.idxmax(axis=1)
This chooses a column label for each row. It chooses the label with the maximum value.Since the data are 1s and 0s, it will pick .
In [40]: s = pd.Series(['dog', 'cat', 'dog', 'bird', 'fox', 'dog'])
In [41]: s
Out[41]:
0 dog
1 cat
2 dog
3 bird
4 fox
5 dog
dtype: object
In [42]: pd.get_dummies(s)
Out[42]:
bird cat dog fox
0 0.0 0.0 1.0 0.0
1 0.0 1.0 0.0 0.0
2 0.0 0.0 1.0 0.0
3 1.0 0.0 0.0 0.0
4 0.0 0.0 0.0 1.0
5 0.0 0.0 1.0 0.0
In [43]: pd.get_dummies(s).idxmax(1)
Out[43]:
0 dog
1 cat
2 dog
3 bird
4 fox
5 dog
dtype: object