Pandas DataFrames: Reversing 'one-hot' encoding

Published on Aug. 22, 2023, 12:12 p.m.

I want to go from this data frame which is essentially one hot encoded .

 In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})

       fox  monkey  rabbit
    0    0       0       1
    1    0       1       0
    2    1       0       0
    3    0       0       0
    4    0       0       0

    In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})
    0  monkey
    1  rabbit
    2     fox

To this one it is encoded ‘ reverse’ one-hot.


This chooses a column label for each row. It chooses the label with the maximum value.Since the data are 1s and 0s, it will pick .

In [40]: s = pd.Series(['dog', 'cat', 'dog', 'bird', 'fox', 'dog'])

In [41]: s
0     dog
1     cat
2     dog
3    bird
4     fox
5     dog
dtype: object

In [42]: pd.get_dummies(s)
   bird  cat  dog  fox
0   0.0  0.0  1.0  0.0
1   0.0  1.0  0.0  0.0
2   0.0  0.0  1.0  0.0
3   1.0  0.0  0.0  0.0
4   0.0  0.0  0.0  1.0
5   0.0  0.0  1.0  0.0

In [43]: pd.get_dummies(s).idxmax(1)
0     dog
1     cat
2     dog
3    bird
4     fox
5     dog
dtype: object