Pandas DataFrames: Reversing 'one-hot' encoding

Published on Aug. 22, 2023, 12:12 p.m.

I want to go from this data frame which is essentially one hot encoded .

 In [2]: pd.DataFrame({"monkey":[0,1,0],"rabbit":[1,0,0],"fox":[0,0,1]})

    Out[2]:
       fox  monkey  rabbit
    0    0       0       1
    1    0       1       0
    2    1       0       0
    3    0       0       0
    4    0       0       0

    In [3]: pd.DataFrame({"animal":["monkey","rabbit","fox"]})
    Out[3]:
       animal
    0  monkey
    1  rabbit
    2     fox

To this one it is encoded ‘ reverse’ one-hot.


df.idxmax(axis=1)

This chooses a column label for each row. It chooses the label with the maximum value.Since the data are 1s and 0s, it will pick .

In [40]: s = pd.Series(['dog', 'cat', 'dog', 'bird', 'fox', 'dog'])

In [41]: s
Out[41]:
0     dog
1     cat
2     dog
3    bird
4     fox
5     dog
dtype: object

In [42]: pd.get_dummies(s)
Out[42]:
   bird  cat  dog  fox
0   0.0  0.0  1.0  0.0
1   0.0  1.0  0.0  0.0
2   0.0  0.0  1.0  0.0
3   1.0  0.0  0.0  0.0
4   0.0  0.0  0.0  1.0
5   0.0  0.0  1.0  0.0

In [43]: pd.get_dummies(s).idxmax(1)
Out[43]:
0     dog
1     cat
2     dog
3    bird
4     fox
5     dog
dtype: object