How to remove duplicate lines from a text file using Linux command line?

Published on Aug. 22, 2023, 12:17 p.m.

To remove duplicate lines from a text file using Linux command line, you can use the sort and uniq commands together. Here’s the command:

sort file.txt | uniq > output_file.txt

This command sorts the lines in file.txt and then passes them to the uniq command, which removes any duplicate lines. The result is then written to output_file.txt.

Alternatively, you can use the awk command to achieve the same result:

awk '!seen[$0]++' file.txt > output_file.txt

This command uses an Awk script which builds an associative array called “seen” to keep track of the lines that have already been seen. If the line has not been seen before, it is printed to the output file.

Note that the sort command is not necessary if the file is already sorted or you don’t otherwise care about the order of the lines.

Also, be careful when using these commands on large files, as they may consume a large amount of memory or take a long time to run. In such cases, you may need to use more specialized tools or scripts to remove duplicates efficiently.