How to remove duplicate lines from a text file using Linux command line?
Published on Aug. 22, 2023, 12:17 p.m.
To remove duplicate lines from a text file using Linux command line, you can use the sort
and uniq
commands together. Here’s the command:
sort file.txt | uniq > output_file.txt
This command sorts the lines in file.txt and then passes them to the uniq
command, which removes any duplicate lines. The result is then written to output_file.txt.
Alternatively, you can use the awk
command to achieve the same result:
awk '!seen[$0]++' file.txt > output_file.txt
This command uses an Awk script which builds an associative array called “seen” to keep track of the lines that have already been seen. If the line has not been seen before, it is printed to the output file.
Note that the sort
command is not necessary if the file is already sorted or you don’t otherwise care about the order of the lines.
Also, be careful when using these commands on large files, as they may consume a large amount of memory or take a long time to run. In such cases, you may need to use more specialized tools or scripts to remove duplicates efficiently.