How do I remove duplicate lines from a file?

First, sort the file using the sort command, then pipe the output to uniq. For example: sort filename.txt | uniq.

Can uniq count duplicate lines?

Yes, use 'uniq -c filename' to prefix lines by the number of occurrences. For example: uniq -c words.txt will show each unique word and its count.

uniq Command

Q: What does the uniq command do in Linux?

The uniq command filters out or reports repeated lines in a file. It is most effective when used on a sorted file, as it only detects adjacent duplicate lines.

The uniq command in Linux is used to filter out or report repeated lines in a file. It is most effective when used on a sorted file, as it only detects adjacent duplicate lines. Therefore, it is often used in conjunction with the sort command.

Syntax

uniq [OPTION]... [INPUT [OUTPUT]]

Description

uniq reads from the specified INPUT (or standard input if none is given) and writes to the specified OUTPUT (or standard output if none is given). It compares adjacent lines and, based on the options, either removes duplicates, reports only duplicates, or reports only unique lines.

Common uses include:

Removing duplicate lines from a sorted file
Counting the occurrences of each unique line
Displaying only unique lines
Displaying only duplicate lines

Common Options

Option	Description
`-c`, `--count`	Prefix lines by the number of occurrences
`-d`, `--repeated`	Only print duplicate lines, one for each group of identical lines
`-u`, `--unique`	Only print unique lines
`-i`, `--ignore-case`	Ignore differences in case when comparing
`-s N`, `--skip-chars=N`	Skip N characters before comparison
`-w N`, `--check-chars=N`	Compare no more than N characters in lines

Examples

Remove duplicate lines (requires sorted input)

sort words.txt | uniq

Sorts 'words.txt' and then removes adjacent duplicate lines, printing the unique lines.

Count occurrences of each unique line

sort words.txt | uniq -c

Sorts 'words.txt' and then counts the occurrences of each unique line, displaying the count before each line.

Show only unique lines (non-repeated)

sort words.txt | uniq -u

Sorts 'words.txt' and then displays only the lines that are not repeated (appear only once).

Show only duplicate lines

sort words.txt | uniq -d

Sorts 'words.txt' and then displays only the lines that have duplicates (appears more than once).

Ignore case when checking for duplicates

sort -f words.txt | uniq -i

Sorts 'words.txt' ignoring case, then filters unique lines ignoring case.