Thursday, May 22, 2008

Delete Windows/DOS carriage return characters from text files

Different operating system may use different characters to indicate the line break. Unix/Linux uses a single Line Feed (LF) character as line break. Windows/DOS uses 2 characters: Carriage Return/Line Feed (CR/LF). MacOS uses CR.

Nowadays, it is a reality that we operate on multiple platforms. If you transfer a text file created on a Windows machine to a Linux machine, the file will contain those extra Carriage Return characters. Some Linux programs run just fine with those characters in their input, but some are less forgiving.

Below are various ways to remove the Carriage Control characters from each line of a text file:

  • dos2unix
    $ dos2unix input.txt 
    dos2unix: converting file input.txt to UNIX format ...


    dos2unix will convert and overwrite the input file by removing the CR characters.

    Be warned that dos2unix is not by default pre-installed in all Linux distributions. If you have a RedHat-based distribution (e.g., Centos), you are safe.

    On my Debian Etch system, you need to install a package named fromdos, and even then, dos2unix is just a soft link to another program, fromdos. See next command.

  • fromdos
    fromdos and the corresponding todos reside in a package named tofrodos.

    To install,
    $ apt-get install tofrodos  


    To run fromdos,
     $ fromdos input.txt 


    Note that fromdos will overwrite the input.txt file.

  • tr

    $ tr -d '\r' < input.txt > output.txt
    $ cp output.txt input.txt


    \r is the carriage control character.

    tr -d removes the specified character (\r in this case) from the standard input.

    tr deals with the standard input and standard output only. So, tr cannot write directly to the original input file (input.txt): an intermediate file (output.txt) is needed.

  • sed
    $ sed -i.bak -e 's/\r//g' input.txt 


    The advantage of sed over tr is that you can do in-line substitution. No need to create an intermediate file. This is done by the -i option.

    If you want to make a backup of the original input.txt, you can specify a different file suffix like this:
    $ sed -i.bak -e 's/\r//g' input.txt 


    -i.bak will make a backup file by appending the suffix .bak to your original file name, resulting in something like input.txt.bak

  • perl
    $ perl -i.bak -pe 's/\r//g' input.txt



If your system has dos2unix or fromdos installed, then using either one is probably the simplest. Otherwise, tr seems like a safe bet, and it is available on all Linux systems, if you don't mind the extra step of copying the intermediate file. If you absolutely want a one-liner to do the job, then either sed or perl with their in-line modification will satisfy you.


StumbleUpon Toolbar

4 comments:

Loïc said...

just type Ctrl+V then Ctrl+M and it will create a ^M

so in vim or whatever

%s/Ctrl + v Ctrl+m//g => %s/^M//g

delta said...

In sed also can be used the -i switch
$ sed -e 's/\r//g' input.txt -i.bak

Peter Leung said...

Thanks, Delta. You're right. I have modified the article to reflect the truth you pointed out.

cetinz said...

thanks,

fromdos works.

But, is there any way to solve in windows