svn – Valid UTF-8 data followed by invalid UTF-8 sequence
This happened to me some time ago while working with a subversion repository in which there were some files created in Windows with a strange encoding. Issuing a simple svn status gave me this error message:
svn status svn: Valid UTF-8 data (hex: 4b) followed by invalid UTF-8 sequence (hex: fc 63 68 65)
I searched for a solution but the best tip that I could find was to remove the files causing the error (see
this for example), but that wasn’t my case. I needed those files.
Discover the encoding of the filenames
So if you want to save your files and use them in subversion, you first need to discover the directory in which the files are stored. I just
cd inside all the dirs checking if the error message was still relevant (may not be stylish but for me did worked). If you have a longer valid UTF-8 sequence than
4b you can try with:
echo "\x4b\x6f\x72\x5f" | xargs -0 printf
and it will give you the starting part of the corrupted filename (
Kor_ in this example).
Once you have spotted the directory containing the files causing the error you need to discover the encoding of the filenames. Using the simple
ls you should be able to spot strange files, for discovering the encoding of those files you can use the
file command like this (if you have only one file you can try to substitute the * with the filename, but I don’t know if it works since I had all the files in that dir with the wrong encoding):
ls * | file - /dev/stdin: ISO-8859 text
in normal circumstances (debian/ubuntu) the filename should be UTF-8 like this simple test shows
$ echo "test" > /tmp/cioé.txt $ ls /tmp/cioé.txt | file - /dev/stdin: UTF-8 Unicode text
In my situation it happened that the filenames were latin1 but the content of the file was UTF-8, to be sure of this I used the
isutf8 command (found in the
that gives no output if everything is fine.
Convert the filenames to utf-8 encoding
At this point you need to convert the filenames to utf-8 encoding, this is easily achieved with
convmv (on the man page you can read “converts filenames from one encoding to another” that is exactly what we need). The usage is simple:
convmv -f latin1 -t utf-8 *
and this is an example output:
Your Perl version has fleas #37757 #49830 Starting a dry run without changes... mv "./Cefal�txt" "./Cefalù.txt" mv "./Badem�el-Badschr�ke.txt" "./Bademöbel-Badschränke.txt" mv "./Blutzuckermessger�e.txt" "./Blutzuckermessgeräte.txt"
beware that by default this is a dry run, you have to add the
--notest flag to actually change the filenames.
Hope this helps.