[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: opening files with unicode characters in the file name on windows

From: Mathias Dahl
Subject: Re: opening files with unicode characters in the file name on windows
Date: 04 Aug 2004 16:27:05 +0200
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3.50

"Eli Zaretskii" <> writes:

> Your original message said ``file names with Unicode characters''.
> Can you tell what characters are those, and why do you think they
> are encoded in some Unicode-related encoding, like UTF-16?  Can you
> look at the file's name as recorded in the directory with some
> low-level tool that actually shows the byte values that encode the
> file's name?

I have done some investigation and I am pretty sure UTF-16 is the
encoding used. The following VBScript program (sorry for pasting
non-emacs related stuff here) loops through all files in a folder and
if the file names contain character values > 255 displays a list with
unicode code point values:

' -- TestUnicoceFileNames.vbs ---

Option Explicit

' --------- Main program starts

Dim sFileName
Dim oFSO
Dim oFile

Set oFSO = CreateObject("Scripting.FileSystemObject")

For Each oFile In oFSO.GetFolder("c:\document\my docs").Files

Set oFSO = Nothing

' --------- Main program ends

Private Sub checkUnicodeFileName(fileName)

  Dim i
  Dim c
  Dim n

  For i = 1 to Len(fileName)

    c = Mid(fileName, i, 1)
    n = AscW(c)

    If n > 255 Then
      MsgBox "File name contains unicode characters: " & _
             Chr(10) & Chr(10) & _
             "File name: " & fileName & _
             Chr(10) & Chr(10) & _
             "Characters and their unicode code points:" & _
             Chr(10) & Chr(10) & _
      Exit Sub
    End If


End Sub

Private Function getStringInfo(s)
  Dim i
  Dim n
  Dim c
  Dim h
  Dim result

  result = "Char" & Chr(9) & "U+NNNN" & Chr(10) & Chr(10)

  For i = 1 to Len(s)
    c = Mid(s, i, 1)
    n = AscW(c)
    h = Hex(n)
    result = result & c & Chr(9) & Right("0000" & h, 4) & Chr(10)

  getStringInfo = result

End Function

' -- TestUnicoceFileNames.vbs end here---

The output looks like this (you do not see the actual characters which
I do if I use a "unicode font" for message boxes):

File name contains unicode characters: 

File name: pravda_правда.txt

Characters and their unicode code points:

Char    U+NNNN

p       0070
r       0072
a       0061
v       0076
d       0064
a       0061
_       005F
п       043F
р       0440
а       0430
в       0432
д       0434
а       0430
.       002E
t       0074
x       0078
t       0074


reply via email to

[Prev in Thread] Current Thread [Next in Thread]