coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Is there a way to print unicode characters and the actual code?


From: Peng Yu
Subject: Re: Is there a way to print unicode characters and the actual code?
Date: Sat, 24 Feb 2018 20:12:01 -0600

> $ od -An -tx1 -ta -tc <<< 'exámple'
>   65  78  c3  a1  6d  70  6c  65  0a
>    e   x   C   !   m   p   l   e  nl
>    e   x 303 241   m   p   l   e  \n

At this moment, I wrote some python code to do this, which prints both
the decoded code as well as the encoded code in both hex and binary
numbers in TSV format.

`c if ord(c)>31 else repr(str(c)).strip("'")` is hacky. I am not sure
if there is a good way get things like \f \b as `od` would.

$ cat dumpunicode0.py
#!/usr/bin/env python
# vim: set noexpandtab tabstop=2 shiftwidth=2 softtabstop=-1 fileencoding=utf-8:

import sys

for line in sys.stdin:
    for c in line.decode('utf-8'):
        utf8_encode = '0x' + ''.join(
                ['%x' % ord(x) for x in reversed(c.encode('utf-8'))]
                )
        print '\t'.join(
                (
                    c if ord(c)>31 else repr(str(c)).strip("'")
                    , '0x%x' % ord(c)
                    , bin(ord(c)).strip("'")
                    , utf8_encode
                    , bin(int(utf8_encode, base=16)).strip("'")
                    )
                )
$ ./dumpunicode0.py <<< á
á    0xe1    0b11100001    0xa1c3    0b1010000111000011
\n    0xa    0b1010    0xa    0b1010
$ printf '\f'| od -xc
0000000    000c
         \f
0000001
$ printf '\f'| ./dumpunicode0.py
\x0c    0xc    0b1100    0xc    0b1100

-- 
Regards,
Peng



reply via email to

[Prev in Thread] Current Thread [Next in Thread]