[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte
From: |
Andrew Janke |
Subject: |
[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding |
Date: |
Sun, 24 Jun 2018 21:57:10 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36 |
Follow-up Comment #10, bug #53842 (project octave):
> Even if we solved the strlen issue, it doesn't seem to be easy to read
UTF-16 or UTF-32 files with std::fgets (e.g. stops reading at single byte \n
which could be part of a valid 2-byte or 4-byte character).
> Does anyone have an idea what we could do?
I think you'd need to sniff the BOM at the beginning of the file when first
opening it, and switch to using `std::wifstream` or `std::fgetws` for UTF-16
and UTF-32 encoded files, converting the strings to UTF-8 as they are pulled
from the input.
I think that implies replacing the whole `octave_gets` use with a file input
object that abstracts away the file encoding and internally switches between
`std::fgets` and `std::fgetws`, since `std::FILE *` does not carry character
encoding info (I think).
_______________________________________________________
Reply to this item at:
<http://savannah.gnu.org/bugs/?53842>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding,
Andrew Janke <=