octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte


From: Andrew Janke
Subject: [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Date: Sun, 24 Jun 2018 21:57:10 -0400 (EDT)
User-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36

Follow-up Comment #10, bug #53842 (project octave):

> Even if we solved the strlen issue, it doesn't seem to be easy to read
UTF-16 or UTF-32 files with std::fgets (e.g. stops reading at single byte \n
which could be part of a valid 2-byte or 4-byte character). 
> Does anyone have an idea what we could do?

I think you'd need to sniff the BOM at the beginning of the file when first
opening it, and switch to using `std::wifstream` or `std::fgetws` for UTF-16
and UTF-32 encoded files, converting the strings to UTF-8 as they are pulled
from the input.

I think that implies replacing the whole `octave_gets` use with a file input
object that abstracts away the file encoding and internally switches between
`std::fgets` and `std::fgetws`, since `std::FILE *` does not carry character
encoding info (I think).

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?53842>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]