[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary characte

From:	Andrew Janke
Subject:	[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Date:	Sun, 24 Jun 2018 21:57:10 -0400 (EDT)
User-agent:	Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/67.0.3396.87 Safari/537.36

Follow-up Comment #10, bug #53842 (project octave):

> Even if we solved the strlen issue, it doesn't seem to be easy to read
UTF-16 or UTF-32 files with std::fgets (e.g. stops reading at single byte \n
which could be part of a valid 2-byte or 4-byte character). 
> Does anyone have an idea what we could do?

I think you'd need to sniff the BOM at the beginning of the file when first
opening it, and switch to using `std::wifstream` or `std::fgetws` for UTF-16
and UTF-32 encoded files, converting the strings to UTF-8 as they are pulled
from the input.

I think that implies replacing the whole `octave_gets` use with a file input
object that abstracts away the file encoding and internally switches between
`std::fgets` and `std::fgetws`, since `std::FILE *` does not carry character
encoding info (I think).

    _______________________________________________________

Reply to this item at:

  <http://savannah.gnu.org/bugs/?53842>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Andrew Janke <=
- [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Andrew Janke, 2018/06/24
  - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/06/25
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Andrew Janke, 2018/06/25
    - [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding, Markus Mützel, 2018/06/27

Prev by Date: [Octave-bug-tracker] [bug #54180] libinterp/corefcn/file-io.cc-tst failure and weird output when test suite is run twice
Next by Date: [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Previous by thread: [Octave-bug-tracker] [bug #54180] libinterp/corefcn/file-io.cc-tst failure and weird output when test suite is run twice
Next by thread: [Octave-bug-tracker] [bug #53842] Handle m-files with arbitrary character encoding
Index(es):
- Date
- Thread