[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" a
From: |
Nicholas Jankowski |
Subject: |
[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters? |
Date: |
Tue, 18 Feb 2020 17:18:43 -0500 (EST) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36 |
Follow-up Comment #3, bug #57596 (project octave):
TL;DR - the LEN argument in matlab specifies characters, even for multibyte
characters. octave should probably try to emulate that for compatibility
reasons.
from a compatibility standpoint - Matlab file says fgets(FID, NCHAR). it does
specifically use the word character to describe behavior of that input
parameter. The help says it will read characters using the encoding scheme
associated with the file as per fopen.
Using a UTF-8 test file [1], the first multibyte line is:
You should see the Greek word 'kosme': "κόσμε"
checking in Matlab 2019a:
>> abc=fopen("UTF-8 test file.html",'r','n',"UTF-8");
>> for idx=1:45,disp(fgets(abc)),end
<trimming output to reach multibyte test chars>
>> disp(fgets(abc,47));
You should see the Greek word 'kosme': "κ
>> disp(fgets(abc,3));
όσμ
without reading file in as UTF-8, reading in that whole line looks like:
You should see the Greek word 'kosme': "κόσμε"
[1] https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?57596>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?,
Nicholas Jankowski <=