[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" a

From: Nicholas Jankowski
Subject: [Octave-bug-tracker] [bug #57596] Should the "len" argument of "fgetl" and "fgets" mean bytes or characters?
Date: Tue, 18 Feb 2020 17:18:43 -0500 (EST)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.130 Safari/537.36

Follow-up Comment #3, bug #57596 (project octave):

TL;DR - the LEN argument in matlab specifies characters, even for multibyte
characters. octave should probably try to emulate that for compatibility

from a compatibility standpoint - Matlab file says fgets(FID, NCHAR).  it does
specifically use the word character to describe behavior of that input
parameter.  The help says it will read characters using the encoding scheme
associated with the file as per fopen. 

Using a UTF-8 test file [1], the first multibyte line is:

You should see the Greek word 'kosme':       "κόσμε"   

checking in Matlab 2019a:

>> abc=fopen("UTF-8 test file.html",'r','n',"UTF-8");
>> for idx=1:45,disp(fgets(abc)),end

<trimming output to reach multibyte test chars>

>> disp(fgets(abc,47));
You should see the Greek word 'kosme':       "κ
>> disp(fgets(abc,3));

without reading file in as UTF-8, reading in that whole line looks like:

You should see the Greek word 'kosme':       "κόσμε"

[1] https://www.w3.org/2001/06/utf-8-wrong/UTF-8-test.html


Reply to this item at:


  Message sent via Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]