[Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit

gnash-commit

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text...

From:	Benjamin Wolsey
Subject:	[Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text...
Date:	Thu, 07 Feb 2008 16:15:34 +0000

CVSROOT:        /sources/gnash
Module name:    gnash
Changes by:     Benjamin Wolsey <bwy>   08/02/07 16:15:34

Modified files:
        .              : ChangeLog 
        libbase        : utf8.h 
        server         : edit_text_character.cpp 

Log message:
        Comments and minor cleanup.

CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/gnash/ChangeLog?cvsroot=gnash&r1=1.5586&r2=1.5587
http://cvs.savannah.gnu.org/viewcvs/gnash/libbase/utf8.h?cvsroot=gnash&r1=1.7&r2=1.8
http://cvs.savannah.gnu.org/viewcvs/gnash/server/edit_text_character.cpp?cvsroot=gnash&r1=1.148&r2=1.149

Patches:
Index: ChangeLog
===================================================================
RCS file: /sources/gnash/gnash/ChangeLog,v
retrieving revision 1.5586
retrieving revision 1.5587
diff -u -b -r1.5586 -r1.5587
--- ChangeLog   7 Feb 2008 12:41:39 -0000       1.5586
+++ ChangeLog   7 Feb 2008 16:15:32 -0000       1.5587
@@ -1,3 +1,9 @@
+2008-02-07 Benjamin Wolsey <address@hidden>
+
+       * libbase/utf8.h: document utf8 code.
+       * server/edit_text_character.cpp: add comments, drop pointless
+         cast.  
+
 2008-02-07 Sandro Santilli <address@hidden>
 
        * server/asobj/LoadVars.cpp: fix confusing message (loading XML..)

Index: libbase/utf8.h
===================================================================
RCS file: /sources/gnash/gnash/libbase/utf8.h,v
retrieving revision 1.7
retrieving revision 1.8
diff -u -b -r1.7 -r1.8
--- libbase/utf8.h      6 Feb 2008 15:21:34 -0000       1.7
+++ libbase/utf8.h      7 Feb 2008 16:15:34 -0000       1.8
@@ -25,31 +25,69 @@
 #include <string>
 #include <boost/cstdint.hpp> // for boost::?int??_t
 
+/// Utilities to convert between std::string and std::wstring.
+/// Strings in Gnash are generally stored as std::strings.
+/// We have to deal, however, with characters larger than standard
+/// ASCII (128), which can be encoded in two different ways.
+///
+/// SWF 6 and later use UTF-8, encoded as multibyte characters and
+/// allowing many thousands of unique codes. Multibyte characters are 
+/// difficult to handle, as their length - used for many string
+/// operations - is not certain without parsing the string.
+/// Converting the string to a wstring (generally a uint32_t - how
+/// many codes the reference player can deal with is unknown)
+/// facilitates string operations, as the length of the string
+/// is equal to the number of valid characters. 
+/// 
+/// SWF5 and earlier, however, used the ISO-8859 specification,
+/// allowing the standard 128 ASCII characters plus 128 extra
+/// characters that depend on the particular subset of ISO-8859.
+/// Characters are 8 bits, not the ASCII standard 7. SWF5 cannot
+/// handle multi-byte characters without special functions.
+///
+/// It is important that SWF5 can distinguish between the two encodings,
+/// so we cannot convert all strings to UTF-8.
+///
+/// Presently, this code is used for the AS String object,
+/// edit_text_character, ord() and chr().
 
 namespace utf8
 {
-       // Converts a UTF-8 encoded std::string with multibyte characters into
-       // a std::wstring.
+       /// Converts a canonical std::string with multibyte characters into
+       /// a std::wstring.
+       /// @ param str the canonical string to convert
+       /// @ param version the SWF version, used to decide how to decode the 
string.
+       //
+       /// For SWF5, UTF-8 (or any other) multibyte encoded characters are
+       /// converted char by char, mangling the string. 
        DSOEXPORT std::wstring decodeCanonicalString(const std::string& str, 
int version);
 
-       // Converts a std::wstring into a UTF-8 encoded std::string.
+       /// Converts a std::wstring into canonical std::string, depending on
+       /// version.
+       /// @ param wstr the wide string to convert
+       /// @ param version the SWF version, used to decide how to encode the 
string.
+       ///
+       /// For SWF 5, each character is stored as an 8-bit (at least) char, 
rather
+       /// than converting it to a canonical UTF-8 byte sequence. Gnash can 
then
+       /// distinguish between 8-bit characters, which it handles correctly, 
and 
+       /// multi-byte characters, which are regarded as multiple characters for
+       /// string methods. 
        DSOEXPORT std::string encodeCanonicalString(const std::wstring& wstr, 
int version);
 
-       // Return the next Unicode character in the UTF-8 encoded
-       // string.  Invalid UTF-8 sequences produce a U+FFFD character
-       // as output.  Advances string iterator past the character
-       // returned, unless the returned character is '\0', in which
-       // case the iterator does not advance.
+       /// Return the next Unicode character in the UTF-8 encoded
+       /// string.  Invalid UTF-8 sequences produce a U+FFFD character
+       /// as output.  Advances string iterator past the character
+       /// returned, unless the returned character is '\0', in which
+       /// case the iterator does not advance.
        boost::uint32_t decodeNextUnicodeCharacter(std::string::const_iterator& 
it);
 
-       // Encodes the given UCS character into the given UTF-8
-       // buffer.  Writes the data starting at buffer[offset], and
-       // increments offset by the number of bytes written.
-       //
-       // May write up to 6 bytes, so make sure there's room in the
-       // buffer!
+       /// Encodes the given wide character into a canonical
+       /// string, theoretically up to 6 chars in length.
        std::string encodeUnicodeCharacter(boost::uint32_t ucs_character);
        
+       /// Encodes the given wide character into an at least 8-bit character,
+       /// allowing storage of Latin1 (ISO-8859-1) characters. This
+       /// is the format of SWF5 and below.
        std::string encodeLatin1Character(boost::uint32_t ucsCharacter);
 }
 

Index: server/edit_text_character.cpp
===================================================================
RCS file: /sources/gnash/gnash/server/edit_text_character.cpp,v
retrieving revision 1.148
retrieving revision 1.149
diff -u -b -r1.148 -r1.149
--- server/edit_text_character.cpp      6 Feb 2008 15:20:57 -0000       1.148
+++ server/edit_text_character.cpp      7 Feb 2008 16:15:34 -0000       1.149
@@ -594,8 +594,15 @@
                {
                        std::wstring s = _text;
 
-                       // id.keyCode is the unique gnash::key::code for a 
character
-                       uint32_t c = (uint32_t) id.keyCode;
+                       // id.keyCode is the unique gnash::key::code for a 
character/key.
+                       // The maximum value is about 265, including function 
keys.
+                       // It seems that typing in characters outside the 
Latin-1 set
+                       // (256 character codes, identical to the first 256 of 
UTF-8)
+                       // is not supported, though a much greater number UTF-8 
codes can be
+                       // stored and displayed. See utf.h for more information.
+                       // This is a limit on the number of key codes, not on 
the
+                       // capacity of strings.
+                       gnash::key::code c = id.keyCode;
 
                        // maybe _text is changed in ActionScript
                        m_cursor = imin(m_cursor, _text.size());
@@ -647,9 +654,11 @@
                                        break;
 
                                default:
-                                       wchar_t t = (wchar_t) 
gnash::key::codeMap[c][key::ASCII];
+                                       wchar_t t = 
static_cast<wchar_t>(gnash::key::codeMap[c][key::ASCII]);
                                        if (t != 0)
                                        {
+                                               // Insert one copy of the 
character
+                                               // at the cursor position.
                                                s.insert(m_cursor, 1, t);
                                                m_cursor++;
                                        }
@@ -1318,9 +1327,17 @@
                        last_space_glyph = rec.m_glyphs.size();
                }
 
-               { // need a sub-scope to avoid the 'goto' in TAB handling to 
cross
+               {
+               // need a sub-scope to avoid the 'goto' in TAB handling to cross
                  // initialization of the 'index' variable
-               int index = _font->get_glyph_index((boost::uint16_t) code, 
_embedFonts);
+
+               // The font table holds up to 65535 glyphs. Casting from 
uint32_t
+               // would, in the event that the code is higher than 65535, 
result
+               // in the wrong character being chosen. It isn't clear whether 
this
+               // would ever happen, but UTF-8 conversion code can deal with 
codes
+               // up to 2^32; if they are valid, the code table will have to be
+               // enlarged.
+               int index = 
_font->get_glyph_index(static_cast<boost::uint16_t>(code), _embedFonts);
 
                IF_VERBOSE_MALFORMED_SWF (
                    if (index == -1)

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text..., Benjamin Wolsey <=
- Re: [Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text..., strk, 2008/02/07
- Re: [Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text..., strk, 2008/02/07

Prev by Date: [Gnash-commit] gnash ChangeLog server/asobj/LoadVars.cpp
Next by Date: Re: [Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text...
Previous by thread: [Gnash-commit] gnash ChangeLog server/asobj/LoadVars.cpp
Next by thread: Re: [Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text...
Index(es):
- Date
- Thread