[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text...
From: |
Benjamin Wolsey |
Subject: |
[Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text... |
Date: |
Thu, 07 Feb 2008 16:15:34 +0000 |
CVSROOT: /sources/gnash
Module name: gnash
Changes by: Benjamin Wolsey <bwy> 08/02/07 16:15:34
Modified files:
. : ChangeLog
libbase : utf8.h
server : edit_text_character.cpp
Log message:
Comments and minor cleanup.
CVSWeb URLs:
http://cvs.savannah.gnu.org/viewcvs/gnash/ChangeLog?cvsroot=gnash&r1=1.5586&r2=1.5587
http://cvs.savannah.gnu.org/viewcvs/gnash/libbase/utf8.h?cvsroot=gnash&r1=1.7&r2=1.8
http://cvs.savannah.gnu.org/viewcvs/gnash/server/edit_text_character.cpp?cvsroot=gnash&r1=1.148&r2=1.149
Patches:
Index: ChangeLog
===================================================================
RCS file: /sources/gnash/gnash/ChangeLog,v
retrieving revision 1.5586
retrieving revision 1.5587
diff -u -b -r1.5586 -r1.5587
--- ChangeLog 7 Feb 2008 12:41:39 -0000 1.5586
+++ ChangeLog 7 Feb 2008 16:15:32 -0000 1.5587
@@ -1,3 +1,9 @@
+2008-02-07 Benjamin Wolsey <address@hidden>
+
+ * libbase/utf8.h: document utf8 code.
+ * server/edit_text_character.cpp: add comments, drop pointless
+ cast.
+
2008-02-07 Sandro Santilli <address@hidden>
* server/asobj/LoadVars.cpp: fix confusing message (loading XML..)
Index: libbase/utf8.h
===================================================================
RCS file: /sources/gnash/gnash/libbase/utf8.h,v
retrieving revision 1.7
retrieving revision 1.8
diff -u -b -r1.7 -r1.8
--- libbase/utf8.h 6 Feb 2008 15:21:34 -0000 1.7
+++ libbase/utf8.h 7 Feb 2008 16:15:34 -0000 1.8
@@ -25,31 +25,69 @@
#include <string>
#include <boost/cstdint.hpp> // for boost::?int??_t
+/// Utilities to convert between std::string and std::wstring.
+/// Strings in Gnash are generally stored as std::strings.
+/// We have to deal, however, with characters larger than standard
+/// ASCII (128), which can be encoded in two different ways.
+///
+/// SWF 6 and later use UTF-8, encoded as multibyte characters and
+/// allowing many thousands of unique codes. Multibyte characters are
+/// difficult to handle, as their length - used for many string
+/// operations - is not certain without parsing the string.
+/// Converting the string to a wstring (generally a uint32_t - how
+/// many codes the reference player can deal with is unknown)
+/// facilitates string operations, as the length of the string
+/// is equal to the number of valid characters.
+///
+/// SWF5 and earlier, however, used the ISO-8859 specification,
+/// allowing the standard 128 ASCII characters plus 128 extra
+/// characters that depend on the particular subset of ISO-8859.
+/// Characters are 8 bits, not the ASCII standard 7. SWF5 cannot
+/// handle multi-byte characters without special functions.
+///
+/// It is important that SWF5 can distinguish between the two encodings,
+/// so we cannot convert all strings to UTF-8.
+///
+/// Presently, this code is used for the AS String object,
+/// edit_text_character, ord() and chr().
namespace utf8
{
- // Converts a UTF-8 encoded std::string with multibyte characters into
- // a std::wstring.
+ /// Converts a canonical std::string with multibyte characters into
+ /// a std::wstring.
+ /// @ param str the canonical string to convert
+ /// @ param version the SWF version, used to decide how to decode the
string.
+ //
+ /// For SWF5, UTF-8 (or any other) multibyte encoded characters are
+ /// converted char by char, mangling the string.
DSOEXPORT std::wstring decodeCanonicalString(const std::string& str,
int version);
- // Converts a std::wstring into a UTF-8 encoded std::string.
+ /// Converts a std::wstring into canonical std::string, depending on
+ /// version.
+ /// @ param wstr the wide string to convert
+ /// @ param version the SWF version, used to decide how to encode the
string.
+ ///
+ /// For SWF 5, each character is stored as an 8-bit (at least) char,
rather
+ /// than converting it to a canonical UTF-8 byte sequence. Gnash can
then
+ /// distinguish between 8-bit characters, which it handles correctly,
and
+ /// multi-byte characters, which are regarded as multiple characters for
+ /// string methods.
DSOEXPORT std::string encodeCanonicalString(const std::wstring& wstr,
int version);
- // Return the next Unicode character in the UTF-8 encoded
- // string. Invalid UTF-8 sequences produce a U+FFFD character
- // as output. Advances string iterator past the character
- // returned, unless the returned character is '\0', in which
- // case the iterator does not advance.
+ /// Return the next Unicode character in the UTF-8 encoded
+ /// string. Invalid UTF-8 sequences produce a U+FFFD character
+ /// as output. Advances string iterator past the character
+ /// returned, unless the returned character is '\0', in which
+ /// case the iterator does not advance.
boost::uint32_t decodeNextUnicodeCharacter(std::string::const_iterator&
it);
- // Encodes the given UCS character into the given UTF-8
- // buffer. Writes the data starting at buffer[offset], and
- // increments offset by the number of bytes written.
- //
- // May write up to 6 bytes, so make sure there's room in the
- // buffer!
+ /// Encodes the given wide character into a canonical
+ /// string, theoretically up to 6 chars in length.
std::string encodeUnicodeCharacter(boost::uint32_t ucs_character);
+ /// Encodes the given wide character into an at least 8-bit character,
+ /// allowing storage of Latin1 (ISO-8859-1) characters. This
+ /// is the format of SWF5 and below.
std::string encodeLatin1Character(boost::uint32_t ucsCharacter);
}
Index: server/edit_text_character.cpp
===================================================================
RCS file: /sources/gnash/gnash/server/edit_text_character.cpp,v
retrieving revision 1.148
retrieving revision 1.149
diff -u -b -r1.148 -r1.149
--- server/edit_text_character.cpp 6 Feb 2008 15:20:57 -0000 1.148
+++ server/edit_text_character.cpp 7 Feb 2008 16:15:34 -0000 1.149
@@ -594,8 +594,15 @@
{
std::wstring s = _text;
- // id.keyCode is the unique gnash::key::code for a
character
- uint32_t c = (uint32_t) id.keyCode;
+ // id.keyCode is the unique gnash::key::code for a
character/key.
+ // The maximum value is about 265, including function
keys.
+ // It seems that typing in characters outside the
Latin-1 set
+ // (256 character codes, identical to the first 256 of
UTF-8)
+ // is not supported, though a much greater number UTF-8
codes can be
+ // stored and displayed. See utf.h for more information.
+ // This is a limit on the number of key codes, not on
the
+ // capacity of strings.
+ gnash::key::code c = id.keyCode;
// maybe _text is changed in ActionScript
m_cursor = imin(m_cursor, _text.size());
@@ -647,9 +654,11 @@
break;
default:
- wchar_t t = (wchar_t)
gnash::key::codeMap[c][key::ASCII];
+ wchar_t t =
static_cast<wchar_t>(gnash::key::codeMap[c][key::ASCII]);
if (t != 0)
{
+ // Insert one copy of the
character
+ // at the cursor position.
s.insert(m_cursor, 1, t);
m_cursor++;
}
@@ -1318,9 +1327,17 @@
last_space_glyph = rec.m_glyphs.size();
}
- { // need a sub-scope to avoid the 'goto' in TAB handling to
cross
+ {
+ // need a sub-scope to avoid the 'goto' in TAB handling to cross
// initialization of the 'index' variable
- int index = _font->get_glyph_index((boost::uint16_t) code,
_embedFonts);
+
+ // The font table holds up to 65535 glyphs. Casting from
uint32_t
+ // would, in the event that the code is higher than 65535,
result
+ // in the wrong character being chosen. It isn't clear whether
this
+ // would ever happen, but UTF-8 conversion code can deal with
codes
+ // up to 2^32; if they are valid, the code table will have to be
+ // enlarged.
+ int index =
_font->get_glyph_index(static_cast<boost::uint16_t>(code), _embedFonts);
IF_VERBOSE_MALFORMED_SWF (
if (index == -1)
- [Gnash-commit] gnash ChangeLog libbase/utf8.h server/edit_text...,
Benjamin Wolsey <=