chicken-janitors
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte seq


From: Chicken Trac
Subject: Re: [Chicken-janitors] #1182: utf8 egg silently accepts invalid byte sequences
Date: Mon, 30 Mar 2015 20:40:09 -0000

#1182: utf8 egg silently accepts invalid byte sequences
-------------------------+--------------------------------------------------
  Reporter:  syn         |       Owner:  ashinn  
      Type:  defect      |      Status:  reopened
  Priority:  major       |   Milestone:  someday 
 Component:  extensions  |     Version:  4.9.x   
Resolution:              |    Keywords:  utf8    
-------------------------+--------------------------------------------------
Changes (by syn):

  * status:  closed => reopened
  * resolution:  invalid =>


Comment:

 Replying to [comment:9 ashinn]:
 > Yes, a validation predicate is a long-standing todo.
 > I met get around to it soon, patches are also welcome.

 I'm attaching a patch which adds `utf8-validation` module along with some
 rudimentary sanity tests. It only exports the discussed predicate, named
 `utf8-string?` -- that's the reason I put it in a separate module, since I
 couldn't think of a better name and I didn't want to make the main `utf8`
 module un-prefixable. Perhaps you have a better idea?

 The validation algorithm is based on The Unicode Standard, Version 7.0 -
 Core Specification, Table 3-7, p. 125. It performs reasonably well when
 compiled with `-O2 -specialize` or `-O3` (around an order of magnitude
 slower than an implementation of the same algorithm in C). I provide it to
 you under the same license as the `utf8` egg so feel free to include it.

 > It would in theory be possible to validate every input
 > to every utf8 operation, but I have no intention of doing
 > so, for performance reasons and because people may
 > currently be using invalid utf8 in "safe" ways already.

 Yep, I totally agree with that!

-- 
Ticket URL: <http://bugs.call-cc.org/ticket/1182#comment:10>
CHICKEN Scheme <http://www.call-with-current-continuation.org/>
CHICKEN Scheme is a compiler for the Scheme programming language.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]