guile-user
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Converting a part of byte vector to UTF-8 string


From: Nala Ginrut
Subject: Re: Converting a part of byte vector to UTF-8 string
Date: Wed, 15 Jan 2014 12:59:16 +0800

hi there!

On Tue, 2014-01-14 at 00:17 +0100, Panicz Maciej Godek wrote:
> Another option would be to use
> (substring (utf8->string buffer 0 n))
> 
> This one works, but according to the manual, the
> string is "newly allocated", so it's unnecessary overhead.
> 

Actually, substring is COW(copy-on-write), so you don't have to be
worried. And you may try substring/shared which won't allocate at all.
But please be careful the side-effect in you context ;-) 

> What would be the best solution?
> 

IMO, no matter you us substring or substring/shared in this context, you
have to allocate a new string. The reason is we don't have something
like bytevector/shared.

But IIRC bytevector in Guile is similar with C array, which means you
can avoid any allocation when you try to slice a bytevector if you can
handle the array pointer properly. 
So one may take advantage of it.

!!But I can't say you can avoid allocation when you convert bytevector
to string, because either utf8->string or pointer->string will allocate
anyway.

(Anyone correct me please if I'm wrong!)

Here's my black magic:
-------------------------------cut------------------------------
(use-modules (system foreign)) ; to handle the C pointer

(define* (bv->string/partly bv #:optional (start 0) 
                                          (end #f) 
                                          (size 1)
                                          (encoding "utf-8"))
 (let ((len (if end (* size (- end start)) 
                    (- (bytevector-length bv) (* size start))))
       (addr (+ (pointer-address (bytevector->pointer bv)) 
                (* size start))))
 (pointer->string (make-pointer addr) len encoding)))
-------------------------------end--------------------------------

;;(define bv (string->utf8 "我了个去啊"))
;; NOTE: Chinese character needs size==3
(bv->string/partly bv 2 4 3)
==> "个去"

;; And for common latin character whose size==1
;;(define bv2 (string->utf8 "hello world"))
(bv->string/partly bv 0 5)
==> "hello"


But I have a give a warning again, when you try to avoid allocation
overhead, you have to face the risk of the side-effect. To me, I'd
prefer pure-functional. ;-P

> TIA
> M





reply via email to

[Prev in Thread] Current Thread [Next in Thread]