[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH] org-export: Remove zero-width space escapes during export

From: Ihor Radchenko
Subject: [PATCH] org-export: Remove zero-width space escapes during export
Date: Tue, 26 Jul 2022 20:59:18 +0800

K K <k_foreign@outlook.com> writes:

> My use case is to emphasize chinese characters without spaces being inserted, 
> even those zero-width spaces. For example "中文*测*试" should be enough to 
> emphasize "测".
> I am using zero-width spaces right now, and it works fine in org-mode 
> buffers, but if exported to latex-pdf files, the U+200B ZERO WIDTH SPACE 
> character will not be zero-width for certain fonts. So I hope not to use that 
> character.

This is a bug. While escape symbols do not affect export in most common
scenarios, your report is adding yet another case when zero-width space
is actually altering the export result.

I am attaching a tentative patch that will make Org export remove
zero-width spaces when those spaces actually separate the object

Any objections?

> On Tue, 26 Jul 2022 09:26:42 +0800, Ihor Radchenko wrote:
>> Another idea we have discussed is using something similar to Markdown
>> format: **bold**, //italics//, __underline__, etc. It is less verbose
>> compared to the special blocks, which should be valuable for
>> Japanese/Chinese/other languages with no spaces between words.
> By the way, it seems that my use case has already been implemented by 
> markdown-mode. In a markdown-mode buffer "中文**测**试" will certainly make "测" 
> bold.

The idea was indeed inspired by Markdown.
However, Markdown is different - **bold** is the official syntax to
indicate bold markup. Though things are more complex in reality:
https://www.markdownguide.org/basic-syntax/ Markdown has its own edge


>From 5764b41b858bff3d56dcb24741cf550a7e245d36 Mon Sep 17 00:00:00 2001
From: Ihor Radchenko <yantar92@gmail.com>
Date: Tue, 26 Jul 2022 20:50:47 +0800
Subject: [PATCH] org-export: Remove zero-width space escapes during export

* lisp/ox.el (org-export--remove-escaped): New function removing
zero-width spaces when they separate object boundaries.
(org-export-as): Call `org-export--remove-escaped'.
* testing/lisp/test-ox.el (test-org-export/remove-escaped): New test.
 lisp/ox.el              | 22 ++++++++++++++++++++++
 testing/lisp/test-ox.el | 13 +++++++++++++
 2 files changed, 35 insertions(+)

diff --git a/lisp/ox.el b/lisp/ox.el
index 40ad7ae4e..de034fd22 100644
--- a/lisp/ox.el
+++ b/lisp/ox.el
@@ -2916,6 +2916,25 @@ (defun org-export--remove-uninterpreted-data (data info)
   ;; Return modified parse tree.
+(defun org-export--remove-escaped (data info)
+  "Remove escape symbols from plain-text in DATA.
+DATA is a parse tree or a secondary string.  INFO is a plist
+containing export options.  It is modified by side effect and
+returned by the function."
+  (org-element-map data '(plain-text)
+    (lambda (string)
+      (let (processed-string)
+        (setq processed-string
+              (replace-regexp-in-string "\\`​" "" string))
+        (setq processed-string
+              (replace-regexp-in-string "​\\'" "" processed-string))
+        (unless (equal string processed-string)
+          (org-element-insert-before processed-string string)
+          (org-element-extract-element string))))
+    info nil nil t)
+  ;; Return modified parse tree.
+  data)
 (defun org-export-as
     (backend &optional subtreep visible-only body-only ext-plist)
@@ -3046,6 +3065,9 @@ (defun org-export-as
           ;; communication channel.
           (org-export--prune-tree tree info)
           (org-export--remove-uninterpreted-data tree info)
+           ;; Remove zero-width spaces that escape Org syntax
+           ;; elements.
+           (org-export--remove-escaped tree info)
           ;; Call parse tree filters.
           (setq tree
diff --git a/testing/lisp/test-ox.el b/testing/lisp/test-ox.el
index 7c71b6e24..ea4fce363 100644
--- a/testing/lisp/test-ox.el
+++ b/testing/lisp/test-ox.el
@@ -982,6 +982,19 @@ (ert-deftest test-org-export/uninterpreted ()
                             (section . (lambda (s c i) c))))
             nil nil nil '(:with-sub-superscript {}))))))
+(ert-deftest test-org-export/remove-escaped ()
+  "Test removing escape symbols."
+  ;; Remove zero-width space around markup.
+  (should
+   (equal "This*is*test.\n"
+          (org-test-with-temp-text "This​*is*​test.\n"
+            (org-export-as (org-test-default-backend)))))
+  ;; Do not remove zero-width space in other places.
+  (should
+   (equal "This​is​test.\n"
+          (org-test-with-temp-text "This​is​test.\n"
+            (org-export-as (org-test-default-backend))))))
 (ert-deftest test-org-export/export-scope ()
   "Test all export scopes."
   ;; Subtree.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]