[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Savannah-register-public] [task #5084] Submission of Anamika Press Code

From: Abhishek Choudhary
Subject: [Savannah-register-public] [task #5084] Submission of Anamika Press Code for Indian Script Representation
Date: Fri, 30 Dec 2005 02:46:43 +0000
User-agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows 98; digit_may2002)


                 Summary: Submission of Anamika Press Code for Indian Script
                 Project: Savannah Administration
            Submitted by: hi_pedler
            Submitted on: Fri 12/30/05 at 02:46
         Should Start On: Fri 12/30/05 at 00:00
   Should be Finished on: Mon 01/09/06 at 00:00
                Category: Project Approval
                Priority: 5 - Normal
                  Status: None
                 Privacy: Public
             Assigned to: None
        Percent Complete: 0%
             Open/Closed: Open
                  Effort: 0.00



A new project has been registered at Savannah 
The project account will remain inactive until a site admin approve or
discard the registration.


While this item will be useful to track the registration process, approving
or discarding the registration must be done using the specific "Group
Administration" page, accessible only to site administrators, effectively
logged as site administrators (superuser):


######### REGISTRATION DETAILS ######### 

Full Name:
  Anamika Press Code for Indian Script Representation

System Group Name:

  non-GNU software & documentation

  GNU General Public License V2 or later

  APCISR (Anamika Press Code for Indian Script Representation) is a GPL'd set
of representational semantics and assoicated algorithms implemented using GCC
for compositional syllabic Indian scripts along with the corresponding set of
graphemes, developed independently by the authors, for use with fixed width
console (text-mode) applications. This is the 'only' FOSS / GPL'd software
that provides this functionality, all other alternatives, such as CDAC's
GIST, are commercial. The APCISR uses a 9-grid format to extract the common
features of the Brahmi derived Indian scripts. Each feature forms a specific
grapheme. The 9-grid consists of three rows, viz. Urdha, Madhya and Nimna,
and three columns, viz. Matrik, Lipik and Purak. The Indian script symbols
are mapped to their constituent graphemes in one table, with the graphemes
being mapped to the corresponding glyphs (character-codes) in another table.
Hence, the process of conversion of codes such as ISCII to APCISR is a
two-step procedure. The first step (synthesis) consists of combining the
grapheme maps of the different Indian symbols, which is algorithmically
intensive, while the second step is a straight forward O(n) lookup procedure
for obtaining the character values of the corresponding graphemes.

The explanation of the synthesis step requires us to distinguish between the
look-up map (LM) and the working-map (WM). The LM is a simple 9-grid grapheme
map, while WM consists of three rows of three or more columns, with three
cursors pointing out the Matrik, Lipik and Purak columns, each of which can
move independently with respect to each other. The LM grapheme maps also
contain other related properties of the Indian script symbols, such as how
the incorporation of the LM in the WM moves the cursors of the WM. This forms
a basis for a set of semantic rules for the synthesis step, such as upon
encountering a half consonant the Matrik remains constant while the Lipik and
Purak are right shifted by 1 place, making the previous Purak the current
Lipik  and introducing a new column to the right of the WM, which becomes the
new Purak. A normal consonant cursor shift consists of the existing Purak
becoming the new Matrik, along with the introduction of two new columns to
the right of the WM for the new Lipik and Purak. A normal matra causes no
cursor shift. The LM grapheme table also consists of mappings for character
combinations (sanyuktakshara or juktakshara), which are treated as a single
symbol. Once the position of the cursor has been determined, the LM values
are logically AND-ed with the corresponding WM values. However, some scripts
deviate from generalisations and require the inclusion of specific rules,
which are economically accommodated at the end of the synthesis step. The
process of APCISR conversion is reversible, however the step of
character-code rendering introduces some ambiguity prohibiting proper
reconstruction, owing to the fact that more than one grapheme may use the
same character code. This issue can be addressed by using larger character
pages. However, as our objectives do not require the APCISR to be reversible,
as the rendering is done just-in-time, and using the same character-code or
glyph for different graphemes allows the extended ASCII code-page to
accommodate the glyphs while maintining common graphic symbols such as box
and shaded bars, with the 7-bit code page remaining constant. The conversion
of the standard international numerals to the corresponding script has been
kept optional. APCISR is currently supported on VGA and compatible graphics

The source code is currently available from


Reply to this item at:


  Message sent via/by Savannah

reply via email to

[Prev in Thread] Current Thread [Next in Thread]