axiom-developer
[Top][All Lists]

## [Axiom-developer] Tex(t) chunk inclusion and citation

 From: William Sit Subject: [Axiom-developer] Tex(t) chunk inclusion and citation Date: Fri, 06 Dec 2002 03:30:29 -0500

Tim wrote:

> Further, I need two behaviors for text chunks.
>
> First, I need to be able to essentially \include a text chunk from
> another document (or maybe a whole text root) into the current
> document. This is to ensure that information that is widely needed is
> not copied everywhere (otherwise some copies will be out of date).
>
> I also need this ability in order to construct "booklets" without
> copying the original sources but also without including the original
> sources in total as \include would do. Think of a booklet as a thread
> thru pamphlets with text along the threads. This will allow one to
> write a chapter introduction, include some text and code chunks from
> some pamphlet, and then write a summary or problems section (for a
> textbook).

Obvious problems:

This would be especially delicate when the text chunk produces source
that would be published, say, not within the open-source or Axiom
journal, but outside these.

(2) technical.

There has to be some very restrictive convention on text chunks that is
TeX/LaTeX source (and similarly for other types of source that supports
any macros). For example, if within a text chunk, \renewcommand\xxx{...}
is allowed, then when other text chunks are assembled, the assembler
will need to be smart enough to save the the current definition of xxx
before getting into that chunk and restore it afterwards -- but this may
create problems too, because the original intention of the chunk author
may be that \renewcommand\xxx{...} has a larger scope than the text
chunk. The problem does not exist for the author, because the author
knows and presumably verify the assembled output. But when the text
chunk is \include(d) by other authors, it will be like calling a
subroutine with global variables reset as side-effect. Other commands
involving defining any macro are equally problematic.

One solution may be to limit the scope of macros local to the text
chunk, if the text chunk is intended for reuse. But how does the text
assembler know the previous environment without being the TeX compiler?
(Perhaps there is a TeX command to query the definitions of a particular
macro? I am no TeX/LaTeX expert, but as we will see, this facility will
be very useful. Let's call this the macroResolver for now. Such a
program is not too difficult to write for TeX gurus.)

How would code chunk assemblers solve this problem? In Axiom, we are
allowed to define macros (say, textual string substitutions with ==>,
not to mention the == ). So again we need an equivalent macrResolver.

> Second, I need to \cite a text chunk in a woven document. Then I
> can cross-reference the documents so a reader mechanism can follow
> the citations. This comes from pondering ideas of how to read the
> .dvi files in some rational way. I'm tempted to automatically
> construct an html tree on top of the dvi files but I'm not happy
> with this solution for the long term.

Citation can presumably be done by creating a protocol that identifies
the location of the source. So the problem is defining what "location"
is, perhaps in a similar way to URL. But there is more. You are not
contemplating on just citing a DOCUMENT, but a TEXT CHUNK, that is the
equivalent of chapter and verse citation. So each citable text chunk
must have its own "URL" (let say we call it UCL, for chunk locator). If
you are using .dvi as the source (because you want to avoid compiling
the source again), you will need to embed the UCL info into the .dvi
source, which requires modifying TeX (not an elegant solution). The html
tree construct would work for entire DOCUMENT citation because the
entire .dvi file is retrieved and displayed. The user must then manually
scroll to find the right place. Not every .dvi (or .ps) viewer supports
searching. A .pdf version would be easier as far as the user goes since
(a) we don't have to maintain Acrobat Reader (b) it has good searching
facilities.

However, this is not a good solution.

Returning to the text chunk citation problem, it would seem that the
logical place for the booklet reader to obtain the information would be
the booklet/pamphlet source. Then the assemblers need to have an option
of outputing only the portion of the text chunk. This is probably
do-able. (For TeX, the assembler will have to create on the fly a TeX
source that can resolve all the user-created macros appearing in the
text chunk using the macroResolver. (As we see later, this is not
enough, we also need a labelResolver, and worst, a symbolResolver, but
let's ignore that for now). For code, a similar thing has to be done
when macros are used.) So a possible scenario is:

booklet 1:
...
(a)=some tag that identifies the type of chunk for that particular
assembler
... some chunk
(b)=some tag that identifies that this is the beginning of a citable
portion
(c)=some tag that identifies an external name for this citable chunk,
say the name is BIBITEM
(d)=citable portion of chunk
(e)= some tag that identifies that this is the end of a citable chunk
... more chunk
(f)=some tag that identifies the end of the chunk

booklet 2:
...
(z)=some tag that cites a chunk using BIBITEM
...

When the assembler (for the type of chunk) compiles booklet 1 the normal
way, it creates in a citation database an entry in the form of a pair
(BIBITEM, UCL). The UCL contains information on the name of the booklet,
the pamphlet containing the chunk, the type of chunk (a), and the tags
(b) and (c). The BIBITEM is published and used by other booklets for
citation.

When the assembler (for the type of chunk) compiles booklet 2 the normal
way, it outputs a citation in the form of [BIBITEM] say.

When a user wants to open this [BIBITEM] while reading booklet 2, and
say clicks on it, the booklet reader spawns a process that, in order,

(1) searches the citation database for the UCL corresponding to the
BIBITEM.

(2) finds out from the UCL the type of chunk BIBITEM is and invokes the
associated assembler by giving the source booklet file, pamplet, and the
tags (b), (c). The assembler resolves all the macros within the citable
portion using macroResolver, sends an abbreviated source that just
includes the citable chunk with all macros resolved to the TeX compiler
if the type is TeX, which outputs a .dvi and sends it to the booklet
reader for rendering (printer/screen). If the citable chunk is code,
then the abbreviated source can be sent to the booklet reader directly,
or to a code-formatter first.

The same mechanism may be used for references in TeX (this is not to be
confused with citations: a reference \ref{label} is a pointer to a
labeled location within the document, such as a Theorem, an equation,
section, etc; a citation \cite{bibitem} is a pointer to something
external to the document).  A labelResolver can resolve the references
easily using .aux, so a reference within a citable chunk can be compiled
correctly. It would be desirable and it is possible to add the
convenience to be able to jump to a labeled point in a pop-up window
(wouldn't that save us all the page turning back and forth?) if we treat
certain references (labeled items) as citable.  The processing of
citable chunks needs to be modified to handle labels external to, but
referenced within, the chunks.

But resolving the labels is only to maintain the same display as the
original document of the cited chunks. The definitions of symbols within
the cited chunks are not available, usually. (A symbol is any
mathematical symbol typset in math-mode). The same is true for a
referenced equation or theorem. Even if we can jump to the reference the
way a citation does by making it citable, the referenced or cited item
is not useful if the symbols are meaningless. This is probably the
hardest to resolve, for a TeX document, unlike a code source, the
symbols are not identified at all. There seems to be no way to write any
symbolResolver for TeX without revolutionizing the way math papers are
typeset.

So I leave you with an open problem, and readers of booklets will still
have to scroll and search manually the meaning of symbols.

We can severely restrict how a citable chunk or reference can be written
-- they need be self-contained.

William