[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[5985] take first display text for a given sort key
From: |
karl |
Subject: |
[5985] take first display text for a given sort key |
Date: |
Wed, 24 Dec 2014 17:34:07 +0000 |
Revision: 5985
http://svn.sv.gnu.org/viewvc/?view=rev&root=texinfo&revision=5985
Author: karl
Date: 2014-12-24 17:34:06 +0000 (Wed, 24 Dec 2014)
Log Message:
-----------
take first display text for a given sort key
Modified Paths:
--------------
trunk/texindex/GNUmakefile
trunk/texindex/ti.twjr
Modified: trunk/texindex/GNUmakefile
===================================================================
--- trunk/texindex/GNUmakefile 2014-12-24 16:36:09 UTC (rev 5984)
+++ trunk/texindex/GNUmakefile 2014-12-24 17:34:06 UTC (rev 5985)
@@ -2,7 +2,7 @@
TEXI = ti.texi
AWK = texindex.awk
-all: $(AWK) html ti.pdf check
+all: check $(AWK) html ti.pdf
$(TEXI): $(SOURCE)
rm -f $@; $(GAWK) ./jrweave $(SOURCE) >$(TEXI) || rm -f $@; chmod a-w $@
Modified: trunk/texindex/ti.twjr
===================================================================
--- trunk/texindex/ti.twjr 2014-12-24 16:36:09 UTC (rev 5984)
+++ trunk/texindex/ti.twjr 2014-12-24 17:34:06 UTC (rev 5985)
@@ -9,7 +9,7 @@
@c Better brace handling, this texindex is needed to process!
@allowindexbraces
address@hidden merge the function and variable indexes into the concept index,
address@hidden Merge the function and variable indexes into the concept index,
@c but without the code font; in the index entries we'll do the
@c font management ourselves. Also merge in the chunk definition
@c and reference entries, which jrweave creates for us.
@@ -140,14 +140,14 @@
The page number for this entry.
@item
-The actual text to go into the printed index, which can include markup.
+The display text to be shown in the printed index, which can include markup.
@end itemize
The braces are balanced in all cases, although for use by this program,
-braces can be included in the sort key by escaping them with the
address@hidden character}. This is the first character on the line. It
-is either the backslash used by @TeX{} (@samp{\}) or the at sign used by
-Texinfo (@samp{@@}).
+literal braces (not necessarily balanced) can be included in the sort
+key by escaping them with the @dfn{command character}. This is the
+first character on the line. It is either the backslash used by @TeX{}
+(@samp{\}) or the at sign used by Texinfo (@samp{@@}).
The job is to sort the entries, and merge those which are identical
except for the page numbers. Thus, for the above two entries, the
@@ -177,10 +177,11 @@
@enumerate 1
@item
-The mapping of sort key to display text should be unique, with only the
-line number changing each time. If the same sort key has two different
-display texts, it means that different markup was used, probably
-inadvertently. For example, suppose you have the following input:
+If a given sort key has more than one display text, we only take the
+first (this matches the behavior of C @command{texindex}). Put another
+way, if the same sort key has two different display texts, it means that
+different markup was used, probably inadvertently, and we just take the
+first. As an example, consider these two Texinfo commands:
@example
@@cindex @@address@hidden()@} function
@@ -189,7 +190,7 @@
@end example
@noindent
-This produces the following output from @file{texinfo.tex},
+They produce the following output via @file{texinfo.tex},
which in turn is the input to @command{texindex}:
@example
@@ -197,12 +198,13 @@
address@hidden() address@hidden@address@hidden@{\code @{field_split()@}
address@hidden
@end example
address@hidden This is ok, and the entries will be processed separately; the
-results will be visible in the final index as two identical appearing
-entries (most likely with different page numbers). This should cause
-the document author to search for entries that are identical with
-respect to text but that differ in their use of Texinfo markup.
address@hidden The result will be a single entry, using @code{\file},
+accumulating the page numbers.
address@hidden
+\entry @{\file @{field_split()@} address@hidden@{2, address@hidden
address@hidden example
+
@item
@cindex roman numerals
For the same sort key and text, page numbers will be monotonically
@@ -214,12 +216,12 @@
@end enumerate
An additional requirement, for ease of deployment, is that the program
-be written in portable @command{awk}, and not use features that are
-found only in GNU @command{awk} (@command{gawk}). For our purposes,
-``portable'' means ``new'' @command{awk} as defined in the 1988 book by
-Aho, Weinberger and Kernighan. This gives us functions,
-multidimensional arrays and a number of other important features over
-the original @command{awk} shipped with V7 Unix.
+be written in portable @command{awk}, and not use features found only in
+GNU @command{awk} (@command{gawk}). For our purposes, ``portable''
+means ``new'' @command{awk} as defined in the 1988 book by Aho,
+Weinberger and Kernighan. This gives us functions, multidimensional
+arrays and a number of other important features over the original
address@hidden shipped with V7 Unix.
@node High-level organization
@chapter High-level organization
@@ -261,8 +263,8 @@
For the first line of the generated output, we hardwire our intended
output file name and how it got made. We do not use a @samp{#!} header
-because, being a GNU program, this needs to accept the @option{--help}
-and @option{--version} options. This cannot be done with a standalone
+because, being a GNU program, we need to accept the @option{--help} and
address@hidden options. This cannot be done with a standalone
@code{awk} script; we need a shell wrapper, and hence, the @code{awk}
script itself need not be executable, and it's simpler not to worry
about the location of the @code{awk} program.
@@ -460,7 +462,7 @@
backslashes in @code{sub()}.
@<Remove leading @code{\entry}@>=
-$0 = substr($0, 7) # remove leading \entry
+$0 = substr($0, 7) # remove leading \entry
@
@node Get the initial
@@ -530,7 +532,7 @@
@<Name the fields@>=
key = fields[1]
-linenum = fields[2]
+pagenum = fields[2]
text = fields[3]
@
@@ -542,25 +544,41 @@
@cindex @code{Keys} array
@cindex @code{Entries} variable
@cindex @code{Data} array
-We use a traditional @command{awk} multidimensional array to store the
-various bits and pieces. The subscripts are based on the sort key, and
-the parts are the @code{"linenum"}, the output @code{"text"}, and the
address@hidden"initial"}. In addition, the key is stored as data in the
address@hidden array. This array is sorted later on.
+We use a traditional @command{awk} multidimensional array, @code{Data}.
+The sort key from the input is invariant across entries, so we use that
+as the basis for the keys in the @code{Data} array.
+The @code{Data} values are the the output @code{"text"}, the
address@hidden"pagenum"} list, and the @code{"initial"}.
-The key and the text are invariant across entries; only the line number
-changes, so we use the key and text as the unique index into
address@hidden
+In the event that a particular key has more than one associated output
+text, we'll keep the first and ignore the remainder (this is the same
+behavior as the C implementation). @xref{Requirements}.
+For page numbers, we merely append the page number field from the input,
+preceded by a comma and space, unless that page number was already the
+last that's been stored. (We're assuming the page numbers don't jump
+around, which, in fact, they don't, so we don't need a more complex
+approach.)
+
address@hidden @code{Entries}
+In addition to all this updating of the @code{Data} array, the key is
+stored in the @code{Entries} array the first time it is seen; this array
+is sorted later on.
+
@<Store the data for this line in the @code{Data} array@>=
if (! ((key, "text") in Data)) {
# first time we've seen this full line
Keys[++Entries] = key
- Data[key, "linenum"] = linenum
Data[key, "text"] = text
+ Data[key, "pagenum"] = pagenum
Data[key, "initial"] = initial
} else
- Data[key, "linenum"] = Data[key, "linenum"] ", " linenum
+ # seen this key before; add the current pagenum
+ # unless we've already seen that too.
+ if ( Data[key, "pagenum"] != pagenum \
+ && Data[key, "pagenum"] !~ (", " pagenum "$")) {
+ Data[key, "pagenum"] = Data[key, "pagenum"] ", " pagenum
+ }
@
@node Check for more than one initial
@@ -714,7 +732,7 @@
printf("%centry {%s}{%s}\n",
Command_char,
Data[Keys[i], "text"],
- Data[Keys[i], "linenum"]) > Output_file
+ Data[Keys[i], "pagenum"]) > Output_file
}
close(Output_file)
}
@@ -750,8 +768,8 @@
@cindex quicksort
@cindex Hoare, C.A.R.
-Sorting uses a standard Quick Sort, with the @code{less_than()} function
-supplying the comparison.
+Sorting uses a standard quicksort algorithm, with a @code{less_than()}
+function (defined in the next function) supplying the comparison.
@cindex @code{less_than()} function
@cindex @code{quicksort()} function
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [5985] take first display text for a given sort key,
karl <=