[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Unknown fields in table input text files

From: Vadim Zeitlin
Subject: Re: [lmi] Unknown fields in table input text files
Date: Sat, 20 Feb 2016 13:57:14 +0100

On Sat, 20 Feb 2016 04:12:25 +0000 Greg Chicares <address@hidden> wrote:

GC> On 2016-02-20 03:16, Vadim Zeitlin wrote:
GC> > 
GC> >  I decided to extend my tests checking that all tables in qx_ins and 
GC> > databases survive the round trip through the new table code to also do the
GC> > same for the tables in qx_ann and got several failures due to the presence
GC> > of unknown "fields" in some of the tables here.
GC> If it's not too difficult, could you share that with me, so that I can
GC> run the same test against all proprietary tables? It doesn't have to be
GC> polished; I would only need to run it once.

 Thinking more about this, why not include this in the table_tool itself as
some --verify option? I think this could be useful and it would be a more
convenient way to test. Let me just do this and send you the tool...

GC> >  One of them looks like a real field as it's present in several files: 
GC> > the "Editor: " one. I don't know at all what to do about it as there is no
GC> > corresponding field in the binary format, so there doesn't seem to be any
GC> > way to store the value of this field in it.
GC> Please tell me the number of a 'qx_ann' table that has this field so that
GC> I can examine it. I don't remember ever seeing "Editor:" in these files.

 It occurs in the following tables:

893 894 895 896 897 898 952 953 954 955 956 957 958 959 960 961 962 963 964 965 
966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984

 It is actually part of "Comments:" in the binary files, but it surely
looks like just another header (similar to e.g. "Contributor") in the text

GC> I have two suggestions:
GC> (1) Build a whitelist of header names, and reject anything not on the list.
GC> I imagine that this list will be short; I thought they were enumerated
GC> in the 1990s code, and perhaps also in the HLP or GID documentation.

 Would we include "WARNING" in this whitelist?

GC> (2) Use a regex like /[A-Za-z0-9]* *[A-Za-z0-9]*:/ on the assumption that
GC> header names consist of one or two words followed by a colon. Deem any
GC> colon that occurs later in the line to be content rather than markup.

 Yes, I definitely need to do this to avoid at least the obvious false
positives. The trouble with "Editor:" and "WARNING:" is that they're not
really obvious, are they.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]