[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [lmi] Unknown fields in table input text files

From: Greg Chicares
Subject: Re: [lmi] Unknown fields in table input text files
Date: Sat, 20 Feb 2016 04:12:25 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Icedove/38.5.0

On 2016-02-20 03:16, Vadim Zeitlin wrote:
>  I decided to extend my tests checking that all tables in qx_ins and qx_cso
> databases survive the round trip through the new table code to also do the
> same for the tables in qx_ann and got several failures due to the presence
> of unknown "fields" in some of the tables here.

If it's not too difficult, could you share that with me, so that I can
run the same test against all proprietary tables? It doesn't have to be
polished; I would only need to run it once.

>  One of them looks like a real field as it's present in several files: it's
> the "Editor: " one. I don't know at all what to do about it as there is no
> corresponding field in the binary format, so there doesn't seem to be any
> way to store the value of this field in it.

Please tell me the number of a 'qx_ann' table that has this field so that
I can examine it. I don't remember ever seeing "Editor:" in these files.

>  Another one is not a field at all, but just something looks like one: a
> couple of tables have lines starting with "WARNING: " in their
> "Construction method" description. I'm not sure what to do about this one
> neither: should I specifically make an exception for this word? Or ignore
> any unknown "fields"? The latter seems dangerous, as typos in the field
> names could go unnoticed. Ideal would be to have some way to escape the
> colon, e.g. by doubling it, but even if I introduced support for this in
> the new code, it still wouldn't be able to deal with the text files
> produced by the old version.

I have two suggestions:

(1) Build a whitelist of header names, and reject anything not on the list.
I imagine that this list will be short; I thought they were enumerated
in the 1990s code, and perhaps also in the HLP or GID documentation.

(2) Use a regex like /[A-Za-z0-9]* *[A-Za-z0-9]*:/ on the assumption that
header names consist of one or two words followed by a colon. Deem any
colon that occurs later in the line to be content rather than markup.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]