gnucobol-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [open-cobol-list] Sort utility with OC


From: Roger While
Subject: Re: [open-cobol-list] Sort utility with OC
Date: Fri, 08 Jun 2007 10:25:26 +0200

This is the wrong approach.
Sort is already implemented in OC/TC and the runtime
hooks are usable.
Speaking for OC, we can sort on any combination of
any field types for any type of file.
We do not want to reinvent a sort algo.
In the OC 0.33 prerelease, I have implemented a
list merge technique that has complexity O(n(log n)).
This will be performed in memory (configurable) if possible
otherwise reverting to temporary flat files.
Note that this technique does not suffer from "worst case"
like other algos.
(Modulo external stuff like memory allocation)

The only thing we need to define is the external syntax.
Knocking up a lexxer is then trivial and can be designed
to produce an API upon which both TC and OC can build.
The runtime implementation will be hooked up by David
and myself.

Interesting note about IP.
MF's current doc for mfsort, when describing field types,
has a link to IBM's DFSORT.

Roger



David,
This conversation seems to have tailed off so I thought I would give it another stir.

I took a look at CSORT and a further look at the GNU sort.
My feeling is that the GNU sort is a much better platform to build out from.
It contains a lot of very useful infrastructure,
e.g. code for memory buffer management, process management,
intermediate sort file management, an efficient sort algorithm,
sort, merge or check capability, etc.

Agreed, it is targeted at LF(CR) delimited variable length records,
but that corresponds to LINE SEQUENTIAL in the Open-Cobol environment.
I think it would be straight forward to enhance it to support Open-Cobol
SEQUENTIAL (i.e. records prefixed with record length in binary)
or RELATIVE files.

Agreed, it is designed for variable length records and delimited keys,
but that is a more difficult problem than fixed offset fixed size keys.
UN*X sort has to scan through each record looking for the keys.
Again I think it would be straight forward to enhance it to support
fixed offset fixed sized keys for the Open Cobol data types.

It also includes all the GNU mechanisms including a suite of test cases
that could be built on.
The GNU mechanisms do provide some rigor and portability,
but working within them may be the biggest challenge.

I am still leaning towards Open Cobol's "-std=<dialect>" mechanism
to specify different forms of sort parameters.
I don't know much about IP issues, but Open Cobol seems to get away with this approach. One of the <dialect>s could be 'unix' to continue with the existing capabilities.
You say you have a YACC grammar for SORT syntax.
If you want to post it somewhere that I can pick up,
then I will take a look at how much can be easily implemented.

Regards,
Bob Moenck


-----Original Message-----
From: David Essex [mailto:address@hidden
Sent: May 29, 2007 12:57 AM
To: open-cobol-list
Cc: Moenck, Robert
Subject: Re: [open-cobol-list] Sort utility within OC


Robert Moenck wrote:

 > There seems to be a lot of interest in this issue.
 > So much, that I am not sure which E-mail to respond to.
 > I arbitrarily picked this one to add my two cents to
 > the pile.

Well, for lack of a better forum, perhaps you should post your views on
the OC mailing list. To which, I took the liberty of forwarding this reply.


 > In random order:
 > 1)
 > I agree with the modular concept you suggest.
 > My inclination is to use as many tools as are available
 > for a project like this.
 > Since we are talking about Open Cobol, we are thinking about
 > *NIX environments.
 > That being the case we should be able to leverage the *NIX
 > tool set where appropriate.
 > For example, a Mainframe Sort utility has a lot of reformatting,
 > filtering and post-processing capability.
 > I would look to the *NIX tool set to provide this.
 > It may be that you need some sort of wrapper to build shell
 > pipelines, and reformatting or filtering scripts to run behind
 > the scenes (for folks not too comfortable in the *NIX environment),
 > but I would try to "stand on other people's shoulders" for
 > something like this.
 > This wrapper could be part 1) that you identified.

The UN*X tools are designed to work in LF(CR) delimited variable length
records. And fields are usually delimited by white-space (tabs, spaces,
etc).

Main-frame tools (COBOL), are designed to work in a fixed length and
binary prefixed variable length records.

So the two methodologies are not really compatible.
It can be done, but it would be easier to change the data from
main-frame like data types to UN*X like data types.
This can be done with a simple COBOL program.


 > 2)
 > That said, then the biggest challenge would be the core sort
 > utility.
 > I took a quick skim of the GNU sort.c code.
 > It seems to me that adding comparison routines for COBOL data
 > types (e.g. packed decimal, etc.) would be less painful than
 > reinventing all the management functions (e.g. temp file
 > handling, etc.).
 > Perhaps some code could be scarfed from Open-Cobol's libcob.
 > This in turn uses the GNU MP multi-precision arithmetic package,
 > so some analysis is required.

The core of the SORT utility would require a comparison and sum
functions for COBOL data types. Both of these functions are available on
the OC run-time library.
So in essence all you would need is to do is create the RT compatible
structures, use the OC RT functions, and then move the data.

I don't know how useful the GNU sort code would be under the
circumstances. But I suspect it would be easier to adapt basic core sort
algorithms. These sources are available on the NET.
You could have multiple sort algorithms if you wanted to do so.

Personally, I think a simple merge (tape) sort would be a good start.

 > ...
 > 4)
 > With regard to what syntax should be supported,my inclination
 > is to be as useful/general as possible.
 > Perhaps something like Open Cobol's -std=<dialect> parm could
 > be used, and various syntax's or capabilities could be supported.

Yes, I think that could be done.
But there are some IP considerations which need to be considered with
this option.


 > 5)
 > I confess to being a Perl fan and am sorry that you have had
 > bad experiences with it.
 > I would propose it for the wrapper in an initial implementation
 > because it is part of the *NIX tool set and comes with a coral
 > reef of enhancements.
 > For example, there are parser modules that could process the
 > SORT syntax given a grammar for them.
 > Using Perl a prototype Sort utility could be put together quickly.
 > Given a prototype, others might be tempted to try it out and provide
 > feedback which could lead to improvements.
 > 6)
 > In the long run, once ideas and design have gelled, things could
 > be rewritten in C (say).
 > 7)
 > To my mind an important first step is to identify a "minimal useful
 > capability" for a sort utility.
 > This would be a collection of enough features that others would use
 > the sort utility as opposed to hand coding something.
 > This "minimal useful capability" could be provided as a prototype
 > and we could start getting feedback from users.
 > Maybe Roger's comments apply here.

I have implemented a SORT syntax, using YACC, which is mostly complete.
The critical part however, in what ever language used, is the ability to
compare and sum COBOL data types.

This functionality could be implemented in a Perl module.

In my view however, it would be less work to use the OC run-time and
write it in C.

BTW, there is a minimal SORT utility called CSORT (1).
It uses a tape sort, and it was originally implemented for DOS, but it
does run on UN*X and Win32, with minor modifications.

Cheers





reply via email to

[Prev in Thread] Current Thread [Next in Thread]