pan-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Pan-users] Here's how I use open-source tools expressly designed for up


From: SciFi
Subject: [Pan-users] Here's how I use open-source tools expressly designed for uploading binary files to Usenet.
Date: Thu, 16 Jun 2011 17:17:27 +0000 (UTC)
User-agent: Pan/0.135 (Tomorrow I'll Wake Up and Scald Myself with Tea; GIT 5acc022 release (address@hidden); x86_64-apple-darwin10.7.0; gcc-4.2.1 (build 5666 (dot 3) 32-bit))


Hi,

I might be using MacOSX, but I try doing everything the *ix/X11 way,
including uploading binary files to the world-wide Usenet.

There *are* several tools available for *ix users
to do file-uploads to Usenet.

(heh, I've always wanted to document how I do this
 without any of those crazy GUI apps that hide everything going on)

In the realm of *ix systems,
using small(er) pieces of code,
putting them together
to do a larger task,
is the very nature of this beast.  ;)

Before the imhotep82 tree became available
(and THANK YOU Mr.Mueller, K.Haley, and the others),
I have been using the yencee project at SourceForge
http://yencee.sourceforge.net/
http://sourceforge.net/projects/yencee/
(or from your repo if it's available there)
for the actual uploading to NNTP servers.
Its main code is written in Perl
(using the well-tested NNTP functions therein)
with a C module to do the actual yEnc encoding.
This is /very/ customizable,
which means ya gotta learn some things
about NNTP and file distribution etc.
before making effective use of it
(and not p!$$ing other users off <g>).

To make the files ready for uploading,
sometimes I use the standard CLI compression tools
such as RAR, 7z, etc.
(if a "family" of files need to be combined, such as DVD VOBs),
with the options to split the archive up into "parts" (or "volumes")
that will fit under the NNTP server's article size limit.

Mostly when uploading a single big file (AVI, MPEG, etc.)
(which is already compressed by its very nature),
I use the GNU coreutils 'split' command
<http://www.gnu.org/software/coreutils/>
(or from your repo if it's available there)
or the simple 'bsplit.c' programs easily found around the web
(such as these versions:
<http://www.linuxforums.org/forum/ubuntu-linux/148798-split-command-need-some-help.html#post707718>
<http://iraf.noao.edu/iraf/ftp/util/bsplit.c>),
again to fit each "part" under the NNTP server's article size limit.
The other-end user (downloading your post)
will use simple concatenation tools
to restore the original file on his/her/its end there.
(*ix users simply use 'cat',
while m$ users can use 'copy /b',
on all files in proper order.)

I'll discuss the article size topic further below,
it's very important to understand.

But let me get to the numbered suffix topic.
One of the things about m$ users
is that they want the first numbered split file
e.g. with the .000 suffix
to be some kind of "index" for the other numbered files
(to be used in m$ apps such as MasterSplitter etc.).
Of course, *ix was here on this planet before any m$ system,
with .000 being the _first_ split file,
full of required useful data for reconstructing the original file,
not their "index" file of any kind.
So, invariably, the .000 file causes confusion
with the way Usenet has been taken-over by m$ users.
I've found the GNU-coreutils 'split.c' module is written
in a very ugly manner, and can't be easily altered to change the way
its suffix-generator code is fashioned.
Instead, to try remeding this,
I have recently switched to the simple 'bsplit.c' file-splitter
(one of those mentioned above)
and did a very minor patch to them
to increase the number of digits in the generated suffix (from 2 to 3)
and to cause the first numbered split file to start with .001
with *no* Zero file .000 generated at all --
I'm sure this is a simple-enough exercize for you-all to perform.  ;)

What about generating PAR2 recovery blocks?
Get the parchive project at SourceForge
http://parchive.sourceforge.net/
http://sourceforge.net/projects/parchive/
(or from your repo if it's available there).
Again, this is highly customizable,
to the point ya really need to learn how to use it properly.
The main point here is
to match the par2 "blocksize"
to the split-up "partsize"
(or an even-multiple)
so each "part" can be reconstructed as an entity
if needed by the other-end user (downloading your post).

Let's go back to the article size topic.

I happen to be a "member" of an unorganized group of Usenet posters
who believe we need to take full advantage of the article size
and fill-it-all-up per "message" posted
when it comes to uploading binary files.
Most posters won't realize this,
because of their GUI app doing all the hidden work,
and they *never* fill-up each article anywhere-near to the max possible.
Besides, those software authors are following age-old "rules"
on setting the article size way too small for today's systems.

How do you find out the size your NNTP server is designed to hold
in an article?
This info is likely located in their FAQ section
or in their BBS/forums for that service
(sometimes paying users can access a special section).
Else, open a support ticket with their staff, and ASK them.  ;)

For example, I can use Astraweb and Giganews
(I pay for unlimited accounts on both services).
Astraweb's published info on article size is here:
<http://helpdesk.astraweb.com/index.php?_m=knowledgebase&_a=viewarticle&kbarticleid=25>
which gives us about 1.5-megabytes per article, max,
including headers etc.

For Giganews, you need to be a paid member
to access their 'private' newsgroups,
then search around for their article size info.
Turns out GN seems to have a limit of 4-megabytes per article
which is probably too high to pass around the world.

I would then use Astraweb's smaller max size
which seems to go everywhere dandily
(yes I've tested with a few Usenet friends on other systems).

Why do we worry about using the whole article size as much as possible?
Let me ask a related question:
Are you tired of waiting for the "new headers" to come over?
If posters would use a bigger article size,
there would not need to be so-many split files
and therefore not so-many headers
for the same file.
See?:  Less headers for the same file, faster operations overall.  ;)

There's another reason for lessening the headers stored on a server.
In relation to the "retention wars"
going on between several Usenet providers,
the number of headers is already higher
than the U.S. Debt.  (lol)  (well maybe not /that/ high… yet…)
Giganews warned about this a few years ago:
<http://www.giganews.com/news/article/64-bit-Usenet.html>
At least one newsgroup is already feeling this pinch
(the infamous boneless group)
and has already caused software problems in many newsreaders.
(I believe/hope the newer Pan versions have the fixes
we discussed back then.)

As a comparison, to explain the problem with details:
Most Usenet posts I've seen,
using the popular m$ apps,
will use about 300-kilobytes per article
(this is easily seen inside NZB files btw).
Considering Astraweb's lower max-size limit,
these posters ought to set-up their software
to pack about 4-times MORE lines per article.
This would generate 4-times LESS number of headers
(IOW one-quarter the usual number of headers)
thus keeping the huge number problem from occurring so quickly
in other large newsgroups.
(not to mention quicker "new headers" for you!)


To get all these pieces of code working together,
I've written some shell scripts
that use simple built-in math functions
to compute the parms needed
to be passed to the CLI tools mentioned above.
This tames the "nature of the beast".  ;)

BTW
We've actually used these scripts & tools
to do the posts for certain TV series etc. in the recent past,
which should still be available on "good" servers.


Final words:
I consider Usenet to be the very-first "cloud storage system"
that was ever invented.
With the "retention wars" going on,
wherein hopefully you won't ever lose your recent-ish postings
(a big reason people will do "reposts"),
it seems these companies have that idea, too.

If Usenet starts to falter
because of SPT (Stupid Political Tricks) and such,
I for one will be a very loud human, hint-hint.

…


Footnotes:
(I think I better be pedantic here
 rather than mushy.  ;)
 Nomenclature lesson:
 When I use the term 'article',
 it also means 'message'.
 Each one needs a separate Message-ID to be locatable by the NNTP server.
 These terms do /not/ mean "the posted file"
 as you'll see in most newsreaders
 since large files are split into smaller articles
 to fit under the NNTP server's size limit.
 Most newsreaders, including Pan,
 will denote how-many articles comprise the shown file
 by the final '(number)' in the Subject line
 in the Header Pane view.
 [This also requires certain logic in the poster's software
  which uses Message-ID and Reference header lines
  to "join-up" the "thread" of articles
  that comprise the posted file.
  The aforementioned yencee project, among others,
  /will/ join-up the articles in this fashion.])

{Does Mr.Mueller's new code in Pan do all this I've mentioned above?
 How do we adjust the "article size" etc. in his new code?
 I've not tested his file-upload functions yet.}






reply via email to

[Prev in Thread] Current Thread [Next in Thread]