[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Bug-wget] Hello again

From: michael
Subject: Re: [Bug-wget] Hello again
Date: Fri, 12 Oct 2018 13:58:11 +0300

Hello Darshit Shah,

Converting a CMS system to static HTML pages is not a solution that suite all. 
Some sites which want to be 'dynamic' and retain "backward flik-flak" abilities 
might not use wget2 and retain their CMS or software behavior.

Many people creating a website use CMS to generate the site because of its 
abilities to retain uniform website and make every change in GUI site-wide. 
Those people might want to have the static website as it is faster to download 
(Google SEO factor) and much more secure - hiding the CMS location and 
preventing login attempts.

If those people would want to retain features as RSS feeds, we might be able to 
tell them how they can have it.

If a website contains some hidden pages that are connected by JavaScript code, 
the programmer might create a shell script calling wget2 specifying each hidden 
page location.

Have a good weekend!


-----Original Message-----
From: 'Darshit Shah' <address@hidden> 
Sent: Thursday, 11 October, 2018 12:35 PM
To: address@hidden
Cc: address@hidden
Subject: Re: [Bug-wget] Hello again

* address@hidden <address@hidden> [181009 17:12]:
> Hello Darshit Shah,
> Thank you for your welcome message. I am glad to be part of your project!
> I don't understand the term "javascript engine". AFAK javascript is code that 
> run on the browser side, and we have no problem fetching it.
Exactly! Javascript is code that is executed on the client side and hence
requires a javascript engine which interprets the code and executes it.
However, Wget does not and will not package a javscript engine in order to run
those scripts. This means, sites where Javascript is used to create hyperlinks
won't work well when scraped through Wget.
> There might be an "ajax" issues with sites rely on it. Ajax is dealt heavy by 
> programmers and they will have to take some action on their site to 
> incorporate the engine.

Similarly, sites that use Javascript to show menus or create AJAX requests are
usually not amenable to being scraped as a static HTML page.
> POST requests to comments and mail will need to taken care of so they will 
> work on static site. One solution is to do hosted supplier that will carry 
> the task and deliver spam removal as well.
> I think I will be able to a howto document on that.
> Michael
> -----Original Message-----
> From: Darshit Shah <address@hidden> 
> Sent: Tuesday, 9 October, 2018 2:52 PM
> To: address@hidden
> Cc: address@hidden
> Subject: Re: [Bug-wget] Hello again
> Hi Michael,
> Nice to hear from you again. I vaguely remember a mention of someone who 
> wanted
> to work on this feature. When deciding to make this work, please remember that
> any of this can only work if the site does not rely on Javascript; which given
> Wordpress is a difficult thing. The reason for this is that we do _not_ intend
> to ship a javascript engine alongwith Wget2. It is too large, unwieldy and too
> much of a maintenance nightmare. However, if the site can work without
> Javascript, then I would assume that Wget2 can already handle making a static
> copy. If it can't handle something, please let us know / file a bug report
> about it.
> Of course, I welcome you to work on Wget2 as you see fit. And we would love to
> look at any contributions you can make. We will also try and help you out as
> much as possible when dealing with the codebase.
> About the dev setup, I only use vim and gdb to work with Wget. As Tim has
> already mentioned, he uses Netbeans and might be able to help you out.
> You also mentioned something about the lib/ directory. That is an
> auto-generated dir with compatibility libs that you don't need to care about.
> All the code for Wget2 is in src/ and the code for the library is in libwget/.
> Those are the two main directories you need to care about. And of course 
> tests/
> for the tests.
> * address@hidden <address@hidden> [181008 21:22]:
> > 
> > Hello again,
> > 
> > My name is Michael. I have approached you about a year ago.
> > 
> > I am interested in making wget2 a tool that can convert content management
> > systems (like WordPress) output to HTML. This actually limits the content
> > management system to generate the website every time it is changed, and the
> > presentation is done using the HTTP server only.
> > 
> > This is an important feature as it prevents security risk - penetration of
> > hacker to the site and installing viruses or stealing data.
> > It also allows the website to be delivered much faster as no PHP code needs
> > to run in order to deliver the content. Google already announced that site
> > download speed is a factor in its SEO evaluation.
> > 
> > I will be able to work for 3 hours every week on the project. I do need some
> > guidance from you.
> > 
> > I have started to configure Netbeans IDE as using a debugger can help me
> > delve into the code much faster. There are some issues with the Netbeans. Do
> > you use Id? Which one?
> > 
> > Best regards,
> > 
> > Michael
> > 
> > 
> > 
> > 
> -- 
> Thanking You,
> Darshit Shah
> PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

Thanking You,
Darshit Shah
PGP Fingerprint: 7845 120B 07CB D8D6 ECE5 FF2B 2A17 43ED A91A 35B6

reply via email to

[Prev in Thread] Current Thread [Next in Thread]