[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: webarchive saved web pages
From: |
Fred Kiefer |
Subject: |
Re: webarchive saved web pages |
Date: |
Sat, 22 Nov 2014 23:53:33 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.2.0 |
I have this small tool that I have been using for ages to convert
property lists into different formats to analyse them in detail.
I never thought it would be useful for anybody else, but feel free to
try it on your webarchives.
You will end up with another property list, that is just as readable by
GNUstep as was the original one. But it might be easier for you to read
the content. You will see just as Ivan described that there is one
WebMainResource plus a list of WebSubresources all consisting of the
same structure. And under the key WebResourceData of each you will find
the file content. I did not look at how these files are encoded, from
the original file I would expect them to be another binary property
list. Maybe these are in NSFileWrapper serialized format, a format the
GNustep is still missing.
Fred
Am 17.11.2014 um 16:23 schrieb Ivan Vučica:
> It's a binary plist. I'm unaware of a plist editor or converter
> targeting GNUstep as such, but I think GNUstep does support bplists.
> Hence, you should be able to write a converter/extractor yourself.
>
> However, I'm looking at a arbitrary .webarchive I found online. I'm only
> guessing the structure and contents from viewing this file using 'less'.
> So -- and I'm only guessing -- it seems to me that files don't have a
> "local" name or ID, and aren't referenced using one. That is, HTML
> doesn't seem to be rewritten to use a local file. I would suspect that
> Safari intercepts resource loads and serves content stored in
> NSDictionary, which (if I'm guessing right) is ingenious.
>
> It does mean a trivially 'unpacked' file will not find its local copy of
> resources.
>
> Perhaps the right way to do this is to take a web browser apart and hack
> it to access appropriate files in the .webarchive when a resource load
> is requested. Or even to build a GNUstep browser. If you're enthusiastic
> enough, Chromium Embedded Framework sounds like a neat way to build a
> GNUstep browser.
> https://code.google.com/p/chromiumembedded/
>
> On Mon, Nov 17, 2014 at 2:58 PM, Gerold Rupprecht <geroldr@bluewin.ch
> <mailto:geroldr@bluewin.ch>> wrote:
>
> Hi,
>
> Has anyone been able to unarchive a "webarchive" formatted file?
>
> Apparently it is made with a Safari browser. Are there any other options
> to installing Safari and wine to a GNU/linux machine?
>
> Any other advise?
>
> Thanks,
>
> Gerold
plconv.m
Description: Text Data