Re: Correspondence between web-pages and Info-pages

emacs-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Correspondence between web-pages and Info-pages

From:	Kelly Dean
Subject:	Re: Correspondence between web-pages and Info-pages
Date:	Tue, 30 Dec 2014 11:17:45 +0000
Stefan Monnier wrote:
> Hey, I think this is a great idea: replace the "(emacs)Title"
> syntax with a URL.  When passed to Info, these URL would be redirected
> to the local Info pages.
>
> The main downside is that those URLs would take up more space.  But the
> upside is not just greater exposure of our HTML manuals to search
> engines, but also the removal of the ad-hoc (info "(emacs)Title") syntax.

Don't overlook two important parts of this: using the same name both for user 
input and for display, and using different names for different formats of a 
page (Info vs. HTML).

Web browsers have some useful navagation features:
0. An address bar, which shows the name of the currently displayed page.
1. A drop-down menu that shows the sequence of visited pages for the current 
buffer, and the current position within that sequence.
2. In the address bar, you can enter a new name and press enter to open that 
page.
3. The name shown is the same string as the string you enter to open the page 
by name.
4. You can copy the name that's shown.
5 Because of the preceding three features, you can save the name into a text 
file that you use as a list of bookmarks, paste the name back into the address 
bar to return to the page, and use the name to cite the page so your readers 
can open it; IOW, you can use the name to link to the page.
6. The name can include a hash mark and section name at the end, so that when 
you open the page, the browser jumps to the named section.

Emacs's Info browser has feature #0, but lacks the rest. Emacs's Info-history 
command partially provides #1, but doesn't show the actual link sequence that's 
traversed by Info-history-back and Info-history-forward. Instead of #2, Emacs 
makes you remember a command («g», for Info-goto-node) for entering the name of 
the page to open. Regarding #3, for example, I'm currently viewing the page 
with the shown name ⌜(elisp)Top > Keymaps > Translation Keymaps⌝[0], but that's 
effectively like an HTML page title; it isn't the name used for opening the 
page.

([0]: I actually had to manually transcribe that name, because incredibly, 
Emacs lacks feature #4. See bug #19471.)

Features #1 and #2 would be nice to have but aren't essential, #4 is essential 
but fortunately is easy to implement, and #6 is unnecessary if pages aren't too 
long. But the lack of #3, and consequently of #5, is the major problem. If you 
adopt URL syntax for page names, be sure to not only use it for Info-goto-node, 
but also display it in the address bar in the Info browser, e.g. 
⌜http://gnu.org/emacs/24.4/docs/elisp/keymaps/translation_keymaps⌝, regardless 
of whatever other syntax (e.g. ⌜(elisp)Translation Keymaps⌝ as the short name) 
might also be usable to open the page. For #2, have the address bar be 
editable, and have Info-goto-node simply move focus to it.

There was a proposal somewhere in this ginormous thread to use the same name 
for both an Info page and an HTML page, and serve the Info page from the local 
cache but the HTML page via HTTP from the official server. That's a bad idea, 
because then the name's scope isn't global; instead, what the name resolves to 
depends on which system (local or remote-official) is queried.

If you try to fix that by relying on the User-agent or some other request 
header to choose which format to return, and having Emacs cache and use the 
format returned by sending ⌜Info⌝ for that header and having web browsers use 
the format returned by sending any other value for that header, then the URL is 
no longer the name of the page; instead, the URL+header is the name, which is a 
facepalm-inducing convention that's already a widespread plague that Emacs 
shouldn't exacerbate, akin to using URL+source-ip for page names in order to 
balkanize the web (conspicuous offenders include Google and CloudFlare).

You could instead conflate the protocol name and the page type name and say 
⌜info:gnu.org/emacs/24.4/docs/elisp/keymaps/translation_keymaps⌝ if you want. 
That would still enable feature #3. Or instead append a ⌜.info⌝ extension to 
the end of the name, like is commonly done with HTML, though that could be 
misleading if the page doesn't have its own dedicated Info file. Both of these 
require you to replace the ⌜info⌝ in the name by ⌜http⌝ or ⌜html⌝ before 
sending the name to non-Emacs users.

I propose a cleaner solution: have the name with no type extension resolve to a 
redirect. Do client-side redirect, not server-side: serve a consistent response 
to all clients (regardless of request headers), containing both a standard HTTP 
redirect that web browsers will follow, and a new Info-file header that Info 
browsers will follow (web browsers will ignore it). The former points to a page 
with the same name but with ⌜.html⌝ appended, and the latter to the Info file 
that contains the requested Info page. This way, the extensionless URL is 
effectively the name of a directory from which browsers automatically choose 
one of two files, but the URL alone, not the URL plus a header, is the name of 
the directory, and the files have their own URLs.

When you receive page URLs from non-Emacs users, it's easy enough to chop off 
the ⌜.html⌝ extension. When you send them page URLs without the extension, 
their browsers will automatically redirect.

For example, if your browser (web or Info) sends this query for a documentation 
page:
GET /emacs/24.4/docs/elisp/keymaps/translation_keymaps HTTP/1.0
Host: gnu.org

then the response is:
HTTP/1.0 302 Found
Location: http://gnu.org/emacs/24.4/docs/elisp/keymaps/translation_keymaps.html
Info-file: http://gnu.org/emacs/24.4/docs/elisp.info

Web browsers will redirect to the URL in the Location header.
Info browsers will:
Fetch the file named in the Info-file header.
Chop the ⌜.info⌝ extension from the value of the Info-file header to get 
Info-base.
Chop Info-base from the front of the original page URL to get the name of the 
page (⌜/keymaps/translation_keymaps⌝ in this case) within the Info file.
Load that page from the file.
In the address bar, display the original page URL.

Info can send all web requests through a cache. Distribute Emacs with the cache 
preloaded with Info files, including the original URL for each of those files.
When Info queries the cache for a cached Info file, the cache returns a file 
descriptor for that file.
When Info queries for a noncached Info file, the cache downloads and caches it 
and returns a descriptor.
When Info queries for for any URL that starts with a string matching the URL of 
a cached Info file (excluding the ⌜.info⌝ extension), and the query URL itself 
doesn't have a filename extension, the cache generates and returns ⌜Info-file: 
X⌝ where X is the URL of the Info file. Info then processes this as a redirect.
When Info queries for anything else, the cache sends the query to the named 
server and returns the response to Info. If the response is a redirect, Info 
processes it.

This way, no network traffic is necessary for cached files. This also lets the 
same cache serve web browsers, not just Info browsers. The cache could be 
preloaded with HTML files for people who really don't like Info, and both Info 
and HTML files for people who like both. The cache doesn't need to be a server; 
it can just be a library, like sqlite is, and integrated into Emacs if only 
Info and Eww use it.

Indirecting through the Info-file header enables splitting or combining Info 
files without affecting the page URLs. E.g. elisp.info could be split up so 
that «keymaps», etc are in separate files, or elisp.info, emacs.info, and all 
the other Info files could be combined into one big docs.info file, but with 
either of those changes, the page URLs would remain unchanged.

It doesn't matter whether URLs are used in Info files (or in Texinfo files), or 
the Info browser just translates the names for input and display. What matters 
for users is just the Info browser's UI. But if Info files use only relative 
names, then the browser must know the original URL of the file in order to 
construct the URL for each page and show that name in the address bar. 
Therefore, the browser can't just search a path on the local system to find 
Info files, like it currently does when the user runs e.g. ⌜(info "(elisp)")⌝, 
unless the file format is changed to include its own URL. Alternatively, and 
more cleanly, the browser could just query the cache and have the cache do the 
search, and the cache can return ⌜Info-file: X⌝ if it finds a match, which Info 
then processes as a redirect.

For any query without a version number embedded in the name, the server should 
respond with a redirect to the same name but with the latest version number 
embedded. This makes it easy to check for updates, and to link to the 
always-latest version of a page.

For non-English manuals, there's no need to embed the language name in the URL; 
just use the source-ip address of the request to choose which version to serve, 
like Google does. (Just checking if anybody is still awake.)
[Prev in Thread]
Current Thread
[Next in Thread]
Re: On being web-friendly and why info must die, (continued)
Prev by Date: Emacs package manager vulnerable to replay attacks
Next by Date: Re: Emacs package manager vulnerable to replay attacks
Previous by thread: RE: Correspondence between web-pages and Info-pages
Next by thread: RE: Correspondence between web-pages and Info-pages
Index(es):
- Date
- Thread