emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

recognizing a file by scanning it


From: Thien-Thi Nguyen
Subject: recognizing a file by scanning it
Date: Sun, 27 Apr 2008 13:36:50 +0200
User-agent: Gnus/5.13 (Gnus v5.13) Emacs/23.0.60 (gnu/linux)

Some time back (within the last half-year or so) there was discussion
about Emacs being able to recognize file types by scanning content
rather than (or in addition to) using name-based heuristics.

One model for such a capability is the external command file(1), which
takes as its data a magic(5) file containing (possibly-chained) rules
specifying where and what to look for in the target file in order to
make a match, and additionally what to display on match.

For example, here is a fragment of ~/.magic ("|"-prefixed):

|# Emacs 18 - this is always correct, but not very magical.
|0      string  \012(         Emacs v18 byte-compiled Lisp data
|# Emacs 19+ - ver. recognition added by Ian Springer
|# Also applies to XEmacs 19+ .elc files; could tell them apart if we had regexp
|# support or similar - Chris Chittleborough <address@hidden>
|0      string  ;ELC
|>4     byte    >19
|>4     byte    <32           Emacs/XEmacs v%d byte-compiled Lisp data

I have written a Scheme program to translate this into sexps amenable
to both Scheme and Emacs Lisp `read'.  To continue the example:

|(0 0 string (= . "\n(") "Emacs v18 byte-compiled Lisp data")
|(0 0 string (= . ";ELC") "")
|(1 4 byte (> 19) "")
|(1 4 byte (< 32) "Emacs/XEmacs v%d byte-compiled Lisp data")

(See <http://www.gnuvola.org/data/> for the complete translation.)

The Scheme program also mimics basic file(1) functionality; it can
recognize an unknown bag of bytes using the rules in either the original
magic(5) format or the translated-to-sexps variant, displaying output
indistinguishable (for the most part) from that of "file -n -N".

|$ ls="src/temacs etc/images/info.pbm lisp/startup.el lisp/startup.elc"
|$ for f in $ls ; do file -n -N $f ; ttn-do magic $f ; done
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for 
GNU/Linux 2.4.1, dynamically linked (uses shared libs), for GNU/Linux 2.4.1, 
not stripped
|src/temacs: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV)
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|etc/images/info.pbm: Netpbm PBM "rawbits" image data
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.el: Lisp/Scheme program text
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data
|lisp/startup.elc: Emacs/XEmacs v23 byte-compiled Lisp data

Although it lacks advanced file(1) functionality (integrated ELF
grokking, charset guesstimation, fancy printf(3) output, etc), i
consider it complete enough to be a good starting point for a port to
Emacs Lisp.  (Indeed, Emacs is much nicer for implementing such features
as charset guesstimation.)

But before continuing, i would like to discover if anyone else is
working on something similar, to avoid (more?) duplicate effort.

thi




reply via email to

[Prev in Thread] Current Thread [Next in Thread]