Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!

emacs-orgmode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!

From:	Christophe Pouzat
Subject:	Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!
Date:	Sun, 28 Dec 2014 22:40:24 +0100
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.3.0

Hi all,

After seeing Ken's mail:

Le 26/12/2014 23:47, Ken Mankoff a écrit :

People here might be interested in a publication from [2014-12-19 Fri]
available at http://dx.doi.org/10.1371/journal.pone.0115069

Title: An Efficiency Comparison of Document Preparation Systems Used
in Academic Research and Development

Summary: Word users are more efficient and have less errors than even
experienced LaTeX users.

Someone here should repeat experiment and add Org into the mix, perhaps
Org -> ODT and/or Org -> LaTeX and see if it helps or hurts. I assume
Org would trump LaTeX, but would Org -> ODT or Org -> X -> DOCX (via
pandoc) beat straight Word?

   -k.

and some of replies it triggered on the list, I went to check the paper.As many of you guys I found some "results" puzzling in particular:1. the use of bar graphs when the data would better be displayeddirectly (that qualifies immediately the paper as "low quality" for me).

2. the larger error bars observed for LaTeX when compared to Word.

3. the systematic inverse relationship between the blue and pink barsheights.

So I went to figshare to download the data and looked at them. A quickand dirty "analysis" is attached to this mail in PDF format (generatedwith org, of course, and this awful software called LaTeX!) and thesource org file can be found at the bottom of this mail. I used R to dothe figures (and I'm sure the authors of the paper will then criticizeme for not using Excel with which everyone knows errors are generatedmuch more efficiently).

I managed to understand the inverse relationship in point 3 above: theauthors considered 3 types of mistakes / errors:

1. Formatting and typos error.
2. Orthographic and grammatical errors.
3. Missing words and signs.

Clearly, following the mail of Tom (Dye) on the list and on the Plos website, I would argue that formatting errors in LaTeX are bona fide bugs.But the point I want to make is that the third source accounts for 80%of the total errors (what's shown in pink bars in the paper) and clearlythe authors counted what the subjects did not have time to type as anerror of this type. Said differently, the blue and pink bars are showingsystematically the same thing by construction! The second type of errorin not a LaTeX issue (and in fact does not differ significantly from theWord case) but an "environment" issue (what spelling corrector had theLaTeX users access to?).

There is another strange thing in the table copy case. For both theexpert and novice group in LaTeX, there is one among 10 subjects thatdid produce 0% of the table but still manage to produce 22 typographicerrors!

The overall worst performance of LaTeX users remains to be explained andas mentioned in on the mails in the list, that does not make sense atleast for the continuous text exercise. The method section of the paperis too vague but my guess is that some LaTeX users did attempt toreproduce the exact layout of the text they had to copy, something LaTeXis definitely not design to provide quickly.

One more point: how many of you guys could specify their total number ofhours of experience with LaTeX (or any other software you are currentlyusing)? That what the subjects of this study had to specify...


Let me know what you think,

Christophe

My org buffer:

#+TITLE: An Efficiency Comparison of Document Preparation Systems Usedin Academic Research and Development: A Re-analysis.

#+DATE: <2014-12-28 dim.>
#+AUTHOR: Christophe Pouzat
#+EMAIL: address@hidden
#+OPTIONS: ':nil *:t -:t ::t <:t H:3 \n:nil ^:t arch:headline
#+OPTIONS: author:t c:nil creator:comment d:(not "LOGBOOK") date:t
#+OPTIONS: e:t email:nil f:t inline:t num:t p:nil pri:nil stat:t
#+OPTIONS: tags:t tasks:t tex:t timestamp:t toc:nil todo:t |:t
#+CREATOR: Emacs 24.4.1 (Org mode 8.2.10)
#+DESCRIPTION:
#+EXCLUDE_TAGS: noexport
#+KEYWORDS:
#+LANGUAGE: en
#+SELECT_TAGS: export
#+LaTeX_HEADER: \usepackage{alltt}
#+LaTeX_HEADER: \usepackage[usenames,dvipsnames]{xcolor}

#+LaTeX_HEADER: \renewenvironment{verbatim}{\begin{alltt} \scriptsize\color{Bittersweet} \vspace{0.2cm} }{\vspace{0.2cm} \end{alltt}\normalsize \color{black}}

#+LaTeX_HEADER: \definecolor{lightcolor}{gray}{.55}
#+LaTeX_HEADER: \definecolor{shadecolor}{gray}{.85}
#+LaTeX_HEADER: \usepackage{minted}
#+LaTeX_HEADER: \hypersetup{colorlinks=true}

#+NAME: org-latex-set-up
#+BEGIN_SRC emacs-lisp :results silent :exports none
(setq org-latex-listings 'minted)
(setq org-latex-minted-options
      '(("bgcolor" "shadecolor")
    ("fontsize" "\\scriptsize")))
(setq org-latex-pdf-process

'("pdflatex -shell-escape -interaction nonstopmode-output-directory %o %f"

    "biber %b"

"pdflatex -shell-escape -interaction nonstopmode -output-directory%o %f""pdflatex -shell-escape -interaction nonstopmode -output-directory%o %f"))

#+END_SRC

* Introduction

This is a re-analysis of the data presented in[[http://dx.doi.org/10.1371/journal.pone.0115069][An EfficiencyComparison of Document Preparation Systems Used in Academic Research andDevelopment]]. My "interest" in this paper was triggered by a discussionon the [[http://article.gmane.org/gmane.emacs.orgmode/93655][emacs orgmode mailing list]]. Ignoring the "message" of the paper, what stroke mewas the systematic use of bar graphs: a way of displaying data that*should never be used* since when many observations are considered, abox plot is going to do a much better job and when, like in the presentpaper, few (10 in each of the 4 categories) observations are available,a direct display or even a simple table is going to do a *much better*job. Since it turns out that the data are available both on the Plos website and on[[http://figshare.com/articles/_An_Efficiency_Comparison_of_Document_Preparation_Systems_Used_in_Academic_Research_and_Development_/1275631][figshare]],I decided to re-analyze them.


* Getting the data, etc.

We get the data with:

#+BEGIN_SRC sh
wget http://files.figshare.com/1849394/S1_Materials.xlsx
#+END_SRC

#+RESULTS:

Using for instance [[http://dag.wiee.rs/home-made/unoconv/][unoconv]],we can convert the =Excel= file into a friendlier =csv= file:


#+BEGIN_SRC sh
unoconv -f csv S1_Materials.xlsx
#+END_SRC

#+RESULTS:
We then get the data with =R= =read.csv= function:

#+NAME: data-table
#+BEGIN_SRC R :session *R* :results silent
efficiency <- read.csv("S1_Materials.csv",header=TRUE,dec=",")
#+END_SRC
The description of this table is obtained with:

#+BEGIN_SRC sh :exports both :results output
wget http://files.figshare.com/1849395/S2_Materials.txt
cat "S2_Materials.txt"
#+END_SRC

* Making some figures

We can now make a figure out of the same data as figures 4, 5 and 6 ofthe paper but showing the actual data. We start with the "continuoustext" exercise. We represent, in each of the four categories, each ofthe 10 individuals by a number between 0 and 9. Some horizontal jitterhas been added to avoid overlaps. Category 1 corresponds to expert=Word= users; 2 to novice =Word= users; 3 to expert \LaTeX{} users; 4 tonovice \LaTeX{} users:


#+HEADER: :file continuous.png :width 1000 :height 1000
#+BEGIN_SRC R :session *R* :exports both :results output graphics
layout(matrix(1:4,nc=2,byrow=TRUE))
par(cex=2)
plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),ylim=c(0,100),
     xlab="User category",ylab="",main="Fraction of text")
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               PROZENT1[Kenntnisse==k],
                               pch = paste(0:9))))

with(efficiency,
     plot(c(1,4),c(0,100),type="n",
          xlim=c(0.5,4.5),ylim=range(FEHLERSFT),xlab="User category",
          ylab="",main="Formatting errors and typos"))
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               FEHLERSFT[Kenntnisse==k],
                               pch = paste(0:9))))

with(efficiency,
     plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),
          ylim=range(FEHLEROFT),xlab="User category",ylab="",
          main="Orthographic and grammatical mistakes"))
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               FEHLEROFT[Kenntnisse==k],
                               pch = paste(0:9))))

with(efficiency,
plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),ylim=range(FEHLENDFT),
          xlab="User category",ylab="",main="Missing words and signs"))
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               FEHLENDFT[Kenntnisse==k],
                               pch = paste(0:9))))
#+END_SRC

Notice that the number of "missing words and signs" exactly mirrors thefraction of written text. We will see that this observation holds forthe two following exercises. This "missing words and signs" is alwaysroughly ten times as large as the two other sources of mistakes. Thisexplains the inverse relationship between the blue and pink bars on eachof the 3 figures.


Let's keep going with the "table exercise":

#+HEADER: :file table.png :width 1000 :height 1000
#+BEGIN_SRC R :session *R* :exports both :results output graphics
layout(matrix(1:4,nc=2,byrow=TRUE))
par(cex=2)
plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),ylim=c(0,100),
     xlab="User category",ylab="",main="Fraction of text")
with(efficiency,sapply(1:4,
                       function(k) points(runif(10,k-0.2,k+0.2),
                                          PROZENT2[Kenntnisse==k],
                                          pch = paste(0:9))))

with(efficiency,plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),
                     ylim=range(FEHLERST),xlab="User category",
                     ylab="",main="Formatting errors and typos"))
with(efficiency,sapply(1:4,
                       function(k) points(runif(10,k-0.2,k+0.2),
                                          FEHLERST[Kenntnisse==k],
                                          pch = paste(0:9))))

with(efficiency,plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),
                     ylim=range(FEHLEROT),xlab="User category",
                     ylab="",main="Orthographic and grammatical mistakes"))
with(efficiency,sapply(1:4,
                       function(k) points(runif(10,k-0.2,k+0.2),
                                          FEHLEROT[Kenntnisse==k],
                                          pch = paste(0:9))))

with(efficiency,plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),
                     ylim=range(FEHLENDT),xlab="User category",ylab="",
                     main="Missing words and signs"))
with(efficiency,sapply(1:4,
                       function(k) points(runif(10,k-0.2,k+0.2),
                                          FEHLENDT[Kenntnisse==k],
                                          pch = paste(0:9))))
#+END_SRC

We also see a strange thing here: in each of the expert \LaTeX{} and thenovice \LaTeX{} users we have one individual who did not right anythingbut still manage to produce 22 "formatting errors and typos" (!) butluckily no orthographic or grammatical error...


#+BEGIN_SRC R :session *R* :exports both
with(efficiency,cbind(c(PROZENT2[Kenntnisse==3][10],
                        FEHLERST[Kenntnisse==3][10],
                        FEHLEROT[Kenntnisse==3][10],
                        FEHLENDT[Kenntnisse==3][10]),
                      c(PROZENT2[Kenntnisse==4][7],
                        FEHLERST[Kenntnisse==4][7],
                        FEHLEROT[Kenntnisse==4][7],
                        FEHLENDT[Kenntnisse==4][7])))
#+END_SRC


Now for the "equations" exercise:

#+HEADER: :file equation.png :width 1000 :height 1000
#+BEGIN_SRC R :session *R* :exports both :results output graphics
layout(matrix(1:4,nc=2,byrow=TRUE))
par(cex=2)
plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),ylim=c(0,100),
     xlab="User category",ylab="",main="Fraction of text")
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               PROZENT3[Kenntnisse==k],
                               pch = paste(0:9))))

with(efficiency,
     plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),
          ylim=range(FEHLERSFOR),xlab="User category",ylab="",
          main="Formatting errors and typos"))
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               FEHLERSFOR[Kenntnisse==k],
                               pch = paste(0:9))))

with(efficiency,
plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),ylim=range(FEHLEROFOR),
          xlab="User category",ylab="",
          main="Orthographic and grammatical mistakes"))
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               FEHLEROFOR[Kenntnisse==k],
                               pch = paste(0:9))))

with(efficiency,
     plot(c(1,4),c(0,100),type="n",xlim=c(0.5,4.5),
          ylim=range(FEHLENDFOR),xlab="User category",ylab="",
          main="Missing words and signs"))
with(efficiency,
     sapply(1:4,
            function(k) points(runif(10,k-0.2,k+0.2),
                               FEHLENDFOR[Kenntnisse==k],
                               pch = paste(0:9))))
#+END_SRC



--
A Master Carpenter has many tools and is expert with most of them. If you only 
know how to use a hammer, every problem starts to look like a nail. Stay away 
from that trap.

Richard B Johnson.

--

Christophe Pouzat
MAP5 - Mathématiques Appliquées à Paris 5
CNRS UMR 8145
45, rue des Saints-Pères
75006 PARIS
France

tel: +33142863828
mobile: +33662941034
web: http://xtof.disque.math.cnrs.fr

EfficiencyComparison.pdf
Description: Adobe PDF document

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [O] Efficiency of Org v. LaTeX v. Word, (continued)
- Re: [O] Efficiency of Org v. LaTeX v. Word, Nick Dokos, 2014/12/26
  - Re: [O] Efficiency of Org v. LaTeX v. Word, Peter Neilson, 2014/12/27
    - Re: [O] Efficiency of Org v. LaTeX v. Word, Eric S Fraga, 2014/12/27
- Re: [O] Efficiency of Org v. LaTeX v. Word, Achim Gratz, 2014/12/27
- Re: [O] Efficiency of Org v. LaTeX v. Word, Paul Rudin, 2014/12/27
  - Re: [O] Efficiency of Org v. LaTeX v. Word, M, 2014/12/27
    - Re: [O] Efficiency of Org v. LaTeX v. Word, Fabrice Popineau, 2014/12/27
    - Re: [O] Efficiency of Org v. LaTeX v. Word, Pascal Fleury, 2014/12/28
- Re: [O] Efficiency of Org v. LaTeX v. Word, Daniele Pizzolli, 2014/12/27
- Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!, Christophe Pouzat <=
  - Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!, Thomas S. Dye, 2014/12/29
  - Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!, Colin Baxter, 2014/12/31

Prev by Date: Re: [O] changing cells in an org-mode table with Emacs lisp
Next by Date: Re: [O] Efficiency of Org v. LaTeX v. Word
Previous by thread: Re: [O] Efficiency of Org v. LaTeX v. Word
Next by thread: Re: [O] Efficiency of Org v. LaTeX v. Word ---LOOK AT THE DATA!
Index(es):
- Date
- Thread