Wednesday, November 25, 2009

Spaces in TeX, part II

Finally back. A while back, I covered spaces in TeX and why they don't always appear. Today's much shorter post will cover a few additional places spaces do not appear. The most obvious place one might expect a space—a vertical one this time—comes from consecutive new lines. The thought goes, if one blank line produces a newline, then surely two blank lines will produce two newlines. This reasoning is incorrect. From before, recall that TeX produces \par tokens when it sees end of line characters in state N. What TeX does with a \par token depends on its mode. There are six different modes, vertical, internal vertical, horizontal, restricted horizontal, math, and display math, but to simplify matters, let's only consider vertical and horizontal. (The internal vertical and restricted horizontal are similar enough to vertical and horizontal for our purpose.) TeX starts in vertical mode and then transitions to horizontal mode (and does things like indentation) to typeset a paragraph. When it sees a \par token, it transitions back to vertical mode. While in vertical mode, \par tokens are ignored. Thus, while two blank lines produce two \par tokens, the first causes TeX to return to vertical mode and the second is ignored. The TeX primitive \vskip or the LaTeX macro \vspace can be used in vertical mode to get additional space, as desired. There are a few other places where TeX ignores spaces, but those are fairly rarely encounted, so I'll only briefly mention them. When TeX expects to see a number, the number can be given as a sequence of digits followed by an optional space. There's actually one place where the optional space is fairly important and that is changing the category code of a character and then using that character immediately. Even less frequently encountered, at least by me, is starting a paragraph with an \hbox causes the paragraph to not be indented. Use of the LaTeX macro \mbox solves this problem, as does starting the paragraph with \indent or \noindent, as desired. And I think that wraps up all I have to say about spaces in TeX.

Friday, October 16, 2009

A quick interlude into fonts

Dealing with fonts in LaTeX is one of the hardest aspects of using it. Fortunately, if one does not want to use Knuth's Computer Modern fonts, there is a very simple way to change font families. Full details are here, but for a publication that requires a Times roman typeface, one should use
\usepackage{mathptmx}
\usepackage[scaled=.92]{helvet}
\usepackage{courier}
which causes the roman and math fonts to be Times, the sans serif fonts to be in Helvetica—scaled so that it matches the other fonts better—and courier for the typewriter family. These three look nice together. In addition, output font encoding can be changed to T1 with
\usepackage[T1]{fontenc}
which is recommended. For more details see the above link

Wednesday, October 14, 2009

Spaces in TeX

Spaces appear all over .tex files but only some of them appear as actual spaces in the output. To understand this, we first need to understand how TeX reads lines of input.

When reading input, TeX is in one of three states: state N is when TeX is at the beginning of a new line; state M is when TeX is in the middle of the line; and state S is when TeX is skipping spaces. TeX will discard space characters it sees in any state except for M. Basically, TeX starts in state N and on the first non space character (actually, it's slightly more complicated, but for for the purposes of this post, just consider tabs as spaces), it transitions into state M. While in state M, each character that is read is turned into a token except that control sequences are turned into a single token (again, it's more complicated than that, but this will suffice). Once a space is encountered, a space token is created and TeX enters state S. Again, a nonspace character brings TeX into state M. As an example, consider the line of input:
Hello      \TeX!
TeX begins in state N and then upon reading the H transitions into state M and produces an H token. Then e, l, l, and o tokens are produced in turn.

Upon reading the first space, TeX produces a space token and then enters state S. The rest of the spaces up to the \ are ignored. Once TeX reads the \, it will scan the rest of the control sequence and produce a single \TeX token. Finally, TeX produces a ! token. I haven't said what state TeX enters when it scan a control sequence. The answer depends on what type of control sequence it is. If the first character after the \ is not a letter, for example if it's a symbol like @ or #, then TeX produces a token consisting of the control symbol. (For example, the token \@ or \#.) In this case, TeX enters (or remains in) state M. If instead, the first character after the \ is a letter, then TeX reads a control word consisting of the \ and all following letters. TeX then enters state S. This explains why TeX ignores spaces after control words like \TeX or \bf. So in the example above, since \TeX is a control word, TeX will enter state S after reading that control word and then immediately enter state M when it reads the !.

Before we can move on, there are two points I skipped. Before TeX starts processing a line of input, it deletes all space characters at the right of the line and inserts a carriage return character which, by default, is the end of line character. So to conclude the discussion of a single line, we need to know what happens with comment characters and end of line characters. For a comment character, all information on the rest of the input line is thrown away and TeX starts on the next line of input in state N. For an end of line character, TeX throws away all remaining input on the line (just like a comment) and then does one of three things. If TeX is in state N, then it produces a \par token. If TeX is in state M, it produces a space token. If TeX is in state S, it ignores the end of line character.

Let's consider the implications of the handling of the end of line character. In state N, it produces a new paragraph which is why entering a blank line in your TeX source gives you a new paragraph. In State M, it produces a space which is why we can sprinkle newlines (almost) anywhere we like in our source and we get spaces. If spaces are being skipped, for example after a control word (but not a control symbol!), then the end of line does nothing. [Okay, one final lie above, after a control symbol consisting of \ and a space, TeX enters state S. This is so that \ followed by two spaces does not produce two space tokens.] To summarize, when TeX reads a line of input, it
  1. removes trailing spaces and adds a carriage return,
  2. enters state N,
  3. reads characters, creating tokens and changing states as described above until it,
  4. reaches the end of line character which is either turned into a \par token, a space token, or ignored, depending on the current state.

This is not the end of the story as there are situations where TeX ignores space tokens and \par tokens a.k.a, "why don't I get more blank lines when I enter more blank lines in my source?" However, this post is long enough, so I'll put that off for now and discuss modes, next time.

Monday, October 5, 2009

Defining new math operators

Defining a new math operator that behaves similar to \sin or \lim is very easy to do using the amsmath package. It provides a \DeclareMathOperator macro that works in the preamble to declare a new operator. It also contains a starred version that behaves similar to \lim with respect to subscripts. For example:
\DeclareMathOperator\arcsec{arcsec}
\DeclareMathOperator*\Lim{Lim}
In addition, \operatorname or \operatorname* can be used for one-time uses that don't warrant defining a new control sequence for the operator name. These are better than using \mathrm to define operator names if for no other reason than spacing is handled correctly in the presence or absence of parentheses.

Friday, October 2, 2009

Getting publication quality tables is easy

One of the problems with reading about various aspects of typography is I start to see the short comings in other's work, and far more importantly, in my own. Creating tables is one area where this is certainly true. In my experience, nearly every document prepared with LaTeX that contains a table, contains an ugly table. I'm not sure what the reason for this is, but everybody seems to want to make tables that look like the following.
\begin{tabular}{|c|c|c|}
\hline
A & B & C\\
\hline\hline
foo & bar & baz\\
\hline
zab & rab & oof\\
\hline
\end{tabular}
(As usual, try this out here.) There's no reason at all for each cell to be boxed that way. I rather suspect this comes from looking at too many ugly HTML tables one gets by default. Fortunately, the solution is very simple. Use the booktabs package. I strongly encourage anyone writing a table to read the documentation (pdf). The use is very simple.
\begin{tabular}{ccc}
\toprule
A & B & C\\
\midrule
foo & bar & baz\\
zab & rab & oof\\
\bottomrule
\end{tabular}
(You'll need to select the booktabs package in the previewer above to try this out.) Notice that in addition to looking better, this actually requires less work to produce! There is a \cmidrule that works similar to \cline, but is more flexible and can actually be used in adjacent columns; however, I don't often find a use for any rules but the three above. Finally, the author of booktabs gives 2 guidelines for making publication quality tables. 1) Do not use vertical rules; and 2) Do not use double rules. These are excellent guidelines. Follow them.

Friday, September 25, 2009

Numbering every paragraph

One occasionally finds cause to number every graf in a document. Well, I never have, but I've seen it done. There's a neat little trick that is mentioned in passing in an exercise in the TeXbook that easily enables this. This relies on a TeX primitive \everypar which is essentially a token register which is to say that it holds a list of tokens that can be used again and again. What exactly a token is is a topic for another post. When TeX starts a new paragraph (or more precisely, when it enters horizontal mode), it does two things. First, it inserts an empty box of width \parindent (this is the indentation that occurs at the start of every graf) and then it will process the tokens defined by \everypar before going on to process the rest of the tokens that make up the graf. The upshot of this is that we can cause TeX to do something at the start of every graf, but after it inserts the indentation glue. The way to use this is to doing something like the following.
\newcounter{grafcounter}
\setcounter{grafcounter}{0}
\everypar={\addtocounter{grafcounter}{1}%
        \arabic{grafcounter}. }
Thus, at the start of every graf, 1 will be added to our counter and then the value of the counter typeset as arabic numerals followed by a period and a space will be prepended each graf. This is close, but not exactly what we want. We'd really like the graf number to appear in the margin. To do this, we use \llap—which is supposed to be Left overLAP. The way this works is that the argument to \llap will be typeset overlapping anything to the left. In this case, we can change the \everypar line to be
\everypar={\addtocounter{grafcounter}{1}%
        \llap{\arabic{grafcounter}.\hskip2\parindent}}
which will cause the graf number to appear indented a single \parindent into the margin—one \parindent to cancel the initial indentation and then a second to indent left into the margin. If there are already tokens in \everypar, we might not want to lose them. In this case, we can use \expandafter to append tokens, similar to how we appended to a macro.
\everypar=\expandafter{\the\everypar
        \addtocounter{grafcounter}{1}%
        \llap{\arabic{grafcounter}.\hskip2\parindent}}
Of course, we might want the other tokens to follow our graf numbering. Prepending; however, is a topic for another day.

Thursday, September 24, 2009

Floating figures

One of the most frequently asked questions about LaTeX seems to be: How does one put a figures right here? The answer is, of course, very simple: Don't put material you don't want to float in a floating environment. This is the LaTeX equivalent of, "Doctor, it hurts when I do this." Shortly after giving this answer, the questioner usually responds that she had no idea that \includegraphics—you are using \includegraphics for including your graphics, aren't you?—could be used outside of the figure environment. There is nothing special about \includegraphics from TeX's point of view: It simply creates a box, and everything is a box (unless it's glue, or a penalty, or a whatsit, or a...). Want to stick a graphic right here? No problem, \includegraphics{foo} will insert foo.eps if you're running latex or foo.{jpg,png,pdf} if you're running pdflatex. Well that's simple, but something's missing. In fact, two things are missing. First, one cannot simply use \caption to produce a caption for this graphic. Second, one cannot reference this graphic using \label and \ref since this requires, among other things, a \caption. Here's the metric I use when deciding whether to make a graphic floating or not. If I want to give it a caption or reference it in text, then I make it float and give it a caption. If I don't reference it in text and it has no caption, then I might not let it float since it might seem out of place if floating. Of course, there are very few reasons why one might want a graphic in a scholarly publication that isn't referenced. This is a fairly simple rule to follow and solves the problem. At this point, there are two things I feel I should mention. The first is that there is a package that will allow one to place a figure environment right here; however, using that violates my rule, so I'll say nothing more about it. The second thing to mention is that LaTeX2e has a bug in its figure placement algorithm when using the twocolumn documentclass option (at least for the standard classes) and the figure* environment. This bug can cause two column figures to be placed out of order with respect to single column figures. The fix for this is very simple: place \usepackage{fixltx2e} into the preamble of your document (between \documentclass and \begin{document}). Actually, the package does more than just fix that bug and should probably be used in every LaTeX document.

Wednesday, September 23, 2009

Appending to a macro

On occasion, one wishes to modify an existing macro by appending additional text or macros. For a simple example, suppose we have a complicated macro \foo and what we'd like is to replace all instances of \foo by \foo bar&emdash;that is, the macro \foo followed by the three tokens "bar". Now, potentially, one could replace all instances of \foo with "\foo bar", but this isn't always feasible, especially if \foo is normally expanded by some other macro in a package or class file. A first attempt at a solution—which won't work— is to try
\def\foo{\foo bar}
This doesn't work for the obvious reason that while expanding \foo, it will expand \foo, which will expand \foo, and then that will expand... A second attempt which is more successful is to use \let along with a second, helper macro.
\let\fooHelper\foo
\def\foo{\fooHelper bar}
This works but seems inelegant in that you have to keep the \fooHelper macro around. Any changes made to it change \foo. What we really want is a way to expand \foo in the replacement text in our first attempt above. We can use the TeX primitive \expandafter for this. \expandafter is an expandable control sequence that causes the second macro that follows it to be expanded one "level." That is, \expandafter\a\b has the effect of expanding \b once and then expanding \a. For example
\def\a{A}
\def\b{\c}
\def\c{C}
\expandafter\a\b
has the following expansion:
  • \expandafter\a\b
  • \a\c
  • A\c
  • AC
Since \expandafter is itself expandable, multiple \expandafters can be used completely alter the order of expansion. For our purposes, we need
\expandafter\def\expandafter\foo\expandafter{\foo bar}
This will expand \foo in the replacement text before making the new definition of \foo. The LaTeX kernel provides a \g@addto@macro that takes two arguments and globally appends its second argument to the first argument. That is, \g@addto@macro\foo{bar} globally performs the assignment we did above. This has the drawback that it can't be used inside a group where we want the definition to be local to that group. In a later post, I'll show how to add replacement text to places other than at the end.

Tuesday, September 22, 2009

Reason for being and variable hanging indentation

People often ask me LaTeX questions. "How can I do X?" for any number of X. Some questions are easy, some are hard, and some seem to have little practical value. However, I occasionally find myself implementing the solution, giving the answer and then forgetting about it only to find someone else asking a similar question later. My purpose here is to collect some of these, or maybe just neat tricks I hear about, and record them. These are mostly for my own reference, but if anyone else can use them, well, good. Most of these will likely be horrible hacks to some degree or another, don't hold it against me. The first one I want to write about was a request for paragraphs where the first line is not indented and subsequent lines are indented to the first word. For example:
Here is the first line of the graf
     and here is the second
     and here is the third
and so forth. Personally, I think this would look horrible since multiple grafs would be indented differing amounts, but since this was the question, so be it! Here is my solution.
\def\whee#1 {%
  \noindent
  \setbox0\hbox{#1 }%
  \hangindent\wd0
  \hangafter1
  \box0
}%
\whee This is my first graf which as you can
see is not indented but all lines following the
first are indented to the ``is.''

\whee Subsequent grafs, however, will look
very strange since the hanging indentation
doesn't match the first graf.
(Feel free to copy and paste this into Troy Henderson's handy LaTeX Previewer.) The first thing to note is that I used \def rather than \newcommand. The reason for this is that I wanted the power of TeX's macro creation that isn't exposed by LaTeX. (In fact, I used other TeXisms and that's likely going to be the case in most of these posts.) In this case, I wanted the power of delimited parameters which Knuth describes on page 203 of the TeXbook and maybe I'll talk about in a later post. For our purposes here, the space after the #1 means that the \whee macro will take one argument that consists of all characters upto the first blank. Thus, in the first usage of \whee above, "This" is the argument to the macro. Next, I saved the argument to \whee (the first word of the graf) into a box, followed by a space, and set the hanging indentation to be the width of the box. The \hangafter control sequence says to use that indentation starting on line 1 and continuing until the end of the graf. Finally, the box is typeset to produce the first word followed by a space. There are two major flaws with this approach. The first flaw is as I pointed out in the example, it looks really awful. The second flaw is slightly more subtle. The width of the space typeset after the first word is the "natural" width without any stretching or shrinking to make the rest of the line look good. The result is that the spacing between the first and second word is not consistent with the rest of the graf. I don't know any good way to deal with the second problem. Rather than simply typesetting the box, it could be "unboxed" first which would allow the space to stretch or shrink, but then subsequent lines would not line up correctly with the second word.