Ten Tips for Writing CS Papers, Part 1

Sebastian Nowozin - Sun 29 November 2015 -

As a non-native English speaker I can relate to the challenge of writing concise and clear English. Scientific writing is particularly challenging because the audience is only partially known at the time of writing: at best, the paper will still be read in 10 or 20 years from the time of writing by people from all over the world.

Learning to write papers well takes a long time and is achieved mostly by practice, that is, writing and publishing papers. But to improve your writing at a faster pace you can actively reflect on certain patterns and writing habits you may have.

Below I compiled a short list of some best practices from my own experience and preference, with more following in a second part. This list is by no means exhaustive and has a certain bias towards computer science publications. However, I hope it will serve as an inspiration to improve your writing.

I provide some examples of poor writing from published papers. To avoid offending anyone, I select the examples from my own published papers.

1. Use Simple Language

Concepts and ideas in scientific papers can at times be complex but the writing used to describe them should remain simple. Simple writing has short sentences, a clear logical structure, and uses minimal jargon. Writing papers is not poetry but still requires you to pay attention to the language you use.

Computer science does not seem to have an overly large problem with complex writing, possibly due to a large number of non-native English speakers. Or perhaps there is a strong desire to be understood by the writers; other academic fields are more challenged.

Yet, I have frequently seen non-native English speaking junior authors, perhaps when writing their first paper, who attempt to copy style from their native language. At least for native German speakers (like me) this would often lead to comparatively complex writing in terms of sentence lengths and less than optimal didactics in terms of presenting the abstract before the concrete.

If still in doubt whether using simple language is a good idea, check this Ig-Nobel-prize-winning work: (Oppenheimer, "Consequences of Erudite Vernacular Utilized Irrespective of Necessity: Problems with Using Long Words Needlessly", Applied Cognitive Psychology, 2006).

2. State your Contribution

The key contribution of most published papers falls into exactly one out of the following three categories.

  1. Insight: you have an explanation for something that is already there.
  2. Performance: you can do something better.
  3. Capability: you can do something that could not be done before.

If you know which category your paper falls into this, emphasize this aspect early in the paper, ideally in the abstract. This sets the tone and expectations for the remainder of the paper.

3. See Everything as a Facet on the Contribution

Every scientific paper claims a contribution over previous work. Once you have stated the contribution clearly, the rest of the paper is there just to support the contribution: The introduction motivates the need for your contribution. The related work section differentiates prior work against your claimed contribution. The method section typically provides a description of the contribution. The experiments verifies that your contribution works as advertised. Etcetera.

The point is: the contribution anchors everything else in the paper. If the contribution is clear, every part of the paper should make sense and become a different facet or view onto the contribution.

There are two common ways how this simple structure is violated, leading to a poorly written paper. The first way is to not clearly state the contribution, leaving it ambiguous during the whole paper. In such papers some method may be described, some experiments may be performed, but the higher goal does not emerge. At the end of the paper, the reader may agree with all statements of the paper and still wonder what he should make of it.

The second way to violate the structure is less severe: a long description of another method or work is added to the paper. I have seen this frequently with junior authors who have just learned about a cool method and want to showcase their understanding. Such description may even be interesting to a reader of the paper, but it is orthogonal to the contribution of the paper thus has negative value and is best removed.

4. Consider Using a Page-1 Figure

Consider using an explanatory figure on page one of the paper. This was started in the SIGGRAPH community with the work of Randy Pausch, but has slowly spread to other communities.

The main purpose of a page one figure is to provide a gist of the paper, much like a "visual abstract". It highlights what is important and sets the right expectations. It is also visually engaging and whets the appetite of the reader.

What makes a good page one figure? 1. Simplicity: You need to be able to understand it in 20 seconds or less. 2. Being self-contained: All relevant information should be in the figure or the figure caption. The figure caption should be short.

Many papers benefit from the addition of a page one figure, but there are some exceptions, for example in theory papers it could appear out of place.

5. Avoid the Passive Voice

You can write clear English in both the active and passive voice. A historical note on this is available in this essay on active vs passive voice in scientific writing:

"More than a century ago, scientists typically wrote in an active style that included the first-person pronouns I and we. Beginning in about the 1920s, however, these pronouns became less common as scientists adopted a passive writing style.

Considered to be objective, impersonal, and well suited to science writing, the passive voice became the standard style for medical and scientific journal publications for decades.

...

Nowadays, most medical and scientific style manuals support the active over the passive voice."

The reason for this change is simple: most people find text written in the active voice easier to read and more engaging. Duke university published a guide on scientific writing that contains a long discussion on the active versus passive voice.

In my writing there are very few exceptions were a passive voice may be more appropriate, for example when discussing prior work ("The relationship between iron intake and lifespan of parrots was studied by Miller and Smith.") or when discussing experimental results ("The test error remained small even when the regularization strength was decreased."), but even for these two examples we can find an alternative active formulation ("Miller and Smith studied the relationship between iron intake and lifespan of parrots.") and ("Even when we decreased the regularization strength the test error remained small."). The use of the passive voice in these two exceptions conveys an impersonal attitude that may be justified when discussing the work of others or reporting (as opposed to interpreting) experimental results.

Here is a real example from a ICCV 2007 paper of mine (page 4):

The dual problem has a limited number of variables, but a huge number of constraints. Such a linear program can be solved efficiently by the constraint generation technique: Starting with an empty hypothesis set, the hypothesis whose constraint (6) is violated the most is identified and added iteratively. Each time a hypothesis is added, the optimal solution is updated by solving the restricted dual problem.

I highlight all the passive formulations. Here is a rewrite of the paragraph using only the active voice:

The dual problem has a limited number of variables, but a huge number of constraints. We can solve such a linear program efficiently by the constraint generation technique: Starting with an empty hypothesis set, we identify the hypothesis with the largest constraint violation in (6) and add the hypothesis to the hypothesis set. Each time we add a hypothesis, we also update the optimal solution by solving the restricted dual problem.

I made a few minor changes such as changing the word order and adding the noun ("to the hypothesis set") for added clarity. I hope you agree that the second version is easier to read.

The next part is available now.