If you’ve spent the weekend enjoying a good book in the May sun, you might not have been thinking about mathematics. But if you were reading a book in a Western language, there’s a good chance that the commas and periods in your favorite novel followed a very predictable mathematical pattern.
Complex systems researchers in Poland studied works of literature across eight European languages and showed that the placement of punctuation was very consistent within each language. Even between languages, sentence punctuation seemed to follow the same mathematical rules.
The researchers looked at texts in English, French, Spanish, Italian, Russian, German and Polish. They chose these languages based on the fact that they were each spoken by at least 50 million speakers and that authors writing in these languages have won at least 5 Nobel Prizes for Literature. In other words, they’re all very popular and established languages.
They used texts from over two hundred books written in these languages. These included well-known books such as Umberto Eco’s “The Name of the Rose”, Gunther Grass’ “The Tin Drum”, Tolstoy’s “Anna Karenina”, Mary Shelley’s “Frankenstein”, Lewis Carroll’s “Alice in Wonderland” and Kafka’s “The Trial” among many others.
What the researchers from the Institute of Nuclear Physics at the Polish Academy of Sciences discovered was that if you measure the distance between punctuation marks – for example between two periods or commas – you’ll find that there are sentence fragments of many different lengths. However, most sentence lengths are short and only very few are extremely long. If you map this out in a graph, you’ll see a peak almost near the beginning (the “short sentence fragment” side) of the graph. This is called a Weibull distribution.
The Weibull distribution is found in many different phenomena. For example, it can describe the time someone spends on a website (most people click away after a few seconds, and almost nobody stays for hours) and it’s used by weather forecasters to predict wind speed distribution. It also describes the time it takes for components to fail, which manufacturers can use to decide how long to provide warranty cover for. And now the Weibull distribution apparently also predicts how long a sentence fragment is in Western languages.
The pattern was similar between languages, but with some variations due to differences in language patterns. Some languages naturally use more words than others, but still have similar rules about where to place commas and periods.
When the researchers compared translated texts of the same literary works, they confirmed that the punctuation distribution matched the language of the translation and not that of the original text. That also shows that it’s not just a quirk of the author to use punctuation a certain way. Even when translators tried to preserve the original voice of the author, they still used punctuation according to the rules of the target language.
The study also found that English and Spanish were less strict than other languages when it came to punctuation. According to the researchers, one possible explanation for that could be that these languages have more formal sentence constructions, so that the meaning is clear even without using all of the required punctuation.
So when you next pick up a novel, pay attention to the punctuation. It’s not a coincidence that you’ll rarely see an extremely long sentence without a single period or comma. It’s just the Weibull distribution in action.