University of Alaska Fairbanks
Geophysical Institute

Ronni Grapenthin - Toolbox

check_repeats - find repeated words in your text file (e.g., LaTeX)

Often times I do crippled copy'n'paste which results in formulations like "the the text" which is really hard to find during proof reading. I researched the web and found a regular expression that does the job, then I put it into a script that takes a text file as arguments, calls grep and shows the repeated words in color highlighting with line number. It's plenty simple and not really worth the effort of putting it up here, but maybe it's useful for some people. It's useful to include the script into the makefile used to generate .ps/.pdf files from the tex file:

	all: 
		latex paper
		bibtex paper
		latex paper
		latex paper
		dvips paper.dvi
		check_repeats paper.tex
		ps2pdf paper.ps
	

I hacked an addition that will catch word repeats over linebreaks, i.e. the same word at the end of a line and the beginning of the following line. There should be an easier way to get this working with sed. However, while fiddling with this I found the awk+grep solution which was implemented faster. Going over the file twice might not be efficient, but I indended to make it obvious that some repeats occur over line breaks. Here's an example that can be downloaded:

	> check_repeats test.txt 
	Checking for repeated words in a line of test.txt:
	   1   :bla hello hello
	   4   :repeated words works now now
 
	Checking for repeated words over linebreaks of test.txt:
	   00002 -00003: ... see 		see whether checking for

	

Download: check_repeats and test.txt

ronni <at> gi <dot> alaska <dot> edu | Last modified: October 26 2011 19:21.