Cleaning and Manipulating Text

Posted December 21, 2011

Written content provided by clients can be horrendous. Beyond the substance of the material itself, the actual text is usually full of inconsistencies, unnecessary characters and strange formatting. Most of the following tips work in any text editor, but I’ll be focusing on using InDesign, since that’s where I do most of my text manipulation.

Importing text

The first step after receiving content is to pull it into InDesign. There are a several options here:

Copy and paste

This is usually the simplest way to bring in text…copy and paste and you’re done. Depending on your clipboard handling preferences, InDesign will keep or remove rich text formatting. If the preference is set to keep the formatting, you can use the Paste Without Formatting command to strip out the formatting and insert plain text.

Drag and drop

This works just like a regular Paste operation.

Place text

The Place command can be used just like copy and paste or drag and drop, but can give you several more options.

Selecting the Show Import Options checkbox will pull up a dialog box with more fine-tuned control of the import, including:

  • Importing table of contents
  • Importing footnotes
  • Converting tables to text
  • Removing styles and formatting from text and tables
  • Converting bullets and numbers to text

I won’t delve into all the options, but you can read more about it in the online InDesign help documents.

Regardless of how you choose to import the text, I almost always recommend stripping formatting and building styles from scratch. The source material has to be pretty meticulously crafted to give much advantage to importing and mapping existing styles, which is almost never the case.

Viewing hidden characters

Once the text is in place, we can get down to work. Start by turning on hidden characters – either hit Command Alt I (Control Alt I on PC) or Type > Show Hidden Characters. Now, you can see all the characters we’ll be manipulating outside standard text – most commonly paragraph breaks, line breaks, spaces, tabs, etc.

These now-visible characters display as light-blue symbols.

Find/replace

This is where the magic happens. Hit Command F or Edit > Find/Change. This dialog is named Find/Change because it can manipulate objects and formatting in addition to text, but we that’s fodder for another article.

Note: Unfortunately, the View Hidden Characters setting doesn't carry over into the Find/Change dialog, so you'll just have to trust that your hidden characters are there when you type them.

The most common operation I perform here is removing double-spaces. Many people were taught to space twice between sentences, but that is no longer the standard. Feel free to let your clients know they don’t have to double-space anymore, but trust me, running this simple command is much easier than getting people to override years of habit and muscle-memory.

Most common find/replace operations:

Operation Find what field Change to field
Remove double-spaces [Space][Space] [Space]
Remove multiple tabs [Tab][Tab] [Tab]
Replace exclamation points with periods ! .

Many users of Word or other word-processors don’t realize they can set tab breaks and instead hit [Tab] as many times as they need to get items in line. This can wreck havoc if you plan to convert text into a table. Simply replace those multiple tabs so you don’t have to worry about inadvertently creating blank columns when converting text into tables.

Note: you usually have to run the find/change operation several times to remove every occurrence of multiple spaces and tabs. Just keep clicking the Change All button until it returns with the message "Search is completed. 0 replacement(s) found."

One of my pet peeves is the unnecessary use of exclamation points. This can turn into a philosophical argument quickly, but I think this bit of punctuation should only be used to express true joy or excitement and should be used exceedingly sparingly. Clients use them because they think everything is important and excited. However, I’ve found good reasons to use exclamation points few and far between in most commercial work. It’s almost always a safe bet to find/replace them and just review the copy later.

Changing Case

If you’ve found yourself re-typing long strings of all-caps text in lower-case or mixed-case, this feature will bring joy to your heart. Simply select your text and click Type > Change case. There are options to adjust type to lower-case, sentence case (first letter of the first word capitalized) or title case (first letter of every word capitalized).

I recommend changing the case even on text you plan to set in all-caps anyway. All-caps should be set in a character or paragraph style instead of set using [Caps Lock] – this gives you much more flexibility with formatting in case you change your mind in the future.

GREP

GREP is an incredibly powerful tool to search for patterns in text and manipulate it. It originates from a UNIX command line tool and has made it’s way into word processing software packages. The syntax has a steep learning curve and looks like gibbersh the first time you see it, but it’s incredibly powerful and a big time-saver.

A couple examples include standardizing the formatting of phone numbers or transposing first and last names in a long list.

I don’t have a great deal of experience with it and have only used it for fairly simple tasks, but there are several great resources for learning more about it online. Here are a few I’ve found incredibly helpful:

I also recommend digging through the flyout menu in the dialog box to learn some of the syntax. Once you see how search strings are assembled and you’ve run a few operations, you’ll start seeing how combining a few of them can start turning your search strings into powerful tools.