Converting legacy content


A new content strategy may result in changes across several facets of content:

  • File storage format
  • Organization
  • Writing style
  • Output formats

It is usually possible to automate the conversion from an old file format to a new file format and the delivery of new and different output formats. But reorganizing and rewriting content requires the dedicated attention of a content creator and cannot be automated.

Most of our customers do at least some legacy content conversion, even if they are planning significant changes to their content approach. Here are just a few of the options:

  • Convert everything into the new system.
  • Identify high-priority content and convert it. For example, you might convert only content for the flagship product, or only for products that you expect will have significant updates.
  • Just-in-time conversion. As new projects are scheduled, find the related content and convert it.
  • Assess for conversion. Somebody reads legacy content page by page to determine what information is good enough to convert.
  • Convert nothing. If updates are required to old content, use the old system.

It will come as no surprise that we recommend assessing the costs and benefits of these various options to determine your legacy content conversion strategy.

Just-in-time conversion strategy

Content conversion is an ongoing, difficult challenge for many organizations. There are companies that specialize in this area with a wide variety of strategies. Some companies write custom scripts to automate as much as possible; others throw warm bodies at the project until they reach the required throughput to meet your deadline. Commercial and open source tools are available that support file format conversion.

Whatever strategy you choose, be aware that no conversion is perfect, and you will always need to ensure that there is a strong quality assurance/proofreading phase toward the end of the process. Some common conversion challenges include the following:

  • Conversion is based on formatting that is present in the source documents. Documents whose formatting does not conform to the standard (formatting “exceptions”) cause problems in conversion. Consistent use of templates/stylesheets in legacy documents makes conversion faster and more accurate.
  • In some cases, the new content requires information that simply is not present in the source documents. For example, the new metadata structure always requires the author’s name, but the author’s name is not present anywhere in the original source file. Another problem can occur when you implement information typing—the various section types, such as Procedure, Reference, and so on, correspond to a single Heading2 tag in the original files. Assigning the correct information types may require manual intervention.

The best-case scenario for automated legacy document conversion is approximately 95 percent accuracy. Significant human effort is required to address the remaining five percent.

We recommend against assigning document conversion to your staff as their introduction to the new content strategy.



Add comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.