The HTML to DITA migration tool ships in the demo/ directory
of the toolkit, and does not make use of the common toolkit processing
for DITA content.
The DITA Open Toolkit release 1.2 or above provides a HTML to DITA
migration tool, which migrates HTML files to DITA files. This
migration tool originally comes from the developerWorks publication
of Robert D. Anderson's how-to articles with the original h2d code.
This migration tool is under demo\h2d directory.
You can use it separately because it is not integrated into
the main transformation of toolkit. The version in the toolkit
is more recent, but the articles should be referenced for information
on details of the program, as well as for information on how
to extend it. There are links to the articles at the bottom
of this page.
Preconditions
The
preconditions to be considered before using the migration
tool are listed below:
- The HTML file content must be divided among concepts,
tasks, and reference articles. If not, the HTML files
should be reworked before migrating.
- This migration tool is intended for topics. The HTML page
should contain a single section without any nested
sections.
- DITA architecture is focused on topics, information that is
written for books needs to be redesigned in order
to fit into a topic-based archiecture.
- This migration utility only works with valid XHTML files,
HTML files must be cleaned up using HTML Tidy or
other utility before processing.
Post conditions
There
are also some post conditions to consider after processing:
- In some case, the tool cannot determine the correct way to migrate,
it places the contents in a <required-cleanup> element, you
should fix such elements in the output DITA files.
- Check the output DITA files. Compare them with the source
HTML files and check if both contents are equivalent.
Known limitations
- Since Xalan doesn't allow to set the public and system IDs
dynamically using a variable, when Xalan is used as the default XSLT
processor, the output will contain:
<!DOCTYPE topic PUBLIC "{$publicid}" "{$systemid}">
Suggest
to use Saxon as the processor to fix this problem. For other information
on this problem, see the section "Other general migration notes"
in the first developerWorks article.
Extension points
The
HTML2DITA migration tool helps extension in the following
listed ways:
- The genidattridbute template can be
overridden to change the method for creating the topic
ID.
- The gentitlealts template can be
overridden to change the ways of title generation.
- Override respond section in the tool to preserve the
semantic of source, in case if the <div> or <span>
element is used in regular structures.
- You can also migrate to another specialized DTD by
overriding the original template base on the specific DTD
and your required output.