I received many emails asking about HTML to VoiceXML conversion, the idea to convert a web page to a voice enabled page and profit from the VoiceXML possibilities to offer new capabilities for people with special needs. The problem that you’ll face is that HTML and XHTML are designed with a visual background and switching from visually described page to a voice described require to set some convention as standard so we can provide an adequate tool.
Different persons could read differently a webpage, depending on their needs, someone is looking for a specified information, another is trying to explore all the content of the page … So in this article I wanted to explore the different aspect of webpage to make the XHTML conversion to VoiceXML more easy.


Content is king
In the beginning we have also to keep in mind that many webpages on the web are image or animation based and in this case they cannot be considered at this time in such conversion. So to make things more easy, we can start with content pages only and forget about multimedia content : images, videos, flash, …
XHTML in place of HTML
HTML pages could be not very well formatted, in addition that it is not XML compliant. So in this case we are going to adopt XHTML as our standard. Conversion from HTML to XHTML is not difficult, you can simply run tidy : tidy -asxml page.html > page.xhtml and apply the conversion rules on the second page.
What’s on the webpage ?
You can consider a page have three basic elements : Title, Menu, Content. But we keep in mind that it happen that some page don’t have a title, or don’t have a menu. This is a very simplified vision which will help us in the conversion process.
A page can have many titles, the most important will be certainly the page title if available, then we’ll look at the H1 tags inside the page, otherwise we can consider the URL as the title of the page if the subtitles inside are related to sub content and not to the whole page.
Detecting the menu is in my opinion the most difficult task, especially if a page have lot of menus and links, which one is going to be considered the main menu ? There is solutions to approach this problem, even that it might not always work but it can at least provide a way to describe where you can go inside the website.
Then came the page content, which should be the rest of the page. How to present it and make it readable to the end user is another story.
Conclusion
In this first article about HTML/XHTML Conversion to VoiceXML I wanted to explore some aspect of a web page and how we can easily convert it into voice enabled applications. Of course there is many voice-browsers which are mainly based on DOM to read content for people with disabilities, the approach is not difficult to implement if we want, but I find it not practical for telephony applications.
That’s why we’ll try to see in the next article more details about XHTML pages and how we can simplify it and make it very well readable via VoiceXML browsers. This will avoid us reading the whole page, and going exactly to the content that we are looking for.