History
transmachina's technology development began in 1999 as a final project for an undergraduate computer science degree. The original prototype was a simple rule-based, top-down parsing system and covered English, French and Spanish.
The goal even from the early days was to generate an abstract, unilateral system which would be able to scale linearly as new languages and vocabulary are added. Sun's Java programming language was chosen as the development platform because of its ability to scale, industry support and ease of integration into enterprise applications.
A major problem was finding accurate, multi-lingual, semantic data, which severely limits the ability for language technology application to process complex, real-life textual data.
As a result, significant effort has gone into the development of a flexible, comprehensive network-based language database. Its creation has been possible because of Princeton University's WordNet database, which has provided the skeleton of our new database. Also important was the availability of Wikipedia's content data in XML format which has been imported and analysed by our tools in order to create an accurate model of different languages, including the most common words and phrases, which phrases often appear together, plus a rich set of noun phrases and their translations.
Development continues with the models being continuously developed. The results so far can be explored and used via the project site sprawk.com.
