Machine learning approach of Japanese composition scoring and writing aided system's design
Automatic scoring system is extremely complex for any language. Because natural language itself is a complex model. When we evaluate articles generated by natural language, we need to view the articles from many dimensions such as word features, grammatical features, semantic features, text structure and so on. Even human beings sometimes can't accurately grade a composition because different people have different opinions about the same article. But a composition scoring system can greatly assist language learners. It can make language leaner improve themselves in the process of output something. Though it is still difficult for machines to directly evaluate a composition at the semantic and pragmatic levels, especially for Japanese, Chinese and other language in high context cultures, we can make machine evaluate a passage in word and grammar levels, which can as an assistance of composition rater or language learner. Especially for foreign language learners, lexical and syntactic content are usually what they are more concerned about. In our experiments, we did the follows works: 1) We use word segmentation tools and dictionaries to achieve word segmentation of an article, and extract word features, as well as generate a words' complexity feature of an article. And Bow technique are used to extract the theme features. 2) We designed a Turing-complete automata model and create 300+ automatons for the grammars that appear in the JLPT examination. And extract grammars features by using these automatons. 3) We propose a statistical approach for scoring a specify theme of composition, the final score will depend on all the writings that submitted to the system. 4) We design an grammar hint function for language leaner, so that they can know currently what grammars they can use.
READ FULL TEXT