Less is More: Summary of Long Instructions is Better for Program Synthesis
Despite the success of large pre-trained language models (LMs) such as Codex, they show below-par performance on the larger and more complicated programming related questions. We show that LMs benefit from the summarized version of complicated questions. Our findings show that superfluous information often present in problem description such as human characters, background stories, names (which are included to help humans in understanding a task) does not help models in understanding a task. To this extent, we create a meta-dataset from the frequently used APPS dataset for the program synthesis task. Our meta-dataset consists of human and synthesized summary of the long and complicated programming questions. Experimental results on Codex show that our proposed approach outperforms baseline by 8.13 strict accuracy. Our analysis shows that summary significantly improve performance for introductory (9.86 questions. However, it shows improvement by a small margin ( 2 competitive programming questions, implying the scope for future research direction.
READ FULL TEXT