JiuZhang 2.0: A Unified Chinese Pre-trained Language Model for Multi-task Mathematical Problem Solving

06/19/2023
by   Wayne Xin Zhao, et al.
0

Although pre-trained language models (PLMs) have recently advanced the research progress in mathematical reasoning, they are not specially designed as a capable multi-task solver, suffering from high cost for multi-task deployment (a model copy for a task) and inferior performance on complex mathematical problems in practical applications. To address these issues, in this paper, we propose JiuZhang 2.0, a unified Chinese PLM specially for multi-task mathematical problem solving. Our idea is to maintain a moderate-sized model and employ the cross-task knowledge sharing to improve the model capacity in a multi-task setting. Specially, we construct a Mixture-of-Experts (MoE) architecture for modeling mathematical text, so as to capture the common mathematical knowledge across tasks. For optimizing the MoE architecture, we design multi-task continual pre-training and multi-task fine-tuning strategies for multi-task adaptation. These training strategies can effectively decompose the knowledge from the task data and establish the cross-task sharing via expert networks. In order to further improve the general capacity of solving different complex tasks, we leverage large language models (LLMs) as complementary models to iteratively refine the generated solution by our PLM, via in-context learning. Extensive experiments have demonstrated the effectiveness of our model.

READ FULL TEXT
research
06/24/2022

MVP: Multi-task Supervised Pre-training for Natural Language Generation

Pre-trained language models (PLMs) have achieved notable success in natu...
research
05/01/2020

AdapterFusion: Non-Destructive Task Composition for Transfer Learning

Current approaches to solving classification tasks in NLP involve fine-t...
research
06/13/2022

JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding

This paper aims to advance the mathematical intelligence of machines by ...
research
05/23/2023

Pre-training Multi-task Contrastive Learning Models for Scientific Literature Understanding

Scientific literature understanding tasks have gained significant attent...
research
11/16/2021

An Empirical Study of Finding Similar Exercises

Education artificial intelligence aims to profit tasks in the education ...
research
12/06/2022

UniGeo: Unifying Geometry Logical Reasoning via Reformulating Mathematical Expression

Geometry problem solving is a well-recognized testbed for evaluating the...
research
03/02/2022

Parameter-Efficient Mixture-of-Experts Architecture for Pre-trained Language Models

The state-of-the-art Mixture-of-Experts (short as MoE) architecture has ...

Please sign up or login with your details

Forgot password? Click here to reset