What's in a Name? Evaluating Assembly-Part Semantic Knowledge in Language Models through User-Provided Names in CAD Files

by   Peter Meltzer, et al.

Semantic knowledge of part-part and part-whole relationships in assemblies is useful for a variety of tasks from searching design repositories to the construction of engineering knowledge bases. In this work we propose that the natural language names designers use in Computer Aided Design (CAD) software are a valuable source of such knowledge, and that Large Language Models (LLMs) contain useful domain-specific information for working with this data as well as other CAD and engineering-related tasks. In particular we extract and clean a large corpus of natural language part, feature and document names and use this to quantitatively demonstrate that a pre-trained language model can outperform numerous benchmarks on three self-supervised tasks, without ever having seen this data before. Moreover, we show that fine-tuning on the text data corpus further boosts the performance on all tasks, thus demonstrating the value of the text data which until now has been largely ignored. We also identify key limitations to using LLMs with text data alone, and our findings provide a strong motivation for further work into multi-modal text-geometry models. To aid and encourage further work in this area we make all our data and code publicly available.


SKILL: Structured Knowledge Infusion for Large Language Models

Large language models (LLMs) have demonstrated human-level performance o...

Exploring the Curious Case of Code Prompts

Recent work has shown that prompting language models with code-like repr...

Intelligent requirements engineering from natural language and their chaining toward CAD models

This paper assumes that design language plays an important role in how d...

Improving the Out-Of-Distribution Generalization Capability of Language Models: Counterfactually-Augmented Data is not Enough

Counterfactually-Augmented Data (CAD) has the potential to improve langu...

Text Encoders Lack Knowledge: Leveraging Generative LLMs for Domain-Specific Semantic Textual Similarity

Amidst the sharp rise in the evaluation of large language models (LLMs) ...

ImPaKT: A Dataset for Open-Schema Knowledge Base Construction

Large language models have ushered in a golden age of semantic parsing. ...

LMPriors: Pre-Trained Language Models as Task-Specific Priors

Particularly in low-data regimes, an outstanding challenge in machine le...

Please sign up or login with your details

Forgot password? Click here to reset