Current large language models have shown an inability to provide age-appropriate answers to students across the K- 12 spectrum when provided with limited context on the student. This has consequences for potential applications within the education sector, where these technologies may look to be used as chatbots to assist with learning to help alleviate stressors in the industry. This report has tried to address this limitation by using Direct Preference Optimisation preference modelling to fine-tune two model architectures on a novel dataset of preference pairs. The dataset contains questions with added context of the students’ grades from the K-12 range and has been evaluated by an independent teacher for suitability. The optimised models showed a significant improvement in providing age-appropriate responses when assessed by a separate primary school independent teacher and a graduate student. The study highlighted difficulties for learning age-appropriate answers to questions by students in middle school and linked the difficulties to a lack of understanding of the exact knowledge level of the student. The models have also been benchmarked against the untrained model on common problem-solving benchmarks. Optimised models were shown to perform on par with the base model, demonstrating that improvements in age-appropriate responses do not sacrifice the models’ problem-solving capabilities.
For more details, you can read the full report here: Preference modelling of language models for age-appropriate responses to questions in K-12 environments (PDF)