The Psychometric Foundation of the TOEFL® Essentials™ Test

By Venessa Manna

The TOEFL® Essentials™ test was developed to provide valid and reliable information about a test taker’s English-language proficiency using a design that is targeted to their proficiency level, with a format that is friendly and engaging, and that requires a brief test-taking time. The psychometric foundation and approach used to measure test takers’ English-language proficiency levels are critical to ensuring accurate measurement of a test taker’s English-language skills. A more comprehensive discussion of the psychometrics underlying the TOEFL Essentials test is provided in the Design Framework for the TOEFL® Essentials Test 2021.

Using a multistage adaptive approach to measurement

To provide for high-quality, precise and efficient measurement across a broad range of language proficiency levels, the Listening, Reading, and Writing sections of the TOEFL Essentials test are designed as multistage adaptive tests (MSTs). That is, each test is comprised of two parts (stages) whereby performance of the first part is aligned to tasks delivered on the second part.  A key advantage of the MST approach is that it allows for targeted assembly of test content combined with rigorous psychometric and expert assessment specialists’ review of all sections before administration. In addition, this approach facilitates efficient incorporation of the task-based design seen across all tests in the TOEFL® Family of Assessments, whereby test tasks reflect those test takers are likely to encounter when using English in academic and daily-life contexts.

MSTs application in the TOEFL Essentials test

The MST for each section of the TOEFL Essentials test, with the exception of the Speaking section, consists of two parts. The first part, or stage, delivered to test takers consists of tasks considered to be of average difficulty, with the second part containing tasks that are at a level of difficulty that “adapts” according to performance on the first part. For example, if a test taker performs very well on the first part of the Listening section, the second part of the Listening section delivered to the test taker will be at a higher level of difficulty.

The content in the second stage for the Listening and Reading sections are classified into three levels of difficulty (low, medium and high). For the Writing section, the second stage is classified into two levels of difficulty (low and medium/high) with test tasks in the medium/high difficulty second stage designed to be accessible to individuals across a broad range of proficiency and scoring rubrics that differentiate between medium and high proficiency levels.

In contrast to the Listening, Reading and Writing sections, the Speaking section employs a nonadaptive, or linear, approach where all test takers receive the same test questions for the entire test, with test tasks designed to be accessible across a wide range of proficiency with many opportunities for the test taker to demonstrate their speaking proficiency skills. A range of difficulty combined with multiple measurement opportunities makes it possible to cover the full range of language proficiency without the need for separate stages.

Employing innovative psychometric and statistical methods

For each test section, both established and innovative psychometric and statistical methodology are employed to help ensure consistency of test difficulty and comparability of scores across test versions. On each of the four test sections, test takers can receive scores that range between 1 and 12. The scoring for the Listening, Reading, and Writing sections take into consideration performance across the two parts, as well as the difficulty level. Scores for the Speaking section are based on overall performance on all tasks.

It is important to note that all aspects of these TOEFL Essentials test designs are guided by the rich legacy of ETS research and development in the field of psychometrics and language testing, including innovation in design and implementation of MST for large-scale assessments. As with all ETS assessments, the ETS Standards for Quality and Fairness and extensive pilot testing with a diverse range of test takers informed the details and implementation of the final design to best meet the needs of test takers and score users, while maintaining the psychometric integrity of the test scores. Statistical monitoring and research studies will be ongoing postlaunch to support the continued validity and reliability of test scores.

To learn more about the TOEFL Essentials test, visit https://www.ets.org/s/toefl-essentials/score-users/

Venessa Manna is Executive Director of Psychometric Analysis and Research (PAR) at ETS.