The workshops will take place on the 13th of November from 09:00 to 16:30.
To attend a workshop you must select it during the registration process.
Is Assessment Fair?
Isabel Nisbet and Stuart Shaw
Fairness in assessment is both complex and contentious. Assessment experts may disagree on whether scores from a particular testing program are fair. However, most assessment experts agree that fairness is a fundamental aspect of validity. As a consequence, fairness has been elevated to a greater position of prominence in the assessment literature, so much so, that it is now considered one of the three primary measurement standards that must be met to legitimise a proposed test (the other two being validity and reliability). In this workshop, we will distinguish between different uses of “fair” which have relevance to assessment. Then, we will identify some of the “lenses” used to examine fairness in assessment and suggesting a framework of questions which can be applied to lenses (measurement, legal, social justice and philosophy). Each approach will be subject to a common set of questions which will investigate whether there is an established consensus on fairness, whether that consensus should be questioned, what comprises an areas of dispute, and what the implications are for other lenses. The research will culminate in a fairness agenda for the 21st Century.
IRT in R made easy
Remco Feskens and Jesse Koops
Item Response Theory (IRT) is a general statistical theory about item and test performance and how performance relates to the abilities that are measured by the items in the test. IRT provides a flexible framework which can be used to obtain comparable ability estimates even when different examinees answered different questions. For among others this reason, IRT has become the method of choice for many organizations.
In recent years, R has become the standard software platform for data manipulation, analysis and visualization. Many statistical and psychometric functions are available in R and there are several packages for doing IRT analyses. Unlike e.g. SPSS, R has no standard GUI menus, instead analysis is done by typing statements. This is a hurdle for many analysts, although programming in R is not inherently more complex than clicking buttons in SPSS.
To overcome the programming hurdle we will start with a gentle introduction in R. After that, an introduction to dexter, an R package which can be used to analyze test data using IRT and Classical Test Theory (CTT) techniques, will be given. The theoretical foundations of IRT will be concisely explained and participants will perform IRT analyses on PISA and/or their own data.
Innovative on-screen assessment—exploring item types and paper-to-digital transition
Caroline Jongkamp and Rebecca Hamer
Since the introduction of computer-based assessment (CBA), significant progress has been made in the development of constrained or closed response CBA items. However, the development and implementation of highly interactive, dynamic assessment items aimed at assessing complex thinking skills appears challenging. In the previous AEA SIG workshop, participants explored two models classifying digital items on design characteristics. These models helped participants understand existing options but lacked the link between item type and the assessment of complex thinking skills. Using working on-screen test items from a variety of sources, participants will collaboratively explore models to classify existing computer-based assessment items by type and identify links between item types and assessment objectives. The initial transition to digital assessment often involves reformatting an existing paper-based assessment (PBA) item into a digital format, a transition often presenting its own significant challenges. Participants will design one or more items for a digital environment aligned to their own field of work, experiencing the variety of choices involved in designing digital items. The SIG E-Assessment pre-conference workshop will interest anyone currently involved in preparing for the digital migration of PBA and those interested in the development of CBA items for assessing complex thinking skills.
Introduction to Differential Item Functioning (DIF) analysis with R and ShinyItemAnalysis
Patrícia Martinková and Adéla Hladká (née Drabinová)
Differential Item Functioning (DIF) analysis is an analytic method useful for identifying potentially biased items in assessments. While simply comparing two groups’ total scores can lead to incorrect conclusions about test fairness, many DIF detection methods have been proposed in the past, those based on total scores as well as those based on Item Response Theory (IRT) models (Martinková, Drabinová et al., 2017).
The workshop will offer an introduction into differential item functioning (DIF) detection from a practical point of view. We will introduce the mostly used DIF detection methods with their pros and cons and we will focus on their application in practice on real data examples. The free statistical software R and its packages difNLR, difR, deltaPlotR, and mirt will be used throughout the sessions. The ShinyItemAnalysis package will provide interactive user-friendly interface helpful for those who are new to R.
Developing selected response test items
The purpose of this workshop is to present and discuss guidance on developing selected response (SR) items. The guidance is based on a synthesis of available research literature on SR item writing, relevant aspects of cognitive psychology (including models of language comprehension and working memory capacity) and the presenter’s own experience of high-stakes test development across primary and secondary education in the UK.
Guidelines will focus on a range of issues including language accessibility, the central role played by distractors in affecting the difficulty and validity of SR items, unintentional cues that can betray the correct answers, and the assessment of higher-order skills. While most SR guidelines and research are based specifically on conventional multiple choice questions , this workshop will address the full range of SR item types. Structural differences between different SR item types can influence their difficulty and proneness to different validity issues.
The workshop will also consider, the use of SR e-assessment item types, the use of SR items in diagnostic assessments, and how evidence from item trialling can be used to identify problematic items, in particular through the use of distractor analysis.