Some Excerpts from and Comments on The Andrich Report

by Steve Kessell and an anonymous helper

"The key recommendation in this report is that for both school based and external assessments, analytic marking of the traditional kind using marking keys that arise directly out of the assessment tasks be used for student assessment for each course unit, and for each course as a whole at the end of Year 12. A related recommendation is that, simultaneously, a rating of student performance into the five generic levels of achievement that arise out of the standards of the outcome statements of a course be used as part of the assessment. The former provides marks for the assessment and measurement of students at a relatively micro level suitable for constructing a tertiary entrance rank according to the policies of the Curriculum Council; the latter provides ratings for classification at a relatively macro level suitable for monitory the general progress of students and the operation of a course. The two assessment processes, distinguished by their level of precision and relevance, are complementary and can be combined and integrated. By taking advantage of this complementarity, the Curriculum Council can genuinely advance the communication of educational achievement in Western Australia".

COMMENT:

The point is that outcome descriptions and levels are not suitable for direct assessment, full stop. [Note the references below to traditional marking for feedback to students as well as tertiary entrance. This is saying that there is very limited use for coarse and 'generic' levels.]

On the inherent arbitrariness and limitations of levels, Andrich says:

"Clearly, in any assessment and measurement, there is a need to have both consistency and a high enough level of precision for the task at hand. In particular, in some aspects of monitoring progress associated with OBE, there is a premium placed on teachers classifying students into levels. However, these classifications have inherent elements of uncertainty which may be too great for certain purposes."

COMMENT: Like calculating a TER that is essentially meaningless.

"The learning areas and courses of study do not in themselves have any particular reasons for being in exactly 8 levels, nor to have outcomes that are from 2 to 4 in number, with a majority having 4. They must be administrative and organisational reasons for imposing these constraints, and can be useful in these terms. However they are not inherent to the courses of study. Furthermore, because they are organisational and administrative structures at a very general and abstract level, they cannot determine every form of assessment for every purpose."

On why "ratings relative to levels" [BANDS] are totally invalid and should be abolished:

"Ratings relative to levels

"One of the proposals for the school based assessment that is intended to relate the levels to the performances and attempt to give the required precision is to have a student performance assessed against four outcomes and then rated against levels into three further categories. Specifically, a student rated at level 6 for example, would be further rated into sublevels of such as 6-, 6, 6+ which would be given ratings of 6.2, 6.5 and 6.8 respectively. Similarly, a student might be rated at 5 and then rated further at 5.2, 5.5 or 5.8, and so on. Then the assessments on say two aspects from an outcome may be 6.2 and 5.8, and these would be averaged to 6.00 to two decimal places to provide an assessment for the outcome. This would be the basis for the school based assessments that would be provided to the Curriculum Council to be integrated with the external assessments.

"Clearly, and consistently with Recommendation 8, I am recommending strongly against this process for all courses for the following specific and additional reasons. First, as indicated already in several places, the levels are generic, that is, general and abstract, and cover a wide range of achievement, both in breadth and in the range between levels. Therefore, the tasks that are provided for assessment will not fall naturally into the 3 sub-levels any more than they will fall naturally into levels on the various aspects that are assessed, or the outcomes to be assessed on different tasks. Further, even if two different tasks assess the same outcome, there is no guarantee that the evidence of the achievement of the outcome, and the degree of its achievement, will be the same.

"Second, and very importantly, this kind of approach gives the impression that the distance between levels is the same in some sense – that is, that the difference in achievement between levels 5 and 6 is the same as between levels 6 and 7. As revealed... this is not the case... It would be unfortunate if the impression is given that these qualitative distinctions between levels, which are ordinal, are somehow measurements which are equidistant. The impression that the numerical scores given to assessments do not need to be transformed to implement policies should not be promoted. Teacher assessments need to be internally consistent and valid, but need not be in particular measurement units to the degree necessary for tertiary selection.

"Third, the issue is compounded by the application of the same generic values across outcomes within a course and between the courses. It was noted already that there is an intention to make the levels across outcomes and across courses of the same order of intellectual demand. This is justifiable at the organisational level of courses as a general working framework for various purposes of teaching and learning, but it would be misleading to suggest that this can be achieved at the measurement level, especially at a precise enough level for tertiary selection and detailed student feedback. It would perpetuate a misunderstanding about the use of numbers in educational assessment and measurement."

"Fourth, the generic descriptors seem to be for communication and understanding amongst teachers and experts in the field. They seem not to be the best descriptors for communicating with students, parents and the community. I believe this is the source of some of the unfortunate press – that the formal language used within the profession for its own communication, is considered the language for communicating with students, parents and the community. For example, the descriptors [of levels] seem not ideal for communicating with students without further operationalisation as would be carried out with analytic marking keys. Further, because of the uncertainties described earlier, attempts to assess directly against outcomes for purposes of precise assessment can be inordinately time consuming..."

"There seems not to be general conventions in the history of the areas of learning that has these in any levels, let alone every area into exactly 8 levels..."

"The problem of the arbitrariness of levels becomes greater when considering the aspects of outcomes. As the level of assessment becomes finer and incorporates more aspects, the idea that the same number of arbitrary levels can be applied to every to every aspect of an assessment task becomes less and less tenable. If the levels are arbitrary and do not match aspects of a task, then markers will have to distinguish amongst other aspects of the task and make an artificial classification, which in turn will lead to artificial consistency."


COMMENT:

In other words, bands are meaningless at best and misleading at worse.

Following from the above, a key theme is that:

"consistency of classification can be achieved readily at the expense of precision of measurement."

The point above is that the levels are extremely broad, and even if consistency is achieved, it is at the expense of precision.

For example, a number of people could easily agree on the heights of children to the nearest metre. However, this consistency is achieved at the expense of precision. There will be much less agreement if judgements are made in centimetres, but much greater precision. The same applies with levels. Even if consistency is achieved, there is no precision. This is consistent with what many have noted -- there are only eight levels for 12 years of schooling. How can these possibly be useful? They're too crude. And if you measure in metres, and then attempt to report a TER to two decimal places, you are now doing the equivalent of converting from "estimated metres" to TENTHS of a MM !

The following refers to the consequences of arbitrary and crude descriptions for assessment:

"... if the classification system is (a) arbitrary and incommensurate with the task, (b) crude relative to distinctions that markers can perceive, and (c) interference in the assessment of different aspects or outcomes on each other is precipitated in one way or another, then the task of assessment is also potentially difficult for the assessors to carry out .... Artificial consistency is an inevitable refuge for assessors who have difficulty marking the performances and the different aspect of the performances on their merits. It also cannot be overlooked, that in justifying a mark to a student when the mark is generated artificially, becomes extremely difficult."

meaning, there is no way you could defend this assessment if challenged.

On measuring assessment:

"Common elementary measurements in every day experience and in the physical sciences have a well defined origin and an arbitrary but well defined unit... This arbitrariness of the origin and unit is understood by children in primary schools and is part of the mathematics curriculum... With the typical measurement of temperature in either Fahrenheit or Centigrade, both the arbitrary origins and the arbitrary units are different. Often measurements in one scale need to be converted to measurements in another scale as in the case of the measurement of mass and temperature where conventional uses of one or the other in different jurisdictions are different.

"In education and the social sciences, measurements are used in a way which approximates the use of measurements in the physical sciences. However, the unit and the origin of most assessments are unique to those assessments - there is no natural origin of zero knowledge for example, and no well defined unit such as a pound or a kilogram for mass for measuring the amount of knowledge in any course of study. Ironically, although measurement in education and social sciences has even more arbitrariness and certainly less conventional agreement on the unit and origin of scale, social measurement seems not to be a topic in any school curriculum. This deficiency tends to persist in university curricula and only in some units within some degrees are they broached. Compounding the irony is that there is a tendency for greater belief in the consistency of origin and units in social measurement than there is in physical measurement where their arbitrariness is made explicit. Cementing the irony is that there are abundant examples of quantification in the social sciences that lend themselves to this study, including of course assessment and measurement of student achievement and the current TER process itself."

COMMENT:

What is being said here is that levels are not only misunderstood as being a scale with a well-defined unit and origin (which they clearly are not), but worse, they're misunderstood as a being scale that has the same unit and origin irrespective of the instrument used for assessment, or even the learning area assessed! That is, there is a bizarre belief that these vague statements manage to do what has not even been possible in physics -- define a scale in a way that makes it possible to measure in a unit that does not depend on the instruments and procedures used to measure!!

"School based assessments

"As already indicated, for purposes of tertiary entrance selection [his emphasis], analytic assessments should take precedence over outcome level classifications, and the levels assigned to students should be considered relevant at the general levels at which they are specified, that is, in general monitoring of the teaching and learning. They are too crude for refined assessments..."

"The differences from the present organisation of Years 11 and 12 studies seem to arise from the differences at the organisational structure of the courses, and the teaching and learning, not at the level of assessment in general or for tertiary selection in particular. As indicated early in the report, it is not considered that OBE is characterised by the methods of selection that are relevant for tertiary selection in Western Australian."

"The level statements are generic and they should be only used at the same general level to guide the teaching, learning and assessment, and not to make precise assessments at a finer level of scale necessary for other purposes [such as a TER]... Finer precision of measurement cannot be generated from assessments than the precision inherent in the [original] assessments... Marks must arise from the analytic marking of a task, and these marks mapped on to the levels; the process cannot be reversed [his emphasis]."

And if the CC doe not "get it right"?

Professor Andrich suggests: "The [university] selection process is sufficiently significant that if it is not accounted for credibly within the schooling framework, and tertiary institutions decide to initiate an independent selection process, then that selection process will inevitably impact even more than the present process does on the teaching and learning in post compulsory education."