Amanda Spielman on Assessment | The Exam Man - the #1 podcast about exams and assessment

Highlights from The Exam Man podcast- Season 2, episode 4

In early September we spoke to Amanda Spielman, who is very well known in the world of education, and has held several very high-profile roles. One of them that many people will not know about so well is that she was chair of Ofqual between 2011 and 2016. So we thought she would be a fantastic person to speak about her experience, and understanding of exams and assessment. We also quizzed her on what she hoped would come out of the upcoming Curriculum and Assessment Review. We deliberately wanted to focus on exams and not Ofsted, but Ofsted inspections are in themselves a form of assessment so it did lead us there in the end.

We're aware that you've had a very interesting life and career, so we'd love to hear about it. Particularly about your experience of moving from finance and into education, and what appealed about that move?

My name's Amanda Spielman. People probably know me best because I was Ofsted Chief Inspector for seven years, but before that I was also Chair of the Exam Regulator, Ofqual, for more than six years. And before that, I was part of the founding management team at one of the big academy chains, Ark Schools. So, I've been in education for 23 years now. It's actually a much longer career in education than my previous career in finance. I've never regretted that switch I made in my late 30s, to move into education.

Fascinating as Ofsted was, what I learnt at Ofqual about assessment, about its complexities, about the opportunities, about all the risks and downsides.... I think for me it is an incredibly interesting and under-discussed aspect of education. There's been a huge amount of discussion in the education world over the last decade about pedagogy, quite a lot more recently about curriculum- and I think I've been part of the reason for that. But assessment remains under-discussed and it's so important, and that's why I'm really happy to be here talking about it today.

You recently wrote an article for Schools Week about the upcoming Curriculum and Assessment Review which is due to report in 2025. In it, you set out a number of priorities for the assessment aspect of that review. But I was interested in something you said towards the start, about your experience of the last reforms that took place in assessment. You said that there were strengths and flaws in these- so I was wondering if you could elaborate on that a little bit for us. What were some of the strengths? What were some of the flaws of the reforms that took place between 2011 and 2016?

The new national curriculum had been put in place in 2012 with a lot of evidence underpinning it, a really serious attempt to crystallise what children should know and be able to do at the end of secondary education. The exam reform programme was trying to translate that faithfully into the new GCSEs to create coherence for children through secondary education. It was very much subject by subject. There was a subject content group deciding on the meat of each subject. And then there was a process for designing the assessments. The translation into the classroom through developing the curriculum resources, lesson planning, training teachers was all picked up after that. So it was very much a sequential undertaking. And therein lies, I think, one of the things that is an opportunity to do differently.

To make exam qualifications work well in the classroom, you really need a lot of different kinds of expertise brought together from the beginning. You need the curriculum expertise, the people who really understand how to put together a curriculum that makes for a coherent education experience, and the people who understand what you can and can't do with assessment, and how far you can push it to achieve the different purposes. And also, what the trade-offs are going to be between the different assessment purposes.

You need the people who are going to teach it to say, hang on, this isn't going to work, this piece isn't going to work in practice, this bit is going to be really hard to motivate children through. And writing resources is a time-consuming undertaking and can only be done by people who properly understand the curriculum and the pedagogy implications. So it really does need a range of skills and there is almost nobody who has all of those within one subject, let alone across the board. So one of the really important takeaways for me, at the completion of the programme, was that next time around it would be better to think about them in a more unified way. Announcing a curriculum and assessment review is a good first step, but really make sure that that is how it's thought about in practice, joining them up, not treating it as a curriculum piece, and then an assessment piece, and then turn it into classroom programmes.

I noticed in your article you talked a bit about how assessment can actually be used to generate good curriculum as well. So that's maybe looking at it from the other end.

People get quite fixated on the end outputs often, partly on the certification for pupils themselves, partly on the results that it feeds into the accountability machine, partly on the ranking that it provides to help selective institutions decide who they're going to admit to which courses. And people forget the extent to which the actual taught curriculum is defined by the content and form of those assessments.

You can have a national curriculum that says children are going to be taught a wide range of literary texts over Key Stage 3 and Key Stage 4. But if you then design an assessment that just tests pupils on one book and three poems, for many children, the experience they'll get will be being taught about one book and three poems. So you really have to think in a cold, hard way, not just about what the enthusiasts will do with this curriculum and assessment, but also about the path of least resistance that hard-pressed teachers, with limited bandwidth, with lots of problems to handle from parents, lots of behaviour difficulties, lots of SEND difficulties, will feel pressured to default to over time. Because that's what the enacted curriculum will become.

Well-designed assessment can protect that by sampling the domain widely enough, by having sufficient unpredictability that there's real downside to skipping chunks that you think aren't going to be assessed. So really thinking about the stuff that's uncomfortable, because nobody wants to think that people will take shortcuts, but under pressure and when people have got a lot on their plate from a lot of directions, it's only rational to simplify. I'm not criticising people for doing this at all, but I'm saying that it's inevitable if an assessment is poorly designed. So it's really important to think about assessment as curriculum definition and protection, as well as something that provides some results and grades at the far end.

Talking about types of assessment, what the previous reforms did was to reduce the amount of coursework in academic qualifications and also to move from modular assessment to terminal assessment in GCSEs and A-levels. Are those trade-offs or are they worthwhile things in themselves?

When it is put under pressure at the far end, by being used for something high stakes, whether it's high stakes for a school or high stakes for a pupil, it may be that that puts more pressure on it than it can bear. That's compounded, I think, for many kinds of coursework in a world with artificial intelligence. We know from the surveys that people are doing of university students, for example, that an extraordinary number of students are already using AI to do or to help them do pieces of university coursework which count towards degrees.

And at one level people say, they're going to be using it in the adult world, so why does it matter? But it does matter because learning is a cumulative process. If you bypass the intellectual effort of synthesising and drawing out and applying what you're meant to be studying, then you're not actually going to learn it. So you're going to come out with no more knowledge than you had at the outset. And people often point at other systems and say, X country has coursework in its equivalent of A levels. But then you look across at their higher education systems and you will see, for example, that they've got a system where everybody who achieves a basic diploma is eligible to sign up for the university course of their choice. And the real weeding out happens at the end of the first year of university, when if people fail exams they're out. Typically, you find that they've got some kind of highly demanding selection test for the things that are always oversubscribed, like medicine, law etc.

Many, many countries have a non-selective first level admission into university. So there isn't the same pressure. But as things stand, with all the pressure of accountability on schools and all the incentives that AI offers to pupils, I think protecting curriculum and teaching and making sure that the grades that come out the far end truly reflect real educational achievement, is an impossible challenge. It's one of those things that looks and feels lovely, and yet it's too hard in practice.

I had somebody who was one of the architects of English coursework in GCSEs in his days at National Strategies, who was a chair of governors. And he said, I was part of the pilot, it worked brilliantly, people produced great work. It was really motivating. And I said go to the school where you're Chair of Governors, sit down with the Head of English and really unpack how the coursework is being done, especially for the lower sets and how much of class time through Key Stage 4 is going on doing and redoing and redoing that coursework. He came back to me a few weeks later and he'd done exactly that. He said, you were right and I couldn't believe what had happened. We thought we had such a great model and it turned into something completely different, that wasn't valuable for most children.

And there's an extra little wrinkle there, which was that the children whose curriculum, whose teaching, whose experience got most distorted by coursework tended to be those in lower sets. So it was less visible to the kinds of politicians, journalists, whoever, whose children tended to be in the top sets, who did the coursework once or at worst twice, so who didn't experience that stripping out of so much of the education.

That's really interesting, because I think the argument I hear most against coursework is the argument at the top end, which is that students who come from family backgrounds, where the parents are highly educated, middle-class backgrounds. There's a kind of gaming of coursework that can go on in that respect. But actually to hear that argument that you've laid out there is a more interesting argument actually.

Controlled assessment was an incredible burden on schools and I actually think it came about through a slight misdiagnosis. The assumption was that the problem was advantaged children getting too much help at home with coursework. But the reality was that in many, many schools, the curriculum had been completely reshaped towards making sure that every child got the intended number of marks in the coursework to get the target grade. Controlled assessment did nothing to defuse that.

I know many, many teachers absolutely cheered when coursework was removed. For so many teachers, it had become an incredible burden. I think that's one of the dangers now, because it was taken out nine years ago now, we've got a lot of young teachers who don't really have any sense of quite how it distorted education and quite how it was used to put incredible pressure on teachers to get the required number of marks on every piece of coursework.

Was it similar thinking when talking about modular exams as well, because they were happening regularly, that there would be an over-focus on that rather than the broad curriculum?

Yes. When almost every term for about three years becomes an exam term, and you've got the curriculum disruption, you've got time switched away from teaching a coherent and accumulative curriculum to direct preparation for that particular module and what's likely to be tested in that. That's quite disruptive to the flow of secondary education. Also, what we now know is that you don't just learn something, get taught something, learn it, remember it, and boom, move on to the next thing. Actually, everything that we now know about retrieval practice, about partial forgetting and retrieval and consolidation, that takes a sustained sort of cumulative first learning and practice of the initial thing and then drawing on that knowledge again. You keep reusing things and that's how you consolidate your learning.

And modularisation cuts right across that because it disrupts what we know now from cognitive science is how we learn and consolidate what we know so that we can really draw on it throughout our life. It encouraged learning a topic for a term, answer the questions, dump it out of your memory and get on to the next thing. So by the end of two years, if you've done something in four different modules and most of them many months ago and never gone back to it, you will simply know a lot less than the person who's done the sort of synoptic course and done internal practice tests in the different elements along the way. Firstly they’ve been taught in a way that draws on the different components and the accumulation of the components as they go through and are expecting to be assessed on the totality of the curriculum at the end, not just the most recent slice. It makes for a different learning experience. I think the whole idea came about almost by default, drawing on the idea of college credit in the US - that if you can chop things up for college credit then wouldn't it be great to make it possible for everybody to move institutions, move courses, just grab a module and move on? But it was like the cart before the horse. It threw out the education baby in pursuit of a secondary idea, a secondary purpose.

Do you think that modularisation prioritizes knowledge over skills? If you don't have ongoing modular assessment, you have the opportunity to develop the skills across a long period of time that then goes with the knowledge.

That's absolutely right. Totally agree. If you chop things up into little modules, and a single GCSE module was really quite a small number of taught hours, and you have to do your skill development only within what's been taught in that module, that's a much, much more limited base for developing anything. You can do much more coherent skill development off the back of the knowledge content of a broader curriculum, which also just makes the experience more interesting. It gives teachers much more scope to design genuinely interesting tasks that give students the chance to do things that they're likely to find interesting in themselves as well as educationally valuable.

You talk in your article, about the different purposes of different assessments. Could you explain a little bit about that in particular, about assessments that are used for accountability versus those that are used for pupil certification?

Let's take a GCSE, for example. It's a wide range assessment. It's designed partly to say, in due course to employers, that's what somebody knows and can do. But the more immediate use for most pupils is to signal to FE colleges, to Sixth Forms for other courses, whereabouts in the distribution they sit, whether they're likely to be able to cope with a particular level of course and particular course content from age 16. Are they going to be doing broadly an A level course, a T level course, another applied general based level 3 course, a level 2 course or down at entry or level 1? Broadly, there's a GCSE grade average that pretty much corresponds with each of those thresholds.

That's a really important function. For that to work, GCSEs have got to give a reasonably reliable indication across quite a wide grade range. At the very top end, there are people who are clearly going to be able to cope with A levels, we probably don't need a huge amount of precision between A's and A stars on that. At the other end, there are people who are clearly going to be heading into an entry or level 1 course. But in between, it is important and only fair, if course spaces are rationed, to make sure that you've got something that is reasonably good at doing that sorting.

To take another kind of test, let's say a selection test for a grammar school to take an unfashionable topic, but an important one to understand. Unlike a GCSE, you don't need to construct a test that is valid and reliable across the entire range. If your test aims to select about the top 25 percent of achievers on your test, then you're not very worried about discriminating among the top 15 percent of the population, and you're not very worried about differentiating among the bottom two-thirds, because you know that the bottom two-thirds are not in scope, and the top 15 percent are definite. So really, you're looking to assess reliably across about 15 percent of the range, sort of five to 10 percent either side, so that you make good decisions at the margin. So you don't put in really easy and really, really hard test items that won't help you differentiate in that range, because they would just take up time. You'd make children spend more time sitting, either doing questions that were really easy for them, or struggling with questions that are really hard, that wouldn't give you better information about who to admit and who not to. So technically, you design something completely different.

Another kind of test is a competence test, which is much more common in vocational education, or something like a driving test. It’s particularly relevant to any sort of vocational assessment where you're giving people the power to kill people or to harm people, as for example, if you're certifying people as electricians. The kinds of assessment you design for those are not designed to assess people across a wide range in saying this is a D grade electrician and this is a B grade electrician and this is an A star electrician. They are designed to say, is this person competent to go out and fit electrical stuff or not?

So whereas your GCSE can simply sample across the domain, your test for an electrician has to basically test them on every skill and make sure that they’re sound on every skill and allow only a small margin for things that can go wrong on the day, because otherwise you will accidentally sign people off as competent to do electric electrical things that can be dangerous if they're done wrongly. So the way you decide what to test from the whole syllabus or specification, the kinds of questions that you set, and how much they range from easy to hard or focus on a certain point in the middle, and whether you set a pass mark or threshold or not, all of those are completely driven by the purpose you're assessing for.

And accountability tests are different again and a lot of people don't understand. They're different because you're not trying to get a reliable fix on every single pupil. At GCSE, you're talking about fairness to the individual. If you are running something as an accountability test on a secondary school that's got a year group of 200, you've got the benefit of averaging. To get a decent fix on the average achievement of children in that school, you can do a much more limited test in terms of the precision with which it measures the child because, at the end of the day, you're dividing that by 200. It's like getting the average weight of a jar of beans. You don't need to weigh each bean individually. You can weigh the jar of 200 beans and divide by 200 to get the average weight of a bean. Key Stage 2 tests are an example of a test that is actually designed for getting reliability at the level of the school.

Do you think that the purpose of Key Stage 2 tests is well understood?

I don't think the purpose of Key Stage 2 tests is well understood. The original purpose is very clearly about assessing schools, but they've come to be seen as something that is sort of definitive of children's achievement, and particularly using them as starter grades for secondary schools to set target grades for the end of Key Stage 4, I think is really concerning because they're simply not that precise. It's possible both to put too low an expectation or too high an expectation if you don't properly understand that there can easily be quite a significant element of overall under- measurement, which isn't the Key Stage 2 test getting it wrong, because it's not what it's designed to do. It's designed to be reliable when it's divided by the number of children in a school to get to the average achievement of children in that school. So I do worry. One of the things that happens as soon as you create an assessment, even if it's clearly stated to be defined for one purpose, people will start using it for others.

One of the things I often see with Key Stage 2 tests, is a confusion about whether or not it's a measure or a target. It's a law, isn't it, that says that once a measure becomes a target, it ceases to be a good measure. Is that something you've been aware of with Key Stage 2 tests?

With all tests, it's a difficulty with any kind of accountability measure. It was a difficulty with five plus, it's a difficulty with Progress 8. Over time, measures always wear themselves out, no matter how good they are, they wear themselves out the more a system sort of orients itself towards delivering on that measure. That was part of why I redesigned inspection to look at what sits underneath results, to help counterbalance that tendency to over-focus on the current measure, whatever it may happen to be, to make sure that good results are being achieved through genuinely good education.

How do you feel on that, in relation to inspections, about the removal of one-word grades?

There's a big government agenda of transparency and accountability. And care homes, GPs, hospitals, prisons, police forces, as well as schools, are all inspected, and essentially the same set of grades is used for all of them. So I don't think it is really about precise wording of grades. I think it's about the significance that they have, which of course is determined by the consequences that central government, local government, employers, people's managers, hang on them. There are some really difficult conversations that as a nation, we've been very, very reluctant to have here, about how do you balance the interests of children, the person using the service, often a vulnerable person, and the adults who work in it. There's no easy answers here, and I wouldn't want to pretend that there are.

I watched an interview that you did with Nick Ferrari on LBC in preparation for this interview, and I was really impressed. He was pushing you on the change to the nine to one grading. I don't know whether you remember this, it was a while ago.

I do remember that because he tied me in knots on trying to explain that basically two of the old grades were turning into three of the new ones or vice versa. He was out to tie me in knots.

He was trying to get you to do the conversions, wasn't he, from the old grades to the new grades? I thought you explained it very well in the sense that you said we need to discriminate better, so we need to expand the range of grades. I wondered whether in relation to Ofsted and the grading that schools were receiving, whether or not you thought that was an issue, that maybe there wasn't a broad enough range to discriminate effectively?

There have been times when there have been more grades. In assessment, to discriminate more, you have to put more into the assessment process. If you judge more things or try to discriminate more on a scale, you have to do more assessment. The trend of the last 20 years in government has been to take resource out of inspection. It's taken out three quarters of the resource for school inspection in that time. All of secondary inspection, up to and including the slice of the Chief Inspector for 4,000 secondary schools, is now done with the budget of one secondary school. So it's an interesting intellectual exercise, but unless government changes policy to allow us a more expansive inspection process, you simply couldn't add in more discrimination without sacrificing reliability.

I guess it's a trade-off as well then, isn't it? Because then you would place more inspection burden on schools as well, wouldn't you because you'd need to be there for longer, you'd need to visit more often, you know, all those sorts of things.

There's some really difficult choices around the lowest achievers. On the one hand, there is a laudable desire to make sure that aspiration isn't capped for anybody, and that children have the opportunity to progress as far as they're capable. But on the other hand, it's often hard for a child who has little realistic possibility of getting further than the lowest grades in a GCSE course, for example, either to find that a rewarding course of study in itself or to come out of it with a grade that indicates anything to employers about what they know or can do because it's assembled from a few marks picked up here and there, not from a clear indication that they can cope with quite a lot of basic maths, say.

And also, it's not going to get them above the most basic level of courses in FE College. So there is a bit of a moral philosophical question about what is right. And we've gone to and fro in national terms in having separate curriculum and qualification for lower achievers, and wanting to have a unified system in which everybody has the chance to progress all the way.

There isn't a perfect answer, but I do think there's an argument about having some curriculum and qualifications that give lower achievers the chance at age 16, not just as an add-on that they take some time in college after they've got a poor grade at GCSE, the chance to earn a decent grade in a qualification that covers a more limited domain and lets them show that they've achieved a level of mastery over that more limited domain. There's an example that people draw on sometimes that goes a long way back. The Royal Society of Arts, interestingly, had a whole suite of qualifications that were mainly used for people going into clerical occupations. They were in arithmetic, shorthand, typing, the kinds of things that people going into clerical occupations would find valuable to have certified. RSA arithmetic was a well-designed course, and it was often used for teaching sets who were not considered likely to get a decent grade at O level. But who could nevertheless in the sort of more limited domain of arithmetic, show good achievement.

That qualification had really high labour market value. It was absolutely the thing you would want on your CV that would help you get a job. So there have been times when we've not been so focused on single level qualifications, but have been more comfortable with multiple levels. If that's what's valuable to the students themselves, I think it's important that we try and think about how to find ways to make sure that's the first thing they do, rather than something they do after they’ve failed, or have gone through the misery of feeling that they failed at GCSE. Keeping people motivated, feeling that there's an opportunity to really achieve on the right curriculum and get recognition for that.

I think there's been very little evaluation. Have you ever come across the Teacher Tapp blog? I've looked at this, and it was really striking how successful the current set of GCSEs have been. On whether they were better preparation for A levels, I think languages were 12 to 1, saying that the new GCSEs were better preparation than the old ones. English was the grumpiest subject and was only just under 2 to 1 in favour of the new ones. And very similar results on did they like teaching them better. Far more teachers like teaching the new ones than the old one.

The range from the happiest to the least happy possibly links to where the difference between thinking about what's going to make a good classroom experience sits most at odds with what's going to give us valid and reliable assessments. When I look at an English language GCSE paper, I can see, on the one hand, why the assessment design works as it does, but I can also see why it's not always going to translate into something that's actually interesting and enjoyable in the class. So that links back round to the point I made at the beginning of join up curriculum, pedagogy, assessment from the very beginning, if you want to make something that people can genuinely see and will acknowledge is making education better.

Is that your one big wish for this curriculum assessment review, would you say, of everything that you'd like to see out of it, would be that broad approach?

If I'm allowed wishes, I'm going to have two. One is the join up, and the other is don't throw babies out with the bath water. Even that limited bit of evaluation from Teacher Tapp shows that in the main, the current set of GCSEs were a huge advance on their predecessors. Teachers have got their head around them as they've got their head around the national curriculum don't create a lot of upheaval in places where it's not needed. Intelligent iteration of the things that need iterating should be the name of the game.

Amanda, thank you so much for giving us your time. We really, really appreciate it.

An hour and a half talking about assessment is my idea of heaven!

Well, ours too, as you can tell.

It's been a real pleasure to talk to you both.

To listen to and read more interviews like this, subscribe to The Exam Man - the #1 podcast about exams and assessment

To find out more about Examscreen please contact us: Examscreen