I spend almost all of my professional life doing one of two things: observing the work of trainee teachers on Celta initial teacher training courses, or observing the work of Celta tutors in my role as an appointed Assessor for the Celta award.

Both of these jobs present various challenges and raise many questions, but in the end it boils down to this:

“Is what I am looking at any good?”*

Answering the question of whether or not a lesson being taught by a trainee teacher is any good, or whether or not a training course being run by a team of teacher trainers is any good, is obviously not as straightforward a question to answer, as it is to pose.

Valuing judgements

First of all it is a subjective decision, and this means that it entails a value judgement: it is predicated on underlying beliefs about what “good teaching” or “good training” is.

The values from which these judgements arise may be reasonable, or they may not; they may be widely shared within the professional community, or they may not; they may be consciously held, or they may not.

Either way, they are significant because it is on the basis of these values that decisions which impact on the future lives of people who wish to become – or remain – education professionals rest.

It is perhaps in response to the fact that judgements about quality of teaching are fundamentally value judgements that teaching award bodies like Cambridge English seek to objectify the process.

The creation and multiplication of performance criteria, grading descriptors and standardisation processes that we can see across education all seek to limit or control the degree to which personal beliefs lead to variation in the evaluation process.

But why would anyone attempt to limit or control this? What is the objective?


Quality is what everyone is after. Criterion referencing and similar instruments are means through which quality may be managed – and quality is – as W. Edward Demings has convinced so many of those in power to believe – “the elimination of variation.”

Whether or not this definition of quality is one that we would wish to accept is an interesting question, but whatever our personal answers to that question, the reality of education today is predicated not only on this definition of quality, but also on its a priori goodness.

Consequently, the courses on which I work are undeniably – and increasingly – influenced by it.

Let’s look at how a teaching award like the Celta goes about defining and measuring teaching quality.

“What do you get if you multiply six by nine?”

There are currently 42** discrete assessment criteria in which a trainee teacher on a Celta course needs to show adequate and consistent competence, by the end of the course, in order to pass.

There are no criteria directly related to learning: “The learners learnt something about the language that they did not know before” or “the learners extended their control of a specific aspect of systems/skills” do not exist.

The criteria limit themselves exclusively to the direct behaviours and actions of the teacher.

These criteria cover a range of teaching related behaviours, from stating the explicit aims for lessons and stages within them (4a) to managing the learning process so that aims are achieved (5d). There are even criteria relating to whether or not, and to what degree, a trainee respects health and safety regulations (5a) or copyright (4d).

While these various criteria are each arguably interesting, valid and appropriate, their existence raises at least as many questions as they help to answer when it comes to the problem of deciding whether or not to award a candidate a pass grade on conclusion of a course, or which level of available pass grading – PASS, PASS B or PASS A – to award.

These questions include, but are not limited to, the following:

  • Does a trainee need to be generally at standard in every single criterion in order to be eligible for a pass grade? If they – for example – were satisfactory in all criteria except respecting local copyright law, should they fail?
  • How many criteria does a candidate need to be above standard in to be eligible for a PASS B? How many for a PASS A?
  • Are all criteria created equal, or are some criteria more equal than others? Is “managing the learning process so that lesson aims are achieved” more significant in terms of grading than “starting and finishing the lesson on time?”

“What does ‘some’ mean?”

I have spoken to a range of tutors and assessors for the Celta award about such questions over the years and there appears to me to be no consensus on any of these questions.

It might be in response to this, and the individual questions to its Helpdesk arising from such lack of agreement between colleagues, that prompted Cambridge English recently to issue a pilot set of new grade descriptors aimed at helping centres reach appropriate end-of-course grading decisions.

In a nutshell, they are:

For a PASS grade, the candidate needs to meet all the assessment criteria.

For a PASS B grade, the candidate needs to meet all the assessment criteria, and exceed strongly the minimum requirements for doing so in some criteria.

For a PASS A grade, the candidate needs meet all, and to exceed strongly the minimum requirements for meeting most, of the assessment criteria.

While this is a welcome attempt to provide a simple, shared framework for trainers to reach grade decisions, its obvious vagueness raises several key questions:

  • Does “most” here literally mean “more than 50%“? In other words, would a candidate definitely need to be considered above standard in at least 22 of the 42 assessment criteria for consideration of a PASS A at the end of the course? Or does “most” here really just mean something less precise, closer to “quite a lot”?
  • How many criteria equal “some“? In other words, would being above standard in ten criteria justify a PASS B? What about five? What about three? If the word “most” from earlier does not mean “more than 50%”, then where does “some” end, and “most” begin?
  • Are all criteria created equal? Can a candidate, for example, receive a PASS B or PASS A without being above standard in “analysing language with relation to meaning, form and phonology to an appropriate depth” (2e), “providing a range of appropriate practice” (2g), “sensitively correcting learner errors” (2b), or “managing the learning process so that aims are achieved” (5d)?

These questions all permit categorical, yes-or-no answers, and in at least the first two cases are questions with direct relevance for virtually all grading decisions.

Arguably, the final question is unlikely to become an issue in practice, but it is at least possible in theory for a candidate to excel in a wide range of criteria without being especially talented in those listed above – aren’t those listed above the essence of what it is, however, to teach language?

Would we be happy about being required by a descriptor to award a higher grade to a teacher who was only average in those areas, regardless of how excellently they implemented health and safety regulations, respected local copyright and started and finished lessons on time?

“Thank you for your enquiry…”

Despite the fact that these questions beg yes-or-no answers from those who drafted the descriptors, when I posed those same questions to Cambridge English a few months ago, I did not get yes-or-no answers.

In the end, and while I understood and appreciated the desire on the part of Cambridge English to respect the professional judgement of those who implement their awards, I was, frankly, left none the wiser by their response.

This is, incidentally, even before posing the trickier question of what constitutes “at standard” or “above standard” achievement of any given criterion at any given stage of the course or overall, measured over time. I have not been involved in a standardisation procedure for tutors and assessors since my induction as a Celta trainer that has not raised more questions related to these issues than it has answered.

So, not for the first time, I was prompted to think about a practical working alternative.

A grading guideline is only useful to the degree that it is easy to understand, agree with, and implement. Therefore, a few years ago I created the following simple set of descriptors of my own, which I have been loosely employing for a few years now, and which I would like to share here with you.

Effectiveness, Efficiency & Elegance

For a PASS grade, a candidate’s work needs to be EFFECTIVE.

Effectiveness is basically a matter of getting a job done.

Imagine that you need to hang a picture, and in order to do this you hammer a nail into a wall: whether or not this has been effective is a question of whether or not the nail is sufficiently embedded in the wall to support the weight of the picture.

How you got the nail into the wall (with a hammer, with a shoe, with a tortoise, with your bare hands) is irrelevant.

In a similar way, if a teacher needs to teach their students about the past simple, and in order to do this uses a basic context (like “my day yesterday”) to introduce the language, highlights and checks understanding of meaning, form and phonology to some degree, provides some practice and some feedback on the students’ work, whether or not this has been effective is a question of whether or not the students get progressively better at employing the new structure over time, even within the small scale of a lesson.

As a bare minimum, for a trainee teacher to leave a course with approval to teach, I would want them at least to be generally effective in what they do.

For a PASS B, a candidate’s work needs to be EFFICIENT.

Efficiency is basically a matter of not wasting time, resources or effort in getting a job done.

If you need to hammer a nail into a wall, you could conceivably get the job done with your bare hands or with a tortoise, but using a hammer will probably take less time, and be easier (on you, and on the tortoise***.)

In a similar way, if a teacher needs to teach the past simple, they could spend 75% or more of the available time setting up an elaborate guided discovery task based on an authentic reading text accessed via a running dictation employing some autocue technology on a set of tablet devices or an IWB – and they might succeed in teaching the students about the past simple by doing so. They might even succeed in providing some degree of practice.

In other words, this approach could be effective – but there would clearly have been much more efficient ways of getting this job done, ways which would have left more time available for practice, for feedback and for reflection.

It is possible to do something effectively without doing it efficiently; it is impossible (I would suggest) to do something efficiently without also doing it effectively.

For a PASS A, a candidate’s work needs to be ELEGANT.

Elegance is basically a matter of aesthetic grace in getting a job done.

Elegance is what you see when you are looking at effectiveness combined with efficiency and poise****.

Most people can jump off a box; some people achieve it more elegantly than others. Many people can swim; some people glide through the water.

Elegance is something that is hard to describe, but no one finds hard to recognise: you always know it when you see it.

Just in the same way, you know elegance in teaching when you see it.

  • When a teacher leverages an already emerging conversation at the outset of a lesson to draw attention to the need for, and the form of, the past simple, that’s elegant.
  • When a teacher makes use of an already established context to provide the real-world reason for a controlled practice task, and thus saves this grammar practice from being a simple mechanical exercise, that’s elegant.
  • When a teacher spins several varieties of practice from the basic fabric of a single gap-fill from a book they are using, that’s elegant.
  • When a teacher usefully gives way in a lesson to a student who is able to do a job the teacher might otherwise have done themselves, that’s elegant.
  • When a teacher solves a class grouping change with the simplest move possible, where moving one learner changes the game for everyone else, that’s elegant.
  • When a teacher’s lesson plan is short but descriptively and logically clear, that’s elegant.
  • When a teacher doesn’t employ CCQs (concept checking questions or ICQs (instruction checking questions) because what they have done before has made them unnecessary, that’s elegant.
  • When learners realise they have learnt something before they notice that they were being taught something, that’s elegant.

An elegant solution

These descriptors are certainly subjective, are clearly predicated on my own values and are therefore partial, but they at least have the virtues of:

Simplicity – Each descriptor is a single word.

Clarity – There is no technical jargon; each descriptor has a very widely understood and shared definition even for laypeople.

Scalability – They apply to all levels of performance, from fleeting moments within a lesson to overall performance over time within a course.

So these days, when I watch teachers in action, or when I reflect on the quality of their lessons in terms of how they engaged in their work, or when I consider their work over time, before cross-checking what I have seen against the 42 discrete assessment criteria laid down by the examining body I work under, or through their recent pilot descriptors, which were well-intentioned but ultimately no clearer for me than previous guidelines, I first ask myself the following question:

Was what I saw this teacher do effective, or was it efficient, or was it elegant?

This certainly may not be the perfect question to ask, and perhaps you can help me find a better one, but it is, so far at least, the most effective, efficient and – dare I say it – elegant solution to the grading problem that I have found.


* I would like to acknowledge and thank David Young and Sinéad Laffan for the conversation in which this question arose as a topic, and which spurred me into finally writing this long-postponed post.

** Proof, if proof were needed, that someone involved in the design of the Celta award was a Douglas Adams fan.

*** No tortoises were harmed in the making of this blog post.

**** Appropriately, perhaps, when I checked the definition of poise in my computer’s built-in dictionary, this is the working example (albeit of the verb usage) that it gave me: teachers are poised to resume their attack on government school tests.


  1. Phil Wade

    Great post.

    I remember there was one woman on my CELTA who was clearly going to get an A from day one. The same as I often see on any course. These A folk seem to have a lot of time and energy to go the extra mile and come in with a certain level to build on. I think I take a long time to absorb things and understand them but they I make great leaps. I don’t accept things without pulling them apart. Thus, I don’t think I do well on courses as I’m not quick enough. During my CELTA, I was average as I was travelling a lot and trying to read and understand all the input sessions and develop a teaching style. It felt alien to just repeat 100% of what I learned in the morning, cheating almost. However, by the end, I surprised my tutors by showing confidence and the first spark of creativity. One remarked that she would like to see how I developed from the last day on.

    Did I get an A? Nope. Just a pass but I am one of the few who got a FT job and survived. The A women went to do some summer work I think and was later unemployed. The school had a policy of only hiring A students so I think that was her goal but there were too many candidates.

    On the DELTA, there was a lot of time dedicated to understanding the criteria and how to get good marks which isn’t what I wanted.

    I can honestly say that nobody has ever remarked on my grades for any teaching course so I really don’t know how useful an A would be. Maybe if you apply for the DELTA soon after or a teacher training post.

    • Anthony

      Thanks for commenting, Phil. You raise a very important question which I would also like to address in a later post, namely: who are grades for?

      • Chiew

        Looking forward to reading that future post, Anthony. Coincidentally, in his latest post, Dale mentioned, indirectly, how important an A pass was, which disappointed me somewhat. I have no comments to make regarding grades…nothing which I haven’t made before, anyway.

        • Anthony

          Thanks for the comment, Chiew. As a recruiter, I can understand Dale preferring pass A applicants, at least in the first 6 months post-qualification, mainly because this indicates he or she will require little to no support in order to work a full timetable. This is really all that a pass A is intended to tell employers – the degree to which a job applicant in the first 6 months post-training will require mentoring resources in order to function. Any school manager or DoS worth their salt knows that after this time all bets are off in terms of whether or not an applicant with a pass A will be better than a teacher who left the course with a pass. Grades have a very short half-life, in other words.

          What managers and DoSes also need to be clear about while preferring pass A applicants is that they are as rare as elephant eggs. Statistically about 3-4% of candidates reach the level required for a pass A year on year (trend, NOT quota). School staff rooms are full of teachers who do valued work for their students and employers and who got the chance despite the fact that they never got an A or a distinction on their teaching exam: I’m one of them.

  2. Sandy Millin

    Hi Anthony,
    As someone who’s just finishing their first CELTA course as a trainer, the differences between the grades have been quite difficult for me. I can normally feel if it’s a clear standard/above standard, for example, but the borderlines are difficult. These three criteria seem a much clearer way of thinking about it.
    I’d also wondered, like you, about the weighting of the criteria, and I often thought about exactly which category some things fall under. For example, is excessive teacher talking time about not ‘adjusting to the students’ level of language’? Or not ’employing effective techniques to fulfil the aims of the lesson’? (Those are off the top of my head, so may not be the exact wording.
    A lot to get my head around!
    Thanks for writing this,

    • Anthony

      Hi Sandy, thanks for stopping by. Apologies for the delay in approving your comment, but I was “away from my desk”, as they say.

      Appropriately grading teacher performance on a course like Celta is certainly the most challenging regular task for a trainer as far as I’m concerned. As you say, it’s on the borderline between an above standard and an at standard lesson (if the centre bothers with that distinction – I’ve dropped it, myself), or between an at standard and a below standard lesson, where wailing and gnashing of teeth can start to be heard emanating from the trainer office after TP. I’ve quite literally lost nights of sleep over this kind of decision. Considering the emotional impact that an N can have for a trainee, it’s simply too important a decision to take lightly.

      And of course, making a single overall decision about how to grade someone whose performance may have been stellar in some areas and mediocre in others is hard too – what is the best, most just reflection of their performance during the course (as opposed to wishful visions of their “potential” afterwards)? All hard questions, and ones which, for me, haven’t really gotten any easier over time.

      And your point about how appropriately to criterise (to coin a new word) a given observable behaviour is spot-on; I suspect this difficulty is why criterion 5d exists!

      Glad you found this piece worth reading, so thank you again for commenting.

      It’s a great job, doing what we do, albeit one not without its challenges and frustrations. Welcome on board. Maybe we’ll get to continue this discussion during an assessment visit someday!

