Thursday, April 13, 2006

On Evaluation - Some Musings at the Near-End of the Semester

Today, Dean Dad has a post about evaluations of instructors from his perspective as an administrator. I left a comment over there that wasn't particularly well thought out (sorry, Dean Dad) but I've been thinking a lot about the post since I left the lame comment, and so I thought I'd do a more deliberate response over here. Also, I thought I might get something out of doing a more deliberate response when I'm not in the whirlwind of reading the results of my own course evaluations or cursing the incomprehensible statistical information about my evaluations. At the moment, I've got some critical distance. My evaluations are waiting to be distributed, the semester is coming to a close, and I'm not fretting over what happens with them - not yet.

Dean Dad begins:

"Evaluating faculty is one of the most important parts of my job, yet some of the most basic information needed to do it right isn’t available."

Ok, this inspires my first point: I think that we need to draw a distinction between the evaluation of faculty and the evaluations of particular courses by students. Why do I draw this distinction? Because I think that when we conflate these two kinds of evaluation that we run the risk of disempowering both students and faculty:
1.) The faculty member becomes responsible for all of the learning that happens in the course, which inspires the faculty member to teach to the instrument of evaluation. When this happens, the faculty member loses the power to experiment with his/her teaching and to take risks as a teacher.
2.) The student becomes characterized as a passive receptacle of information, whose role in his or her education is defined by the reviews that he/she metes out for his/her instructors. Did the instructor deposit the right amount of wisdom in the student, according to the student? Fill in bubble A. Not exactly a model that inspires students to take ownership over their own educations.
3.) Sometimes a student's response to a course (say, a required course for the major, like the British Survey) is a response to the material of that course and not to the instructor. Yes, we might say that it's the instructor's charge to bring even the most inaccessible material to the students and to make them like it, but I think that's wrong. And sometimes, when students don't like the material - or if they feel overwhelmed by it - they will give a poor evaluation of the instructor when really what they are responding to is the course itself. The instructor may have little power to change the course itself, especially if it is a core of the major, and so the faculty member is then screwed.

Thus, I think it would be entirely valuable to find a way to make distinct the evaluation of faculty as instructors and the evaluation of courses, though I fear that this, with human nature being what it is, is impossible. So let's table this idea and move on to the kinds of instruments for evaluation that Dean Dad describes:

Student Evaluations
I think that it's important to note that students are not experts on classroom instruction nor do they have the experience necessarily to know what constitutes good pedagogy. Thus, when we read student evaluations, we are really reading students' evaluations of their experiences in the course and not objective evaluations of the quality of instruction. Thus, students might give very high marks to a professor who provides copies of all class notes in the form of power point online because students read this as the instructor being prepared or as the instructor being very user-friendly to students who can't make it to class or who have other commitments. I, on the other hand, might say that this does not constitute good pedagogy because it gives students the impression that they should passively consume the instructor's notes, it encourages memorization and does not encourage critical thinking, and it makes for limited interaction between instructor and student. Which interpretation has more legitimacy? I would argue that the student's perception is less legitimate, but yet that student's evaluation may ultimately mean that the professor who makes all notes available will have higher evaluations than I, the nasty lady who expects her students to take notes, will. I'm not saying that we should do away with student evaluations, but I do question their utility in the evaluation of professors, and I'm not sure that these evaluations should carry substantial weight in promotion and tenure decisions. (Oh, and I should add that one of the reasons I am very cynical about student evaluations is that at my university there is a widespread problem with young, female instructors getting low evaluations, and everybody is just like, "oh yeah, that's really a problem," and nothing is done about it and it sucks.)

Peer Observation
Dean Dad notes that peer observation is useless as an evaluative tool because the observation reports are uniformly glowing. I think he's right. But I think that the reason why they are uniformly glowing is precisely because they are used as an evaluative tool by administrators. Ultimately, these evaluations are not about improving teaching, they are about weeding out bad teaching, or at least that is the perception. One will not give honest critique in this situation - at least not in the formal letter - because one doesn't want to be responsible for doing one's colleagues wrong. Also, one probably wants to err on the side of niceness - one probably only observed one class, and the colleague being observed could have been having an "off" day, etc. Again, it would be different if these evaluations were not used by administrators but rather were for the use of faculty. As it is, they are just one more hoop to jump through on the path to tenure, and an annoying and disruptive hoop at that.

Student Grades
I do think that grades can be an indicator of... something... if they are uniformly very high or uniformly very low. Again, though, I wonder: how do grades show something about good teaching if they are in the "normal" range? To my mind, what matters a hell of a lot more than grades is giving students feedback on their work, which is not the same thing. How do we measure that? Also, I tend to allow revision with no penalty in writing classes, which means that the grades in those courses tend to skew higher than the usual curve. Would I be penalized for this under a system in which grades were evaluated as an indicator of good teaching? Would I be accused of grade inflation? Even though the students work their asses off to get those grades? Or let's say that you have a bad class one semester, and the grades skew low. Does that necessarily say something about the instructor? The idea that grades tell something about teaching seems the equivalent to saying that test scores say something about good teaching. I'm not sure that either is true, and the idea of using these to measure the "accountability" of instructors seems specious.

Course Attrition Rates
Hmmm. I'm not sure, again, whether this would be a meaningful measure of teacher effectiveness in all but the most extreme cases. Yes, there might be that instructor who has incredibly high drop rates over time across a range of courses. In that case, this would be a meaningful statistic. But what about everybody who falls in the middle?

The point is, I suppose, that I don't think there is a way to quickly and easily to assess the quality of instruction, and I think that part of the problem is that assessment from the top-down is geared (or is perceived to be geared) not as a mechanism by which to improve instruction but as a mechanism through which to get rid of dead weight. The emphasis is not on developing quality teachers and on encouraging quality teaching but instead on proving that students are getting their money's worth and on justifying the existence of instructors, who, as we all know, are out frolicking in the sunshine instead of worrying about their teaching, preparing for class, grading, serving on a thousand and one committees, mentoring students, etc.

I suppose the thing is, if evaluation is supposed to be a means by which instructors get honest feedback on their teaching and are encouraged to improve their teaching through that feedback, then it should be disconnected from professional advancement. If it is going to be connected to professional advancement, evaluations of courses should be course-specific - or at the very least discipline-specific - and individual departments and instructors should have some say in the criteria through which courses are evaluated; evaluations of instructors should be just that - evaluations of their abilities as instructors, and they should be disconnected from course-specific questions.

I don't know how to do any of this, and I might be wrong about half of it. But I suppose that I often think that evaluation, assessment, self-assessment, etc. get in the way of my being a good teacher and becoming a better teacher.


Anonymous said...

"at my university there is a widespread problem with young, female instructors getting low evaluations, and everybody is just like, "oh yeah, that's really a problem," and nothing is done about it"

that does suck. and it very much irritates me when things like this go on.

rwellor said...

Just a few points.

First, fill any other job in your quote "I often think that evaluation, assessment, self-assessment, etc. get in the way of my being a good teacher and becoming a better teacher" and you will see how bizarre it might sound to some ears. Surely administrators and District Offices could make exactly the same argument? And it would be perceived as extremely self-serving to most faculty and staff. And rightly so. It is a case for personal exceptionalism that would need a very large amount of evidence to support.

As to the way you attacked evaluation here? This kind of reductive argument can be made for or against any evaluative scheme. It is the miracle of the human mind that we come up with evaluation schemes (now and then) that combine elements. I mean, what would you make of an administrator who broke down his evaluation in this way?

I mean, I can make counter arguments (good, bad, indifferent) to each point by separating from a larger goal and evaluative schema.

1) Faculty should not "experiment" because it is upon students. Some experiments fail and in this cas the instructor *is* responsible for the failures in the class.

2) A student cannot be a "passive receptacle" at the same time that they have coercive power (which is what you seem to hink) over instructors through evaluation.

3) And sometimes it isn't. So only compare like courses to rule content-based problems out.

How in the world can you say that an instructor who rightly evaluates a peer as ineffective is doing the other instructor a "wrong." Imagine if Administrative evaluations worked that way (and if they do? they are wrong as well). I thought tenure was supposed to protect the steely-eyed critical faculties and beliefs of instructors?

You admit they might be an indication of an elliptical "something" but then discard the impossibility that this something can be reasonably interpeted. As to the threat that one class with an outlying grade curve can adversely affect evaluations? Build protection into the evaluation -- when you pick criteria off one by one, as you've done here, you effectively pretend they stand alone.

I'd argue that they are meaningful across the bell-curve, but if you need to take a stance in which they are only important in "the most extreme cases," I'm fine with that. Roll it in as only applicable in extreme cases.

It can be done... faculty just (as in the case of peer evaluation) don't want to be subject to it.


Dr. Crazy said...

To respond to rwellor -

I understand why you read my post as you did - as an attack on all evaluation and as a resistance to being evaluated. Let me qualify my position:

1. The reality is that evaluation is part of the gig. I am going to be evaluated on my teaching, one way or the other, and I'm not against that. I suppose my point, in responding to Dean Dad's post, which specifically sought a quick and easy way to assess good teaching, was to talk about the fallibility of finding quick and easy solutions to evaluating something that has so many components.

2. One of my main points here (though poorly articulated perhaps) is that it's not entirely clear what evaluation is FOR. My university is not (at least not yet) a corporation. Thus, one message that is sent to faculty about evaluation is that it's there in order to provide feedback so that one can improve his or her teaching. Then, there is a contrary message that it's not there as something that one should look to for ideas about how to improve but rather that it is a measure of one's success - an out-comes based model. The issue I think is one of definition. If we see evaluation as about assessing outcomes as the end goal, then it should work in one way (as I suggested, divorcing evaluations of specific courses from evaluations of instructors, for example) or if it is intended to work in the other, as productive feedback, then it should probably work in another. In a nutshell, I guess I think the problem is that there are two sometimes contradictory messages about what evaluations are supposed to be, and this puts instructors between a rock and a hard place, feeling like they need to manipulate the instrument to save their asses while at the same time they read the evaluations like tea leaves to try to figure out (often from questions that tell them very little about their actual teaching) how to become better teachers. This seems screwed up to me.

3. I don't think that students have "coercive power" over faculty. Clearly in the power relations between student and teacher are weighted in favor of the teacher. This post talks about a lot more than just the student evaluation part of this whole evaluation of good teaching question, and what I do say about student evaluations really relates more to how they are used and by whom than to what the students say on them.

4. To respond to what you say about peer evaluation of teaching, I would say that when people evaluate administrators they do not do so by attending one meeting that the administrator runs and then extrapolating from there about how good of an administrator he/she is. From what I experienced in graduate school with this (at my current institution we do not do this) and from what I've heard from friends, this is what often happens in the "observation" portion of such evaluations, and thus, if that's the deal, yes, I'd feel like I was doing somebody wrong to say "Dr X is a bad teacher" because of what I observed on one day (again, though, I think we're talking about people who fall in the low middle here - not somebody who truly is atrocious. I think that the reality is that most people are ok teachers - that truly horrible teachers are pretty rare (as rare as truly great teachers) and that we're talking less about the difference between apples and oranges than about the difference between varieties of apples when we talk about this stuff.

Ok, this comment is way too long, but I did want to respond. Oh, and one last thing: I think that we make a mistake if we think that constant self-reflection or assessment by others necessarily leads to improvement - in ANY context. I'm not saying all self-reflection or assessment by others is bad, but I do think that it can ultimately overwhelm its original intent as we add more and more kinds of analysis in an attempt to find a "true" picture. So yes, all of the assessment CAN get in the way of good teaching. I also think that the idea that we shouldn't experiment in the classroom because "some experiments fail" leads to stale teaching (which of course, would mean students would learn less, and then they would give low marks on evaluations, and then I'd be screwed anyway, even though I tried to play it safe). I'm not talking about experimentation for the sake of experimentation - I'm talking about taking responsible risks in the classroom by trying new things for good pedagogical reasons with making a better learning experience for the student as the focus.

Konibono said...

Another great post. There's definitely something wrong with a system that makes teachers (sub)consciously hope for a few real lunkheads or no shows to soften the curve.

In some cases, there are times when an entire group simply masters the material of the course. Is it wrong to give all of them decent grades? In the system we function within? No, of course not.

I try to teach my students, and when they prove that they are learning the things the course sets out as goals, they get rewarded. How about that as "outcome-based" education?

Shaun Huston said...

I agree that one problem with evaluations at most schools, mine included, is that it is not always clear what they are going to be used for. There are always mixed and overlapping messages about their relationship to promotion and tenure, hiring and re-hiring for adjuncts, etc.

There are three things I've experienced with evaluation:

1) Where student evaluations are concerned, one or two thoughtful, qualitative comments are worth piles of statistical data from standardized questions, at least where course design and pedagogy are concerned.

2) Evaluation instruments that I design for my specific courses are far more useful to me than the institutional forms that get distributed every year. For the most part those forms assume a standard lecture format that I rarely employ in the classroom.

3) I am without question my biggest critic. I see flaws in my teaching that no one else does. The year before my tenure review, my colleagues in the division personnel committee had to tell me to be more positive about myself in my application for tenure and promotion. I suspect that I am not the only one for whom this is true, but I also suspect that teacher self-evaluation would be dismissed as "unreliable" by many in administration (which brings us back to the question of why we evaluate in the first place - to improve? to punish? to reward? to empower students? to see who we can fire? to satisfy the institutional imperatives of state bureaucracies or boards of trustees?)

rwellor said...

Dr. Crazy,

The "what are they used for" point is an excellent one and to me the main one. I'm currently in marketing and outreach (NOTE: I have taught at the high-school level, been an instructional assistant in a WRAC lab, tutored, and taught two classes at Community College with a waiver. I am also currently in a Master's Program in English because I want to teach it. So I'm not at all anti-teaching) and the main problem with most everything we do in my CC district is we never sit down and explicitly state what we are trying to achieve. When we do, things typically become much easier.

That would also get to your point that evaluation (like instruction of students) does not always result in 'improvement' (however that is initially defined). But even if instructors weren't drinking at that point, you would have led them to the water. ;-)

As to peer evaluation.. I buy that short, discrete peer evaluation is much more likely to introduce errors. To me this is a procedural problem with how the tactic is applied, not the tactic itself.

What boggles me is when instructors who routinely (specially on the "soft" side of instrution) talk about the importance of teaching critical thinking (rather than rote instruction) turn around and rather mysteriously contend that what they do cannot or should not be subject to any critical thinking itself.

Part of the problem where I sit (California) is that everyone in academia is intensely defensive of their positions and perogatives and interprets any attempt at evaluation or change as an attack. This is partly a sensible approach, since budgets are insufficient and you'd better protect what you have, not to mention the fact that in the absence of a decent evaluative structure evaluation is often pulled out as a weapon by adminstrators.

But to me this is all just an argument to step out of our boxes for a moment and try to work on a system that achieves our goals (as you note, our goals need to be consensual and clear).


in my district administrators haven't been evaluated in about 10 years and the arguments they make about specific plans pretty much run parrallel to the ones you have made. The problem is systemic, I think.

whew! lotta typing...

Dr. Crazy said...

Just time for a quick response to the following:

"What boggles me is when instructors who routinely (specially on the "soft" side of instrution) talk about the importance of teaching critical thinking (rather than rote instruction) turn around and rather mysteriously contend that what they do cannot or should not be subject to any critical thinking itself."

I think that this is a misimpression. I think (though maybe I'm just speaking for myself on this, others will have to chime in with their views) that I, as a humanities person, would WELCOME evaluation that included any sort of room for critical thinking about what I actually do as a teacher. The problem is that most instruments that are offered for such evaluation are not based on what happens in my classroom but rather on what happens in a more power-point lecture-oriented sort of a classroom. Or a lab. Or a class in which there are right answers, like math or statistics. And so I think that resistance comes from the fact that disciplinarity is ignored in the process of arriving at instruments for evaluation, faculty feel like they are not adequately evaluated by those instruments - like there is absolutely no critical thinking used in approaching individual teaching situations but rather a set of numbers that are meaningless is generated - and like then one has to teach to the instrument if one doesn't want to be axed (or bring in candy on evaluation day, or use weird subliminal message techniques throughout the semester in order to get students to respond the way that one wants them to on the evaluations, etc.). In other words, I don't think it's that instructors in the "soft" disciplines (terminology I find offensive, by the way, as if what I do is fluff and that there is no need for it in the world - not that you're saying that, but just generally) think they are above assessment but that they would like to be assessed in terms that actually make sense. How would a teacher in a huge intro to psych class, with 150 students say, react to course evaluations which asked questions about group dynamics in the class or about interactive learning activities during class time? Not well, I suspect. Well, that's how I feel about some of the questions on my forms - like they've got nothing to do with pedagogically sound strategies that I use in my classes.

Dena Marie said...

I'm now happily installed in a cc that actually *does* use evaluations for instructional improvement, *BUT* my previous institution was quite the opposite! You got a swift kick in the ass, rap on the chops, or (in some cases) a pink-slip for scores anywhere below 95% on any one of their instruments of evaluative torture!

For example, you were put in the hot-seat if your class grade averages were below 95%, your student eval scores were below 95%, your class attrition rates were below 95%... well, you get the picture. And by hot-seat, I mean that the head honchos would suddenly decide that a "peer evaluation" was overdue, thus tearing you apart on paper. The repercussions of this were many (which is why I left that god-awful place!)... many instructors inflated grades, failed to meet course objectives that whiny students deemed unfair or too hard, and developed inappropriate buddy-buddy relationships with students for popularity points.

My name is Dena, I was employed by a private career college, and this was my story. Now where are the fucking donuts and coffee they usually serve at these damn support group meetings??!! :-)

--Dena Marie