This interesting report, just released in April by the Brookings Institute, provides a balanced perspective on teacher evaluation systems, and concrete tools for determining whether a system “passes muster”. From the report…
The present report offers advice on how to determine the degree to which an evaluation system is successful in the face of those imperfections and limitations. We address the connection between the reliability of an evaluation system and its ability to accurately identify exceptional teachers for special action, e.g., for a salary bonus if they are exceptionally good or for remedial action or dismissal if they are exceptionally bad. Reliability is not the only issue arising from the use of value-added measures. In particular, designers of evaluation systems and policymakers have to address biases that are introduced by differences in the contexts in which different teachers work. However, in this report, we focus on the issue of reliability.
We build our presentation around a proposal we put forth in a previous report, America’s Teacher Corps, calling for the creation of national recognition for teachers deemed effective based on approved state and local evaluation systems.[vi] The three design features of that proposal were:
- Promoting the use of teacher evaluation systems to identify and reward excellence
Whereas most of the focus of teacher evaluation systems using value-added has been on the identification and removal of ineffective teachers, we believe that such systems can also have a major impact by identifying and promoting excellence through recognition of exceptionally strong classroom performance.
- Flexibility on the components that would need to be a part of a teacher evaluation system and how those components would be weighted
There is no consensus on the degree to which teacher performance should be judged based on student gains on standardized achievement tests. Supporters of test-based measures would seek to expand standardized testing to virtually all grades and subjects and weight the results heavily in personnel decisions about teachers. Opponents question the validity of state assessments as measures of student learning and the accuracy and reliability value-added indicators at the classroom level. They typically prefer observational measures, e.g., ratings of teachers’ classroom performance by master teachers. Our proposal for a system to identify highly-effective teachers is agnostic about the relative weight of test-based measures vs. other components in a teacher evaluation system. It requires only that the system include a spread of verifiable and comparable teacher evaluations, be sufficiently reliable and valid to identify persistently superior teachers, and incorporate student achievement on standardized assessments as at least some portion of the evaluation system for teachers in those grades and subjects in which all students are tested.
- Involving a light hand from levels of government above the school district
A central premise of our previous report is that buy-in from teachers and utilization of their expertise are most likely if the design of an evaluation system occurs at a level at which they feel they have real influence. In most cases this will be the local school district where they work. We expect wariness from teachers, even with respect to a system intended only to identify and reward excellence, if the design of that system is subject to considerable control from Washington or the state level. Further, we doubt that there is much of an appetite within Congress for the creation of a federal bureaucracy devoted to the fine-grained oversight of state and local teacher evaluation systems. And we doubt there is sufficient capacity within state-level education bureaucracies to carry out such oversight even if there is a political will to do so.
Suppose a state or the federal government wanted to fund a program whereby individual school districts could provide a bonus or other rewards to their exceptionally effective teachers. This requires a system of evaluation that meaningfully differentiates among teachers based on their performance. Similarly, suppose that a state wanted to encourage districts to differentiate the teaching profession so that new teachers started with one set of responsibilities but could be promoted into more complex and challenging roles as they demonstrated capability in the job. This reform, again, requires evaluations to determine different levels of teaching performance. Given the great variation in design and quality of district evaluation systems and the practical and political constraints on states or the federal government producing uniformity in those systems, how could state or federal funds for such recognition programs be fairly distributed?
In this report we address the question of how a state or the federal government could achieve a sufficiently uniform standard for dispensing funds for the recognition of exceptional teachers without imposing a uniform evaluation system on participating school districts. In particular, we address the role of the state or federal government in assessing the reliability of local evaluation systems. We demonstrate that the quality of the measures and the quantity of data affect reliability and determine the number of teachers a system can identify as exceptional. Instead of a school district wanting to recognize the top quartile of teachers being able to identify 25 of every 100 teachers as being in the top 25 percent, we show that when imperfections in the measurement system are taken into account, only some portion of the true top 25 percent can be identified with confidence. Further, that portion would be greater in an identical sized district that has better measures and more data.
Although we provide a solution to what may seem to be a narrowly-focused administrative challenge, i.e., funding a teacher recognition program from the state or federal level, the underlying approach we offer has more general uses to which we will turn in the final section of this report.
…click here for the complete report and a link to the “Passing Muster Calculator”