1. Framing the Problem
The first four papers of this suite have addressed the structural and rhetorical conditions of authority within the assembly: the alignment between metaphor and corporate form, the New Testament patterns of authority and their non-transferable limits, the proper scope of warning language, and the specific drift of family rhetoric into business analogy. Each of these examinations has touched, in passing, on the question of how the work of the ministry is evaluated. The present paper takes that question up directly.
The work of the ministry is real work. The teaching of doctrine, the care of souls, the oversight of the flock, the administration of the assembly’s life — these are activities that can be done well, badly, or somewhere in between. They are activities that can be improved, refined, taught, and learned. The texts that describe the office of elder do so in terms that presuppose that the work has a quality, that the quality matters, and that those entrusted with the work are accountable for how they have done it. “Let the elders that rule well be counted worthy of double honour, especially they who labour in the word and doctrine” (1 Timothy 5:17) presupposes that some elders rule well and some, by implication, do not. “Take heed therefore unto yourselves, and to all the flock, over the which the Holy Ghost hath made you overseers, to feed the church of God” (Acts 20:28) places the seriousness of the work in the foreground. The texts do not authorize the abandonment of evaluation; they authorize its faithful exercise.
The difficulty addressed by this paper arises in a particular setting that is common to many assemblies in the present generation: the setting in which evangelistic success — the conversion of large numbers, dramatic numerical growth, public visibility — is not expected, is not promised, and in many cases is theologically discounted. The reasons for the discounting vary by tradition. In some assemblies, the conviction is that the work of calling is the Father’s prerogative (John 6:44, 65), and that the assembly is responsible for faithfulness to the truth rather than for the production of converts. In other assemblies, the conviction is eschatological — that the present age is one of preparation rather than ingathering, and that the great harvest belongs to a coming time. In still others, the conviction is more cautiously practical — that numerical metrics distort pastoral work, that growth-orientation has historically corrupted the assembly’s witness, and that an unhurried and unanxious posture toward outcomes is itself part of faithfulness.
Each of these convictions has biblical support and pastoral integrity. The discounting of evangelistic outcome metrics is not, in itself, a flaw in the assembly’s self-understanding. The difficulty is what happens to evaluation when the most readily measurable outcomes are discounted and nothing comparably concrete is put in their place. A measurement vacuum opens. Evaluation does not, on that account, stop occurring; evaluation always occurs, because human institutions cannot operate without some assessment of who is doing what well. What changes is the criteria. In the absence of explicit, defensible, biblically grounded criteria, evaluation drifts toward what is observable: tone, demeanor, deference to those higher in the structure, conformity to internal expectations, the absence of complaint. These are not evaluations of skill; they are evaluations of posture, and they cannot do the work that evaluation of skill is meant to do.
The thesis of this paper is that when outcome metrics are discounted without alternatives in their place, evaluation drifts toward tone and loyalty, and that a balanced scorecard grounded in the New Testament’s own categories can preserve both pastoral care and mission clarity. The work of the paper is to describe the measurement vacuum and what fills it when nothing better is supplied, to construct a balanced scorecard that names the actual dimensions of ministerial work, to identify and resist the substitution of loyalty for faithfulness, to propose feedback structures that gather information from those who have it without putting them in adversarial relation to those evaluated, and to provide concrete deliverables that an assembly can adapt for its own use.
The work is undertaken with awareness that evaluation in ministry is delicate. Ministers are not employees in the ordinary sense; the office is callings-based, the relationships are personal, and the temptation to let evaluation become either harsh or perfunctory is constant. The proposal here is neither harsh nor perfunctory. It is disciplined, and the discipline is in the service of the work itself — the work the texts describe as feeding the flock, watching over souls, keeping the doctrine sound, and being examples to those over whom the Holy Spirit has placed the elders. That work deserves evaluation worthy of it, and the assembly’s failure to provide such evaluation is not a kindness to its ministers; it is a withholding of one of the conditions under which the work can be done well over a lifetime.
2. The Measurement Vacuum
When evangelistic outcome metrics are discounted and no alternative metrics are explicitly developed, the resulting situation is not an absence of evaluation. It is the presence of evaluation under unstated criteria. Several features characterize this situation, and each is worth describing in turn.
The first feature is the persistence of evaluative instincts under conditions of metric scarcity. Human beings cannot avoid evaluating one another’s work, especially in settings of close cooperation and shared purpose. Members of an assembly form impressions of their ministers; ministers form impressions of one another; regional administrators form impressions of those they oversee. These impressions are continuous and largely automatic. They cannot be turned off. What can be done is to discipline them — to require that they be formed on grounds that are explicit, examinable, and defensible — or to leave them undisciplined, in which case they are formed on whatever grounds happen to be available. The choice is not between evaluation and no evaluation. The choice is between disciplined evaluation and undisciplined evaluation.
The second feature is the drift toward observable proxies. When the criteria are not explicit, the criteria that come to operate are those that are most readily observed. In the ministerial setting, what is most readily observed is not the quality of pastoral care given in private, the depth of preparation behind a sermon, or the long-term formation of members under a particular ministry’s care. What is most readily observed is the sermon’s effect in the moment, the minister’s manner in interaction with peers and superiors, the absence of public complaints from the congregation, and the minister’s apparent alignment with the prevailing direction of the assembly. None of these is the actual work of ministry, but each is what an evaluator can see without effort, and each becomes, in the absence of better criteria, what evaluation tracks.
The third feature is the inversion of evaluative direction. The patterns examined in Papers 2 and 3 establish that the New Testament places the primary direction of accountability between leaders and Christ, with mutual exhortation among members and recognized congregational agency in discernment, selection, discipline, and financial visibility. When the measurement vacuum opens, the direction of evaluation tends to invert. Members are evaluated by their ministers for their compliance, their support, their absence of complaint; ministers are evaluated by their superiors for similar qualities relative to those above them. The direction in which the evaluative gaze travels becomes downward and inward, and the upward and outward direction the texts authorize — the testing of teaching against Scripture, the visibility of significant decisions to the body, the accountability of leaders to those they serve — atrophies for lack of structural support.
The fourth feature is the substitution of identity for output. Without explicit criteria for what good ministerial work looks like, the question “is this minister doing well?” becomes increasingly difficult to distinguish from the question “is this minister one of us?” The latter question is answerable; the former is not, in the absence of criteria. The result is that evaluation becomes a function of belonging. A minister who is recognized as one of us is, by that recognition, evaluated favorably; a minister whose belonging is in doubt is, by that doubt, evaluated unfavorably, regardless of the actual quality of the work being done. This is not a deliberate substitution. It is what happens when the question of work-quality has no available answer and the question of belonging is the only question available.
The fifth feature is the cumulative cost of unmeasured work. Work that is not measured tends, over time, not to be done. This is not because workers are lazy; it is because attention follows feedback, and feedback follows measurement. A minister whose pastoral visitation is unmeasured will, over years, find that the time spent on pastoral visitation is not visible to anyone, while the time spent on activities that are visible — public speaking, attendance at conferences, presence at meetings — produces feedback. The minister’s effort drifts toward what produces feedback. The pastoral visitation that no one measured slowly becomes the pastoral visitation that is no longer being done, and the loss is invisible until it is severe. This is one of the slow institutional costs of the measurement vacuum, and it is paid not by the ministers — who are, on the whole, doing what their environment rewards — but by the members whose pastoral care has quietly diminished.
These five features together describe the cost of failing to develop explicit ministerial evaluation under conditions where the most readily measurable outcomes are theologically discounted. The cost is not solved by the addition of evangelistic outcome metrics that the theology of the assembly does not endorse. It is solved by the development of alternative metrics that match what the work actually is and that can be defended on the assembly’s own theological grounds.
3. The Balanced Scorecard for Ministry
The constructive proposal of this paper is a balanced scorecard for ministerial evaluation, organized around four dimensions of the work that the New Testament itself names. The scorecard is balanced in two senses. It is balanced across the dimensions, so that no one dimension dominates the others. And it is balanced between input and output, between effort and result, in a way that does not collapse the evaluation into either pure activity-counting or pure outcome-measurement. Each dimension is described below in terms of what it measures, how it can be measured without reducing it to crude proxies, and what safeguards prevent it from being misused.
Dimension 1: Care. The first dimension is the pastoral care of those entrusted to the minister. This is the work named in 1 Peter 5:2 — “Feed the flock of God which is among you, taking the oversight thereof” — and in Acts 20:28 — “Take heed therefore unto yourselves, and to all the flock, over the which the Holy Ghost hath made you overseers.” It includes visitation of the sick and the elderly, counseling of the troubled, response to crises, follow-up on those who have withdrawn from fellowship, attention to families undergoing transitions, and the ordinary day-by-day attentiveness that the office requires.
Care is, by its nature, not quantifiable in the way commercial outputs are. The number of visits made does not, by itself, indicate whether the visits were of any pastoral value; the time spent in counseling does not indicate whether the counseling was sound. Quantitative measures of care are real but partial; they describe the visible texture of effort without describing the quality of what was done. A faithful evaluation of care therefore combines several kinds of evidence. The frequency and pattern of pastoral contact — whether members in particular categories of need are being attended to with appropriate regularity — is one kind of evidence. The reports of those who have received care, gathered through the feedback structures discussed in section five, are another. The judgment of fellow ministers who have observed the minister’s pastoral work, where such observation is appropriate, is a third. The minister’s own account, given in periodic conversation with those who oversee, of what pastoral situations have been encountered and how they have been addressed, is a fourth. None of these alone is sufficient; together they give a textured picture of the care being given.
The safeguards on this dimension are important. The evaluation of care must not become an interrogation of pastoral confidences; what is shared in pastoral encounters belongs to the encounter and not to the evaluation. The evaluation must not produce a count-driven distortion in which ministers prioritize the accumulation of visits over the quality of presence. And the evaluation must recognize that some care is by nature long-term and slow-fruiting; a minister who has carried a difficult pastoral situation for years may have, in the moment of evaluation, less to show than one who has handled many briefer matters, and the longer carrying may itself be the more excellent work.
Dimension 2: Teaching. The second dimension is the teaching of the assembly. This is the work named in 2 Timothy 4:2 — “Preach the word; be instant in season, out of season; reprove, rebuke, exhort with all longsuffering and doctrine” — and in Titus 1:9 — “Holding fast the faithful word as he hath been taught, that he may be able by sound doctrine both to exhort and to convince the gainsayers.” It includes the preparation and delivery of sermons, the teaching of classes, the conduct of Bible studies, the answering of questions raised by members, and the cumulative work of forming the assembly’s understanding of Scripture and doctrine over time.
Teaching is more readily evaluable than care, because its products are more public and its quality is more directly testable against Scripture. Faithful evaluation of teaching combines three kinds of assessment. The first is fidelity: does the teaching represent what Scripture actually says, in context, with attention to the texts’ own concerns and the framework of the gospel? This is a question of doctrine, and it requires that those evaluating the teaching be themselves competent in Scripture; it cannot be delegated to evaluators whose only concern is whether the teaching aligns with current institutional emphasis. The second is clarity: does the teaching communicate, in language the hearers can follow, what it intends to communicate? Clear teaching can be tested by asking hearers what they understood; opacity can be detected by the same test. The third is comprehensiveness over time: does the minister’s teaching, taken as a whole over months and years, cover the range of biblical truth — the gospel, the moral law, the covenants, the prophetic witness, the practical instruction of the apostles — without reducing to a narrow band of topics or a partisan emphasis on a few favored themes?
The safeguards on the teaching dimension are particularly important. The evaluation of teaching can become, in a corrupted form, the enforcement of internal consensus rather than the testing of fidelity to Scripture. The two are not the same, and they can come apart. A minister who is teaching what Scripture clearly teaches but at variance with current institutional emphasis is doing the work the office requires; a minister who is teaching current institutional emphasis but at variance with what Scripture clearly teaches is not. The evaluation must be capable of recognizing the difference, which means it must be conducted by those whose primary loyalty is to Scripture rather than to institutional continuity. Where this safeguard is absent, the teaching dimension becomes the most easily corrupted of the four, because teaching is the dimension most directly visible and therefore most readily managed by those who would manage perception rather than substance.
Dimension 3: Formation. The third dimension is the formation of members under the minister’s care — the long-term development of those entrusted to the ministry into mature, faithful, capable members of the body of Christ. This is the work described in Ephesians 4:11–13, where the gifts of pastors and teachers are given “for the perfecting of the saints, for the work of the ministry, for the edifying of the body of Christ; till we all come in the unity of the faith, and of the knowledge of the Son of God, unto a perfect man, unto the measure of the stature of the fulness of Christ.” Formation is the dimension that distinguishes ministry from religious entertainment or chaplaincy in the limited sense; it asks whether those who have been under a minister’s care are growing.
Formation is the most difficult dimension to evaluate, because growth is gradual, individual, and not always visible in the timeframe of evaluation. The evidence available is indirect. Are members under this ministry deepening in their understanding of Scripture over time? Are they exercising the gifts they have received? Are they bearing the burdens of others, loving their neighbors, raising their children in the faith, taking up the responsibilities of mature membership? Are those who have been under the ministry for a decade visibly more equipped than those who are newly arrived? The answers are not simple; many factors besides the ministry contribute to a member’s formation, and a minister cannot be held simply responsible for outcomes that depend on the member’s own response. But the question of whether the ministry contributes to formation, in the aggregate and over time, is a real question, and it is one that members themselves can speak to with reasonable accuracy when they are asked in structured ways.
The safeguards on this dimension are similarly important. Formation cannot be reduced to participation rates in particular programs, attendance at particular events, or any other proxy that mistakes activity for growth. It also cannot be reduced to subjective impressions of a member’s “spiritual maturity” detached from observable practice. Faithful evaluation of formation looks at the trajectory of those under the minister’s care over years, with attention to multiple indicators — including the testimony of members themselves about what has helped them grow and where they remain stuck — and resists the temptation to substitute measurable proxies for the actual phenomenon being measured.
Dimension 4: Mission. The fourth dimension is mission — the carrying out of whatever evangelistic and outreach responsibilities the assembly has, in faithful proportion to its actual calling. This dimension is the most contested in the setting this paper has framed, because the assembly’s discounting of outcome metrics is most pointed here. The proposal of this paper is that the discounting of outcomes does not entail the discounting of the work itself. The minister whose assembly does not expect mass conversions can still be evaluated on whether he is doing the outreach work the assembly’s understanding of mission requires — whether the work is being done at all, whether it is being done with care, whether the witness offered is faithful to the gospel, whether the minister’s life among those outside the assembly commends or compromises that witness.
The evaluation of mission proceeds by asking what the assembly understands its mission to be, and then asking whether the minister’s work is consistent with that understanding. If the mission is to maintain a faithful witness to the truth as opportunity arises, the evaluation asks whether the minister is taking such opportunities and conducting himself in them with appropriate care. If the mission includes deliberate outreach efforts of particular kinds, the evaluation asks whether those efforts are being carried out and what their texture is. The evaluation does not ask whether large numbers have been converted, because the assembly’s theology has explicitly placed that question outside the scope of what the minister can be held responsible for. But it does ask whether the work the minister can be held responsible for — the offering of witness, the doing of outreach in whatever form the assembly has determined, the maintenance of a life that adorns rather than discredits the gospel — is being done.
The safeguards on this dimension protect against two corruptions. The first is the corruption of overcounting effort: treating the mere existence of outreach activities as sufficient evidence that mission is being attended to, regardless of the texture of those activities. The second is the corruption of theology-driven evasion: using the discounting of outcomes as cover for the absence of effort, on the grounds that since outcomes do not matter, the work itself is dispensable. Both corruptions are real, and the second is more dangerous in the setting this paper has framed, because it has theological cover that the first does not. The evaluation of mission therefore requires honest distinction between what the assembly’s theology actually discounts (results that depend on God’s calling and the response of others) and what it does not discount (the faithful work of the minister in offering witness in the form the assembly’s understanding endorses).
The four dimensions together — care, teaching, formation, mission — describe the work the New Testament names for those who hold the office of elder, overseer, or minister. A balanced scorecard tracks all four; it does not allow any one to dominate, and it does not allow the discounting of any one to become the discounting of evaluation altogether. The scorecard is not a substitute for the texts; it is an instrument that makes the texts’ own concerns operative in the evaluation of the work.
4. Avoiding Loyalty Substitution
The most consequential corruption of ministerial evaluation in the setting this paper has framed is the substitution of loyalty for faithfulness. The two are easily confused, and the confusion is one of the predictable pathologies that follows from the measurement vacuum. A clear account of the difference is therefore necessary if the balanced scorecard is to do the work it is designed to do.
Faithfulness, in the New Testament sense, is the disposition of the steward who is true to what has been entrusted to him. Paul’s instruction in 1 Corinthians 4:2 — “Moreover it is required in stewards, that a man be found faithful” — places faithfulness at the center of the steward’s evaluation. The steward is faithful to the householder, which means faithful to what the householder has given him to do and faithful to the householder’s purposes in giving it. In the ministerial application, faithfulness is fidelity to Scripture, to the gospel, to the work the office requires, to the souls entrusted to the minister’s care, and ultimately to Christ as the head of the body. Faithfulness can be evaluated, and the New Testament does evaluate it; the texts speak of faithful and unfaithful stewards, and the difference between them is a difference in their relation to what they have been entrusted with.
Loyalty, in the sense that comes to operate in the corrupted form of evaluation, is something different. It is the disposition of the subordinate toward the institution and its current direction — willingness to support, defer, refrain from challenge, and align with prevailing emphasis. Loyalty in this sense is not biblically named as a virtue of the steward; it is, at most, a contingent good that may or may not align with faithfulness depending on whether the institution’s direction is itself aligned with what the steward is faithful to. When the institution’s direction is faithful, loyalty and faithfulness coincide, and the difference does not show. When the institution’s direction departs from faithfulness in some respect, the two come apart, and the question of which one is being evaluated becomes acute.
The substitution occurs when evaluation, having lost the explicit criteria for faithfulness, defaults to the more readily observable criterion of loyalty. A minister who is faithful but inconveniently so — who teaches what Scripture clearly says even when it cuts against current emphasis, who raises questions about practices that seem to him at variance with the texts, who declines to enforce expectations that he judges biblically unwarranted — appears, under the substituted criterion, as disloyal. A minister who is loyal but unfaithful in some respect — who teaches what is currently endorsed without rigorous reference to Scripture, who suppresses concerns rather than raising them, who enforces expectations regardless of their scriptural basis — appears, under the same substituted criterion, as the model of what the office requires. The evaluation, conducted under loyalty as the operative criterion, rewards the second and penalizes the first. This is not because the evaluators are corrupt; it is because the criterion is.
Several diagnostic markers indicate that loyalty has substituted for faithfulness in a particular evaluative culture. Each is worth naming.
The first marker is the praise of conformity rather than the praise of work. When ministers are commended in language that emphasizes their alignment with direction, their support of leadership, their absence of friction — rather than in language that names the substantive quality of their pastoral care, teaching, formation work, or mission engagement — the criterion in operation is loyalty. The praise pattern is itself diagnostic. Substantive praise names substantive work; loyalty praise names posture.
The second marker is the response to legitimate disagreement. In a culture where faithfulness is the operative criterion, a minister who raises a substantive concern receives a substantive response; the concern is engaged on its merits; the eventual judgment, whatever it is, is grounded in argument. In a culture where loyalty has substituted, a minister who raises a substantive concern is treated as having raised a question about his loyalty, and the response addresses that question rather than the substance. The pattern from Paper 3 — the rhetorical drift by which disagreement is recategorized as rebellion — operates here as a particular case of the larger loyalty substitution.
The third marker is the career trajectory of those who have raised substantive concerns and those who have not. Over years, the pattern of who advances and who does not within an institution reveals what is actually being rewarded. If those who have raised substantive concerns find their advancement halted, their assignments narrowed, their visibility reduced — while those who have raised no such concerns find advancement, assignment, and visibility flowing reliably — the operative criterion is loyalty regardless of what is said about faithfulness in formal documents. The pattern is observable and, over time, unmistakable.
The fourth marker is the internal language of evaluation itself. When ministers are spoken of, in evaluative settings, in terms that center on their relation to leadership — “he’s been very supportive,” “she’s a real team player,” “he understands where we’re going” — rather than in terms that center on the substance of their work — “his preparation is rigorous,” “her counseling has helped many,” “his teaching has formed several leaders in the local body” — the criterion in operation can be inferred from the vocabulary. The vocabulary follows the criterion; the criterion can be read off the vocabulary.
The recovery of faithfulness as the operative criterion requires explicit attention to each of these markers. Praise must be reformed to name substantive work. Response to disagreement must be reformed to engage the substance. Career trajectories must be reviewed to ensure that those who have raised legitimate concerns are not, by that fact, disadvantaged in their assignments and advancement. The internal language of evaluation must be retrained to name the dimensions of the work rather than the postures of the worker. None of these reforms is dramatic in any single case; together, over time, they restore the criterion the texts authorize.
A further observation is necessary, because the proposal can be misunderstood. The argument that loyalty must not substitute for faithfulness is not an argument that ministers should be free of any expectation of cooperation with the body’s direction. The body has direction; cooperation is necessary to its functioning; ministers who are persistently uncooperative without substantive ground are not, by virtue of their uncooperativeness, faithful. The point is that cooperation is a contingent good that should not be confused with the work being evaluated, and that the threshold for treating uncooperativeness as a fault must include serious examination of whether the uncooperativeness is groundless — which it sometimes is — or whether it is the response of a faithful steward to a direction he judges, on substantive grounds, to be at variance with the texts. The same uncooperative behavior can be evidence of either condition, and the criterion of faithfulness is what allows the distinction to be made.
5. Feedback Loops
The balanced scorecard requires evidence, and the evidence requires sources. Members of the assembly are among the most important sources of evidence about ministerial care, teaching effectiveness, and formation, because they are the recipients of the work and have direct knowledge of its texture from the receiving end. Fellow ministers and those who oversee are sources for other dimensions, including peer accountability and observation of work that members do not see directly. The development of feedback loops that gather this evidence honestly and use it constructively is the structural counterpart to the scorecard itself.
Several principles govern the design of these feedback loops.
The first principle is that feedback should not be adversarial in framing. The purpose of gathering member feedback is not to give members a tool for grievance against ministers, and the practice of gathering feedback should not produce that effect. Adversarial framing — treating feedback as a complaint mechanism, structuring it as the bringing of charges, positioning members against ministers in a contest — corrupts both the feedback and the relationships it touches. Constructive framing presents feedback as the body’s care for itself, its care for those who serve, and its responsibility before God for the quality of the work being done in its midst. Members are invited to contribute what they can see; ministers are positioned as those for whom the feedback is a service; the framing throughout is of mutual benefit rather than opposition.
The second principle is that feedback should be solicited rather than left to volunteer initiative. When feedback is left to volunteer initiative, what arrives is biased toward complaint — those most dissatisfied are most likely to speak — and toward the cases most easily articulated. Members whose pastoral care has been excellent often do not think to say so; members whose teaching experience has been formative often do not think to say so; the steady ordinary excellence of ministerial work goes unreported, while the sharper edges of dissatisfaction reach the evaluators. Solicited feedback — periodic, structured, and inclusive of the full range of members — produces a more accurate picture, and it does so without requiring members to take the initiative of bringing concerns that they may not be sure rise to the level of formal raising.
The third principle is that feedback should be confidential where appropriate, and identifiable where appropriate, with the distinction explicit. Some kinds of feedback — observations about pastoral care, comments on teaching, reports on what has helped a member grow — can be given more honestly when the giver knows that the feedback will not be attached to him personally in the minister’s hearing. Other kinds of feedback — formal concerns about doctrine or conduct, requests that require a response — must be identifiable because they cannot be addressed without conversation with the giver. The feedback structure should make the distinction clear: which kinds are gathered confidentially, which kinds require identification, and how the two are kept appropriately separate. Members who do not know which is which will default to caution, and the feedback will be impoverished accordingly.
The fourth principle is that feedback should reach those competent to act on it, and the action taken should be visible in some form to those who provided the feedback. Feedback that disappears into a process whose workings are invisible to its sources produces, over time, a sense among members that their input is not real — that giving it changes nothing — and the feedback dries up. Feedback whose handling is visible at the appropriate level — without breaching the privacy of any minister whose work is the subject — sustains the practice over time. The visibility need not be detailed. It needs to communicate that input was received, that it was considered, and that it has informed the assembly’s thinking in some way the giver can recognize.
The fifth principle is that feedback should be one input among several, not the sole determinant of evaluation. Members can speak to what they have experienced; they cannot speak to dimensions of the work they do not see. A minister’s preparation, his pastoral attention to those whose situations the wider body does not know about, his work with fellow ministers, his administrative responsibilities, his own character and conduct in settings the members do not observe — these are dimensions that require other sources of evidence. The use of member feedback as the sole determinant produces a distortion in the opposite direction from the loyalty substitution: it can elevate ministers who are publicly skilled but privately deficient, or penalize ministers whose private excellence is not visible to those whose feedback is sought. Member feedback is essential and irreplaceable; it is also partial, and the evaluation must combine it with other evidence to be balanced.
A further observation concerns the texture of the feedback itself. The questions asked of members shape what they can answer, and the questions can be asked well or poorly. Questions that ask members to rate their minister on a scale of 1 to 10 produce data that is essentially uninterpretable, because it is unclear what dimension is being rated and what the scale means. Questions that ask members to describe what their minister has done well, what they wish he had done differently, what teaching has been most helpful to them, what pastoral encounter has been formative — these produce data that is interpretable, because the data is anchored in particular reported experiences. The design of the feedback instrument is itself part of the work, and a poorly designed instrument is worse than no feedback at all, because it produces the appearance of evidence without the substance.
6. Deliverable: KPI Set with Definitions and Safeguards
The first deliverable of this paper is a set of key performance indicators corresponding to the four dimensions of the balanced scorecard, each with its definition, its means of measurement, and the safeguards that protect it from corruption. The set is illustrative; particular assemblies will adapt the indicators to their own circumstances and emphasize different aspects depending on the local context.
Under the dimension of care, the indicators are the regularity of pastoral contact with members in defined categories of need (the elderly, the chronically ill, those in crisis, those who have recently joined, those who have withdrawn from fellowship), the responsiveness of pastoral attention when situations arise (whether the minister is reachable and reaches out when needed, within appropriate timeframes), and the quality of pastoral encounter as reported by those who have received it (gathered through the feedback structures of section five, anchored in the recipients’ specific accounts of what helped). The means of measurement are records the minister himself maintains of pastoral contact, reports from members gathered through structured feedback, and the judgment of fellow ministers where appropriate observation has occurred. The safeguards are the protection of pastoral confidentiality (records describe contact, not content), the avoidance of count-driven distortion (regularity rather than total number is the metric, and quality is weighted equally), and the recognition of long-term carrying as legitimate care.
Under the dimension of teaching, the indicators are fidelity of doctrine to Scripture as evaluated by competent peers and against the texts directly, clarity as reported by hearers and observed by evaluators familiar with the audience, and comprehensiveness over time as evaluated by review of the minister’s teaching across months and years for breadth across the range of biblical truth. The means of measurement are review of recorded or written teaching by competent peers, structured feedback from members on what has been understood and what has been formative, and periodic examination of teaching coverage against a rough framework of biblical material. The safeguards are the requirement that fidelity be evaluated against Scripture rather than against current institutional emphasis, the protection of legitimate distinctive emphasis that may not match the prevailing direction but remains within scriptural fidelity, and the avoidance of evaluation by those whose primary loyalty is to institutional consensus rather than to the texts.
Under the dimension of formation, the indicators are the trajectory of members under the minister’s care over years (whether members are growing in understanding, in exercise of gifts, in mature responsibility), the testimony of members about what has helped them grow, and the visible difference between members who have been under the ministry for a decade and those who are newly arrived. The means of measurement are longitudinal observation, structured feedback from members specifically on formation experiences, and the kind of qualitative review that asks whether the body under this ministry is, in the aggregate, being built up. The safeguards are the recognition that formation depends on many factors besides the ministry, the avoidance of crude proxies (program participation, event attendance) for the actual phenomenon of growth, and the willingness to evaluate over timeframes that are long enough for formation to be visible — typically several years rather than a single annual cycle.
Under the dimension of mission, the indicators are the existence and texture of whatever outreach activities the assembly’s understanding of mission endorses, the consistency of the minister’s life with the witness he offers (his conduct toward those outside the assembly being consistent with the gospel he proclaims), and the faithfulness of the witness offered in its content (the gospel preached being the gospel of Scripture, in proportion and emphasis). The means of measurement are review of outreach activities undertaken, observation of the minister’s conduct in settings outside the assembly’s internal life, and examination of the content of witness offered. The safeguards are explicit theological honesty about what is and is not being measured (faithfulness to mission as the assembly understands it, not numerical results that the theology has placed outside the minister’s responsibility), and resistance to the corruption that would use the discounting of results as cover for the absence of work.
Across all four dimensions, two cross-cutting safeguards apply. The first is that no minister should be evaluated solely on a single dimension; the balanced scorecard is balanced precisely so that strength in one area does not excuse weakness in another and weakness in one area does not eclipse strength in others. The second is that evaluation should be conducted with the seriousness the work deserves, on schedules that allow patterns to be visible, and with the kind of personal engagement between evaluator and evaluated that produces understanding rather than mere data — the New Testament’s pattern of evaluation is relational, and the scorecard does not displace that pattern but disciplines it.
7. Deliverable: Reporting Template That Avoids “Proof-of-Nothing Work”
The second deliverable is a template for periodic ministerial reports that gathers evidence about the work without producing the institutional pathology that elsewhere has been called “proof-of-nothing work” — the production of documentation whose function is to demonstrate that activity has occurred without conveying anything substantive about what was done. The template is designed to extract real information with reasonable economy of effort, and to resist the corruption that turns reporting into a performance of compliance.
The template covers a defined period — a quarter, a half-year, a year, depending on the assembly’s rhythm — and is organized around the four dimensions of the scorecard.
For care, the report describes the texture of pastoral attention given during the period: the categories of members attended to, the patterns of contact, the significant pastoral situations carried (described in terms that protect confidentiality but convey what was carried), the situations that proved difficult and how they were handled, and the situations the minister judges he could have handled better and what he is doing in response. The report does not list every contact made; it describes the pattern and gives examples that convey what the work has been.
For teaching, the report describes the topics taught, the texts addressed, the questions encountered from members and the responses offered, the areas in which the minister judges his preparation has been particularly fruitful, and the areas in which he judges his preparation needs strengthening. The report includes, where appropriate, copies of significant teaching pieces or notes for review by competent peers. It does not list every sermon delivered; it describes the trajectory of teaching across the period and engages reflectively with its quality.
For formation, the report describes what the minister has observed about the development of those under his care: members who have grown noticeably in particular ways, situations of struggle that have been worked through, the body’s movement toward or away from the marks of mature membership the minister is watching for. The report acknowledges the limits of what the minister can claim about formation — many factors contribute, and the minister’s own assessment of his contribution must be modest — and offers his honest reading of where the body is and where he hopes to see growth.
For mission, the report describes the outreach activity undertaken in whatever form the assembly’s mission endorses, the texture of the minister’s life among those outside the assembly during the period, the content of witness offered, and the minister’s reflection on the consistency of his life with the witness he carries. The report does not claim results that the theology has placed outside its scope; it does report on the work the minister is responsible for.
A final section invites the minister’s own reflection on his work during the period: what he has learned, what he is wrestling with, where he wants to grow, what support or correction he is requesting from those who oversee. This section is not optional. It is the place where the minister speaks for himself, and it is one of the protections against the report becoming an exercise in institutional performance.
The template is designed to produce reports that are substantively informative, reasonable in the time they require to produce, and honest in their texture. It is also designed to make clear that the report is not the evaluation; the report is one input into evaluation, alongside member feedback, peer observation, and direct conversation between the minister and those who oversee. A report that is honest about a difficult quarter is more valuable than a report that has been polished into the appearance of uniform success, and the evaluation culture must reward the first and treat the second with the appropriate skepticism.
8. Conclusion
When the most readily measurable outcomes of ministerial work are theologically discounted, the question of how the work is to be evaluated does not disappear; it becomes more difficult, and the cost of failing to address it explicitly is the drift of evaluation toward criteria that cannot do the work the New Testament requires. The drift is not toward no evaluation but toward evaluation under unstated criteria — tone, demeanor, conformity, loyalty — that are not what the texts authorize and that cannot, over time, sustain the actual work of ministry that the texts describe.
The proposal of this paper has been a balanced scorecard that names the four dimensions of the work — care, teaching, formation, mission — that the New Testament itself names, with indicators and safeguards for each, supported by feedback structures that gather honest evidence from those who have it and disciplined reporting that avoids the institutional pathology of documentation as performance. The proposal is not radical; the texts have always asked the assembly to evaluate the work of its ministers, and the categories the proposal uses are the texts’ own. What the proposal offers is the discipline of making the evaluation explicit, balanced, and sustainable in a setting where the absence of explicit alternatives has allowed the criterion of loyalty to substitute for the criterion of faithfulness.
The cost of failing to develop such an evaluation is borne by the ministers whose work is poorly understood, by the members whose care is invisibly diminished as ministerial effort drifts toward what produces feedback, and by the assembly whose capacity to know whether its work is being done well is impaired in proportion to the vagueness of its evaluation. The cost of developing it is the discipline of attention — the willingness to ask what is being measured, to defend the criteria, to gather the evidence honestly, and to act on what is found. The cost is not small, but it is the cost of caring about the work, and the work is worth caring about.
The paper that follows takes up a question that has been adjacent to several of the discussions in this suite and now requires direct treatment: the boundary between divine governance and human administration, and the theological care required when an ecclesial arrangement claims to participate in or instantiate the government of God. The connection to the present paper is direct. The criterion of faithfulness, as opposed to loyalty, presupposes that there is a ground to which the steward is faithful that is not identical to any human institution; the criterion of loyalty presupposes that the institution is itself the ground. Which presupposition is operative depends, in the end, on the theological account of where the government of God is and how it relates to whatever human administration is at hand. That theological account is what the next paper undertakes to clarify.
