NEA Issue Brief: An analysis of principal nuclear issues
No. 2, January 1988
The human factor in nuclear power plant operation
I. What is the human factor? In a complex industrial facility such as a nuclear power plant, the majority of the tasks are performed by machines. But man is, of course, involved to a great extent in their design, testing, maintenance and operation. The performance of a person working within a complex mechanical system depends on that person's capabilities, limitations and attitudes, as well as on the quality of instructions and training provided. The interface between a machine and its operators in any industrial project is usually known as the human factor. II. What human factor issues must be considered? Human error can occur at every stage in the life of a nuclear facility and thus a variety of methods must be used to detect and prevent this. Among the most important aspects are the following: *Task analysis* Because some tasks are assigned to machines and others to humans, it is important to know how these functions are allocated and the exact description of the tasks assigned to humans. The functions and tasks of plant operators and maintenance personnel, and how their activities are coordinated, must be fully understood. A task analysis can determine what personnel is needed, how its members should be selected, what should be included in training programs, and other technical issues. In some countries, a specific data base to analyse operator tasks has been developed to assist management in selecting, testing and training personnel, and in evaluating control room instrumentation and procedures. *Personnel Hiring and Organisation* A person's skills, personality and experience must be carefully reviewed during the hiring process to determine which candidates are best suited to operate and maintain a nuclear facility. For this purpose, nuclear plant managers can make use of the techniques developed for the selection of airline pilots and others. By listing the qualifications thought to be related to a particular position and comparing them with the qualifications of those who have excelled at that task, a job description can be drawn up and periodically analysed to verify its effectiveness. Since a nuclear power plant is run continuously, personnel must work in shifts, giving shift duration and rotation, alertness and other efficiency issues the same important role in the nuclear industry as they have in other safety-conscious industries like the airlines. Quality management of plant staff is also highly important, because the way in which the work is organised, staffed, manned, supervised, evaluated and rewarded will determine the effectiveness, productivity and safety of the facility. For example, the way in which information is transferred from one shift to the next can significantly affect the safety of the plant. *Operator Training and Testing* The lack of proper training, as well as operational procedures, has been a major cause of human error in the nuclear industry. This was a principal factor in the accidents at Three Mile Island (TMI) in the U.S. (1979) and Chernobyl in the USSR (1986). Greater emphasis is now being placed on such training issues as the use of simulators, case studies, computer-assisted training, team training techniques and better evaluation of training programmes. An examination system, carefully planned and executed by experienced personnel, is another important element. Examinations based on a task analysis can help ensure that all requisite skills and knowledge are included, while reducing the possibility that an operator will be required to demonstrate skills or knowledge that are not necessary to perform the job. *Procedures* In addition to training, operators also follow written instructions, called procedures, for carrying out normal plant operations, particularly those which are very complex, not often performed, or which depend on access to large amounts of numerical data. Procedures for normal and emergency operations must be technically accurate, well-defined and entirely comprehensible. The presentation of procedures for routine maintenance, calibration and testing of equipment differs from operating procedures. For example, while maintaining clarity and conciseness, more detail should be included, especially if the task is not often repeated. The NEA has provided practical advice along these lines to its Member countries. Operators also frequently use other aids, such as computers, electronic displays and computer-based information systems which inform them of the status of the plant and alert them to any changes in that status. This equipment should be designed to give clear and unambiguous indication of the need for action (e.g., an alarm). In recent years, the form and content of the procedures provided to operators to cope with emergency situations (known as Emergency Operating Procedures) have come under increased scrutiny. When the plant approaches a degraded condition, these procedures provide operator guidance on how to verify the adequacy of critical safety functions, and how to restore and maintain them when they are degraded. An NEA Task Force, established in 1983, reviewed these emergency procedures and compiled and assessed descriptions of EOP-related practices. As a result, the need to give more consideration to the human factor in this area was recognised. *Control Room Design and Layout* Errors by control room personnel have often been caused by designs that did not take human limitations into account. During the accident at TMI, many alarms went off simultaneously, and operators were unable to monitor the plant adequately. Since this accident, many modifications have been carried out in existing plants to reduce the probability of design-induced error. Improvements in control room design, layout, and work environment can lead to the prevention of accidents or better management of accidents if they occur, although care must also be taken to avoid causing human errors by changing layouts or designs with which the operator is familiar. *Reporting* It is also important to compile statistical data on the number and kind of human errors which occur in nuclear power plants through the proper use of a well-designed reporting system. Each time an event occurs which is out of the ordinary, a form is completed describing the event, its probable cause and other pertinent information. If human error data is correctly entered on this form, it can help to assess the likelihood of accidents and to evaluate changes in control room procedures and training programmes. However, it is difficult to obtain complete accuracy without some form of protection for those reporting the incident. The subject of reporting is discussed in greater detail in Section V. *Equipment Design, Maintenance and Testing* Human errors occur when machines are improperly designed or built or when they are poorly maintained. Errors in system design can only be eliminated by a thorough evaluation or testing prior to operation. A preliminary study in the planning stage should explicitly determine how the system may fail and what safeguards have been incorporated by the designer to prevent or mitigate such failures. Accidents which could jeopardize public health and safety may be caused by human error during maintenance and testing activities, the most spectacular example, of course, being the Chernobyl accident in the USSR. This is a particularly vulnerable area because safety systems are designed primarily to cope with human error during normal plant operations and not during the testing and maintenance of the safety systems themselves. Some problems can be resolved by improving the identification of equipment and access to it, providing better technical manuals and written procedures, and by designing better tools and instruments. Maintenance errors can be reduced further by improving the work environment -- for example, by avoiding extreme temperatures, noise and inadequate lighting. Human error during test and calibration activities has also been attributed to inadequate organisation of these activities, design of the equipment or limitations of the maintenance personnel. III. How can human reliability be assessed? Overall system reliability in a nuclear power plant is more often dependent on individuals than on the equipment. Although a human reliability assessment has the same objective as an assessment of equipment reliability, in the latter case, logical methods are used to study the structure of the system and the role of the designed safeguards. For human reliability assessment, there are no equivalent methods for identifying significant potential human failures on a purely logical basis, and great reliance is placed on the expertise and experience of the assessor. It is difficult to evaluate human performance qualitatively because a decision can be affected by many psychological factors. For example, individuals may vary in their performance of well-defined tasks, depending on their familiarity with the task, their state of fatigue, what other tasks have to be performed, a changing physical environment at work or a tense psychological environment at home, and many other factors. Nevertheless, NEA Member countries have recognised the need for a classification system to identify and define human errors, and in 1983, the Group of Experts on Human Error Data and Assessment suggested the principal elements of such a system. A three-level model of human thought processes was developed, and different types of mental error were identified for each level: errors in trained skills, such as clumsiness; errors in learned rules, such as forgetfulness; and errors in creative thinking, such as incorrect interpretation of an event. All of these can cause critical mistakes in operating a nuclear power plant. IV. How can human error be analysed? A Probabilistic Safety, or Risk, Assessment is the method used by the nuclear industry to calculate and compare different accident scenarios and to identify those areas of greatest concern. Since human beings are infinitely complex, predicting their performance is particularly difficult. Nevertheless, if it could be done, even with limited accuracy, it would contribute greatly to such a safety assessment. A method has been developed by the nuclear industry to help estimate the probable occurrence of procedural errors, based on an extensive task analysis of each human action evaluated. This method concentrates on mechanical tasks, with little analysis of the thinking behind human actions. For example, it identifies errors in reading and implementing emergency operating procedures but not errors caused by faulty knowledge or reasoning during an event. These are called cognitive errors. For analysing and quantifying human errors made by operators responding to an accident sequence, cognitive errors must be explicitly considered. Under accident conditions, an operator must first diagnose the nature of the accident before selecting the appropriate procedures and recovery action. Errors of diagnosis are more frequent than procedural errors or those which result from misread instruments. Cognitive errors can be divided into four categories: Making an incorrect diagnosis of an accident situation and continuing to act on it despite information from the plant that contradicts the diagnosis. This frequently occurs when an operator makes a firm decision early in the accident sequence and dismisses contradictory information as instrument error. The opposite behavior is seen when an operator frequently changes a response decision without any technical basis for such changes. This failure to follow a systematic course of action can be brought about by the stress of having to make an urgent and vital decision. A third cause of human error is due to the limitations of short-term memory. In order for information such as instrument readings or procedures to be remembered over a short period of time, it must be repeated at frequent intervals. Under the stress of accident conditions, such information can be forgotten as new facts or tasks are added. With time, operators subconsciously learn to respond quickly and correctly to the normal operating behaviour of a plant by building up a simplified mental image of the interactions and responses of the system. However, when a reactor experiences an abnormal event, operators may fail to recognise the differences between the plant's unfamiliar behaviour and their expectations, and may respond incorrectly. When an accident sequence occurs, operators may: Fail to realise that an event has occurred, Fail to diagnose the event correctly and identify proper responses to it, or Fail to take timely or proper corrective actions. Designing systems so that they increase the time available for operators to respond to abnormal conditions can help resolve these problems. When they realise that the plant is not responding as expected, they will have time to analyse the situation and implement the proper corrective actions. It is hard to assess these errors by the data bank approach used for procedural errors because of the difficulty of observing diagnostic and other hidden thought processes. The alternative is to use the judgement of individuals who have experienced these errors in plant or simulator situations, or who have other appropriate knowledge. This can help assess the likelihood of human failure. Such individuals may be plant designers, operators, trainers, human factor specialists, risk analysts, or others who have expertise in the area and who are experienced in quantitative thinking. In 1984, an NEA Task Force reviewed detailed descriptions of systems used by Member countries to identify potentially significant human actions. The Task Force's finding that assessment methods rely extensively on the knowledge and experience of the persons who performed the analysis led the NEA, in 1985, to a closer study of the use of expert judgement to quantify human reliability. V. How can human factor data be collected? The need for qualitative information to support conventional statistical error analysis has been demonstrated. There are at least three ways to collect such information: by in-depth event reports submitted by plant personnel; by on-site investigation of significant abnormal events carried out by experienced human factor experts; and by the use of simulators. The NEA Group of Experts mentioned in Section III recommended a system of collecting information based on the use of detailed reports on the circumstances leading up to the incident, to be submitted by plant personnel, and the use of teams of specialists to analyse selected important events in greater detail. Among the categories of information recommended for inclusion in incident reports were: The exact nature of the error (e.g., omission of task or action, wrong action, wrong piece of equipment); Factors relating to the general work situation (was the task routine or unfamiliar, performed under difficult physical working conditions, on night-shift, etc.); Which mental function failed (wrong decision made, wrong action taken); Why it failed (the person was distracted, had the wrong information, was ill); and How it failed (describes the psychological mechanism involved, such as absentmindness). In June 1984, an NEA Working Group studied the methods used in Member countries to analyse events in nuclear power plants involving human error. The results show that some countries have set up a specific system for analysing these incidents, and that site visits are the most effective way to gather information and identify root causes. Written reports seldom contain enough information for the purpose. In some countries, a human performance evaluation specialist is responsible for the analysis of unplanned reactor events, and for making recommendations to correct the root causes of human performance problems. Simulators are also used to accumulate human error data in the performance of individual tasks during abnormal events. However, there are systematic differences between the training simulator situation and a real incident. In a simulator situation, the operator naturally has no need to fear that a high risk event will occur or that serious consequences will follow, and therefore does not experience the same stress as during a real event. Furthermore, because of the high cost of simulator time, a simulated accident is limited to 30 minutes instead of the several hours that a real incident could take. VI. Conclusions There are obviously many ways to avoid human error, for example, distinctive and consistent labelling of equipment, control panels and documents; displaying information concerning the state of the plant so that the operator easily understands it and does not make a faulty diagnosis; and designing systems to give unambiguous responses to operator actions so incorrect actions can be easily identified. Systems should also be designed to limit the need for human intervention, overcome failures due to human causes or at least minimise their consequences. Human factor studies are now advancing rapidly in many countries. Greater attention is being paid to human needs in designing equipment, and efforts are being made to learn from experience in order to correct past errors. In 1984, the NEA decided to collect information relating to human factor issues by means of a Newsletter to which Member countries contribute descriptions of current projects. Through this Newsletter and regular meetings of the various NEA committees and task forces, the role of the human factor is being thoroughly studied. Current NEA work in the human factor area is focussed on: the need for operators to be better trained to understand what happens during plant emergencies, including the use of simulators; analysis of the misinterpretation of plant status by operators; and evaluating the use of digital computers in the control room.

NEA Issue Brief: An analysis of principal nuclear issues

No. 2, January 1988

The human factor in nuclear power plant operation

I. What is the human factor?

In a complex industrial facility such as a nuclear power plant, the majority of the tasks are performed by machines. But man is, of course, involved to a great extent in their design, testing, maintenance and operation. The performance of a person working within a complex mechanical system depends on that person's capabilities, limitations and attitudes, as well as on the quality of instructions and training provided. The interface between a machine and its operators in any industrial project is usually known as the human factor.

II. What human factor issues must be considered?

Human error can occur at every stage in the life of a nuclear facility and thus a variety of methods must be used to detect and prevent this. Among the most important aspects are the following:

Task analysis

Because some tasks are assigned to machines and others to humans, it is important to know how these functions are allocated and the exact description of the tasks assigned to humans. The functions and tasks of plant operators and maintenance personnel, and how their activities are coordinated, must be fully understood. A task analysis can determine what personnel is needed, how its members should be selected, what should be included in training programs, and other technical issues. In some countries, a specific data base to analyse operator tasks has been developed to assist management in selecting, testing and training personnel, and in evaluating control room instrumentation and procedures.

Personnel Hiring and Organisation

A person's skills, personality and experience must be carefully reviewed during the hiring process to determine which candidates are best suited to operate and maintain a nuclear facility. For this purpose, nuclear plant managers can make use of the techniques developed for the selection of airline pilots and others. By listing the qualifications thought to be related to a particular position and comparing them with the qualifications of those who have excelled at that task, a job description can be drawn up and periodically analysed to verify its effectiveness. Since a nuclear power plant is run continuously, personnel must work in shifts, giving shift duration and rotation, alertness and other efficiency issues the same important role in the nuclear industry as they have in other safety-conscious industries like the airlines. Quality management of plant staff is also highly important, because the way in which the work is organised, staffed, manned, supervised, evaluated and rewarded will determine the effectiveness, productivity and safety of the facility. For example, the way in which information is transferred from one shift to the next can significantly affect the safety of the plant.

Operator Training and Testing

The lack of proper training, as well as operational procedures, has been a major cause of human error in the nuclear industry. This was a principal factor in the accidents at Three Mile Island (TMI) in the U.S. (1979) and Chernobyl in the USSR (1986). Greater emphasis is now being placed on such training issues as the use of simulators, case studies, computer-assisted training, team training techniques and better evaluation of training programmes.

An examination system, carefully planned and executed by experienced personnel, is another important element. Examinations based on a task analysis can help ensure that all requisite skills and knowledge are included, while reducing the possibility that an operator will be required to demonstrate skills or knowledge that are not necessary to perform the job.

Procedures

In addition to training, operators also follow written instructions, called procedures, for carrying out normal plant operations, particularly those which are very complex, not often performed, or which depend on access to large amounts of numerical data. Procedures for normal and emergency operations must be technically accurate, well-defined and entirely comprehensible. The presentation of procedures for routine maintenance, calibration and testing of equipment differs from operating procedures. For example, while maintaining clarity and conciseness, more detail should be included, especially if the task is not often repeated. The NEA has provided practical advice along these lines to its Member countries.

Operators also frequently use other aids, such as computers, electronic displays and computer-based information systems which inform them of the status of the plant and alert them to any changes in that status. This equipment should be designed to give clear and unambiguous indication of the need for action (e.g., an alarm).

In recent years, the form and content of the procedures provided to operators to cope with emergency situations (known as Emergency Operating Procedures) have come under increased scrutiny. When the plant approaches a degraded condition, these procedures provide operator guidance on how to verify the adequacy of critical safety functions, and how to restore and maintain them when they are degraded. An NEA Task Force, established in 1983, reviewed these emergency procedures and compiled and assessed descriptions of EOP-related practices. As a result, the need to give more consideration to the human factor in this area was recognised.

Control Room Design and Layout

Errors by control room personnel have often been caused by designs that did not take human limitations into account. During the accident at TMI, many alarms went off simultaneously, and operators were unable to monitor the plant adequately. Since this accident, many modifications have been carried out in existing plants to reduce the probability of design-induced error. Improvements in control room design, layout, and work environment can lead to the prevention of accidents or better management of accidents if they occur, although care must also be taken to avoid causing human errors by changing layouts or designs with which the operator is familiar.

Reporting

It is also important to compile statistical data on the number and kind of human errors which occur in nuclear power plants through the proper use of a well-designed reporting system. Each time an event occurs which is out of the ordinary, a form is completed describing the event, its probable cause and other pertinent information. If human error data is correctly entered on this form, it can help to assess the likelihood of accidents and to evaluate changes in control room procedures and training programmes. However, it is difficult to obtain complete accuracy without some form of protection for those reporting the incident. The subject of reporting is discussed in greater detail in Section V.

Equipment Design, Maintenance and Testing

Human errors occur when machines are improperly designed or built or when they are poorly maintained. Errors in system design can only be eliminated by a thorough evaluation or testing prior to operation. A preliminary study in the planning stage should explicitly determine how the system may fail and what safeguards have been incorporated by the designer to prevent or mitigate such failures.

Accidents which could jeopardize public health and safety may be caused by human error during maintenance and testing activities, the most spectacular example, of course, being the Chernobyl accident in the USSR. This is a particularly vulnerable area because safety systems are designed primarily to cope with human error during normal plant operations and not during the testing and maintenance of the safety systems themselves. Some problems can be resolved by improving the identification of equipment and access to it, providing better technical manuals and written procedures, and by designing better tools and instruments. Maintenance errors can be reduced further by improving the work environment -- for example, by avoiding extreme temperatures, noise and inadequate lighting. Human error during test and calibration activities has also been attributed to inadequate organisation of these activities, design of the equipment or limitations of the maintenance personnel.

III. How can human reliability be assessed?

Overall system reliability in a nuclear power plant is more often dependent on individuals than on the equipment. Although a human reliability assessment has the same objective as an assessment of equipment reliability, in the latter case, logical methods are used to study the structure of the system and the role of the designed safeguards. For human reliability assessment, there are no equivalent methods for identifying significant potential human failures on a purely logical basis, and great reliance is placed on the expertise and experience of the assessor.

It is difficult to evaluate human performance qualitatively because a decision can be affected by many psychological factors. For example, individuals may vary in their performance of well-defined tasks, depending on their familiarity with the task, their state of fatigue, what other tasks have to be performed, a changing physical environment at work or a tense psychological environment at home, and many other factors. Nevertheless, NEA Member countries have recognised the need for a classification system to identify and define human errors, and in 1983, the Group of Experts on Human Error Data and Assessment suggested the principal elements of such a system. A three-level model of human thought processes was developed, and different types of mental error were identified for each level: errors in trained skills, such as clumsiness; errors in learned rules, such as forgetfulness; and errors in creative thinking, such as incorrect interpretation of an event. All of these can cause critical mistakes in operating a nuclear power plant.

IV. How can human error be analysed?

A Probabilistic Safety, or Risk, Assessment is the method used by the nuclear industry to calculate and compare different accident scenarios and to identify those areas of greatest concern. Since human beings are infinitely complex, predicting their performance is particularly difficult. Nevertheless, if it could be done, even with limited accuracy, it would contribute greatly to such a safety assessment.

A method has been developed by the nuclear industry to help estimate the probable occurrence of procedural errors, based on an extensive task analysis of each human action evaluated. This method concentrates on mechanical tasks, with little analysis of the thinking behind human actions. For example, it identifies errors in reading and implementing emergency operating procedures but not errors caused by faulty knowledge or reasoning during an event. These are called cognitive errors.

For analysing and quantifying human errors made by operators responding to an accident sequence, cognitive errors must be explicitly considered. Under accident conditions, an operator must first diagnose the nature of the accident before selecting the appropriate procedures and recovery action. Errors of diagnosis are more frequent than procedural errors or those which result from misread instruments. Cognitive errors can be divided into four categories:

Making an incorrect diagnosis of an accident situation and continuing to act on it despite information from the plant that contradicts the diagnosis. This frequently occurs when an operator makes a firm decision early in the accident sequence and dismisses contradictory information as instrument error.
The opposite behavior is seen when an operator frequently changes a response decision without any technical basis for such changes. This failure to follow a systematic course of action can be brought about by the stress of having to make an urgent and vital decision.
A third cause of human error is due to the limitations of short-term memory. In order for information such as instrument readings or procedures to be remembered over a short period of time, it must be repeated at frequent intervals. Under the stress of accident conditions, such information can be forgotten as new facts or tasks are added.
With time, operators subconsciously learn to respond quickly and correctly to the normal operating behaviour of a plant by building up a simplified mental image of the interactions and responses of the system. However, when a reactor experiences an abnormal event, operators may fail to recognise the differences between the plant's unfamiliar behaviour and their expectations, and may respond incorrectly.

When an accident sequence occurs, operators may:

Fail to realise that an event has occurred,
Fail to diagnose the event correctly and identify proper responses to it, or
Fail to take timely or proper corrective actions.

Designing systems so that they increase the time available for operators to respond to abnormal conditions can help resolve these problems. When they realise that the plant is not responding as expected, they will have time to analyse the situation and implement the proper corrective actions.

It is hard to assess these errors by the data bank approach used for procedural errors because of the difficulty of observing diagnostic and other hidden thought processes. The alternative is to use the judgement of individuals who have experienced these errors in plant or simulator situations, or who have other appropriate knowledge. This can help assess the likelihood of human failure. Such individuals may be plant designers, operators, trainers, human factor specialists, risk analysts, or others who have expertise in the area and who are experienced in quantitative thinking.

In 1984, an NEA Task Force reviewed detailed descriptions of systems used by Member countries to identify potentially significant human actions. The Task Force's finding that assessment methods rely extensively on the knowledge and experience of the persons who performed the analysis led the NEA, in 1985, to a closer study of the use of expert judgement to quantify human reliability.

V. How can human factor data be collected?

The need for qualitative information to support conventional statistical error analysis has been demonstrated. There are at least three ways to collect such information: by in-depth event reports submitted by plant personnel; by on-site investigation of significant abnormal events carried out by experienced human factor experts; and by the use of simulators.

The NEA Group of Experts mentioned in Section III recommended a system of collecting information based on the use of detailed reports on the circumstances leading up to the incident, to be submitted by plant personnel, and the use of teams of specialists to analyse selected important events in greater detail. Among the categories of information recommended for inclusion in incident reports were:

The exact nature of the error (e.g., omission of task or action, wrong action, wrong piece of equipment);
Factors relating to the general work situation (was the task routine or unfamiliar, performed under difficult physical working conditions, on night-shift, etc.);
Which mental function failed (wrong decision made, wrong action taken);
Why it failed (the person was distracted, had the wrong information, was ill); and
How it failed (describes the psychological mechanism involved, such as absentmindness).

In June 1984, an NEA Working Group studied the methods used in Member countries to analyse events in nuclear power plants involving human error. The results show that some countries have set up a specific system for analysing these incidents, and that site visits are the most effective way to gather information and identify root causes. Written reports seldom contain enough information for the purpose. In some countries, a human performance evaluation specialist is responsible for the analysis of unplanned reactor events, and for making recommendations to correct the root causes of human performance problems.

Simulators are also used to accumulate human error data in the performance of individual tasks during abnormal events. However, there are systematic differences between the training simulator situation and a real incident. In a simulator situation, the operator naturally has no need to fear that a high risk event will occur or that serious consequences will follow, and therefore does not experience the same stress as during a real event. Furthermore, because of the high cost of simulator time, a simulated accident is limited to 30 minutes instead of the several hours that a real incident could take.

VI. Conclusions

There are obviously many ways to avoid human error, for example, distinctive and consistent labelling of equipment, control panels and documents; displaying information concerning the state of the plant so that the operator easily understands it and does not make a faulty diagnosis; and designing systems to give unambiguous responses to operator actions so incorrect actions can be easily identified. Systems should also be designed to limit the need for human intervention, overcome failures due to human causes or at least minimise their consequences.

Human factor studies are now advancing rapidly in many countries. Greater attention is being paid to human needs in designing equipment, and efforts are being made to learn from experience in order to correct past errors. In 1984, the NEA decided to collect information relating to human factor issues by means of a Newsletter to which Member countries contribute descriptions of current projects. Through this Newsletter and regular meetings of the various NEA committees and task forces, the role of the human factor is being thoroughly studied. Current NEA work in the human factor area is focussed on: the need for operators to be better trained to understand what happens during plant emergencies, including the use of simulators; analysis of the misinterpretation of plant status by operators; and evaluating the use of digital computers in the control room.