Introduction
The availability of reliable and proper data is indispensable in manufacturing. If we cannot trust the
data set we measured, we are not able to improve our processes, and cannot define appropriate actions. Such as we validate our manufacturing
processes with capability studies, testing and measurement, so we have to validate our measurement processes as well. Proper measurement and
testing is the basis of quality and
Statistical Process Control (SPC). The
variation of products have two factors, one is manufacturing process variation, while the other one is the variation of the measurement. In
other words, the influencing factors of the final value being measured are the measured sample (true value) and the measurement uncertainty
(measurement system variation).
Every measurement and test system has a so called "measurement uncertainty", and this can bring measurement
errors. Our intention must be to decrease this uncertainty as small as possible, to increase the quality of measurement data we gathered. In
order to validate our measurement systems, we need to be sure if our system is capable to measure, hence we need to perform
Measurement System Analysis (MSA).
Generally, the MSA is focusing on variable characteristics (continuous data), but there are methods to
analyse a system, that evaluates attribute characteristics (e.g. OK / NOK or GO / NO-GO). During an attribute inspection / testing, appraisers
and test systems need to classify parts (see difference of variable and attribute characteristics in the following table).
Variable and attribute characteristics |
Examples for Variable characteristic (continuous data) |
Examples for Attribute characteristic |
Diameter of a caliber in mm |
Blue or green |
Width of a coated surface in microns |
Pass or Fail |
Capacity of a capacitor in micro Farad (uF) |
Number of pre-defined defects on a die-casted aluminum part |
Our intention is always to reach the highest level of reliability of these classifications
(e.g. all appraisers classify good parts as good ones, bas parts as bad ones, and don’t misclassify any of them).
Source: qMindset.com
Key Features
In order to answer the following questions, we need to perform an Attribute Agreement Analysis (AAA) or an
Attribute RR:
- Do the appraisers (operators) agree with themselves?
- Do the appraisers agree with each other?
- Do the appraisers agree with the reference?
To answer these questions, we need to evaluate mathematically the connection between the decisions.
Complex Attribute Agreement Analysis (Example 1): we have 3 operators (m), evaluating 10 parts (n)
with 2 trials (r). Remark: for an effective AAA we should use at least 20 parts, and the number of trials should be at least 2.
Step 1: Collect the decision results of each evaluation on a data collection sheet for each operator, for
each part and for each trial. The reference value should be also indicated in the right column. Number 1 means pass, while number 0 means fail.
A-1 means the first trial of operator A.
AAA data collection sheet |
Part |
A-1 |
A-2 |
B-1 |
B-2 |
C-1 |
C-2 |
Reference |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
2 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
3 |
0 |
1 |
1 |
0 |
1 |
0 |
0 |
4 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
5 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
6 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
7 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
8 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
9 |
1 |
1 |
1 |
1 |
0 |
1 |
1 |
10 |
1 |
1 |
1 |
1 |
1 |
1 |
1 |
Step 2: Calculate repeatability for each appraiser (operator).
%Appraiser Repeatability = 100 * Number of parts, where all trials agree / n
where n = the number of parts.
In the case of appraiser A, the %Repeatability is 90% (100 * 9 / 10), as the operator classified 9 parts
with the same result during the two trials.
Step 3: Calculate reproducibility. First, we create the cross tabulation for each observer pair (A-B, B-C
and A-C).
Cross tabulation (Source: qMindset.com)
We fill up the table based on the inspected results:
- a = 4, which means, that the operators agreed on evaluating the part as fail (0) four times.
- b = 1, so there was one trial, when operator A chose fail (0), and B chose pass (1).
- c = 1, so there was one additional trial, when operator A chose pass (1), but operator B chose fail (0).
- d = 14, which means, that both operator evaluated the part as pass (1) fourteen times.
Next, we summarize the number, receiving the total row and column (a + b), (a + c), (c + d) and (b + d). The
total sum is N. We need these numbers for calculating proportion numbers (p0 and pE), and the Kappa.
p0 = the sum of observed proportion, pE = the sum of expected proportions.
The kappa value gives us the statistical inter-rater agreement (proposed by Cohen), in other words it is the
proportion of agreement corrected for chance. When K = -1, the appraisers totally disagree, K = 0 means no agreement above expected (only by chance),
while K = 1 means perfect (or total) agreement.
Sum of observed proportions, expected proportions and the kappa value (Source: qMindset.com; Jacob Cohen: "A coefficient of agreement for nominal scales")
The following thresholds are key to decide about inter-rater agreement:
AAA data collection sheet |
Inter-rater agreement number (kappa) |
Appraiser agreement |
K > 0.75 |
Good agreement (sufficient) |
0.4 < K < 0.75 |
Improvement necessary |
K < 0.4 |
Poor agreement, restructuring of decision making is necessary |
Step 4: Calculate the appraiser effectiveness for each appraiser (operator) with the following formulae:
%Effectiveness = (Number of correct decisions) / (Total opportunities for decision)
In case of appraiser A the %Effectiveness is 95% (100 * 19 / 20), as the appraiser classified the part
correctly 19 times, compared to the overall 20 trial opportunities (10 parts * 2 trials).
Simple Attribute RR (Example 2): we have 2 operators (m), evaluating 10 parts (n) with 2 trials (r).
Simple attribute R&R collection sheet |
Sample |
Attribute |
Appraiser 1 |
Appraiser 2 |
Complete R&R |
R&R vs Attribute |
|
Experiment 1 |
Experiment 2 |
Experiment 1 |
Experiment 2 |
|
1 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
2 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
3 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
4 |
Red |
Blue |
Red |
Red |
Red |
NOK |
NOK |
5 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
6 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
7 |
Red |
Blue |
Blue |
Blue |
Blue |
OK |
NOK |
8 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
9 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
10 |
Blue |
Blue |
Blue |
Blue |
Blue |
OK |
OK |
Accuracy |
80% |
90% |
90% |
80% |
Repeatability |
90% |
100% |
Meaning of variables and their calculations:
- Accuracy = number of correctly rated parts / number of rated parts (e.g. 8 / 10 = 80%).
- Repeatability = number of same ratings / number of rated parts (e.g. 9 / 10 = 90%).
- Complete R&R = number of same ratings by all appraisers / number of rated parts (e.g. 9 / 10 = 90%).
- R&R vs Attribute = number of same ratings by all operators and versus the reference / number of rated parts (e.g. 8 / 10 = 80%).
What do the numbers tell us?
- The accuracy of Appraiser A is 80%, as he evaluated 8 / 10 samples correctly (both times).
- The accuracy of Appraiser B is 90%, as he evaluated 9 / 10 samples correctly (both times).
- The repeatability of Appraiser A is 90%, as he did not agreed with himself once out of 10 times.
- The repeatability of Appraiser B is 100%, as he always agreed with his own first decision for the second time as well.
- The complete R&R of the two appraisers is 90%, as 9 / 10 times both appraisers agreed with each other.
- But the R&R including the hit rate of the real attribute is only 80%, as the two appraisers made at least one wrong decision in the
case of two samples.
- Important: the requested level of the R&R or the R&R vs Attribute score depends on the maturity of the company. There are organizations,
which request 100% R&R vs Attribute, which means a "full hit rate".
Source: qMindset.com;
Hints
Before conducting any improvement or intervention in your process, be sure, that the noticed deviation is
the result of the process, and not your measurement system. In addition, SPC is not efficient if your measurement system is incapable, as the
measured results are distorted, and may be far away from the true value.
Prior to the AAA or Attribute R&R study, select parts from the true variation range, otherwise your study will
become worthless. Why? Because we analyse and evaluate the measurement system, and not the parts. If you choose very similar products (with
very low variation), the measurement system variation will have a much higher contribution to the whole variation, despite it is not being the
truth.
Operators must not know which part they measure or evaluate, to avoid any prejudice that would affect their
decision making.
Source: qMindset.com