|
Year : 2020 | Volume
: 63
| Issue : 5 | Page : 25-29 |
|
A grading dilemma; Gleason scoring system: Are we sufficiently compatible? A multi center study |
|
Yelda Dere1, Özgür Ilhan Çelik1, Serkan Yasar Çelik1, Sümeyye Ekmekçi2, Gözde Evcim3, Fatma Pehlivan4, Anıl Ağalar5, Hasan Deliktaş6, Nil Çulhacı7
1 Department of Pathology, Faculty of Medicine, Mugla Sitki Kocman University, Izmir, Turkey 2 Department of Pathology, Tepecik Training and Research Hospital, Izmir, Turkey 3 Department of Pathology, Çiğli Region Education Hospital, Izmir, Turkey 4 Department of Pathology, Tinaztepe Special Hospital, Izmir, Turkey 5 Department of Pathology, Faculty of Medicine, 9 Eylul University, Izmir, Turkey 6 Department of Urology, Faculty of Medicine, Mugla Sitki Kocman University, Muğla, Turkey 7 Department of Pathology, Faculty of Medicine, Adnan Menderes University, Aydin, Turkey
Click here for correspondence address and email
Date of Web Publication | 26-Feb-2020 |
|
|
 |
|
Abstract | | |
Objective: Gleason scoring is the grading system which strongly predicts the prognosis of prostate cancer. However, even being one of the most commonly used systems, the presence of different interobserver agreement rates push the uropathologists update the definitons of the Gleason patterns. In this study, we aimed to determine the interobserver agreement variability among 7 general pathologists, and one expert uropathologist from 6 different centers. Methods: A set of 50 Hematoxylin & Eosin stained slides from 41 patients diagnosed as prostate cancer were revised by 8 different pathologists. The pathologists were also grouped according to having their residency at the same institute or working at the same center. All pathologists' and the subgroups' Gleason scores were then compared for interobserver variability by Fleiss' and Cohen's kappa tests using R v3.2.4. Results: There were about 8 pathologists from 6 different centers revised all the slides. One of them was an expert uropathologist with experience of 18 years. Among 7 general pathologists 4 had surgical pathology experience for over 5 years whilst 3 had under 5 years. The Fleiss' kappa was found as 0.54 for primary Gleason pattern, and 0.44 for total Gleason score (moderate agreement). The Fleiss' kappa was 0.45 for grade grouping system. Conclusion: Assigning a Gleason score for a patient can be problematic because of different interobserver agreement rates among pathologists even though the patterns were accepted as well-defined.
Keywords: Gleason score, interobserver variability, prostate cancer
How to cite this article: Dere Y, Çelik &I, Çelik SY, Ekmekçi S, Evcim G, Pehlivan F, Ağalar A, Deliktaş H, Çulhacı N. A grading dilemma; Gleason scoring system: Are we sufficiently compatible? A multi center study. Indian J Pathol Microbiol 2020;63, Suppl S1:25-9 |
How to cite this URL: Dere Y, Çelik &I, Çelik SY, Ekmekçi S, Evcim G, Pehlivan F, Ağalar A, Deliktaş H, Çulhacı N. A grading dilemma; Gleason scoring system: Are we sufficiently compatible? A multi center study. Indian J Pathol Microbiol [serial online] 2020 [cited 2023 Sep 24];63, Suppl S1:25-9. Available from: https://www.ijpmonline.org/text.asp?2020/63/5/25/279531 |
Introduction | |  |
Prostate cancer is the most common type of cancer in men in addition to being the second most common cause of death.[1] Gleason scoring is the widely recommended, and used system for grading prostate cancer.[2]
Gleason grading system is mainly being taught to all pathology residents by the first year of their education. And during their education all the residents came across many prostatic needle biopsies diagnosed as adenocarcinoma with a given Gleason score. In addition to that, after residency general pathologists also reviews many prostatic needle biopsies, diagnoses prostatic adenocarcinomas and gives Gleason scores to these tumors. However, the exact reason for interobserver variability is unclear and was thought to be associated with many different factors such as lack of education or lack of experience.[3],[4]
The most widely used diagnostic approach for patients with high prostate spesific antigen (PSA) levels is the prostatic needle biopsies in our country as well as all over the world. The Gleason scores given to prostate cancer biopsies help urologists draw an optimal treatment strategy for their patients. Thus, as being one of the most important predictive factors of prostate cancer, Gleason score itself sometimes is the only factor guiding the treatment decisions. In addition to that the reproducibility of Gleason score becomes even more important when the patients decide to be treated in different centers. The reproducibility of Gleason score among pathologists is widely variable.
Depending on these different interobserver agreement rates, we decided to assess interobserver reproducibility rates of Gleason grading of prostate cancer among 8 pathologists from 6 different centers, and put forth our approach to this problem.
Methods | |  |
A set of 50 Hematoxylin & Eosin stained glass slides diagnosed as prostatic adenocarcinoma was prepared for scoring. The cases were selected randomly from hospital records. All of the slides were prostatic needle biopsies done by the same urologist. The only exclusion criteria was the lack of slides of the patients due to patient's choice of being treated in a different center. In preparing the set all the cases were re-numbered from 1 to 50 to ensure the most simple way for scoring and statistical analysis. The same slides were revised by all the pathologists in order to prevent from diagnostic problems due to serial sectioning.
In addition a simple questionnaire targeting to assess the histopathological approach and personal experience was also prepared and the set and the questionnaire was sent to all the 8 participant pathologists. The main target of the questionnaire was to assess the pathologists' perspective for scoring such as tertiary pattern usage in needle biopsies, immunohistochemical antibodies used for scoring and the main additional parameters given in the pathology reports.
The set of glass slides and the questionnaire was then sent to the all parcipitant pathologists one by one and asked for return in one month time. The scores were then exported to an excel file for statistical analysis.
The pathologists were also subgrouped according to being worked at the same center or being residents at the same center because these can be the criteria affecting agreement rates.
The total number of slides with exact agreement which is defined as the slides to which all the participitants gave the same primary, secondary, and total Gleason scores is also noted. The patterns were given according to the 2014 ISUP revision.[5]
Kappa value for interobserver agreement was calculated for primary Gleason score, secondary Gleason score, total Gleason score in addition to K values of each subgroup and grade grouping. Kappa value was calculated by using the statistical programme of R v3.2.4 (x64) and the strength of K was accepted as follows; Kappa 0.00-0.20 very low agreement, 0.21-0.40 low agreement, 0.41-0.60 moderate, 0.61-0.80 high agreement, 0.81-1 perfect agreement.[6]
This study was supported by a Project from the Scientific Research Projects Management Unit of Muǧla Sıtkı Koçman University (Grant number: 15/086), approved by the ethics committe of the same center (19/2015) and a written informed consent was taken from all the patients included in the study.
Results | |  |
Among the participitant pathologists; one is an expert uropathologist with 18 years of experience working at university hospital. One of the general pathologists was also interested in uropathology for two years at her center. Others are general pathologists with a range of experience from 1 to 8 years. Out of them, 3 of them are working at the same university hospital, 2 are working at different training and reasearch hospitals, 1 is working in a state hospital, and the last one is working at a private hospital. Out of it 3 of the pathologists had surgical pathology experience of less than 5 years, whilst 5 of them had more than 5 years. About 3 pathologists had their residency at the same university hospital, another 3 at the same training and research hospital, and 2 at another training and research hospital.
According to the questionnaire; about 7 pathologists (87.5%) have attended a course focusing on this area, and 3 of them (37.5%) has taken a couse at the last one year. But one of these 7 pathologists (12.5%) answered as still need to have a course. About 4 pathologists (50%) answered as experience is the most important factor affecting agreement whilst 1 pathologist (12.5%) reported as fresh knowledge is the most important one and other 3 pathologists (37.5%) reported both.
The statistical analysis showed Fleiss' kappa for primary Gleason pattern as 0.54 (moderate agreement), for secondary Gleason pattern as 0.34 (low agreement) and for total Gleason score as 0.44 (moderate agreement). When compared with grade grouping as proposed by Epstein et al.[5] the Fleiss' kappa was calculated as 0.45 (moderate agreement) between all the pathologists.
Among 50 slides; 24 slides were “consensus” cases in which 70% of the pathologists gave the same score. The three most common Gleason scores of the these cases were “3 + 3 = 6“(10 cases) [Figure 1], 3 + 4 = 7 (9 cases) [Figure 2], 4 + 4 = 8 (3 cases). | Figure 1: A histopathological appearance of an example of consensus cases for Gleason score 3 + 3 = 6
Click here to view |
 | Figure 2: A histopathological appearance of an example of consensus cases for Gleason score 3 + 4 = 7
Click here to view |
After further subgrouping; kappa values among pathologists working at the same center was found as 0.59 for primary Gleason pattern and 0.48 for total Gleason score. Working at the same center was found as a factor increasing level of agreement.
Having residency at the same center was found as one of the most important factor for the level of agreement but the most important factor was found as having residency and working at the same center [Table 1]. | Table 1: Kappa values for primary Gleason pattern and total Gleason score for every
Click here to view |
Discussion | |  |
Interobserver variability is slightly more common in scoring systems in general pathology practice. Gleason grading is also one of these scoring systems. However, it almost always cause many problems, and one of the most frequent one is confused patients and clinicians. The confusion of clinicians generally occurs when patients want to get a second opinion for the diagnosis or the scores. The importance of this variability lies beneath the difference in the treatment options or prognostic prediction suggested by the clinicians.
Thus, reducing interobserver variability should be a target for every uropathology association worldwide.[3],[4],[7],[8] The difference of interobserver variability may depend on many factors such as lack of experience or knowledge, lack of the chance of getting a second opinion, being unaware of new revisions of grading systems, untracking of the current literature, unattending to courses because of money or time problems. The most dangerous reason is self confidency generally seen in new residents especially in problematic cases. In addition to that undergrading of prostatic carcinoma is reported as one of the most common problem for Gleason grading.[9],[10]
In a study from Iran; Abdollahi et al. reported a kappa value of 0.25 before and 0.52 after a web-based education where web-based educations were found attractive because of the low costs.[11] Web- based educations are relatively uncommon in our country however, many pathologists tend to attend interactive courses and educational seminars arranged by different pathology societies. Also Griffiths et al. reported low kappa value (0.33) before and moderate aggreement (k = 0.41) after a teaching session among United Kingdom (UK) pathologists.[12] Allsbrook et al. worked on “consensus” cases and found moderate aggreement among 41 general pathologists (k = 0.43), and pointed out that one of the most important factors affecting aggreement is learning Gleason score at a course.[3] These studies may reflect the strength of education on pathologists' diagnostic approach.
Working together at the same center for a long time can also be a factor for better agreement results since pathologist is a human-being and a second opinion supporting him or coming up with a new idea is almost always needed and welcomed. But, the rotten apple injures its neighbours. If you work at the same center with a colleague over years you may realize that your choices generally start to be similar. Because of this, professors or expert pathologists should be accessible and modest that every general or uropathologist can get a second opinion from them without being offended. Luckily, getting a second opinion or consultation is readily acceptible in our country.
The new Gleason grading system has already started being used worldwide and also in our country.[5] When Fleiss' kappa was calculated according to this grading group system, the agreement status was found as moderate (K = 0.45). This result was slightly better than other studies in the literature.[13]
Muezzinoglu et al. reported the evaluation of prostatic biopsy samples, the use of tertiary pattern may be different in different centers.[14] The same study also showed that reporting of Gleason score is even different among uropathology working group members of Turkey such as 46.4% of them report individual Gleason score for each biopsy core. In another study from Turkey, Ozdamar et al. reported 70.8% of interobserver variability whereas different authors reported different results ranging from 0.43-0.70[15] [Table 2]. | Table 2: Other studies focusing interobserver agreement of Gleason score among needle biopsies
Click here to view |
Rodrigues-Urredo et al. studied all these variabilities according to microscopic and digital assessment of the biopsies and reported similar interobserver aggreement results by the two methods.[7] This can increase the use of telepathology and digital consultation which allows you to access experts easily. However in standards of our country, digital slide scanning can be used in only senior pathology centers. This inability can create opportunity when experts use this method to detect the approach of the general pathologist and educate them via web-based seminars.
Conclusion | |  |
The interobserver variability of Gleason scoring is still a common problem among general and uropathologists as well as the urologists. This problem may be overcomed by web-based educations, free access of articles specialized upon this subject worldwide, disseminate new revisions free by internet or journals and continue to arrange more on-site meetings among pathologists. Therefore, more and more courses targeting strictly newly announced scoring revisions or just scoring biopsies in which multiple attendants have chance of observing the same slide sections at the same time are needed. For the management of prostate cancer, being aware of this problem may enlighten the way of treatment in cases having different scores from different pathologists.
Financial support and sponsorship
This study was supported by a Project from the Scientific Research Projects Management Unit of Muǧla Sıtkı Koçman University (Grant number: 15/086).
Conflicts of interest
There are no conflicts of interest.
References | |  |
1. | Billis A, Guimaraes MS, Freitas LL, Meirelles L, Magna LA, Ferreira U. The impact of the 2005 international society of urological pathology consensus conference on standard Gleason grading of prostatic carcinoma in needle biopsies. J Urol 2008;180:548-52. |
2. | The Royal College of Pathologists. Standards and Minimum datasets for Reporting Common Cancers. Minimum Dataset for Prostate Cancer Histopathology Reports. London: The Royal College of Pathologists; 2000. |
3. | Allsbrook WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein JI. Interobserver reproducibility of Gleason grading of prostatic carcinoma: General pathologists. Hum Pathol 2001;32:81-8. |
4. | Allsbrook WC Jr, Mangold KA, Johnson MH, Lane RB, Lane CG, Amin MB, et al. Interobserver reproducibility of Gleason grading of prostatic carcinoma: Urologic pathologists. Hum Pathol 2001;32:74-80. |
5. | Epstein JI, Egevad L, Amin MB, Delahunt B, Srigley JR, Humphrey PA; Grading Committee. The 2014 International Society of Urological Pathology (ISUP) Consensus conference on Gleason grading of prostatic carcinoma: Definition of grading patterns and proposal for a new grading system. Am J Surg Pathol 2016;40:244-52. |
6. | Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378-82. |
7. | Rodriguez-Urrego PA, Cronin AM, Al-Ahmadie HA, Gopalan A, Tickoo SK, Reuter VE, et al. Interobserver and intraobserver reproducibility in digital and routine microscopic assessment of prostate needle biopsies. Hum Pathol 2011;42:68-74. |
8. | Egevad L, Ahmad AS, Algaba F, Berney DM, Boccon-Gibod L, Compérat E, et al. Standardization of Gleason grading among 337 European pathologists. Histopathology 2013;62:247-56. |
9. | Rubin MA, Dunn R, Kambham PA, Misick CP, O'Toole KM. Should a Gleason score be assigned to a minute focus of carcinoma on prostate biopsy? Am J Surg Pathol 2000;24:1634-40. |
10. | Steinberg DM, Sauvageot J, Piantadosi S, Epstein JI. Correlation of prostate needle biopsy and radical prostatectomy Gleason grade in academic and community setting. Am J Surg Pathol 1997;21:566-76. |
11. | Abdollahi A, Sheikhbahaei S, Meysamie A, Bakhshandeh M, Hosseinzadeh H. Inter-observer reproducibility before and after web-based education in the Gleason grading of the prostate adenocarcinoma among the Iranian pathologists. Acta Med Iran 2014;52:370-4. |
12. | Griffiths DF, Melia J, McWilliam LJ, Ball RY, Grigor K, Harnden P, et al. A study of Gleason score interpretation in different groups of UK pathologists; techniques for improving reproducibility. Histopathology 2006;48:655-62. |
13. | Ozkan TA, Eruyar AT, Cebeci OO, Memik O, Ozcan L, Kuskonmaz I. Interobserver variability in Gleason histological grading of prostate cancer. Scand J Urol 2016;50:420-4. |
14. | Muezzinoglu B, Yorukoglu K. Current practice in handling and reporting prostate needle biopsies: Results of a Turkish survey. Pathol Res Practice 2015;211:374-80. |
15. | Ozdamar SO, Sarikaya S, Yildiz L, Atilla MK, Kandemir B, Yildiz S. Intraobserver and interobserver reproducibility of WHO and Gleason histologic grading systems in prostatic adenocarcinomas. Int Urol Nephrol 1996;28:73-7. |
16. | Lessels AM, Burnett RA, Howatson SR, Lang S, Lee FD, McLaren KM, et al. Observer variability in the histopathological reporting of needle biopsy specimens of the prostate. Hum Pathol 1997;28:646-9. |
17. | Melia J, Moseley R, Ball RY, Griffiths DFR, Grigor K, Harnden P, et al. A UK-based investigation of inter- and intra-observer reproducibility of Gleason grading of prostatic biopsies. Histopathology 2006;48:644-54. |
18. | Oyama T, Allsbrook WC Jr, Kurokawa K, Matsuda H, Segawa A, Sano T, et al. A comparison of interobserver reproducibility of Gleason grading of prostatic carcinoma in Japan and the United States. Arch Pathol Lab Med 2005;129:1004-10. |
19. | Veloso SG, Lima MF, Salles PG, Berenstein CK, Scalon JD, Bambirra EA. Interobserver agreement of Gleason score and modified Gleason score in needle biopsy and in surgical specimen of prostate cancer. Int Braz J Urol 2007;33:639-51. |
20. | Bori R, Salamon F, Móczár C, Cserni G.Interobserver reproducibility of Gleason grading in prostate biopsy samples. Orv Hetil 2013;154:1219-25. |

Correspondence Address: Yelda Dere Department of Pathology, Mugla Sitki Kocman University, Faculty of Medicine, Mugla Turkey
 Source of Support: None, Conflict of Interest: None  | Check |
DOI: 10.4103/IJPM.IJPM_288_18

[Figure 1], [Figure 2]
[Table 1], [Table 2] |
|
This article has been cited by | 1 |
Diagnostic significance of reassessment of prostate biopsy specimens by experienced urological pathologists at a high-volume institution |
|
| Yoichiro Okubo, Yayoi Yamamoto, Shinya Sato, Emi Yoshioka, Masaki Suzuki, Kota Washimi, Kimito Osaka, Takahisa Suzuki, Tomoyuki Yokose, Takeshi Kishida, Yohei Miyagi | | Virchows Archiv. 2022; | | [Pubmed] | [DOI] | | 2 |
The Classification Power of Classical and Intra-voxel Incoherent Motion (IVIM) Fitting Models of Diffusion-weighted Magnetic Resonance Images: An Experimental Study |
|
| Ruba Alkadi, Osama Abdullah, Naoufel Werghi | | Journal of Digital Imaging. 2022; | | [Pubmed] | [DOI] | | 3 |
From Editor's desk |
|
| Ranjan Agrawal | | Indian Journal of Pathology and Microbiology. 2020; 63(5): 1 | | [Pubmed] | [DOI] | |
|
|
 |
 |
|
|
|
|
|
|
Article Access Statistics | | Viewed | 3980 | | Printed | 111 | | Emailed | 0 | | PDF Downloaded | 175 | | Comments | [Add] | | Cited by others | 3 | |
|

|