Indian Journal of Pathology and Microbiology
Home About us Instructions Submission Subscribe Advertise Contact e-Alerts Ahead Of Print Login 
Users Online: 53936
Print this page  Email this page Bookmark this page Small font sizeDefault font sizeIncrease font size

  Table of Contents    
Year : 2020  |  Volume : 63  |  Issue : 5  |  Page : 25-29
A grading dilemma; Gleason scoring system: Are we sufficiently compatible? A multi center study

1 Department of Pathology, Faculty of Medicine, Mugla Sitki Kocman University, Izmir, Turkey
2 Department of Pathology, Tepecik Training and Research Hospital, Izmir, Turkey
3 Department of Pathology, Çiğli Region Education Hospital, Izmir, Turkey
4 Department of Pathology, Tinaztepe Special Hospital, Izmir, Turkey
5 Department of Pathology, Faculty of Medicine, 9 Eylul University, Izmir, Turkey
6 Department of Urology, Faculty of Medicine, Mugla Sitki Kocman University, Muğla, Turkey
7 Department of Pathology, Faculty of Medicine, Adnan Menderes University, Aydin, Turkey

Click here for correspondence address and email

Date of Web Publication26-Feb-2020


Objective: Gleason scoring is the grading system which strongly predicts the prognosis of prostate cancer. However, even being one of the most commonly used systems, the presence of different interobserver agreement rates push the uropathologists update the definitons of the Gleason patterns. In this study, we aimed to determine the interobserver agreement variability among 7 general pathologists, and one expert uropathologist from 6 different centers. Methods: A set of 50 Hematoxylin & Eosin stained slides from 41 patients diagnosed as prostate cancer were revised by 8 different pathologists. The pathologists were also grouped according to having their residency at the same institute or working at the same center. All pathologists' and the subgroups' Gleason scores were then compared for interobserver variability by Fleiss' and Cohen's kappa tests using R v3.2.4. Results: There were about 8 pathologists from 6 different centers revised all the slides. One of them was an expert uropathologist with experience of 18 years. Among 7 general pathologists 4 had surgical pathology experience for over 5 years whilst 3 had under 5 years. The Fleiss' kappa was found as 0.54 for primary Gleason pattern, and 0.44 for total Gleason score (moderate agreement). The Fleiss' kappa was 0.45 for grade grouping system. Conclusion: Assigning a Gleason score for a patient can be problematic because of different interobserver agreement rates among pathologists even though the patterns were accepted as well-defined.

Keywords: Gleason score, interobserver variability, prostate cancer

How to cite this article:
Dere Y, Çelik &I, Çelik SY, Ekmekçi S, Evcim G, Pehlivan F, Ağalar A, Deliktaş H, Çulhacı N. A grading dilemma; Gleason scoring system: Are we sufficiently compatible? A multi center study. Indian J Pathol Microbiol 2020;63, Suppl S1:25-9

How to cite this URL:
Dere Y, Çelik &I, Çelik SY, Ekmekçi S, Evcim G, Pehlivan F, Ağalar A, Deliktaş H, Çulhacı N. A grading dilemma; Gleason scoring system: Are we sufficiently compatible? A multi center study. Indian J Pathol Microbiol [serial online] 2020 [cited 2023 Sep 24];63, Suppl S1:25-9. Available from:

   Introduction Top

Prostate cancer is the most common type of cancer in men in addition to being the second most common cause of death.[1] Gleason scoring is the widely recommended, and used system for grading prostate cancer.[2]

Gleason grading system is mainly being taught to all pathology residents by the first year of their education. And during their education all the residents came across many prostatic needle biopsies diagnosed as adenocarcinoma with a given Gleason score. In addition to that, after residency general pathologists also reviews many prostatic needle biopsies, diagnoses prostatic adenocarcinomas and gives Gleason scores to these tumors. However, the exact reason for interobserver variability is unclear and was thought to be associated with many different factors such as lack of education or lack of experience.[3],[4]

The most widely used diagnostic approach for patients with high prostate spesific antigen (PSA) levels is the prostatic needle biopsies in our country as well as all over the world. The Gleason scores given to prostate cancer biopsies help urologists draw an optimal treatment strategy for their patients. Thus, as being one of the most important predictive factors of prostate cancer, Gleason score itself sometimes is the only factor guiding the treatment decisions. In addition to that the reproducibility of Gleason score becomes even more important when the patients decide to be treated in different centers. The reproducibility of Gleason score among pathologists is widely variable.

Depending on these different interobserver agreement rates, we decided to assess interobserver reproducibility rates of Gleason grading of prostate cancer among 8 pathologists from 6 different centers, and put forth our approach to this problem.

   Methods Top

A set of 50 Hematoxylin & Eosin stained glass slides diagnosed as prostatic adenocarcinoma was prepared for scoring. The cases were selected randomly from hospital records. All of the slides were prostatic needle biopsies done by the same urologist. The only exclusion criteria was the lack of slides of the patients due to patient's choice of being treated in a different center. In preparing the set all the cases were re-numbered from 1 to 50 to ensure the most simple way for scoring and statistical analysis. The same slides were revised by all the pathologists in order to prevent from diagnostic problems due to serial sectioning.

In addition a simple questionnaire targeting to assess the histopathological approach and personal experience was also prepared and the set and the questionnaire was sent to all the 8 participant pathologists. The main target of the questionnaire was to assess the pathologists' perspective for scoring such as tertiary pattern usage in needle biopsies, immunohistochemical antibodies used for scoring and the main additional parameters given in the pathology reports.

The set of glass slides and the questionnaire was then sent to the all parcipitant pathologists one by one and asked for return in one month time. The scores were then exported to an excel file for statistical analysis.

The pathologists were also subgrouped according to being worked at the same center or being residents at the same center because these can be the criteria affecting agreement rates.

The total number of slides with exact agreement which is defined as the slides to which all the participitants gave the same primary, secondary, and total Gleason scores is also noted. The patterns were given according to the 2014 ISUP revision.[5]

Kappa value for interobserver agreement was calculated for primary Gleason score, secondary Gleason score, total Gleason score in addition to K values of each subgroup and grade grouping. Kappa value was calculated by using the statistical programme of R v3.2.4 (x64) and the strength of K was accepted as follows; Kappa 0.00-0.20 very low agreement, 0.21-0.40 low agreement, 0.41-0.60 moderate, 0.61-0.80 high agreement, 0.81-1 perfect agreement.[6]

This study was supported by a Project from the Scientific Research Projects Management Unit of Muǧla Sıtkı Koçman University (Grant number: 15/086), approved by the ethics committe of the same center (19/2015) and a written informed consent was taken from all the patients included in the study.

   Results Top

Among the participitant pathologists; one is an expert uropathologist with 18 years of experience working at university hospital. One of the general pathologists was also interested in uropathology for two years at her center. Others are general pathologists with a range of experience from 1 to 8 years. Out of them, 3 of them are working at the same university hospital, 2 are working at different training and reasearch hospitals, 1 is working in a state hospital, and the last one is working at a private hospital. Out of it 3 of the pathologists had surgical pathology experience of less than 5 years, whilst 5 of them had more than 5 years. About 3 pathologists had their residency at the same university hospital, another 3 at the same training and research hospital, and 2 at another training and research hospital.

According to the questionnaire; about 7 pathologists (87.5%) have attended a course focusing on this area, and 3 of them (37.5%) has taken a couse at the last one year. But one of these 7 pathologists (12.5%) answered as still need to have a course. About 4 pathologists (50%) answered as experience is the most important factor affecting agreement whilst 1 pathologist (12.5%) reported as fresh knowledge is the most important one and other 3 pathologists (37.5%) reported both.

The statistical analysis showed Fleiss' kappa for primary Gleason pattern as 0.54 (moderate agreement), for secondary Gleason pattern as 0.34 (low agreement) and for total Gleason score as 0.44 (moderate agreement). When compared with grade grouping as proposed by Epstein et al.[5] the Fleiss' kappa was calculated as 0.45 (moderate agreement) between all the pathologists.

Among 50 slides; 24 slides were “consensus” cases in which 70% of the pathologists gave the same score. The three most common Gleason scores of the these cases were “3 + 3 = 6“(10 cases) [Figure 1], 3 + 4 = 7 (9 cases) [Figure 2], 4 + 4 = 8 (3 cases).
Figure 1: A histopathological appearance of an example of consensus cases for Gleason score 3 + 3 = 6

Click here to view
Figure 2: A histopathological appearance of an example of consensus cases for Gleason score 3 + 4 = 7

Click here to view

After further subgrouping; kappa values among pathologists working at the same center was found as 0.59 for primary Gleason pattern and 0.48 for total Gleason score. Working at the same center was found as a factor increasing level of agreement.

Having residency at the same center was found as one of the most important factor for the level of agreement but the most important factor was found as having residency and working at the same center [Table 1].
Table 1: Kappa values for primary Gleason pattern and total Gleason score for every

Click here to view

   Discussion Top

Interobserver variability is slightly more common in scoring systems in general pathology practice. Gleason grading is also one of these scoring systems. However, it almost always cause many problems, and one of the most frequent one is confused patients and clinicians. The confusion of clinicians generally occurs when patients want to get a second opinion for the diagnosis or the scores. The importance of this variability lies beneath the difference in the treatment options or prognostic prediction suggested by the clinicians.

Thus, reducing interobserver variability should be a target for every uropathology association worldwide.[3],[4],[7],[8] The difference of interobserver variability may depend on many factors such as lack of experience or knowledge, lack of the chance of getting a second opinion, being unaware of new revisions of grading systems, untracking of the current literature, unattending to courses because of money or time problems. The most dangerous reason is self confidency generally seen in new residents especially in problematic cases. In addition to that undergrading of prostatic carcinoma is reported as one of the most common problem for Gleason grading.[9],[10]

In a study from Iran; Abdollahi et al. reported a kappa value of 0.25 before and 0.52 after a web-based education where web-based educations were found attractive because of the low costs.[11] Web- based educations are relatively uncommon in our country however, many pathologists tend to attend interactive courses and educational seminars arranged by different pathology societies. Also Griffiths et al. reported low kappa value (0.33) before and moderate aggreement (k = 0.41) after a teaching session among United Kingdom (UK) pathologists.[12] Allsbrook et al. worked on “consensus” cases and found moderate aggreement among 41 general pathologists (k = 0.43), and pointed out that one of the most important factors affecting aggreement is learning Gleason score at a course.[3] These studies may reflect the strength of education on pathologists' diagnostic approach.

Working together at the same center for a long time can also be a factor for better agreement results since pathologist is a human-being and a second opinion supporting him or coming up with a new idea is almost always needed and welcomed. But, the rotten apple injures its neighbours. If you work at the same center with a colleague over years you may realize that your choices generally start to be similar. Because of this, professors or expert pathologists should be accessible and modest that every general or uropathologist can get a second opinion from them without being offended. Luckily, getting a second opinion or consultation is readily acceptible in our country.

The new Gleason grading system has already started being used worldwide and also in our country.[5] When Fleiss' kappa was calculated according to this grading group system, the agreement status was found as moderate (K = 0.45). This result was slightly better than other studies in the literature.[13]

Muezzinoglu et al. reported the evaluation of prostatic biopsy samples, the use of tertiary pattern may be different in different centers.[14] The same study also showed that reporting of Gleason score is even different among uropathology working group members of Turkey such as 46.4% of them report individual Gleason score for each biopsy core. In another study from Turkey, Ozdamar et al. reported 70.8% of interobserver variability whereas different authors reported different results ranging from 0.43-0.70[15] [Table 2].
Table 2: Other studies focusing interobserver agreement of Gleason score among needle biopsies

Click here to view

Rodrigues-Urredo et al. studied all these variabilities according to microscopic and digital assessment of the biopsies and reported similar interobserver aggreement results by the two methods.[7] This can increase the use of telepathology and digital consultation which allows you to access experts easily. However in standards of our country, digital slide scanning can be used in only senior pathology centers. This inability can create opportunity when experts use this method to detect the approach of the general pathologist and educate them via web-based seminars.

   Conclusion Top

The interobserver variability of Gleason scoring is still a common problem among general and uropathologists as well as the urologists. This problem may be overcomed by web-based educations, free access of articles specialized upon this subject worldwide, disseminate new revisions free by internet or journals and continue to arrange more on-site meetings among pathologists. Therefore, more and more courses targeting strictly newly announced scoring revisions or just scoring biopsies in which multiple attendants have chance of observing the same slide sections at the same time are needed. For the management of prostate cancer, being aware of this problem may enlighten the way of treatment in cases having different scores from different pathologists.

Financial support and sponsorship

This study was supported by a Project from the Scientific Research Projects Management Unit of Muǧla Sıtkı Koçman University (Grant number: 15/086).

Conflicts of interest

There are no conflicts of interest.

   References Top

Billis A, Guimaraes MS, Freitas LL, Meirelles L, Magna LA, Ferreira U. The impact of the 2005 international society of urological pathology consensus conference on standard Gleason grading of prostatic carcinoma in needle biopsies. J Urol 2008;180:548-52.  Back to cited text no. 1
The Royal College of Pathologists. Standards and Minimum datasets for Reporting Common Cancers. Minimum Dataset for Prostate Cancer Histopathology Reports. London: The Royal College of Pathologists; 2000.  Back to cited text no. 2
Allsbrook WC, Mangold KA, Johnson MH, Lane RB, Lane CG, Epstein JI. Interobserver reproducibility of Gleason grading of prostatic carcinoma: General pathologists. Hum Pathol 2001;32:81-8.  Back to cited text no. 3
Allsbrook WC Jr, Mangold KA, Johnson MH, Lane RB, Lane CG, Amin MB, et al. Interobserver reproducibility of Gleason grading of prostatic carcinoma: Urologic pathologists. Hum Pathol 2001;32:74-80.  Back to cited text no. 4
Epstein JI, Egevad L, Amin MB, Delahunt B, Srigley JR, Humphrey PA; Grading Committee. The 2014 International Society of Urological Pathology (ISUP) Consensus conference on Gleason grading of prostatic carcinoma: Definition of grading patterns and proposal for a new grading system. Am J Surg Pathol 2016;40:244-52.  Back to cited text no. 5
Fleiss JL. Measuring nominal scale agreement among many raters. Psychol Bull 1971;76:378-82.  Back to cited text no. 6
Rodriguez-Urrego PA, Cronin AM, Al-Ahmadie HA, Gopalan A, Tickoo SK, Reuter VE, et al. Interobserver and intraobserver reproducibility in digital and routine microscopic assessment of prostate needle biopsies. Hum Pathol 2011;42:68-74.  Back to cited text no. 7
Egevad L, Ahmad AS, Algaba F, Berney DM, Boccon-Gibod L, Compérat E, et al. Standardization of Gleason grading among 337 European pathologists. Histopathology 2013;62:247-56.  Back to cited text no. 8
Rubin MA, Dunn R, Kambham PA, Misick CP, O'Toole KM. Should a Gleason score be assigned to a minute focus of carcinoma on prostate biopsy? Am J Surg Pathol 2000;24:1634-40.  Back to cited text no. 9
Steinberg DM, Sauvageot J, Piantadosi S, Epstein JI. Correlation of prostate needle biopsy and radical prostatectomy Gleason grade in academic and community setting. Am J Surg Pathol 1997;21:566-76.  Back to cited text no. 10
Abdollahi A, Sheikhbahaei S, Meysamie A, Bakhshandeh M, Hosseinzadeh H. Inter-observer reproducibility before and after web-based education in the Gleason grading of the prostate adenocarcinoma among the Iranian pathologists. Acta Med Iran 2014;52:370-4.  Back to cited text no. 11
Griffiths DF, Melia J, McWilliam LJ, Ball RY, Grigor K, Harnden P, et al. A study of Gleason score interpretation in different groups of UK pathologists; techniques for improving reproducibility. Histopathology 2006;48:655-62.  Back to cited text no. 12
Ozkan TA, Eruyar AT, Cebeci OO, Memik O, Ozcan L, Kuskonmaz I. Interobserver variability in Gleason histological grading of prostate cancer. Scand J Urol 2016;50:420-4.  Back to cited text no. 13
Muezzinoglu B, Yorukoglu K. Current practice in handling and reporting prostate needle biopsies: Results of a Turkish survey. Pathol Res Practice 2015;211:374-80.  Back to cited text no. 14
Ozdamar SO, Sarikaya S, Yildiz L, Atilla MK, Kandemir B, Yildiz S. Intraobserver and interobserver reproducibility of WHO and Gleason histologic grading systems in prostatic adenocarcinomas. Int Urol Nephrol 1996;28:73-7.  Back to cited text no. 15
Lessels AM, Burnett RA, Howatson SR, Lang S, Lee FD, McLaren KM, et al. Observer variability in the histopathological reporting of needle biopsy specimens of the prostate. Hum Pathol 1997;28:646-9.  Back to cited text no. 16
Melia J, Moseley R, Ball RY, Griffiths DFR, Grigor K, Harnden P, et al. A UK-based investigation of inter- and intra-observer reproducibility of Gleason grading of prostatic biopsies. Histopathology 2006;48:644-54.  Back to cited text no. 17
Oyama T, Allsbrook WC Jr, Kurokawa K, Matsuda H, Segawa A, Sano T, et al. A comparison of interobserver reproducibility of Gleason grading of prostatic carcinoma in Japan and the United States. Arch Pathol Lab Med 2005;129:1004-10.  Back to cited text no. 18
Veloso SG, Lima MF, Salles PG, Berenstein CK, Scalon JD, Bambirra EA. Interobserver agreement of Gleason score and modified Gleason score in needle biopsy and in surgical specimen of prostate cancer. Int Braz J Urol 2007;33:639-51.  Back to cited text no. 19
Bori R, Salamon F, Móczár C, Cserni G.Interobserver reproducibility of Gleason grading in prostate biopsy samples. Orv Hetil 2013;154:1219-25.  Back to cited text no. 20

Correspondence Address:
Yelda Dere
Department of Pathology, Mugla Sitki Kocman University, Faculty of Medicine, Mugla
Login to access the Email id

Source of Support: None, Conflict of Interest: None

DOI: 10.4103/IJPM.IJPM_288_18

Rights and Permissions


  [Figure 1], [Figure 2]

  [Table 1], [Table 2]

This article has been cited by
1 Diagnostic significance of reassessment of prostate biopsy specimens by experienced urological pathologists at a high-volume institution
Yoichiro Okubo, Yayoi Yamamoto, Shinya Sato, Emi Yoshioka, Masaki Suzuki, Kota Washimi, Kimito Osaka, Takahisa Suzuki, Tomoyuki Yokose, Takeshi Kishida, Yohei Miyagi
Virchows Archiv. 2022;
[Pubmed] | [DOI]
2 The Classification Power of Classical and Intra-voxel Incoherent Motion (IVIM) Fitting Models of Diffusion-weighted Magnetic Resonance Images: An Experimental Study
Ruba Alkadi, Osama Abdullah, Naoufel Werghi
Journal of Digital Imaging. 2022;
[Pubmed] | [DOI]
3 From Editor's desk
Ranjan Agrawal
Indian Journal of Pathology and Microbiology. 2020; 63(5): 1
[Pubmed] | [DOI]


    Similar in PUBMED
   Search Pubmed for
   Search in Google Scholar for
 Related articles
    Email Alert *
    Add to My List *
* Registration required (free)  

    Article Figures
    Article Tables

 Article Access Statistics
    PDF Downloaded175    
    Comments [Add]    
    Cited by others 3    

Recommend this journal