ISSN 2394-5125
 

Review Article 


AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA

JEBAMALAI ROBINSON1*, V. SARAVANAN.

Abstract
In research problems associated with text mining and classification, many factors have to be considered as on what basis the classification needs to be done. These factor variables are termed as features. The hardness of the visualization of training data is directly based on the number of features. Most of the times, the features are found to have high correlation and redundant. Dimensionality reduction helps to reduce the number of these features under the task by accumulating a group of principle variables. In the previous work an automated feature extraction technique using the weighted TF-IDF was proposed. Although the proposed method performed well, there was a drawback that some of the features generated are correlated to each other which resulted in high dimensionality resulting in more time complexity and memory usage. This paper proposes an Automatic text summarization method using the weighted TF-IDF model and K-means clustering for reducing the dimensionality of the extracted features. The various similarity measures are utilized in order to identify the similarity between the sentences of the document and then they are grouped in cluster on the basis of their term frequency and inverse document frequency (tf-idf) values of the words. The experiments were carried out on the student text data from the US educational data hub and the results were compared with other dimensionality reduction methods in terms of co-selection, content based, weight based and term significance parameters. The proposed method found to be efficient in terms of memory usage and time complexity.

Key words: Text Mining, Classification, Dimension Reduction, Text Summarization, Weighted TF-IDF and K-Means Clustering .


 
ARTICLE TOOLS
Abstract
PDF Fulltext
How to cite this articleHow to cite this article
Citation Tools
Related Records
 Articles by JEBAMALAI ROBINSON1*
Articles by V. SARAVANAN
on Google
on Google Scholar
Article Statistics
 Viewed: 238
Downloaded: 4


How to Cite this Article
Pubmed Style

ROBINSON J, SARAVANAN V. AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. JCR. 2020; 7(1): 135-140. doi:10.22159/jcr.07.01.24


Web Style

ROBINSON J, SARAVANAN V. AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. http://www.jcreview.com/?mno=302645209 [Access: February 21, 2020]. doi:10.22159/jcr.07.01.24


AMA (American Medical Association) Style

ROBINSON J, SARAVANAN V. AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. JCR. 2020; 7(1): 135-140. doi:10.22159/jcr.07.01.24



Vancouver/ICMJE Style

ROBINSON J, SARAVANAN V. AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. JCR. (2020), [cited February 21, 2020]; 7(1): 135-140. doi:10.22159/jcr.07.01.24



Harvard Style

ROBINSON, J. & SARAVANAN, . V. (2020) AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. JCR, 7 (1), 135-140. doi:10.22159/jcr.07.01.24



Turabian Style

ROBINSON, JEBAMALAI, and V. SARAVANAN. 2020. AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. Journal of Critical Reviews, 7 (1), 135-140. doi:10.22159/jcr.07.01.24



Chicago Style

ROBINSON, JEBAMALAI, and V. SARAVANAN. "AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA." Journal of Critical Reviews 7 (2020), 135-140. doi:10.22159/jcr.07.01.24



MLA (The Modern Language Association) Style

ROBINSON, JEBAMALAI, and V. SARAVANAN. "AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA." Journal of Critical Reviews 7.1 (2020), 135-140. Print. doi:10.22159/jcr.07.01.24



APA (American Psychological Association) Style

ROBINSON, J. & SARAVANAN, . V. (2020) AN EXTRACTIVE BASED MULTI-DOCUMENT SUMMARIZATION USING WEIGHTED TF-IDF AND CENTROID BASED K-MEANS CLUSTERING (TF-IDF: CBC) FOR LARGE TEXT DATA. Journal of Critical Reviews, 7 (1), 135-140. doi:10.22159/jcr.07.01.24





Most Viewed Articles
  • ANALYTICAL RESULTS OF SLOPE FAILURE AND EFFECTIVE USE OF FLYCAM DATA: A CASE STUDY FROM KM 11 TO KM 13 ON THE 3B HIGHWAY, BACKAN PROVINCE OF VIETNAM
    VIETHA NGUYEN, HONGTHINH PHI, TRUONGTHANH PHI*
    JCR. 2020; 7(1): 1-5
    » Abstract » doi: 10.22159/jcr.07.01.01

  • ZOOTHERAPY AMONG THE ETHNIC GROUPS OF NORTH EASTERN REGION OF INDIA-A CRITICAL REVIEW
    KHIROD SANKAR DAS, SUDIPTA CHOUDHURY, K. CHANREILA L. NONGLAIT
    JCR. 2017; 4(2): 1-9
    » Abstract » doi: 10.22159/jcr.2017v4i2.14698

  • CADMIUM NANOPARTICLES AND ITS TOXICITY
    RAJNISH GUPTA
    JCR. 2019; 6(5): 1-7
    » Abstract » doi: 10.22159/jcr.2019v6i5.34073

  • ADVANCES OF HYDRAZONE LINKER IN POLYMERIC DRUG DELIVERY
    SHIVSHANKAR R. MANE
    JCR. 2019; 6(2): 1-4
    » Abstract » doi: 10.22159/jcr.2019v6i2.31833

  • RECENT THERAPEUTIC PROGRESS OF CHALCONE SCAFFOLD BEARING COMPOUNDS AS PROSPECTIVE ANTI-GOUT CANDIDATES
    DEBARSHI KAR MAHAPATRA, VIVEK ASATI, SANJAY KUMAR BHARTI
    JCR. 2019; 6(1): 1-5
    » Abstract » doi: 10.22159/jcr.2019v6i1.31760

  • HYDROGEL: AN UPDATED PRIMER
    LALITA DEVI, PUNAM GABA
    JCR. 2019; 6(4): 1-10
    » Abstract » doi: 10.22159/jcr.2019v6i4.33266

  • A REVIEW ON PPROM (PRETERM PRELABOUR RUPTURE OF MEMBRANES) AND EARLY ONSET NEONATAL SEPSIS AND ROLE OF INFLAMMATORY MARKERS IN DIAGNOSIS OF MATERNAL AND NEONATAL INFECTION
    MANASVI BOMMAREDDY, SHRIPAD HEBBAR
    JCR. 2019; 6(3): 7-13
    » Abstract » doi: 10.22159/jcr.2019v6i3.31792

  • AN OVERVIEW ON MEDICINAL PLANTS FOR THE TREATMENT OF ACNE
    D. MANOGNA REDDY, VIKAS JAIN
    JCR. 2019; 6(6): 7-14
    » Abstract » doi: 10.22159/jcr.2019v6i6.35696

  • Review Article DOSAGE FORMS OF HERBAL MEDICINAL PRODUCTS AND THEIR STABILITY CONSIDERATIONS-AN OVERVIEW
    DORIS KUMADOH, KWABENA OFORI-KWAKYE
    JCR. 2017; 4(4): 1-8
    » Abstract » doi: 10.22159/jcr.2017v4i4.16077

  • QUANTITATIVE ANALYSIS OF BIOLOGICAL NITROGEN FIXATION IN VARIOUS MODELS OF LEGUMES AND THE FACTORS INFLUENCING THE PROCESS: A REVIEW
    SAMEER SHARMA, ANIKET MALAGE, SIBI G.
    JCR. 2019; 6(6): 24-28
    » Abstract » doi: 10.22159/jcr.2019v6i6.35637

  • Most Downloaded
  • ANALYTICAL RESULTS OF SLOPE FAILURE AND EFFECTIVE USE OF FLYCAM DATA: A CASE STUDY FROM KM 11 TO KM 13 ON THE 3B HIGHWAY, BACKAN PROVINCE OF VIETNAM
    VIETHA NGUYEN, HONGTHINH PHI, TRUONGTHANH PHI*
    JCR. 2020; 7(1): 1-5
    » Abstract » doi: 10.22159/jcr.07.01.01

  • CADMIUM NANOPARTICLES AND ITS TOXICITY
    RAJNISH GUPTA
    JCR. 2019; 6(5): 1-7
    » Abstract » doi: 10.22159/jcr.2019v6i5.34073

  • Multi Drug Resistance in Cancer Therapy-An Overview
    HARISH KADKOL, VIKAS JAIN, AMIT B PATIL*
    JCR. 2019; 6(6): 1-6
    » Abstract » doi: 10.22159/jcr.2019v6i6.35673

  • AN OVERVIEW ON MEDICINAL PLANTS FOR THE TREATMENT OF ACNE
    D. MANOGNA REDDY, VIKAS JAIN
    JCR. 2019; 6(6): 7-14
    » Abstract » doi: 10.22159/jcr.2019v6i6.35696

  • FACTORS INFLUENCING HEAVY METAL REMOVAL BY MICROALGAE-A REVIEW
    SIBI G.
    JCR. 2019; 6(6): 29-32
    » Abstract » doi: 10.22159/jcr.2019v6i6.35600

  • QUANTITATIVE ANALYSIS OF BIOLOGICAL NITROGEN FIXATION IN VARIOUS MODELS OF LEGUMES AND THE FACTORS INFLUENCING THE PROCESS: A REVIEW
    SAMEER SHARMA, ANIKET MALAGE, SIBI G.
    JCR. 2019; 6(6): 24-28
    » Abstract » doi: 10.22159/jcr.2019v6i6.35637

  • A SCOPING REVIEW OF THE UNMET NEEDS FOR PHYSIOTHERAPY SERVICES FOR THE PEDIATRIC POPULATION IN CANADA
    MELANIE LYONS, ANNE STOKES, JUSTIN PARKER, SANDRA HANNA, SARAH WOJKOWSKI
    JCR. 2019; 6(6): 15-23
    » Abstract » doi: 10.22159/jcr.2019v6i6.35257

  • HYDROGEL: AN UPDATED PRIMER
    LALITA DEVI, PUNAM GABA
    JCR. 2019; 6(4): 1-10
    » Abstract » doi: 10.22159/jcr.2019v6i4.33266

  • RECENT THERAPEUTIC PROGRESS OF CHALCONE SCAFFOLD BEARING COMPOUNDS AS PROSPECTIVE ANTI-GOUT CANDIDATES
    DEBARSHI KAR MAHAPATRA, VIVEK ASATI, SANJAY KUMAR BHARTI
    JCR. 2019; 6(1): 1-5
    » Abstract » doi: 10.22159/jcr.2019v6i1.31760

  • A REVIEW ON PPROM (PRETERM PRELABOUR RUPTURE OF MEMBRANES) AND EARLY ONSET NEONATAL SEPSIS AND ROLE OF INFLAMMATORY MARKERS IN DIAGNOSIS OF MATERNAL AND NEONATAL INFECTION
    MANASVI BOMMAREDDY, SHRIPAD HEBBAR
    JCR. 2019; 6(3): 7-13
    » Abstract » doi: 10.22159/jcr.2019v6i3.31792

  • Most Cited Articles
  • ZOOTHERAPY AMONG THE ETHNIC GROUPS OF NORTH EASTERN REGION OF INDIA-A CRITICAL REVIEW
    KHIROD SANKAR DAS, SUDIPTA CHOUDHURY, K. CHANREILA L. NONGLAIT
    JCR. 2017; 4(2): 1-9
    » Abstract » doi: 10.22159/jcr.2017v4i2.14698
    Cited : 1 time [Click to see citing article]