Thursday, August 2, 2007

Computing Semantic Relatedness

  • Overview
We can extract clusters from nodes of hierachical clustering results. For conceptualization, we need to find clusters that consits of similar words that can be a class of ontology.
We extract it by computing semantic relatedness. The semantic relatedness of cluster is obtained by measuring distance between terms and lowest common subsume (lcs).
We can use any kind of taxonomy for computing semantic relatedness.
In this case, we use WordNet hierarchy and Wikipedia category hierarchy for computing Semantic Relatedness.
  • Computing Semantic Relatednes
  1. Platform: Independent
  2. Required: WordNet (>=2.1)
  3. Location: http://csace.kaist.ac.kr/~cwseo/WNSearch.zip
  4. Input: word vector
    • ex)
football basketball convolution cable_television coaxial_cable convolution cable_television ruby_programming_language php cricket football basketball xhtml xml tiff gif system operating_system cybertron galvatron ruby_programming_language php tcl perl java_#programming_languag
    • Output
Ball_games SYNSET{SID-2752393-n#:#Words[W-2752393-n-1-ball]} 2.772588722239781 3.4011973816621555 football basketball Communication SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]} 0.6931471805599453 0.8362480242006186 convolution cable_television Communication SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]} 0.7801585575495751 0.8109302162163287 coaxial_cable convolution cable_television Culture NULL 0.8266785731844679 -1.0 ruby_programming_language php Ball_games SYNSET{SID-462746-n#:#Words[W-462746-n-1-field_game]} 3.1780538303479458 0.3409265869705933 cricket football basketball Human_communication NULL 0.8266785731844679 -1.0 xhtml xml

  • Usage
Command: Computing.bat "input_file" "output_file"

ex)
Computing.bat wiki5000_cluster.txt wiki5000_res.txt