Monday, August 27, 2007

Software Downloads

  • Co-occurrence computation
We use Liepzig tools, TinyCC2 for computing co-occurrence. This tool gives log-likelihood ratio for significant neighbour and sentence collocation.
  1. Platform: Linux (x86)
  2. Location: http://wortschatz.uni-leipzig.de/~cbiemann/software/TinyCC2.htm
  3. Input: Text file, Mark-up text (xml, html)
  • Clustering tool
We use open source clustering tool for hierarchical clustering. This tool support hierchical, k-means and SOM bsed clustering.
We made some format convertor for co-occurrence tool and clustering tools
  1. Platform: Linux (x86),
  2. Required: php (http://www.php.net)
  3. Location:
    1. http://csace.kaist.ac.kr/~cwseo/gen_matrix.tar.gz
      • make director "matrix" and extract to there
    2. http://csace.kaist.ac.kr/~cwseo/tinyCC2.tar.gz
      • extract and copy to tinyCC2 directory
  • generating collocation vector
    • extCoc_s.sh
  • generating matrix from co-occurrence result
    • gen_matrix.sh
  • Cluster Extration
    • cnvFormatx.php
  • Computing Semantic Relatednes
football basketball convolution cable_television coaxial_cable convolution cable_television ruby_programming_language php cricket football basketball xhtml xml tiff gif system operating_system cybertron galvatron ruby_programming_language php tcl perl java_#programming_languag
    • Output
Ball_games SYNSET{SID-2752393-n#:#Words[W-2752393-n-1-ball]} 2.772588722239781 3.4011973816621555 football basketball Communication SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]} 0.6931471805599453 0.8362480242006186 convolution cable_television Communication SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]} 0.7801585575495751 0.8109302162163287 coaxial_cable convolution cable_television Culture NULL 0.8266785731844679 -1.0 ruby_programming_language php Ball_games SYNSET{SID-462746-n#:#Words[W-462746-n-1-field_game]} 3.1780538303479458 0.3409265869705933 cricket football basketball Human_communication NULL 0.8266785731844679 -1.0 xhtml xml