- Co-occurrence computation
We use Liepzig tools, TinyCC2 for computing co-occurrence. This tool gives log-likelihood ratio for significant neighbour and sentence collocation.
- Platform: Linux (x86)
- Location: http://wortschatz.uni-leipzig.de/~cbiemann/software/TinyCC2.htm
- Input: Text file, Mark-up text (xml, html)
We use open source clustering tool for hierarchical clustering. This tool support hierchical, k-means and SOM bsed clustering.
We made some format convertor for co-occurrence tool and clustering tools
- Platform: Linux (x86),
- Required: php (http://www.php.net)
- Location:
- http://csace.kaist.ac.kr/~cwseo/gen_matrix.tar.gz
- make director "matrix" and extract to there
- http://csace.kaist.ac.kr/~cwseo/tinyCC2.tar.gz
- extract and copy to tinyCC2 directory
- generating collocation vector
- generating matrix from co-occurrence result
- Cluster Extration
- Computing Semantic Relatednes
football basketball convolution cable_television coaxial_cable convolution cable_television ruby_programming_language php cricket football basketball xhtml xml tiff gif system operating_system cybertron galvatron ruby_programming_language php tcl perl java_#programming_languagBall_games SYNSET{SID-2752393-n#:#Words[W-2752393-n-1-ball]} 2.772588722239781 3.4011973816621555 football basketball Communication SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]} 0.6931471805599453 0.8362480242006186 convolution cable_television Communication SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]} 0.7801585575495751 0.8109302162163287 coaxial_cable convolution cable_television Culture NULL 0.8266785731844679 -1.0 ruby_programming_language php Ball_games SYNSET{SID-462746-n#:#Words[W-462746-n-1-field_game]} 3.1780538303479458 0.3409265869705933 cricket football basketball Human_communication NULL 0.8266785731844679 -1.0 xhtml xml
