<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1865487031546459436</id><updated>2011-07-07T18:13:22.732-07:00</updated><category term='Downloads'/><category term='Co-occurrence'/><category term='Semantic Relatedness'/><category term='Tool'/><category term='Clustering'/><category term='Ontology Building'/><title type='text'>ChungwonSeo</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://chungwon.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://chungwon.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Chungwon Seo</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>3</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1865487031546459436.post-1339185778583679599</id><published>2007-08-27T22:54:00.001-07:00</published><updated>2007-08-27T22:59:31.027-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Downloads'/><category scheme='http://www.blogger.com/atom/ns#' term='Tool'/><title type='text'>Software Downloads</title><content type='html'>&lt;ul style="font-weight: bold;"&gt;&lt;li&gt;Co-occurrence computation&lt;/li&gt;&lt;/ul&gt;We use Liepzig tools, TinyCC2 for computing co-occurrence. This tool gives log-likelihood ratio for significant neighbour and sentence collocation.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Platform: Linux (x86)&lt;/li&gt;&lt;li&gt;Location: &lt;a onclick="return top.js.OpenExtLink(window,event,this)" href="http://wortschatz.uni-leipzig.de/%7Ecbiemann/software/TinyCC2.html" target="_blank"&gt;http://wortschatz.uni-leipzig&lt;wbr&gt;.de/~cbiemann/software/&lt;span id="st" name="st" class="st"&gt;TinyCC2&lt;/span&gt;&lt;wbr&gt;.htm&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Input: Text file, Mark-up text (xml, html)&lt;/li&gt;&lt;/ol&gt;&lt;ul style="font-weight: bold;"&gt;&lt;li&gt;Clustering tool&lt;/li&gt;&lt;/ul&gt;We use open source clustering tool for hierarchical clustering. This tool support hierchical, k-means and SOM bsed clustering.&lt;br /&gt;&lt;ul&gt;&lt;ol&gt;&lt;li&gt;Platform: Windows/Linux/MacOS&lt;/li&gt;&lt;li&gt;Location: &lt;a href="http://bonsai.ims.u-tokyo.ac.jp/%7Emdehoon/software/cluster/"&gt;http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Input: Feature matrix&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Utils&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;We made some format convertor for co-occurrence tool and clustering tools&lt;ol&gt;&lt;li&gt;Platform: Linux (x86),&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Required: php (&lt;a href="http://www.php.net/"&gt;http://www.php.net)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Location:&lt;br /&gt;&lt;/li&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://csace.kaist.ac.kr/%7Ecwseo/gen_matrix.tar.gz"&gt;http://csace.kaist.ac.kr/~cwseo/gen_matrix.tar.gz&lt;/a&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;make director "matrix" and extract to there&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;a href="http://csace.kaist.ac.kr/%7Ecwseo/tinyCC2.tar.gz"&gt;http://csace.kaist.ac.kr/~cwseo/tinyCC2.tar.gz&lt;/a&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;extract and copy to tinyCC2 directory&lt;/li&gt;&lt;/ul&gt;&lt;/ol&gt;&lt;/ol&gt;&lt;ul style="margin-left: 40px;"&gt;&lt;li&gt;generating collocation vector&lt;/li&gt;&lt;ul&gt;&lt;li&gt;extCoc_s.sh&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;generating matrix from co-occurrence result&lt;/li&gt;&lt;ul&gt;&lt;li&gt;gen_matrix.sh&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Cluster Extration&lt;/li&gt;&lt;ul&gt;&lt;li&gt;cnvFormatx.php&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;ul style="font-weight: bold;"&gt;&lt;li&gt;Computing Semantic Relatednes&lt;/li&gt;&lt;/ul&gt;&lt;ul style="margin-left: 40px;"&gt;&lt;ul&gt;&lt;li&gt;&lt;img src="file:///C:/DOCUME%7E1/mr/LOCALS%7E1/Temp/moz-screenshot.jpg" alt="" /&gt;Platform: Independent&lt;/li&gt;&lt;li&gt;Required: &lt;a href="http://wordnet.princeton.edu/"&gt;WordNet (&gt;=2.1)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Location: &lt;a href="http://csace.kaist.ac.kr/%7Ecwseo/WNSearch.zip"&gt;http://csace.kaist.ac.kr/~cwseo/WNSearch.zip&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Input: word vector&lt;/li&gt;&lt;ul&gt;&lt;li&gt;ex)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;span style="font-style: italic;"&gt;football basketball&lt;/span&gt; &lt;span style="font-style: italic;"&gt;convolution cable_television&lt;/span&gt; &lt;span style="font-style: italic;"&gt;coaxial_cable convolution cable_television&lt;/span&gt; &lt;span style="font-style: italic;"&gt;ruby_programming_language php&lt;/span&gt; &lt;span style="font-style: italic;"&gt;cricket football basketball&lt;/span&gt; &lt;span style="font-style: italic;"&gt;xhtml xml&lt;/span&gt; &lt;span style="font-style: italic;"&gt;tiff gif&lt;/span&gt; &lt;span style="font-style: italic;"&gt;system operating_system&lt;/span&gt; &lt;span style="font-style: italic;"&gt;cybertron galvatron&lt;/span&gt; &lt;span style="font-style: italic;"&gt;ruby_&lt;/span&gt;&lt;span style="font-style: italic;"&gt;programming_language php tcl&lt;/span&gt; &lt;span style="font-style: italic;"&gt;perl java_#programming_languag&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;Output&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Ball_games    SYNSET{SID-2752393-n#:#&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Words[W-2752393-n-1-ball]}    2.772588722239781    3.4011973816621555&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;   football basketball&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Communication    SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]}    0.6931471805599453    0.8362480242006186  &lt;/span&gt;&lt;span style="font-style: italic;"&gt;  convolution cable_television&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Communication    SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]}    0.7801585575495751    0.8109302162163287&lt;/span&gt;&lt;span style="font-style: italic;"&gt;    coaxial_cable convolution cable_television&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Culture    NULL    0.8266785731844679    -1.0  &lt;/span&gt;&lt;span style="font-style: italic;"&gt;  ruby_programming_language php&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Ball_games    SYNSET{SID-462746-n#:#Words[W-462746-n-1-field_game]}    3.1780538303479458    0.3409265869705933&lt;/span&gt;&lt;span style="font-style: italic;"&gt;    cricket football basketball&lt;/span&gt; &lt;span style="font-style: italic; font-weight: bold;"&gt;H&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;uman_communication    NULL    0.82&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;66785731844679    -1.0&lt;/span&gt;&lt;span style="font-style: italic;"&gt;    xhtml xml&lt;/span&gt;&lt;img src="file:///C:/DOCUME%7E1/mr/LOCALS%7E1/Temp/moz-screenshot-1.jpg" alt="" /&gt;&lt;img src="file:///C:/DOCUME%7E1/mr/LOCALS%7E1/Temp/moz-screenshot-2.jpg" alt="" /&gt;&lt;img src="file:///C:/DOCUME%7E1/mr/LOCALS%7E1/Temp/moz-screenshot-3.jpg" alt="" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1865487031546459436-1339185778583679599?l=chungwon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://chungwon.blogspot.com/feeds/1339185778583679599/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1865487031546459436&amp;postID=1339185778583679599' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default/1339185778583679599'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default/1339185778583679599'/><link rel='alternate' type='text/html' href='http://chungwon.blogspot.com/2007/08/software-downloads.html' title='Software Downloads'/><author><name>Chungwon Seo</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1865487031546459436.post-1576466814624642453</id><published>2007-08-02T05:17:00.000-07:00</published><updated>2008-12-09T19:21:25.133-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Semantic Relatedness'/><category scheme='http://www.blogger.com/atom/ns#' term='Ontology Building'/><title type='text'>Computing Semantic Relatedness</title><content type='html'>&lt;ul&gt;&lt;li&gt;Overview&lt;/li&gt;&lt;/ul&gt;We can extract clusters from nodes of hierachical clustering results. For conceptualization, we need to find clusters that consits of similar words that can be a class of ontology.&lt;br /&gt;We extract it by computing semantic relatedness. The semantic relatedness of cluster is obtained by measuring distance between terms and lowest common subsume (lcs).&lt;br /&gt;We can use any kind of taxonomy for computing semantic relatedness.&lt;br /&gt;In this case, we use WordNet hierarchy and Wikipedia category hierarchy for computing Semantic Relatedness.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Computing Semantic Relatednes&lt;/li&gt;&lt;/ul&gt;&lt;ol&gt;&lt;li&gt;&lt;img src="file:///C:/DOCUME%7E1/mr/LOCALS%7E1/Temp/moz-screenshot.jpg" alt="" /&gt;Platform: Independent&lt;/li&gt;&lt;li&gt;Required: &lt;a href="http://wordnet.princeton.edu/"&gt;WordNet (&gt;=2.1)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Location: &lt;a href="http://csace.kaist.ac.kr/%7Ecwseo/WNSearch.zip"&gt;http://csace.kaist.ac.kr/~cwseo/WNSearch.zip&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Input: word vector&lt;/li&gt;&lt;/ol&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;ex)&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;span style="font-style: italic;"&gt;football basketball&lt;/span&gt; &lt;span style="font-style: italic;"&gt;convolution cable_television&lt;/span&gt; &lt;span style="font-style: italic;"&gt;coaxial_cable convolution cable_television&lt;/span&gt; &lt;span style="font-style: italic;"&gt;ruby_programming_language php&lt;/span&gt; &lt;span style="font-style: italic;"&gt;cricket football basketball&lt;/span&gt; &lt;span style="font-style: italic;"&gt;xhtml xml&lt;/span&gt; &lt;span style="font-style: italic;"&gt;tiff gif&lt;/span&gt; &lt;span style="font-style: italic;"&gt;system operating_system&lt;/span&gt; &lt;span style="font-style: italic;"&gt;cybertron galvatron&lt;/span&gt; &lt;span style="font-style: italic;"&gt;ruby_&lt;/span&gt;&lt;span style="font-style: italic;"&gt;programming_language php tcl&lt;/span&gt; &lt;span style="font-style: italic;"&gt;perl java_#programming_languag&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;Output&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Ball_games    SYNSET{SID-2752393-n#:#&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Words[W-2752393-n-1-ball]}    2.772588722239781    3.4011973816621555&lt;/span&gt;&lt;span style="font-style: italic;"&gt;&lt;span style="font-weight: bold;"&gt; &lt;/span&gt;   football basketball&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Communication    SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]}    0.6931471805599453    0.8362480242006186  &lt;/span&gt;&lt;span style="font-style: italic;"&gt;  convolution cable_television&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Communication    SYNSET{SID-1930-n#:#Words[W-1930-n-1-physical_entity]}    0.7801585575495751    0.8109302162163287&lt;/span&gt;&lt;span style="font-style: italic;"&gt;    coaxial_cable convolution cable_television&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Culture    NULL    0.8266785731844679    -1.0  &lt;/span&gt;&lt;span style="font-style: italic;"&gt;  ruby_programming_language php&lt;/span&gt; &lt;span style="font-weight: bold;"&gt;Ball_games    SYNSET{SID-462746-n#:#Words[W-462746-n-1-field_game]}    3.1780538303479458    0.3409265869705933&lt;/span&gt;&lt;span style="font-style: italic;"&gt;    cricket football basketball&lt;/span&gt; &lt;span style="font-style: italic; font-weight: bold;"&gt;H&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;uman_communication    NULL    0.82&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;66785731844679    -1.0&lt;/span&gt;&lt;span style="font-style: italic;"&gt;    xhtml xml&lt;/span&gt;&lt;br /&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://3.bp.blogspot.com/_IvN9Z8ZSA2I/RrHPy7p0q9I/AAAAAAAAAAk/xpoyAUcsJCQ/s1600-h/res_tab.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://3.bp.blogspot.com/_IvN9Z8ZSA2I/RrHPy7p0q9I/AAAAAAAAAAk/xpoyAUcsJCQ/s320/res_tab.JPG" alt="" id="BLOGGER_PHOTO_ID_5094081127446260690" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Usage&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-weight: bold;"&gt;Command&lt;/span&gt;: Computing.bat "input_file" "output_file"&lt;br /&gt;&lt;br /&gt;ex)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Computing.bat wiki5000_cluster.txt wiki5000_res.txt&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1865487031546459436-1576466814624642453?l=chungwon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://chungwon.blogspot.com/feeds/1576466814624642453/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1865487031546459436&amp;postID=1576466814624642453' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default/1576466814624642453'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default/1576466814624642453'/><link rel='alternate' type='text/html' href='http://chungwon.blogspot.com/2007/08/computing-semantic-relatedness.html' title='Computing Semantic Relatedness'/><author><name>Chungwon Seo</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_IvN9Z8ZSA2I/RrHPy7p0q9I/AAAAAAAAAAk/xpoyAUcsJCQ/s72-c/res_tab.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1865487031546459436.post-1147089063402694155</id><published>2007-08-02T04:25:00.000-07:00</published><updated>2008-12-09T19:21:25.369-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Clustering'/><category scheme='http://www.blogger.com/atom/ns#' term='Co-occurrence'/><category scheme='http://www.blogger.com/atom/ns#' term='Ontology Building'/><title type='text'>Term Clustering for Domain Ontology Building</title><content type='html'>&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Overview&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;For building ontology from text, we need to find terms and conceptualize them as classes of ontology. The first step of conceptualization is finding synonyms and clustering of terms into clusters that have similar meaning and can be defined by same properties.&lt;br /&gt;For example, the set of terms {“hard_disk, floppy_disk, cd-rom, linux, unix, bsd,  unix-like operating_systems”} can be partitioned into two concepts. {“hard_disk, floppy_disk, cd-rom”} is classified as a disc device and {“linux, unix, bsd, unix-like operating_systems”} is classified as an operating system.&lt;br /&gt;We use paradigmatic relations to get synonym set. The result of hierarchical clustering with synonym sets gives candidates of concepts. We use 1st order and 2nd order collocation to extract pragmatic relation. Cluster that consists of similar words can be a class of ontology. We extract it by computing semantic relatedness. The semantic relatedness of cluster is obtained by measuring distance between terms and lowest common subsume (lcs) .&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://1.bp.blogspot.com/_IvN9Z8ZSA2I/RrHaRbp0q-I/AAAAAAAAAAs/WBGji7GfCwg/s1600-h/SysteArchitecture.JPG"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer; width: 291px; height: 228px;" src="http://1.bp.blogspot.com/_IvN9Z8ZSA2I/RrHaRbp0q-I/AAAAAAAAAAs/WBGji7GfCwg/s320/SysteArchitecture.JPG" alt="" id="BLOGGER_PHOTO_ID_5094092646548548578" border="0" /&gt;&lt;/a&gt;&lt;ul style="font-weight: bold;"&gt;&lt;li&gt;Demo&lt;/li&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://cseight.kaist.ac.kr:8080/TermCluster"&gt;http://cseight.kaist.ac.kr:8080/TermCluster&lt;/a&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Co-occurrence computation&lt;/li&gt;&lt;/ul&gt;We use Liepzig tools, TinyCC2 for computing co-occurrence. This tool gives log-likelihood ratio for significant neighbour and sentence collocation.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Platform: Linux (x86)&lt;/li&gt;&lt;li&gt;Location: &lt;a onclick="return top.js.OpenExtLink(window,event,this)" href="http://wortschatz.uni-leipzig.de/%7Ecbiemann/software/TinyCC2.html" target="_blank"&gt;http://wortschatz.uni-leipzig&lt;wbr&gt;.de/~cbiemann/software/&lt;span id="st" name="st" class="st"&gt;TinyCC2&lt;/span&gt;&lt;wbr&gt;.htm&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Input: Text file, Mark-up text (xml, html)&lt;/li&gt;&lt;/ol&gt;&lt;ul style="font-weight: bold;"&gt;&lt;li&gt;Clustering tool&lt;/li&gt;&lt;/ul&gt;We use open source clustering tool for hierarchical clustering. This tool support hierchical, k-means and SOM bsed clustering.&lt;br /&gt;&lt;ul&gt;                          &lt;ol&gt;&lt;li&gt;Platform: Windows/Linux/MacOS&lt;/li&gt;&lt;li&gt;Location: &lt;a href="http://bonsai.ims.u-tokyo.ac.jp/%7Emdehoon/software/cluster/"&gt;http://bonsai.ims.u-tokyo.ac.jp/~mdehoon/software/cluster/&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Input: Feature matrix&lt;/li&gt;&lt;/ol&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Utils&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;We made some format convertor for co-occurrence tool and clustering tools&lt;ol&gt;&lt;li&gt;Platform: Linux (x86),&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Required: php (&lt;a href="http://www.php.net/"&gt;http://www.php.net)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;Location:&lt;br /&gt;&lt;/li&gt;&lt;ol&gt;&lt;li&gt;&lt;a href="http://csace.kaist.ac.kr/%7Ecwseo/gen_matrix.tar.gz"&gt;http://csace.kaist.ac.kr/~cwseo/gen_matrix.tar.gz&lt;/a&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;make director "matrix" and extract to there&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;a href="http://csace.kaist.ac.kr/%7Ecwseo/tinyCC2.tar.gz"&gt;http://csace.kaist.ac.kr/~cwseo/tinyCC2.tar.gz&lt;/a&gt;&lt;/li&gt;&lt;ul&gt;&lt;li&gt;extract and copy to tinyCC2 directory&lt;/li&gt;&lt;/ul&gt;&lt;/ol&gt;&lt;/ol&gt;&lt;ul style="margin-left: 40px;"&gt;&lt;li&gt;generating collocation vector&lt;/li&gt;&lt;ul&gt;&lt;li&gt;extCoc_s.sh&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;generating matrix from co-occurrence result&lt;/li&gt;&lt;ul&gt;&lt;li&gt;gen_matrix.sh&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Cluster Extration&lt;/li&gt;&lt;ul&gt;&lt;li&gt;cnvFormatx.php&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Usage&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;ul style="font-weight: bold;"&gt;&lt;ul&gt;&lt;li&gt;Computing co-occurrence&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;span style="font-style: italic;"&gt;&lt;directory&gt;/tinyCC2/&lt;/directory&gt;&lt;/span&gt;&lt;span style="font-style: italic;" id="st" name="st" class="st"&gt;tinyCC&lt;/span&gt;&lt;span style="font-style: italic;"&gt;.sh &lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Command: &lt;/span&gt;sh &lt;span id="st" name="st" class="st"&gt;tinyCC&lt;/span&gt;.sh "prefix" "datadir" none&lt;br /&gt;&lt;br /&gt;Ex) Input files in ~cwseo/tinyCC2/wikiCS2/*.txt&lt;br /&gt;cd ~cwseo/tinyCC2&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;tincyCC.sh  wikiCS wikiCS2/ none&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;extCoc_s.sh wikiCS 50&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;After execution, we can find coc_"prefix"_"threshold" directory and  context vector files in there . In tinyCC2 directory, "prefix"_cos.src is generated (result of  extCoc_s.sh).&lt;br /&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;2nd order collocation&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;Excute tinyCC.sh again for coc_"prefix"_"threshold" directory.&lt;br /&gt;&lt;br /&gt;ex)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;&lt;span id="st" name="st" class="st"&gt;tinyCC&lt;/span&gt;.sh cocWikiCS coc_wikiCS2_50 none &lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;extCoc_s.sh cocWikiCS 20&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Result: &lt;span style="font-style: italic;"&gt;cocWikiCS_cos.src&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;Hierarchical Clustering&lt;/span&gt;&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;span style="font-style: italic;"&gt;&lt;directory&gt;/matrix &lt;/directory&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Make new directory under "matrix" and copy "wikiCS_cos.src" to "freq_src.txt", and "cocWikiCS_cos.src" to "list.txt".&lt;br /&gt;ex)&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;cs ~cwseo/tinyCC2/matrix&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;mkdir 07WikiCS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; cp ../wikiCS_cos.src ./07WikiCS/freq_src.txt&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;cp ../cocWikiCS_cos.src ./07WikiCS/list.txt&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt; sh gen_matrix.sh 07WikiCS&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Result) result/07WikiCS.newick , result/07WikiCS.sif, result/07WikiCS.graphml&lt;br /&gt;*.newick (for TreeQVista)&lt;br /&gt;*.sif (for cytoscape)&lt;br /&gt;*.graphml (for yEd)&lt;br /&gt;&lt;br /&gt;TreeQVista: &lt;a href="http://genome.lbl.gov/vista/TreeQVista/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"&gt; http://genome.lbl.gov/vista&lt;wbr&gt;/TreeQVista/&lt;/a&gt;&lt;br /&gt;Cytoscape: &lt;a href="http://www.cytoscape.org/" target="_blank" onclick="return top.js.OpenExtLink(window,event,this)"&gt;http://www.cytoscape.org/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1865487031546459436-1147089063402694155?l=chungwon.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://chungwon.blogspot.com/feeds/1147089063402694155/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1865487031546459436&amp;postID=1147089063402694155' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default/1147089063402694155'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1865487031546459436/posts/default/1147089063402694155'/><link rel='alternate' type='text/html' href='http://chungwon.blogspot.com/2007/08/term-clustering-for-domain-ontology_02.html' title='Term Clustering for Domain Ontology Building'/><author><name>Chungwon Seo</name><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_IvN9Z8ZSA2I/RrHaRbp0q-I/AAAAAAAAAAs/WBGji7GfCwg/s72-c/SysteArchitecture.JPG' height='72' width='72'/><thr:total>0</thr:total></entry></feed>
