<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-26532628</id><updated>2011-07-14T17:39:06.947-07:00</updated><title type='text'>German FrameNet</title><subtitle type='html'>This blog chronicles the set-up of German FrameNet (GFN) at UT Austin. It gives an up-to-date overview of the progress in setting up GFN.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Hans C. Boas</name><uri>http://www.blogger.com/profile/14429941800714309921</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>14</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-26532628.post-3385267994527225045</id><published>2007-05-18T06:23:00.000-07:00</published><updated>2007-05-18T07:28:49.316-07:00</updated><title type='text'>File structure &amp; To-do</title><content type='html'>The file structure of the German Framenet is currently organized as follows:&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/remote_client/heartofgold/hog  &lt;/span&gt;Contains the files of the Hear of Gold engine.  This includes the chunker and tokenizer.  It also includes some of the xsl transformation files such as toFrameNet.xsl&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/remote_client/german-client &lt;/span&gt;Includes client utils such as FnDesktop&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/german&lt;/span&gt;  German version of the FN database&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/english&lt;/span&gt;  English version of the FN database&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/fnSystem&lt;/span&gt; Complete FN database and JBoss that Jisup emailed to us&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/corpus&lt;/span&gt;  corpus files including:&lt;/li&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;bin&lt;/span&gt;  utils&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;cfg&lt;/span&gt;  some scripts and config files such as header/footers &amp; DTD files&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;cqp&lt;/span&gt;  CQP engine version 3.0&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;doc&lt;/span&gt;  some documentation drafts&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;lib&lt;/span&gt;  includes OpenSP SGML to XML comverter&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;other&lt;/span&gt;  tar files of the original corpora as they came on the CDs&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;raw&lt;/span&gt;  uncompressed corpora (AFP, APWS, DPA)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;tagger&lt;/span&gt;  IMS tree-tagger&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;txt&lt;/span&gt;  plain-text version of corpora (i.e., without tags)&lt;br /&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;xml&lt;/span&gt; XML version of the corpora&lt;/li&gt;&lt;/ul&gt;&lt;li&gt;&lt;span style="font-weight: bold;"&gt;current/sandbox&lt;/span&gt;  misc files&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;span style="font-size:100%;"&gt;Major To-do items:&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Incorporate the H.O.G. engine&lt;/li&gt;&lt;li&gt;Solve the FarinaImport.sh error&lt;/li&gt;&lt;li&gt;Create a few more scripts to plug-in different pipeline components&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-3385267994527225045?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/3385267994527225045/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=3385267994527225045' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/3385267994527225045'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/3385267994527225045'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2007/05/file-structure-to-do.html' title='File structure &amp; To-do'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-116944986286366203</id><published>2007-01-21T23:05:00.000-08:00</published><updated>2007-05-18T07:11:57.275-07:00</updated><title type='text'>Pipeline Version 2</title><content type='html'>&lt;span style="font-size:100%;"&gt;&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/x/blogger/7504/723/1600/392961/pipelineV2.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://photos1.blogger.com/x/blogger/7504/723/400/786508/pipelineV2.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:100%;"&gt;This diagram shows a preliminary version of the new conversion pipeline.  Compared to the previous version, we do not use Abney's parser as it was not available for German; instead we are using several tools from the &lt;a href="http://heartofgold.dfki.de/"&gt;Heart of Gold &lt;/a&gt;NLP suite developed at The German Research Center for Artificial Intelligence.&lt;br /&gt;&lt;br /&gt;The new tools used include:&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;Jtok - Tokenizer&lt;/li&gt;&lt;li&gt;Chunkie - Chunker&lt;/li&gt;&lt;li&gt;xsltproc - XSL Transformer&lt;/li&gt;&lt;li&gt;HOG engine - Hosts Jtok &amp;amp; Chunkie processes' RPC&lt;br /&gt;&lt;/li&gt;&lt;/ul&gt;&lt;span style="font-size:100%;"&gt;As shown in the diagram, there has been significant changes to the pipeline.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-116944986286366203?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/116944986286366203/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=116944986286366203' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/116944986286366203'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/116944986286366203'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2007/01/pipeline-version-2.html' title='Pipeline Version 2'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-116874663316971976</id><published>2007-01-13T19:35:00.000-08:00</published><updated>2007-01-24T08:08:51.246-08:00</updated><title type='text'>How-to's for Version 2</title><content type='html'>&lt;span style="font-size:180%;"&gt;Converting Entire corpus from SGML to XML&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;APWS Corpus&lt;/span&gt;&lt;br /&gt;XML Location: HOME/current/corpus/xml&lt;br /&gt;&lt;br /&gt;How to install SGML conversion engine:&lt;br /&gt;&lt;ol&gt;   &lt;li&gt;Download OpenSP from http://sourceforge.net/projects/openjade to a local directory, say /tmp&lt;/li&gt;   &lt;li&gt;Decompress and install it with the following commands:&lt;/li&gt;   &lt;ul&gt;     &lt;li&gt;# tar zxvf OpenSP-1.5.2.tar.gz&lt;/li&gt;     &lt;li&gt;# OpenSP-1.5.2&lt;/li&gt;     &lt;li&gt;# ./configure --prefix /home/framenet/current/corpus/lib/opensp --disable-doc-build&lt;/li&gt;     &lt;li&gt;# make&lt;/li&gt;     &lt;li&gt;# make install&lt;/li&gt;   &lt;/ul&gt;   &lt;li&gt;Add /framenet/opensp/bin to PATH&lt;br /&gt;&lt;/li&gt; &lt;/ol&gt; How to convert SGML corpus into XML:&lt;br /&gt;# cd /home/framenet/current/corpus/cfg&lt;br /&gt;#perl sgml2xml.pl ../raw/apws_ger ../xml/apws_ger&lt;br /&gt;&lt;br /&gt;How to trim XML corpus:&lt;br /&gt;# cd /home/framenet/current/corpus/cfg&lt;br /&gt;# perl trim.pl ../xml/apws_ger ../xml/apws_ger_trimmed&lt;br /&gt;&lt;br /&gt;How to convert XML corpus into plain-text:&lt;br /&gt;# cd /home/framenet/current/corpus/cfg&lt;br /&gt;# perl ./xml2txt.pl ../xml/apws_ger_trimmed ../txt/apws_ger ./extract-text.xsl ./remove-short-sent.pl&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:180%;"&gt;Install TreeTagger&lt;/span&gt;&lt;br /&gt;# cd /home/framenet/current/corpus/tagger&lt;br /&gt;Copy the following files from http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/DecisionTreeTagger.html&lt;br /&gt;&lt;ul&gt;   &lt;li&gt;tagger-scripts.tar.gz&lt;/li&gt;   &lt;li&gt;tree-tagger-linux-3.1.tar.gz&lt;/li&gt;   &lt;li&gt;german-chunker-par-linux-3.1.bin.gz&lt;/li&gt;   &lt;li&gt;german-par-linux-3.1.bin.gz&lt;/li&gt;   &lt;li&gt;install-tagger.sh&lt;/li&gt; &lt;/ul&gt; # chmod +x install-tagger.sh&lt;br /&gt;# ./install-tagger.sh&lt;br /&gt;&lt;br /&gt;You may have to modify the file cmd/filter-chunker-output.perl and include the correct path to perl (obtained executing the command "which perl").&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-116874663316971976?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/116874663316971976/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=116874663316971976' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/116874663316971976'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/116874663316971976'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2007/01/how-tos-for-version-2.html' title='How-to&apos;s for Version 2'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-115471667041682163</id><published>2006-08-04T11:37:00.000-07:00</published><updated>2006-08-04T11:40:26.863-07:00</updated><title type='text'>Current developments..</title><content type='html'>Of late, we have made considerable progress in identifying off-the-shelf tools for Sentence Boundary Disambiguators (SBD) and Chunking. Besides, we have found answers or ways to get around some of the issues enlisted in our earlier posts.&lt;br /&gt;&lt;br /&gt;For SBD, we could either look at Satz or Uplug. Neither of them are usable directly, since we need to generate a training script and a cross-validation script with respect to our corpus.&lt;br /&gt;&lt;br /&gt;For Chunking, we could use the German Chunker from the University of Stuttgart.&lt;br /&gt;&lt;br /&gt;Currently both these 2 parts of our pipeline are facing issues and have enlisted the same here:&lt;br /&gt;&lt;br /&gt;Satz:&lt;br /&gt;1) While creating the cross-validation and training scripts, we do not yet know how to consider "embedded sentences". For example: &lt;br /&gt;&lt;br /&gt;&lt;span style="font-style:italic;"&gt;&amp;lt;s&amp;gt;Zum Exil der Schrifstellerin Taslima Nasreen, die in ihrer Heimat Bangladesch vom Tode bedroht ist und am Mittwoch nach Schweden ausreiste, schreibt die Wirtschaftszeitung &amp;quot;Les Echos&amp;quot;&amp;#58;&lt;br /&gt;&amp;quot;&amp;lt;s&amp;gt;Ministerpräsidentin Khaleda Zia hat sich sicher für das geringere Übel entschieden, als sie die Ausreise von Taslima Nasreen erlaubte.&amp;lt;&amp;#47;s&amp;gt;&lt;br /&gt;....&lt;br /&gt;&amp;lt;s&amp;gt;Die Fundamentalisten, die vor weniger als zwei Wochen rund 200.000 Demonstranten auf die Straße brachten, werden ihr jetzt keine Ruhe mehr lassen.&amp;lt;&amp;#47;s&amp;gt;&amp;quot;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Here, we face multiple issues. &lt;br /&gt;  (i)Are we to consider the whole paragraph above as one single sentence [since the ":" is not really a sentence terminator], even if within the double-quotes we have multiple sentences , &lt;br /&gt;  (ii)Are we to consider the sub-portion within the double-quotes as one sentence [since without the start-and-end quotes, the sentence is not grammatically correct] - in which case the format should be &lt;br /&gt;                &amp;lt;s&amp;gt;&amp;quot; Ministerpräsidentin... lassen.&amp;quot;&amp;lt;s&amp;gt;&lt;br /&gt;    &lt;br /&gt;   For now, we have followed  the following approach :&lt;br /&gt;        &lt;span style="font-weight:bold;"&gt;&lt;span style="font-style:italic;"&gt;&amp;lt;s&amp;gt;Zum Exil der Schrifstellerin Taslima Nasreen, die in ihrer Heimat Bangladesch vom Tode bedroht ist und am Mittwoch nach Schweden ausreiste, schreibt die Wirtschaftszeitung &amp;quot;Les Echos&amp;quot;&amp;#58;&lt;br /&gt;&amp;quot;&amp;lt;s&amp;gt;Ministerpräsidentin Khaleda Zia hat sich sicher für das geringere Übel entschieden, als sie die Ausreise von Taslima Nasreen erlaubte.&amp;lt;&amp;#47;s&amp;gt; ... &lt;br /&gt;&amp;lt;s&amp;gt;Die Fundamentalisten, die vor weniger als zwei Wochen rund 200.000 Demonstranten auf die Straße brachten, werden ihr jetzt keine Ruhe mehr lassen.&amp;lt;&amp;#47;s&amp;gt;&amp;quot;&amp;lt;&amp;#47;s&amp;gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;  (iii)With regard to "paragraphs" such as &lt;br /&gt;          &amp;lt;p&amp;gt;&lt;br /&gt;         (folgt drei)mf&amp;#47;rom&lt;br /&gt;         &amp;lt;&amp;#47;p&amp;gt;&lt;br /&gt;         &amp;lt;p&amp;gt; &lt;br /&gt;         AFP&lt;br /&gt;         &amp;lt;&amp;#47;p&amp;gt;&lt;br /&gt;  we have decided to ignore them as parts of meaningful sentences.&lt;br /&gt;&lt;br /&gt;  Similarly in the case of columnar/tabular data, we do not consider them to be sentences.&lt;br /&gt;&lt;br /&gt;  (iv)The training and cross-validation scripts are ready. However, there is an issue in the execution of Satz. It appears that it needs additional executables that are not shipped with the same package. However the source code does not indicate usage of external libraries, which I noticed today. &lt;br /&gt;  &lt;br /&gt;  (v)Besides, the only reason we need to have a SBD is to compute structural information in [B]. Jisup, from Berkeley, recently clarfied that these &amp;lt;s&amp;gt; tags that are added by the SBD are removed during the CQP processing and the 1-sentence per line input does not have them. The &amp;lt;s&amp;gt; tags are added again prior the processing in send_to_schmid.pl where the &amp;lt;s&amp;gt; tags are appended with attributed such as aPos,docInfo,etc.&lt;br /&gt;For example &lt;br /&gt;&amp;lt;s aPos=&amp;quot;1351844&amp;quot; corpus=&amp;quot;BNCP&amp;quot; docInfo=&amp;quot;default_document&amp;quot; textNo=&amp;quot;41&amp;quot;paraNo=&amp;quot;178&amp;quot; sentNo=&amp;quot;2&amp;quot;&amp;gt;&lt;br /&gt;&lt;br /&gt;This leads us to the "sentence-count" script or [B] which is the only part that has been completed successfully. Initially the misunderstanding was that we needed to compute the sentence attributes like aPos,docInfo,etc here. But this was not possible since the aPos is dependent on the target word. Thanks to Jisup, we sorted this issue out. The script has been designed such that future additions of new corpuses would need to be new directories in the parent directory of AFG. &lt;br /&gt;1094991:  bncp=33=246=15 These distinctive units were finally {withdrawn} in 1984.&lt;br /&gt;Here 1094991 represents the absolute position of the target word "withdrawn", bncp=33=246=15 implies that this sentence occurs in BNCP corpus, Text# 33, Para# 246, Sentence# 15. This "bncp=33=246=15" is the one that gets added in [B], and the  1094991 is prefixed by the CQP engine. &lt;br /&gt;&lt;br /&gt;Going to the next module that we have worked on: German Chunker&lt;br /&gt;(i) We do not have a comprehensive list of chunker tags that would cover all the  cases of noun/adj/adv/verb/preposition chunks.&lt;br /&gt;(ii) We are able to obtain recursive chunks in the chunk-file, but their mapping with the parts-of-speech is not taking place. &lt;br /&gt;i.e In the chunk-file, contents are like :&lt;br /&gt;chunk pos=&amp;quot;NP.Nom&amp;quot;&amp;gt;&lt;br /&gt;&amp;lt;chunk pos=&amp;quot;NP.Nom&amp;quot;&amp;gt;&lt;br /&gt;Platz&lt;br /&gt;&amp;lt;&amp;#47;chunk&amp;gt;&lt;br /&gt;für&lt;br /&gt;&amp;lt;chunk pos=&amp;quot;NP.Akk&amp;quot;&amp;gt;&lt;br /&gt;800&lt;br /&gt;Menschen&lt;br /&gt;&amp;lt;&amp;#47;chunk&amp;gt;&lt;br /&gt;&amp;lt;&amp;#47;chunk&amp;gt;&lt;br /&gt;&lt;br /&gt;while in the output file, it is like: &lt;br /&gt;&amp;lt;s&amp;gt; [ PPARTADJ.Dat Im Innern ] [ NP.Gen dieser Insel der wenigen Seligen ] - [ NC.Dir ihre Familien ] [ VVFIN  hätten ] [ NP.Akk die Kongreßmitglieder ] nicht [ VVINF  mitbringen ] [ VMINF  dürfen ] - [ VSFIN  war ] [ NP.Nom Platz ] für [ NP.Akk 800 Menschen ] . &amp;lt;&amp;#47;s&amp;gt;&lt;br /&gt;&lt;br /&gt;while we want the output file to be like [My lack of German knowlegde prevents me from specifying the correct output in German, however in the English world, it would be as below]: &lt;br /&gt;&amp;lt;s aPos=&amp;quot;1351633&amp;quot; corpus=&amp;quot;ELNC&amp;quot; docInfo=&amp;quot;default_document&amp;quot; textNo=&amp;quot;41&amp;quot;&lt;br /&gt; paraNo=&amp;quot;177&amp;quot; sentNo=&amp;quot;1&amp;quot;&amp;gt;&lt;br /&gt;   [dt the The]&lt;br /&gt;   [rx&lt;br /&gt;     [next next next]]&lt;br /&gt;   [vx&lt;br /&gt;   h=[bez be is]]&lt;br /&gt;   [to to to]&lt;br /&gt;   [vv calibrate &amp;lt;target&amp;gt;calibrate&amp;lt;&amp;#47;target&amp;gt;]&lt;br /&gt;   [nmess&lt;br /&gt;   h=[nx&lt;br /&gt;       [dt the the]&lt;br /&gt;     h=[nn gain gain]]]&lt;br /&gt;   [cma , ,]&lt;br /&gt;   [vvg use using]&lt;br /&gt;   [nmess&lt;br /&gt;   h=[nx&lt;br /&gt;       [dt-a a a]&lt;br /&gt;     h=[nn pair pair]]&lt;br /&gt;     [pp-of&lt;br /&gt;     f=[of of of]&lt;br /&gt;     h=[nx&lt;br /&gt;         [name&lt;br /&gt;           [nnp &amp;lt;unknown&amp;gt; Helmholtz]]&lt;br /&gt;       h=[nns coil coils]]]]&lt;br /&gt;   [sent . .]&lt;br /&gt;&amp;lt;&amp;#47;s&amp;gt; &lt;br /&gt;&lt;br /&gt;Discussions with Sabine have reached a dead-end since Sabine's proposed solutions to my questions had already been tried out by me.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-115471667041682163?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/115471667041682163/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=115471667041682163' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/115471667041682163'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/115471667041682163'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/08/current-developments.html' title='Current developments..'/><author><name>sumeet</name><uri>http://www.blogger.com/profile/15109210681665633558</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-115161330900108559</id><published>2006-06-29T13:33:00.000-07:00</published><updated>2006-06-29T14:35:27.163-07:00</updated><title type='text'>A few issues..</title><content type='html'>i)Here is one of the problems :&lt;br /&gt;The corpus files have paragraph tags in XML like &amp;lt;p&amp;gt;.&lt;br /&gt;For example:&lt;br /&gt;&amp;lt;p&amp;gt;&lt;br /&gt;bk&lt;br /&gt;&amp;lt;p&amp;gt;&lt;br /&gt;AFP&lt;br /&gt;&lt;br /&gt;Ideally a group of sentences make up a paragraph.&lt;br /&gt;So a processed file would appear as below with &amp;lt;s&amp;gt; tags acting as delimiters. However here, the sentences are not terminated by a full stop.&lt;br /&gt;&lt;br /&gt;&amp;lt;p&amp;gt;&lt;br /&gt;&amp;lt;s&amp;gt;bk&amp;lt;/s&amp;gt;&lt;br /&gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&amp;lt;p&amp;gt;&amp;lt;s&amp;gt;AFP &amp;lt;/s&amp;gt;&lt;br /&gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&lt;br /&gt;Since there are no known sentence boundary detectors in this "paragraph", I doubt if a finite automata -based detector will identify the same. Or first of all, should they be marked as sentences? If they are not marked as sentences, is it ok for them to be marked as paragraphs?&lt;br /&gt;&lt;br /&gt;(ii)&lt;br /&gt;Here is another example -&lt;br /&gt;Each of these paragraphs below are in fact columnar data. Apart from the above problem also being faced here, another issue arises. Most sentence boundary detectors expect one sentence per line, how should we handle columnar data as below?&lt;br /&gt;&lt;br /&gt;&amp;lt;p&amp;gt;&lt;br /&gt;Bonn:           SPD zu Industriepolitik (Pk. 10.30 Uhr)&lt;br /&gt;              - Steuergewerkschaft zu Steuerverein-&lt;br /&gt;              fachung (11.00 Uhr)&lt;br /&gt;              - Rönsch zur Zeitverwendung der&lt;br /&gt;              Deutschen (Pk. 12.00 Uhr)&lt;br /&gt;              - SPD zur Arbeitsmarktpolitik für&lt;br /&gt;              Jugendliche (Pk. 12.00 Uhr)&lt;br /&gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&amp;lt;p&amp;gt;&lt;br /&gt;Düsseldorf:     Solinger Mordprozeß (9.15 Uhr)&lt;br /&gt;              - HBV zu Sicherheitsmaßnahmen im Filial-&lt;br /&gt;              einzelhandel nach Mord bei Schlecker&lt;br /&gt;              (Pk. 11.00 Uhr)&lt;br /&gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&amp;lt;p&amp;gt;&lt;br /&gt;Dresden:        Treffen der SPD-Fraktions- und Landesvor-&lt;br /&gt;              sitzenden in neuen Ländern (Pk. 13.45 Uhr)&lt;br /&gt;&amp;lt;/p&amp;gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Issues with German Chunker (Version from 2000)&lt;br /&gt;It contains an example input file( &lt;span style="font-style: italic;"&gt;FAZ-input&lt;/span&gt; ) which is much different from the input file that is expected in our pipeline. It demonstrates the application of this parser called lopar that identifies the noun/verb/adjective chunks on this input file. The problem is that all these chunks cannot be created to the same output file.&lt;br /&gt;&lt;br /&gt;In other words, when we parse &lt;span style="font-style: italic;"&gt;FAZ-input &lt;/span&gt;with the noun_chunks argument, it produces the output file &lt;span style="font-style: italic;"&gt;FAZ-noun_chunks.&lt;br /&gt;&lt;/span&gt;When we parse  &lt;span style="font-style: italic;"&gt;FAZ-input &lt;/span&gt;with the verb_chunks argument, it produces the output file &lt;span style="font-style: italic;"&gt;FAZ-v-all_chunks&lt;/span&gt;&lt;br /&gt;and simlarly with the adjective-adverb_chunks, etc&lt;br /&gt;&lt;br /&gt;Now,if we try to make the output file of one chunk operation as the input file to another, the parser hangs.&lt;br /&gt;i.e If &lt;span style="font-style: italic;"&gt;FAZ-noun_chunks &lt;/span&gt;is the input to the chunker with the arguement verb_chunks, the parser starts executing some stuff but then hangs.&lt;br /&gt;&lt;br /&gt;I do not know if this is expected behaviour.&lt;br /&gt;&lt;br /&gt;Or is it because of the existence of some tags/chunks inside the &lt;span style="font-style: italic;"&gt;FAZ-noun_chunks &lt;/span&gt; file? If so, will it work on the output file from the IMS Tree-tagger in our pipeline.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-115161330900108559?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/115161330900108559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=115161330900108559' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/115161330900108559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/115161330900108559'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/06/few-issues.html' title='A few issues..'/><author><name>sumeet</name><uri>http://www.blogger.com/profile/15109210681665633558</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-115039936445887195</id><published>2006-06-15T12:17:00.000-07:00</published><updated>2006-06-15T12:22:45.590-07:00</updated><title type='text'>New German Chunker</title><content type='html'>Hello Mario&lt;br /&gt;&lt;br /&gt;Thanks for the inputs.. I figured out that there are some differences though. You mentioned here that /home/framenet/may06/sandbox/framenet/collin/Adjusting.calibrate.v.v.chunked as the input for D. However your pipeline flowchart shows it to be infact the output file from the Abney's Chunker.&lt;br /&gt;&lt;br /&gt;Anyways, from the new German chunker that I am testing out, it appears that such a file is a mismatch for both the input and outputs.&lt;br /&gt;The input format here is one-word-per-line format. Each sentence has to be preceded with an &lt;s&gt; &lt;/s&gt; tag and an empty line, for example:&lt;br /&gt;&amp;lt;s&amp;gt;&lt;br /&gt;In&lt;br /&gt;den&lt;br /&gt;Großraumduschen&lt;br /&gt;lag&lt;br /&gt;die&lt;br /&gt;Seife&lt;br /&gt;schon&lt;br /&gt;bereit&lt;br /&gt;.&lt;br /&gt;&amp;lt;/s&amp;gt;&lt;br /&gt;&lt;br /&gt;which is pretty much what we have at the end of the pre-processing stage.&lt;br /&gt;So do we have to go through the IMS Tree Tagger and all in between?&lt;br /&gt;&lt;br /&gt;Do let me know what you think.&lt;br /&gt;&lt;br /&gt;Thanks&lt;br /&gt;Sumeet&lt;br /&gt;&lt;br /&gt;____________________________&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Sumeet:&lt;br /&gt;&lt;br /&gt;Let us consider the following example extracted from Complaining.lament.v.v.9:&lt;br /&gt;&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;br /&gt;&amp;lt;s aPos="2784485" corpus="AP" docInfo="apwsE941123.0183" textNo="1" paraNo="7" sentNo="1"&amp;gt;&lt;br /&gt;Prince nnp Prince&lt;br /&gt;Philip person Philip&lt;br /&gt;&amp;lt;target&amp;gt;lamented&amp;lt;/target&amp;gt; vvd lament&lt;br /&gt;that comp that&lt;br /&gt;`` nil ``&lt;br /&gt;lots nns lot&lt;br /&gt;of of of&lt;br /&gt;resources nns resource&lt;br /&gt;are ber be&lt;br /&gt;going vvg go&lt;br /&gt;into in into&lt;br /&gt;economic jj economic&lt;br /&gt;development nn development&lt;br /&gt;and cc and&lt;br /&gt;very rb very&lt;br /&gt;little jj little&lt;br /&gt;into in into&lt;br /&gt;conservation nn conservation&lt;br /&gt;of of of&lt;br /&gt;Nature organization Nature&lt;br /&gt;. sent .&lt;br /&gt;'' nil ''&lt;br /&gt;&amp;lt;/s&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;You can use your NEW German tagger but I am thinking that its input (more precisely, its eventual output) will have to contain extra information such as:&lt;br /&gt;&lt;ul&gt;&lt;li&gt; a tagged target sentence word such as &amp;lt;target&amp;gt;lamented&amp;lt;/target&amp;gt; (in the original pipeline, target word is given by the CQP output.)&lt;/li&gt;&lt;li&gt;[optionally] the named entities. For example, if you compare the output of the intermediate stages, you will notice that "Nature" was tagged as "organization" and it was tagged not by TreeTagger but by runIdentitiTagger.  &lt;/li&gt;&lt;li&gt;and the information in the opening "s" tag, such as aPos="2784485" corpus="AP" docInfo="apwsE941123.0183" which is eventually needed by FN in order to have some sequence number to "control" internal functions.&lt;/li&gt;&lt;/ul&gt;Thus, by observing the aforementioned example from v.9.9 one will notice that all of this information is present.  If with your NEW tagger you are able to somehow incorporate all this information and, in addition, you are able to produce an output with the format that uses nested brackets then you will be able to call abney_to_done.pl and the rest of the pipeline.&lt;br /&gt;&lt;br /&gt;Thanks,&lt;br /&gt;Mario&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-115039936445887195?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/115039936445887195/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=115039936445887195' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/115039936445887195'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/115039936445887195'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/06/new-german-chunker.html' title='New German Chunker'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114968872668945472</id><published>2006-06-07T06:55:00.000-07:00</published><updated>2006-06-07T06:58:47.236-07:00</updated><title type='text'></title><content type='html'>Mario has left Austin to take on an internship in India for the summer. Sumeet (who comes from the same city where Mario is doing his internship (!), Bangalore (India))  is continuing work on the GFN setup where Mario left off: finishing all steps of the pipeline, so that we can start with sample annotations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114968872668945472?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114968872668945472/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114968872668945472' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114968872668945472'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114968872668945472'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/06/mario-has-left-austin-to-take-on.html' title=''/><author><name>Hans C. Boas</name><uri>http://www.blogger.com/profile/14429941800714309921</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114832536639376458</id><published>2006-05-22T12:15:00.000-07:00</published><updated>2006-06-11T11:28:20.396-07:00</updated><title type='text'>How-to's</title><content type='html'>&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;span style="font-style: italic;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Configure client:&lt;/span&gt;&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;span style="font-style: italic;"&gt;client_home &lt;/span&gt;&lt;/span&gt;is the path to the directory containing the client files.&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Open e the file &lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;span style="font-style: italic;"&gt;client_home&lt;/span&gt;/bin/RunClass.sh&lt;/span&gt;&lt;/li&gt;&lt;li&gt;Replace the top variables with the IP/hostname of the FN server and the path to the java binary file (at the prompt, use the command &lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;which java&lt;/span&gt;&lt;/span&gt; to determine this value.)&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;br /&gt;&lt;/span&gt;&lt;/li&gt;&lt;/ol&gt;&lt;span style="font-weight: bold;font-family:courier new;font-size:85%;"  &gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-weight: bold;"&gt;Start client:&lt;/span&gt;&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;span style="font-style: italic;"&gt;client_home &lt;/span&gt;&lt;/span&gt;is the path to the directory containing the client files. At the prompt, execute:&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;# cd &lt;span style="font-style: italic;"&gt;client_home&lt;/span&gt;/bin&lt;br /&gt;# ./FNDesktop.sh&lt;/span&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;br /&gt;&lt;/span&gt;Note that the server must be running in order for the client to work.  Also, an active FN account is needed in order to use FNDesktop.sh&lt;br /&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Start server:&lt;/span&gt;&lt;br /&gt;There are two copies of the server files under &lt;span style=";font-family:courier new;font-size:85%;"  &gt;/home/framenet/current&lt;/span&gt;, one for English and another for German FrameNet server files each one running in TCP port 1098 and 1099.  Both copies of the server are identical except while some tables in the English version have sample records in the German couterpart the tables are empty.&lt;br /&gt;&lt;br /&gt;To start either server, follow the instructions given in the file &lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;span style="font-style: italic;"&gt;server_home&lt;/span&gt;/bin/README&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Log into MySQL Database:&lt;br /&gt;At the prompt:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;# su&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;# su framenet&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;# mysql&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[mysql]# show tables;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;[mysql]# use gnframenet;     //there is also an Eglish database&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114832536639376458?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114832536639376458/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114832536639376458' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114832536639376458'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114832536639376458'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/05/how-tos.html' title='How-to&apos;s'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114830976240969801</id><published>2006-05-22T07:55:00.000-07:00</published><updated>2006-05-22T12:02:52.216-07:00</updated><title type='text'>General Pipeline | Pending Parts</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7504/723/1600/Slide1.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://photos1.blogger.com/blogger/7504/723/320/Slide1.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;B.&lt;br /&gt;In order to properly import a corpus into CQP, the corpus needs to have "p" XML tags surounding every paragraph and "s" tags surounding every sentence.  However, the text contained in the German Corpus contains only "p" tags.  Therefore, we have to look for a program that will perform boundary-sentence detection for us in order to add "s" tags.  The program mxterminator is used as boundary-sentence detector for the English FN.&lt;br /&gt;&lt;br /&gt;A.&lt;br /&gt;Eventually, every sentence that will be imported into FN will need to have some ID assigneed to the sentence.  However, these IDs are not part of the original corpus; rather, a (Perl) script was written for the English FN which takes an entire corpus and outpus the same corpus, but now with an ID prepended to each sentence.  A sample ID along with its sentence looks as follows: &lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;apwsE941117.0373=1=7=1 A sample sentence.  &lt;/span&gt;&lt;/span&gt;Where, after being transformed by intermediate scripts, the ID information will be written to the final XML file given to FarinaImport.sh as this: &lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;docInfo="apwsE941123.0183" textNo="1" paraNo="7" sentNo="1" &lt;/span&gt; Either we can ask Collin to send this script to us so that we can reuse it or we can write a similar script.&lt;br /&gt;&lt;br /&gt;C.&lt;br /&gt;A named-entity tagger is a program that takes a sentence as input and identifies (tags) the part of the sentences belonging to an entity.  Entities can be proper names, names of cities and places, names of companies, countries, etc.   Here the task is to search the Internet for either an open-source or a commercial named-entity tagger for German.    (In an intial phase, this part of the pipeline may be skipped at the expense that annotators will have to manually detect entities during the annotation process.)&lt;br /&gt;&lt;br /&gt;D.&lt;br /&gt;&lt;font&gt;&lt;span style="font-size:130%;"&gt;Chunker&lt;/span&gt;&lt;br /&gt;Since for the English FN, Abney's chunker produces an ouput as follows:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;  [nmess lemma=&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  h=[nx lemma=&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    h=[person&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        [nnp lemma=Prince Prince]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        [person lemma=Philip Philip]]]]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [vvd lemma=lament &lt;target&gt;lamented&lt;/target&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [comp lemma=that that]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [nil lemma=`` ``]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [nmess lemma&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;...&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;and since our IMS chunker for German produces an output as follows:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;nc&gt;&lt;/nc&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&lt;nc&gt;&lt;/nc&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Eine ART&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;weitere ADJA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Schwierigkeit NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;besteht VVFIN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;darin PAV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;, $,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;daß KOUS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;die ART&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Kameras NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;nur ADV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;dann ADV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;verwertbares ADJA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Bildmaterial NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;liefern VVFIN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;, $,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;wenn KOUS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;die ART&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;See NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;einigermaßen ADV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ruhig ADJD&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ist VAFIN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;. $.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;Here, the task is to take the ouput of the IMS chunker for German and convert it to the format produced by Abney's chunker for English.  Another approach could be to find a chunker for german that &lt;span style="font-style: italic;"&gt;already&lt;/span&gt; produces its output in the same format as Abney's.&lt;br /&gt;&lt;br /&gt;As a last resource, this stage of the pipeline may be skipped intially, but the burden to do so might suggest that is better to avoid skipping it.  In case it is to be skipped, we would have to modify the existing Java classes of FrameNet, remove all references that invoke in &lt;span style="font-family:courier new;"&gt;ProcessRules.sh&lt;/span&gt; that filter the sentences (given some rules), and keep only the functions that add the (now unfiltered) sentences into the FN database.  Again, this seems to complex that is not recommended.&lt;font&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;E.&lt;br /&gt;Assuming we have succesfully obtained an XML file containing subcorpora that is ready to be added to FN's database.  For this task, we need to use FarinaImport.sh script.&lt;br /&gt;&lt;br /&gt;However, by using a sample XML file we could not import the sample subcorpora into the German FN, FarinaImport.sh produced a Java exception the cause of which we could not find (see the previous blog entry for details.)&lt;br /&gt;&lt;br /&gt;Both Collin and Marc Ortega (from the SpanishFN) helpmed me debug this problem but we had no success.  Not to say that there is no solution, but because of time constrains we did not find the solution.&lt;br /&gt;&lt;br /&gt;Given the follwing (simplistic) sample XML subcorpus:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;?xml version="1.0" encoding="UTF-8"?&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;subcorpora frame="Adjusting" lexunit="calibrate.v" lemma="calibrate" pos="V"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  &amp;lt;annoset-conf classify-type="fn2.farina.classify.FNClassifierPenn"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    &amp;lt;annoset-model type="POS"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model containsPOS="y"&amp;gt;PENN&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    &amp;lt;/annoset-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    &amp;lt;annoset-model type="standard"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model&amp;gt;Target&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model&amp;gt;FE&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model&amp;gt;GF&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model&amp;gt;PT&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model&amp;gt;Other&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;layer-model&amp;gt;Sent&amp;lt;/layer-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    &amp;lt;/annoset-model&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  &amp;lt;/annoset-conf&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  &amp;lt;subcorpus scName="02-T-NP-PPto" maxSize="20"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    &amp;lt;s tStart="0" tEnd="11" aPos="7382282" corpus="BNC2" docInfo="bncp" textNo="372" paraNo="162" sentNo="9"&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;!--&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;text&amp;gt;calibrated .&amp;lt;/text&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;words&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        &amp;lt;w pos="VVN" wf="calibrated" target="y" start="0" end="9"&amp;gt;calibrate&amp;lt;/w&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        &amp;lt;w pos="SENT" wf="." start="11" end="11"&amp;gt;.&amp;lt;/w&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;/words&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;--&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;text&amp;gt;.&amp;lt;/text&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;words&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        &amp;lt;w pos="SENT" wf="." start="0" end="0"&amp;gt;.&amp;lt;/w&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;/words&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;labels&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;      &amp;lt;/labels&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;     &amp;lt;/s&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  &amp;lt;/subcorpus&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/subcorpora&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;This is what we tried:&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;li&gt;We noted that adding an empty subcorpus worked succesfully.  That is, FarinaImport.sh was able to properly add a record to the SubCorpus table.  This means that FarinaImport.sh succesfully communicates with the German FN database.&lt;/li&gt;&lt;li&gt;As soon as we included a corpus with at least one "w" tag, the given Java exception was thrown.  This is the misterious part, the reason of which we would not figure out.&lt;/li&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;li&gt;Though we rapidly verified the records in MiscLabel and LabelType tables, correspoding to the Penn stagset (yes, our sample file uses the Penn tagset and it is still pending adding the German STTS tagset into these tables.)  At a first glance, it seemed that these tables had correct information.  However, going over the records of this tables and corroborating that they have correct values for the tags involved in the example, will be an starting point to debug.&lt;/li&gt;&lt;li&gt;We manually added records to the tables Corpus and Document as we were unsure whether FarinaImport.sh will add initial records to this tables when these tables are empty.  It did not make a difference as the Java exception was still thrown.&lt;/li&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;li&gt;Marc sent me a new version of the Client and Server parts of the original English FN.  I tried the client part and the Java error still appeared.  I tried to configure the server part with the new release but I could not configure it properly.  Elias might now how to do so.&lt;/li&gt;&lt;li&gt;I tried both adding the sample file to both our English and our German versions of FN and both threw the same Java exception.&lt;/li&gt;&lt;li&gt;From within FNDesktop, I tried assigning diffferent statuses for the given LU and it made no difference.&lt;br /&gt;&lt;/li&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;Other parts&lt;br /&gt;&lt;/span&gt;Most of the scripts of the remaining part of the diagrams have not been tested and some of them may require some changes.  It is important to note that it will be very advisable to review the script FN2Import.sh as it is the  "mother" script that calls all the part of the middle column of the diagram.&lt;br /&gt;&lt;br /&gt;Thanks a lot to Collin and Marc for all of their invaluable help and cooperation.&lt;br /&gt;&lt;span style="font-size:130%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114830976240969801?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114830976240969801/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114830976240969801' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114830976240969801'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114830976240969801'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/05/general-pipeline-pending-parts.html' title='General Pipeline | Pending Parts'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114792711959917147</id><published>2006-05-17T21:38:00.000-07:00</published><updated>2006-05-18T16:19:15.146-07:00</updated><title type='text'>CQP, Chunker &amp; FarinaImport</title><content type='html'>&lt;span style="font-size:130%;"&gt;CQP&lt;/span&gt;&lt;br /&gt;I have been working in producing a CQP query-result of German corpora with similar format to this sample for the English corpus (keyword=laments):&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;  1517620:  apwsE941117.0373=1=7=1 `` The most unpleasant thing is that we are attacked by those formerly high officials who insisted on us being a brainwashing center , '' {laments} Perfilov , sitting under the once-obligatory portrait of Lenin in his office .&lt;br /&gt;1528335:  apwsE941117.0393=1=7=1 `` The most unpleasant thing is that we are attacked by those formerly high officials who insisted on us being a brainwashing center , '' {laments} Perfilov , sitting under the once-obligatory portrait of Lenin in his office&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;br /&gt;versus the current German output (keywork=des):&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;260: überzeugenden Darlegungen  Chefs {des} Europäischen W&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;262: den Darlegungen {des} Chefs &lt;/span&gt;&lt;des style="font-family: courier new;"&gt; Europäischen Währungsins&lt;/des&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;However, there are a few things that I cannot output in the German CQP query result.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Sentence context:  While the English sample outpus an &lt;span style="font-style: italic;"&gt;entire &lt;/span&gt;sentence containing the matched keyword, the German counterpart can only output a fixed number of characters surounding the given keyword since it does not contain information about sentence boundaries. Because our original German corpus has only XML tags to delimit paragraphs and it does not have any tag to delimit sentences, consequently, when imported into CQP, the German corpus does not have an "s"  (sentence) s-attribute defined. Thus, entering &lt;span style="font-family:courier new;"&gt;&lt;span style="font-size:85%;"&gt;set context s;&lt;/span&gt; &lt;/span&gt;produces an error in CQP, which results in not being able to output full sentences.&lt;br /&gt;A possible work around will be to write a script that will insert "s" XML tags somehow, delimiting each sentence so that CQP can have the sentence "s" attribute.  &lt;span style="font-weight: bold;"&gt;Do you know if CQP is able to accept a context the boundary of which is a string (in this case we could use the period "." as end-of-sentence boundary?)&lt;/span&gt; I have already tried, for instance, &lt;span style=";font-family:courier new;font-size:85%;"  &gt;set context "."; &lt;/span&gt;and received an error.&lt;font&gt;&lt;span style="font-weight: bold;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/li&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;li&gt;Additional information: I noted that the part of the CQP output composed by &lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;apwsE941117.0373=1=7=1 &lt;/span&gt;&lt;/span&gt;is eventually tranformed into&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt; &lt;/span&gt;&lt;/span&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;aPos="2784485" corpus="AP" docInfo="apwsE941123.0183" textNo="1" paraNo="7" sentNo="1" &lt;/span&gt;However, the CQP output produced from our German corpus does not include this information. &lt;span style="font-weight: bold;"&gt; What does this information represent and is it necessary that we include it? If so, will taking a glance at the pipeline you use to import the English corpus into CQP will help us? (this will also help us see if our pre-CQP pipeline is not missing anything)&lt;/span&gt;&lt;/li&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;span style="font-size:130%;"&gt;Chunker&lt;/span&gt;&lt;br /&gt;Since for the English &lt;font&gt;FN, Abney's chunker produces an ouput as follows:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;  [nmess lemma=&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  h=[nx lemma=&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;    h=[person&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        [nnp lemma=Prince Prince]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;        [person lemma=Philip Philip]]]]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [vvd lemma=lament &lt;target&gt;lamented&lt;/target&gt;]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [comp lemma=that that]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [nil lemma=`` ``]&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;  [nmess lemma&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;...&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;and since our IMS chunker for German produces an output as follows:&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:courier new;"&gt;&lt;nc&gt;&lt;/nc&gt;&lt;/span&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Eine    ART&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;weitere ADJA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Schwierigkeit   NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;besteht VVFIN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;darin   PAV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;,       $,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;daß     KOUS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;die     ART&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Kameras NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;nur     ADV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;dann    ADV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;verwertbares    ADJA&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;Bildmaterial    NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;liefern VVFIN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;,       $,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;wenn    KOUS&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;die     ART&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;See     NN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/NC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;einigermaßen    ADV&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ruhig   ADJD&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;ist     VAFIN&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;&amp;lt;/VC&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:courier new;"&gt;.       $.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-weight: bold;"&gt;Is it feasible to use Abney's chunker/parser to parse the German chunked data and produce a format with nested brackets similar to the English counterpart? Will it be better to modify the Java classes of FN in order to support the current format of the chuncked German text? Or, is another method more feasible?&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;FarinaImport.sh&lt;/span&gt;&lt;br /&gt;Assuming that the pipeline is complete, I used the file Adjusting.calibrate.v.v.processed in order to test whether FarinaImport.sh works correctly with our German FN.  However, after executing this script I obtain the following error:&lt;br /&gt;&lt;span style=";font-family:courier new;font-size:85%;"  &gt;&lt;br /&gt;~/framenet/client/german-client/bin&gt; FarinaImport.sh ~/framenet/collin/Adjusting.calibrate.v.v.processed&lt;br /&gt;[FNProperties] ./..&lt;br /&gt;[FNProperties] loading from file ./../conf/fnclient.properties&lt;br /&gt;[FNProperties] loading from file /u/guajardo/.fnclient.properties&lt;br /&gt;log4j:WARN No appenders could be found for logger (fn2.farina.clients.FNProperties).&lt;br /&gt;log4j:WARN Please initialize the log4j system properly.&lt;br /&gt;[FNProperties] Using server [framenet...]&lt;br /&gt;username:[my user]&lt;br /&gt;password:[my pass]&lt;br /&gt;&lt;br /&gt;Importing /u/guajardo/framenet/collin/Adjusting.calibrate.v.v.processed...&lt;br /&gt;Processing on server...Exception in thread "main" fn2.farina.exception.ImportException: Import Exception: javax.ejb.TransactionRolledbackLocalException: Unexpected Error&lt;br /&gt;java.lang.NoClassDefFoundError&lt;br /&gt; at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)&lt;br /&gt; at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)&lt;br /&gt; at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)&lt;br /&gt; at java.lang.reflect.Constructor.newInstance(Constructor.java:274)&lt;br /&gt;...&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Do you know what will be the reason for this exception?  I went over the contents of MySQL DB for the German FN and noted that the following relevant tables are empty:&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;li&gt;Corpus&lt;/li&gt;&lt;li&gt;Document&lt;/li&gt;&lt;li&gt;SubCorpus&lt;/li&gt;&lt;li&gt;Genre&lt;/li&gt;&lt;li&gt;Paragraph&lt;/li&gt;&lt;li&gt;Sentence&lt;/li&gt;&lt;li&gt;Annotation Set&lt;/li&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/ul&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;&lt;font&gt;thus, do you have sample initial values for these tables?  I tried to infer what some of the initial records for this tables might be but my attempt was pretty much of trial and error.  For instance, I added a new record for Corpus, Document and Genre tables respectively and still the aforementioned Java exception was thrown.  Also, do we need to set a particualr status in FNDesktop for Adjusting.calibrate.v? I am thinking that a given status might need to be set in order for FarinaImport.sh to work on that given LU.&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114792711959917147?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114792711959917147/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114792711959917147' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114792711959917147'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114792711959917147'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/05/cqp-chunker-farinaimport.html' title='CQP, Chunker &amp; FarinaImport'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114664542611473514</id><published>2006-05-03T01:24:00.000-07:00</published><updated>2006-05-03T02:11:49.440-07:00</updated><title type='text'>Subcorporation Pipeline</title><content type='html'>&lt;a onblur="try {parent.deselectBloggerImageGracefully();} catch(e) {}" href="http://photos1.blogger.com/blogger/7504/723/1600/subcorporationQuestions.png"&gt;&lt;img style="margin: 0px auto 10px; display: block; text-align: center; cursor: pointer;" src="http://photos1.blogger.com/blogger/7504/723/400/subcorporationQuestions.png" alt="" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;The attached diagram pictures Collin's response in a graphical form.  I also included the pipeline that we follow to pre-process our German corpus so as to be able to imported into CQP, and though this pipeline is not yet totally implemented, I know how to implement this portion of the diagram.  For the rest of the diagram, however, I still have some questions about how the different pieces look together.  This is just a first draft and as things progress I will incorporate more detailed information into this diagram.&lt;span style="font-size:130%;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;Questions&lt;/span&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;The following questions correspond to specific parts of the diagram:&lt;span style="font-size:130%;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-size:130%;"&gt;&lt;span style="font-size:100%;"&gt;&lt;br /&gt;&lt;/span&gt;A:&lt;/span&gt;&lt;br /&gt;Within FNDesktop, particularly within the Subcorpus Rule-Definition GUI, the following error appeared when trying to save a sample rule: “You must select a corpus before you can save!”  I noticed that pull-down menu for the &lt;span style="font-weight: bold;"&gt;Corpus&lt;/span&gt; field does not show any corpus value.   How can we add corpus to FNDesktop so as to be able to save the rules?&lt;br /&gt;&lt;br /&gt;Also, in the file, &lt;span style="font-family:courier new;"&gt;conf/fnclient.properties&lt;/span&gt;, where does the following variable point to? &lt;span style="font-family:courier new;"&gt;rule_path=/n/jolt/da/aicorpus/fncorp/FErec&lt;/span&gt; Is it related to the aforementioned error?&lt;br /&gt;&lt;br /&gt;Is it also related that within FNDesktop, when enabling "Main/Tree Mode/Corpus Mode," all the frames in the left column disappear and the FNDesktop lists no frames at all.  Why is this if other Tree Modes (i.e., Corpus, Semantic Type, Inheritance and Using) list all of the frames?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;B:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;How do we call this script and what command-line argument shall we provide for it?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;C:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;Does the shell-script in B, will call the CQP engine?      Is there any special directory where the CQP engine must reside and/or any other special configuration for CQP?            Also note that we already know how to perform the steps in the block &lt;span style="font-style: italic;"&gt;Pre-processing German Corpus&lt;/span&gt;.&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;D:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;What format shall the CQP output have?   For example, we are able to produce KWIC format from CQP:&lt;br /&gt;&lt;span style="font-family:courier new,monospace;"&gt;&lt;br /&gt;260: überzeugenden Darlegungen &lt;des&gt; Chefs des Europäischen W&lt;br /&gt;262: den Darlegungen des Chefs &lt;des&gt; Europäischen Währungsins&lt;br /&gt;373: Partei ` Für Lettland ' ' &lt;des&gt; deutschen Rechtsradikale&lt;br /&gt;510: Partei ` Für Lettland ' ' &lt;des&gt; deutschen Rechtsradikale&lt;br /&gt;530: - Die Zahl der Todesopfer &lt;des&gt; Erdbebens in der westtür&lt;br /&gt;584: ürden in Zelten im Garten &lt;des&gt; Krankenhauses behandelt&lt;br /&gt;952: trierte sich nach Angaben &lt;des&gt; bosnischen Rundfunks auf&lt;br /&gt;968: und Sanski Most im Westen &lt;des&gt; Landes . In allen übrige &lt;/des&gt;&lt;/des&gt;&lt;/des&gt;&lt;/des&gt;&lt;/des&gt;&lt;/des&gt;&lt;/des&gt;&lt;/des&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;E:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;How does this output look like?   We are able to chunk our sentences as follows:&lt;br /&gt;&lt;pre&gt;&amp;lt;s&amp;gt;&lt;br /&gt;&amp;lt;PC&amp;gt;&lt;br /&gt;Im    APPRART&lt;br /&gt;Innern    NN&lt;br /&gt;&amp;lt;NC&amp;gt;&lt;br /&gt;dieser    PDAT&lt;br /&gt;Insel    NN&lt;br /&gt;&amp;lt;/NC&amp;gt;&lt;br /&gt;&amp;lt;/PC&amp;gt;&lt;br /&gt;&amp;lt;NC&amp;gt;&lt;br /&gt;der    ART&lt;br /&gt;wenigen    PIS&lt;br /&gt;Seligen    NN&lt;br /&gt;&amp;lt;/NC&amp;gt;&lt;br /&gt;-    $(&lt;br /&gt;&amp;lt;NC&amp;gt;&lt;br /&gt;ihre    PPOSAT&lt;br /&gt;Familien    NN&lt;br /&gt;&amp;lt;/NC&amp;gt;&lt;br /&gt;&amp;lt;VC&amp;gt;&lt;br /&gt;hätten    VAFIN&lt;br /&gt;&amp;lt;/VC&amp;gt;&lt;br /&gt;&amp;lt;NC&amp;gt;&lt;br /&gt;die    ART&lt;br /&gt;Kongreßmitglieder    NN&lt;br /&gt;&amp;lt;/NC&amp;gt;&lt;br /&gt;nicht    PTKNEG&lt;br /&gt;&amp;lt;VC&amp;gt;&lt;br /&gt;mitbringen    VVINF&lt;br /&gt;dürfen    VMINF&lt;br /&gt;&amp;lt;/VC&amp;gt;&lt;br /&gt;-    $(&lt;br /&gt;&amp;lt;VC&amp;gt;&lt;br /&gt;war    VAFIN&lt;br /&gt;&amp;lt;/VC&amp;gt;&lt;br /&gt;&amp;lt;NC&amp;gt;&lt;br /&gt;Platz    NN&lt;br /&gt;&amp;lt;/NC&amp;gt;&lt;br /&gt;&amp;lt;PC&amp;gt;&lt;br /&gt;für    APPR&lt;br /&gt;800    CARD&lt;br /&gt;Menschen    NN&lt;br /&gt;.    $.&lt;br /&gt;&amp;lt;/s&amp;gt;&lt;br /&gt;&lt;/pre&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;F:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;This part is “cloudy” as it is not very clear how the pipeline will flow until being able to import the subcorpora into the FrameNet DB?&lt;br /&gt;&lt;span style="font-family:courier new,monospace;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:130%;"&gt;G:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;Does this refer to the &lt;span style="font-weight: bold;"&gt;FE Classifier&lt;/span&gt; mentioned the Farina Book, section 6.8? How (and who) invokes this Java Class?      Where is the output of this classifier sent to? To &lt;span style="font-family:courier new;"&gt;FarinaProcessRules.sh&lt;/span&gt;, &lt;span style="font-family:courier new;"&gt;FarinaImport.sh&lt;/span&gt;, or somewhere else?&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:130%;"&gt;H:&lt;/span&gt;&lt;br /&gt;&lt;p:colorscheme colors="#ffffff,#000000,#808080,#000000,#bbe0e3,#333399,#009999,#99cc00"&gt;  &lt;/p:colorscheme&gt;&lt;div shape="_x0000_s1026" class="O" style=""&gt;  &lt;div style="text-align: center;"&gt;&lt;span style="font-size:180%;"&gt;&lt;b&gt;&lt;/b&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;How is this script called (e.g., arguments and other required input) and what part of the pipeline does it go into?&lt;br /&gt;&lt;span style="text-decoration: underline;"&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114664542611473514?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114664542611473514/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114664542611473514' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114664542611473514'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114664542611473514'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/05/subcorporation-pipeline.html' title='Subcorporation Pipeline'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114653861390413023</id><published>2006-05-01T19:54:00.000-07:00</published><updated>2006-05-01T19:57:50.950-07:00</updated><title type='text'>Problems with subcorpora creation and import into FN Desktop</title><content type='html'>Met with Elias and Mario today to discuss further progress. Right now, we are having two major issues that we are trying to resolve:&lt;br /&gt;&lt;br /&gt;(1) integrating the FN Desktop with the CQP and other parts of "the pipeline" so that we can create subcopora for import;&lt;br /&gt;&lt;br /&gt;(2) importing the subcorpora into the FN Desktop so that we can start annotating.&lt;br /&gt;&lt;br /&gt;Both points sound like they should be straightforward, but it turns out that it is much harder than we thought initially. Mario will be getting in touch with Collin to sort out these issues.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114653861390413023?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114653861390413023/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114653861390413023' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114653861390413023'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114653861390413023'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/05/problems-with-subcorpora-creation-and.html' title='Problems with subcorpora creation and import into FN Desktop'/><author><name>Hans C. Boas</name><uri>http://www.blogger.com/profile/14429941800714309921</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114565190498744248</id><published>2006-04-21T12:50:00.000-07:00</published><updated>2006-04-21T13:41:25.273-07:00</updated><title type='text'>Procedure to import subcorpora into FrameNet</title><content type='html'>Collin explained to Elias the overall information flow that is required in order to import subcorpora into FrameNet. Because we are doing a lexicographic project, we will need to create a subcorpus for each lexical unit (LU) associate with a given frame. In order to be able to import subcorpora into FrameNet, we need to create an XML file containing the given subcorpus, the format of which is defined by FrameNet.  Collin sent us a sample XML file in order to see the exact format that FrameNet expects.&lt;br /&gt;&lt;br /&gt;As a next step, we plan to import our German corpus into the CQP engine. I know how to do that already, we just haven't had a chance to yet.   The way I import our German corpus into CQP is as follows:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;The orignal German corpus is transformed from SGML to XML format.&lt;/li&gt;&lt;li&gt;The paragraph portions of the XML corpus files are combined into a single plain-text file.&lt;/li&gt;&lt;li&gt;The plain-text file, containing German sentences, is tagged using Tree tagger.&lt;/li&gt;&lt;li&gt;The tagged output is imported into CQP using the CQP import and compile tools.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;There seem to be two different ways of exracting subcorpora using the CQP as a query engine.&lt;br /&gt;&lt;br /&gt;On one hand, accoriding to the article "FrameNet in Action: The Case of Attaching" there seems to exist a GUI (called Subcorpus Query Definition page) within FrameNet Desktop that allows the user to define CQP queries in order to produce subcorpora; though we have not actually tried it out, it is my understanding that this GUI is able to translate its input parameters into an actual CQP query that will obtain the desired subcorpus.&lt;br /&gt;&lt;br /&gt;On the other hand, Elias understood from Collin that there is a process called "farina-import" that froms a pipeline from the larger German corpus using CQP, a named-entity recognizer, the IMS tree tagger, and Steve Abney's chunk parser, to form the desired subcorpora. These subcorpora can then be imported into the server using a feature called import-xml. Apparently farina-import comprises the Berkeley technique for doing this import process, other systems (Spanish FrameNet, Japanese FrameNet) have used other techniques.&lt;br /&gt;&lt;br /&gt;The one point to the farina-import system that Elias is not clear on is the creation of chunk rules. Collin said he'd share with us some of the source for the farina-import pipeline system and examples of chunk-rule creation.&lt;br /&gt;&lt;br /&gt;So, at the point where we're at, we have five questions:&lt;br /&gt;&lt;ol&gt;&lt;li&gt;Is there relationship between the farina-import process and the SQD page GUI process? We believe Hans is more familiar with the latter process.&lt;br /&gt;&lt;/li&gt;&lt;li&gt;How are CQP queries formed by either process, since naturally we want to get the right queries generated to make our subcorpora&lt;/li&gt;&lt;li&gt;Similarly, we need to know how chunk rules are created and applied, basically how that step works with the process. We may have a conference call on that matter.&lt;/li&gt;&lt;li&gt;What tool or CQP parameter is used to tranform the subcorpora from the KWIC format that CQP outputs to the XML format that import-xml seems to require?&lt;br /&gt;&lt;/li&gt;&lt;li&gt;Finally, how does import-xml work, getting the fully specified subcorpora XML into the FN system.&lt;br /&gt;&lt;/li&gt;&lt;/ol&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114565190498744248?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114565190498744248/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114565190498744248' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114565190498744248'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114565190498744248'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/04/procedure-to-import-subcorpora-into.html' title='Procedure to import subcorpora into FrameNet'/><author><name>Mario Guajardo</name><uri>http://www.blogger.com/profile/06187053202503985989</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-26532628.post-114550100747435091</id><published>2006-04-19T19:27:00.000-07:00</published><updated>2006-04-19T19:55:38.383-07:00</updated><title type='text'>Creating a German FrameNet</title><content type='html'>FrameNet is one of the most amazing lexical resources for English (&lt;a href="http://framenet.icsi.berkeley.edu"&gt;http://framenet.icsi.berkeley.edu&lt;/a&gt;). In order to spread knowledge on how to set up FrameNets for other languages we're collecting information on how the set-up of German FrameNet at UT Austin is taking place (see &lt;a href="http://gframenet.gmc.utexas.edu"&gt;http://gframenet.gmc.utexas.edu&lt;/a&gt;). This will hopefully help others with setting up FrameNets for other languages. In addition, we're blogging other FrameNet-related information.&lt;br /&gt;&lt;br /&gt;The idea to build a German FrameNet grew out of my stay with the Berkeley FrameNet group from 1999-2001. Since then, I've thought about different ways of creating FrameNets for other languages (see, for example: Hans C. Boas. 2005. Semantic Frames as Interlingual Representations for Multilingual Lexical Databases. In: International Journal of Lexicography 18.4, 445-478).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/26532628-114550100747435091?l=gframenet.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://gframenet.blogspot.com/feeds/114550100747435091/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=26532628&amp;postID=114550100747435091' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114550100747435091'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/26532628/posts/default/114550100747435091'/><link rel='alternate' type='text/html' href='http://gframenet.blogspot.com/2006/04/creating-german-framenet.html' title='Creating a German FrameNet'/><author><name>Hans C. Boas</name><uri>http://www.blogger.com/profile/14429941800714309921</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
