German FrameNet

Wednesday, May 03, 2006

Subcorporation Pipeline


The attached diagram pictures Collin's response in a graphical form. I also included the pipeline that we follow to pre-process our German corpus so as to be able to imported into CQP, and though this pipeline is not yet totally implemented, I know how to implement this portion of the diagram. For the rest of the diagram, however, I still have some questions about how the different pieces look together. This is just a first draft and as things progress I will incorporate more detailed information into this diagram.

Questions

The following questions correspond to specific parts of the diagram:

A:

Within FNDesktop, particularly within the Subcorpus Rule-Definition GUI, the following error appeared when trying to save a sample rule: “You must select a corpus before you can save!” I noticed that pull-down menu for the Corpus field does not show any corpus value. How can we add corpus to FNDesktop so as to be able to save the rules?

Also, in the file, conf/fnclient.properties, where does the following variable point to? rule_path=/n/jolt/da/aicorpus/fncorp/FErec Is it related to the aforementioned error?

Is it also related that within FNDesktop, when enabling "Main/Tree Mode/Corpus Mode," all the frames in the left column disappear and the FNDesktop lists no frames at all. Why is this if other Tree Modes (i.e., Corpus, Semantic Type, Inheritance and Using) list all of the frames?

B:
How do we call this script and what command-line argument shall we provide for it?

C:
Does the shell-script in B, will call the CQP engine? Is there any special directory where the CQP engine must reside and/or any other special configuration for CQP? Also note that we already know how to perform the steps in the block Pre-processing German Corpus.

D:
What format shall the CQP output have? For example, we are able to produce KWIC format from CQP:

260: überzeugenden Darlegungen Chefs des Europäischen W
262: den Darlegungen des Chefs Europäischen Währungsins
373: Partei ` Für Lettland ' ' deutschen Rechtsradikale
510: Partei ` Für Lettland ' ' deutschen Rechtsradikale
530: - Die Zahl der Todesopfer Erdbebens in der westtür
584: ürden in Zelten im Garten Krankenhauses behandelt
952: trierte sich nach Angaben bosnischen Rundfunks auf
968: und Sanski Most im Westen Landes . In allen übrige


E:
How does this output look like? We are able to chunk our sentences as follows:
<s>
<PC>
Im APPRART
Innern NN
<NC>
dieser PDAT
Insel NN
</NC>
</PC>
<NC>
der ART
wenigen PIS
Seligen NN
</NC>
- $(
<NC>
ihre PPOSAT
Familien NN
</NC>
<VC>
hätten VAFIN
</VC>
<NC>
die ART
Kongreßmitglieder NN
</NC>
nicht PTKNEG
<VC>
mitbringen VVINF
dürfen VMINF
</VC>
- $(
<VC>
war VAFIN
</VC>
<NC>
Platz NN
</NC>
<PC>
für APPR
800 CARD
Menschen NN
. $.
</s>

F:
This part is “cloudy” as it is not very clear how the pipeline will flow until being able to import the subcorpora into the FrameNet DB?

G:
Does this refer to the FE Classifier mentioned the Farina Book, section 6.8? How (and who) invokes this Java Class? Where is the output of this classifier sent to? To FarinaProcessRules.sh, FarinaImport.sh, or somewhere else?

H:
How is this script called (e.g., arguments and other required input) and what part of the pipeline does it go into?

0 Comments:

Post a Comment

<< Home