New German Chunker
Hello Mario
Thanks for the inputs.. I figured out that there are some differences though. You mentioned here that /home/framenet/may06/sandbox/framenet/collin/Adjusting.calibrate.v.v.chunked as the input for D. However your pipeline flowchart shows it to be infact the output file from the Abney's Chunker.
Anyways, from the new German chunker that I am testing out, it appears that such a file is a mismatch for both the input and outputs.
The input format here is one-word-per-line format. Each sentence has to be preceded with an tag and an empty line, for example:
<s>
In
den
Großraumduschen
lag
die
Seife
schon
bereit
.
</s>
which is pretty much what we have at the end of the pre-processing stage.
So do we have to go through the IMS Tree Tagger and all in between?
Do let me know what you think.
Thanks
Sumeet
____________________________
Sumeet:
Let us consider the following example extracted from Complaining.lament.v.v.9:
<s aPos="2784485" corpus="AP" docInfo="apwsE941123.0183" textNo="1" paraNo="7" sentNo="1">
Prince nnp Prince
Philip person Philip
<target>lamented</target> vvd lament
that comp that
`` nil ``
lots nns lot
of of of
resources nns resource
are ber be
going vvg go
into in into
economic jj economic
development nn development
and cc and
very rb very
little jj little
into in into
conservation nn conservation
of of of
Nature organization Nature
. sent .
'' nil ''
</s>
You can use your NEW German tagger but I am thinking that its input (more precisely, its eventual output) will have to contain extra information such as:
Thanks,
Mario
Thanks for the inputs.. I figured out that there are some differences though. You mentioned here that /home/framenet/may06/sandbox/framenet/collin/Adjusting.calibrate.v.v.chunked as the input for D. However your pipeline flowchart shows it to be infact the output file from the Abney's Chunker.
Anyways, from the new German chunker that I am testing out, it appears that such a file is a mismatch for both the input and outputs.
The input format here is one-word-per-line format. Each sentence has to be preceded with an
<s>
In
den
Großraumduschen
lag
die
Seife
schon
bereit
.
</s>
which is pretty much what we have at the end of the pre-processing stage.
So do we have to go through the IMS Tree Tagger and all in between?
Do let me know what you think.
Thanks
Sumeet
____________________________
Sumeet:
Let us consider the following example extracted from Complaining.lament.v.v.9:
<s aPos="2784485" corpus="AP" docInfo="apwsE941123.0183" textNo="1" paraNo="7" sentNo="1">
Prince nnp Prince
Philip person Philip
<target>lamented</target> vvd lament
that comp that
`` nil ``
lots nns lot
of of of
resources nns resource
are ber be
going vvg go
into in into
economic jj economic
development nn development
and cc and
very rb very
little jj little
into in into
conservation nn conservation
of of of
Nature organization Nature
. sent .
'' nil ''
</s>
You can use your NEW German tagger but I am thinking that its input (more precisely, its eventual output) will have to contain extra information such as:
- a tagged target sentence word such as <target>lamented</target> (in the original pipeline, target word is given by the CQP output.)
- [optionally] the named entities. For example, if you compare the output of the intermediate stages, you will notice that "Nature" was tagged as "organization" and it was tagged not by TreeTagger but by runIdentitiTagger.
- and the information in the opening "s" tag, such as aPos="2784485" corpus="AP" docInfo="apwsE941123.0183" which is eventually needed by FN in order to have some sequence number to "control" internal functions.
Thanks,
Mario