-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
As you may know, I have been looking into automating the analysis of
bodies of source code, with one of my goals being the automatic
generation of interfaces, e.g. Poplog + GSL and DISP interfaces for
Poplog.
Having developed a generic parser I was faced with compiling a set of
grammars to keep it well-fed. I decided to divert a little time into
investigating how much of the information expressed in those grammars I
could derive with little or no a priori knowledge based on lower-level
analyses of various source files.
Along this path, I am halted at the point of identifying tokens from a
statistical analysis of the characters in the file. I have started to
look in the Poplog Teach and Ref files to see if they have anything to
say on this subject, but I am hampered by not being certain of the name
usually applied to this field of enquiry.
Is there a name for this field? Is there any coverage within the Poplog
documentation? Does anyone have any pointers for places I can look for
confirming the "rules" I seem to be uncovering, e.g. "Across a diverse
set of linguistic data, the token-separator appears to be the most
common character"?
Regards,
- --
Jeff Best
-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1
iQA/AwUBPVDKCvHj+enJbeYqEQJ9UgCgwzhnsz+xfhspPOx4LKvL7DxOWlQAoIYz
XiWSRAP/fC8IE1jbyNkzDyUr
=v+pd
-----END PGP SIGNATURE-----
|