[Date Prev] [Date Next] [Thread Prev] [Thread Next] Date Index Thread Index Search archive:
Date:Mon Aug 7 07:21:23 2002 
Subject:A question 
From:jeffb 
Volume-ID:1020807.01 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

As you may know, I have been looking into automating the analysis of 
bodies of source code, with one of my goals being the automatic 
generation of interfaces, e.g. Poplog + GSL and DISP interfaces for 
Poplog.

Having developed a generic parser I was faced with compiling a set of 
grammars to keep it well-fed. I decided to divert a little time into 
investigating how much of the information expressed in those grammars I 
could derive with little or no a priori knowledge based on lower-level 
analyses of various source files.

Along this path, I am halted at the point of identifying tokens from a 
statistical analysis of the characters in the file. I have started to 
look in the Poplog Teach and Ref files to see if they have anything to 
say on this subject, but I am hampered by not being certain of the name 
usually applied to this field of enquiry.

Is there a name for this field? Is there any coverage within the Poplog 
documentation? Does anyone have any pointers for places I can look for 
confirming the "rules" I seem to be uncovering, e.g. "Across a diverse 
set of linguistic data, the token-separator appears to be the most 
common character"?

Regards,
- -- 
Jeff Best

-----BEGIN PGP SIGNATURE-----
Version: PGPsdk version 1.7.1

iQA/AwUBPVDKCvHj+enJbeYqEQJ9UgCgwzhnsz+xfhspPOx4LKvL7DxOWlQAoIYz
XiWSRAP/fC8IE1jbyNkzDyUr
=v+pd
-----END PGP SIGNATURE-----