We suggest upgrading to IE 7 or downloading Firefox for a more enjoyable web experience.
Organization name identification
Set-up of dedicated dictionaries like, for example, list of organizations or other named entities, is a time consuming process. Even when such material is available, maintaining it up to date could be a very tedious task. Studies published in the specialized literature [1, 2] shown that such list are not needed to achieve named entities recognition with a good accuracy.
Mikhev et al. [1] have described that their system, which displays a P & R score of 91.5% for organization name identification (using dedicated gazetteers), obtains a 85.5% P & R score without using the specialized lists.

Evaluation measures for Information Extraction Systems [3]
According to this study, we report a company name identification system that combines rule-based grammars with bayesian networks. This system, which does not use any companies name list, is part of a resume analyzer serving one of the world’s foremost professional recruitment firms.
The goal of this system is to extract companies' names that are displayed in the career section of the resume.
Process description
The word that can be part of a company name are identified and tagged by an automatically learned bayesian network. Those words then are used as seeds for specialized grammar based rules that find the companies full names.
Bayesian network construction
Each word is characterized by using 21 variables that describe:
- The word structure (uppercase, mixed case, lowercase),
- Word appear in a common word dictionary,
- Part of speech information about the word,
- Preceding and following words information’s,
- Statistical data about the line of text where the word appears.
A corpus made of 8000 resumes has been used to constitute learning and testing sets that have been exploited by BayesiaLab to automatically learn the bayesian network illustrated below with the “Sons&Spouses” supervised learning algorithm.
The performance evaluation on the test set returns the following results:
- Recall = 84%
- Precision = 72%
- P & R = 77.5%
Rule based analysis
Specialized grammar based rules are used to identify the full company name using as seeds the words that have been tagged by the bayesian network.
Current system performances are given bellow:
- Recall = 75%
- Precision = 80%
- P & R = 77.3%
For more details regarding this work, please contact Turn on JavaScript!.
[1] Andrei Mikhev, Marc Moens, and Claire Grover. 1999. "Named Entity Recognition without Gazetteers". In Proceeding of EACL'99.
[2] GuoDong Zhou, and Jian Su. 1999. "Named Entity Recognition using an HMM-based Chunk Tagger". In Proceedings of the 40th Annual Meeting of the ACL.
[3] Ralph Grishman and Beth Sundheim. 1996. "Message understanding conference - 6: A brief history". In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen.



