Patrice Mellot
Consultant
Set-up of dedicated dictionaries like, for example, list
of organizations or other named entities, is a time consuming process. Even
when such material is available, maintaining it up to date could be a very
tedious task. Studies published in the specialized literature [1, 2] shown
that such list are not needed to achieve named entities recognition with a
good accuracy.
Mikhev et al. [1] have described that their system, which
displays a P & R score of 91.5% for organization name identification (using
dedicated gazetteers), obtains a 85.5% P & R score without using the specialized
lists.

Evaluation measures for Information Extraction
Systems [3]
According to this study, we report a company name identification
system that combines rule-based grammars with Bayesian Networks. This system,
which does not use any companies name list, is part of a resume analyzer serving
one of the world’s foremost professional recruitment firms.
The goal of this system is to extract companies' names that
are displayed in the career section of the resume.
Process description
The word that can be part of a company name are identified
and tagged by an automatically learned Bayesian network. Those words then
are used as seeds for specialized grammar based rules that find the companies
full names.
Bayesian Network construction
Each word is characterized by using 21 variables that describe:
A corpus made of 8000 resumes has been used to constitute
learning and testing sets that have been exploited by BayesiaLab
to automatically learn the Bayesian network illustrated below with the “Sons&Spouses”
supervised learning algorithm.

The performance evaluation on the test set returns the following results:
Recall = 84%
Precision=72%
P&R=77.5%
Rule based analysis
Specialized grammar based rules are used to identify the full company name
using as seeds the words that have been tagged by the Bayesian network.
Current system performances are given bellow:
Recall = 75%
Precision=80%
P&R=77.3%
References
[1] Andrei Mikheev, Marc Moens, and Claire Grover. 1999. "Named Entity
Recognition without Gazetteers". In Proceeding of EACL'99.
[2] GuoDong Zhou, and Jian Su. 1999. "Named Entity Recognition using
an HMM-based Chunk Tagger". In Proceedings of the 40th Annual Meeting
of the ACL.
[3] Ralph Grishman and Beth Sundheim. 1996. "Message understanding conference
- 6: A brief history". In Proceedings of the 16th International Conference
on Computational Linguistics, Copenhagen.
For more details regarding this work, please contact
Patrice Mellot