Download
BayesiaLab 4.4


Dynamic Presentations
of BayesiaLab


Static Presentations
of BayesiaLab


News


Search

 Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools Bayesian Networks and Data Mining Tools 

BayesiaLab: Application Examples of Bayesian networks

18 good reasons to use Bayesia's technologies for your marketing problems
Mining the Customer Data Base
Fraud detection
Satisfaction questionnaire analysis 
Experience feedback exploitation 
Modeling and simulation of complex systems
Intrusion Detection
Text Mining
DNA Microarrays analysis
Health Trajectory analysis
8 good reasons to use Bayesia's technologies for your risk management
Global Risk Analysis and Security Policy

Bayesian Networks for Organization Name Identification

Patrice Mellot
Consultant

Set-up of dedicated dictionaries like, for example, list of organizations or other named entities, is a time consuming process. Even when such material is available, maintaining it up to date could be a very tedious task. Studies published in the specialized literature [1, 2] shown that such list are not needed to achieve named entities recognition with a good accuracy.

Mikhev et al. [1] have described that their system, which displays a P & R score of 91.5% for organization name identification (using dedicated gazetteers), obtains a 85.5% P & R score without using the specialized lists.

Evaluation measures for Information Extraction Systems [3]

According to this study, we report a company name identification system that combines rule-based grammars with Bayesian Networks. This system, which does not use any companies name list, is part of a resume analyzer serving one of the world’s foremost professional recruitment firms.

The goal of this system is to extract companies' names that are displayed in the career section of the resume.

Process description

The word that can be part of a company name are identified and tagged by an automatically learned Bayesian network. Those words then are used as seeds for specialized grammar based rules that find the companies full names.

Bayesian Network construction

Each word is characterized by using 21 variables that describe:

  • The word structure (uppercase, mixed case, lowercase)
  • Word appear in a common word dictionary
  • Part of speech information about the word
  • Preceding and following words information’s
  • Statistical data about the line of text where the word appears.

A corpus made of 8000 resumes has been used to constitute learning and testing sets that have been exploited by BayesiaLab to automatically learn the Bayesian network illustrated below with the “Sons&Spouses” supervised learning algorithm.

The performance evaluation on the test set returns the following results:
Recall = 84%
Precision=72%
P&R=77.5%

Rule based analysis

Specialized grammar based rules are used to identify the full company name using as seeds the words that have been tagged by the Bayesian network.

Current system performances are given bellow:
Recall = 75%
Precision=80%
P&R=77.3%

References

[1] Andrei Mikheev, Marc Moens, and Claire Grover. 1999. "Named Entity Recognition without Gazetteers". In Proceeding of EACL'99.
[2] GuoDong Zhou, and Jian Su. 1999. "Named Entity Recognition using an HMM-based Chunk Tagger". In Proceedings of the 40th Annual Meeting of the ACL.
[3] Ralph Grishman and Beth Sundheim. 1996. "Message understanding conference - 6: A brief history". In Proceedings of the 16th International Conference on Computational Linguistics, Copenhagen.

 

For more details regarding this work, please contact Patrice Mellot

© 2001-2008 Bayesia SA.
All rights reserved.