Custom Named Entity Recognition (Part 2)

By Ashwin Venugopal - June 23, 2026

Azure AI Language Project Life Cycle

Creating an entity extraction model typically follows a similar path to most Azure AI Language service features:

Define entities: Understanding the data and entities you want to identify, and try to make them as clear as possible. For example, defining exactly which parts of a bank statement you want to extract.

Tag data: Label, or tag, your existing data, describing what text in your dataset relates to which entity. This phase is vital to accomplish precisely and thoroughly, as any improper or missed labels will reduce the effectiveness of the trained model. A good variation of possible input documents is useful. For example, label bank name, customer name, customer address, specific loan or account terms, loan or account amount, and account number.

Train model: Train your model once your entities are labeled. Training teaches your model how to recognize the entities you label.

View model: After your model is trained, view the results of the model. This page includes a score of 0 to 1 that is based on the precision and recall of the data tested. Here, you can see which entities worked well (such as customer name) and which entities need improvement (such as account number).

Improve model: Improve your model by seeing which entities failed to be identified, and which entities were incorrectly extracted. Find out what data needs to be added to your model's training to improve performance. This page shows you how entities failed, and which entities (such as account number) need to be differentiated from other similar entities (such as loan amount).

Deploy model: Once your model performs as desired, deploy your model to make it available via the API. For example, you can send to requests to the model when it's deployed to extract bank statement entities.

Extract entities: Use your model for extracting entities.

Considerations for Data Selection and Refining Entities

For the optimal performance, you'll need to use both high quality data to train the model and properly defined entity types.

High quality data will let you spend less time tweaking and get better outcomes from your model.

Distribution - use the appropriate distribution of document types. A more diverse dataset to train your model will help your model avoid learning incorrect relationships in the data.

Accuracy - use data that is as close to real world data as possible. Fake data works to start the training process, but it likely will differ from real data in ways that can cause your model to not extract correctly.

Additionally, entities must be well thought out and as clearly specified as possible. Avoid ambiguous entities (such as two names next to each other on a bank statement), as it will make the model difficult to differentiate. If it's necessary to have some ambiguous entities, make sure your model has more examples to learn from so it can distinguish between them.

Keeping your entities distinct will also go a long way in aiding your model's performance. For example, trying to extract anything like "Contact info" that could be a phone number, social media handle, or email address would require multiple examples to accurately teach your model. Instead, try to break them down into more specific entities such as "Phone", "Email", and "Social media" and let the model classify whichever sort of contact information it finds.

Conclusion

We have successfully learnt about Azure AI language’s project cycle as well as considerations for data selection and refining entities.

Search This Blog

Blogs by Ashwin

Custom Named Entity Recognition (Part 2)

Comments

Post a Comment

Popular posts from this blog

Connect Data to Azure Sentinel Using Data Connectors

Azure AI Search plugin in Microsoft Security Copilot (Preview)

Information Protection Scanner: Resolve Issues with Information Protection Scanner Deployment