Intel

AIKIDO-2026-10401

pythainlp is vulnerable to Deserialization of Untrusted Data

Deserialization of Untrusted Data Pre-CVE
Found by Aikido Intel before public disclosure or CVE publication.
Published Mar 19, 2026

85

High Risk

This Affects:

PYTHONpythainlp
0.0.1 - 5.3.0
Fixed in 5.3.1
Are you affected? Scan for Free

TL;DR

The library loads model and vocabulary data from pickle files in thai2fit and w2p (and corpus loading paths). Loading pickle allows arbitrary code execution when the file content is untrusted or attacker-influenced (e.g. via a malicious corpus or model file). Before the fix, an attacker who could supply or influence a pickle file could achieve code execution in the process. The fix removes pickle from these code paths: thai2fit uses JSON for vocabulary and w2p uses npz for models, and corpus loading validates fields before processing.

Who does this affect?

You are affected if you are using a version that falls within the vulnerable range.

Background info

pythainlp is vulnerable to Deserialization of Untrusted Data in versions 0.0.1 - 5.3.0.

How to fix this

Upgrade the pythainlp library to the patch version.