I’m noticing some spam messages these days that contain at the top the ad of what they’re trying to sell to you. Then the rest of the message is a copy and paste of some legit content, sometimes very technical. I have an idea that may help in coping with these.
Most emails out there are are of one of two forms:
1. They cover a lot of different themes but at a very high level.
2. They cover a specific theme and use some low-level terms.
Basically, you wouldn’t find the terms colonoscopy and TCP in the same email and if you do then something is probably up.
What if you build a dictionary of words found in e-mails, then assign both a level to the word and a theme. The level would be 1 for high level terms such as ‘doctor’ or ‘computers’ and 2 for lower level terms. Some themes could be medical, and computers. If you find level 1 words from different themes in an email that’s ok. However, if you find level 2 words from different themes in the same email then something is probably up.