Fighting Spam

I’m noticing some spam messages these days that contain at the top the ad of what they’re trying to sell to you. Then the rest of the message is a copy and paste of some legit content, sometimes very technical. I have an idea that may help in coping with these.
Most emails out there are are of one of two forms:
1. They cover a lot of different themes but at a very high level.
2. They cover a specific theme and use some low-level terms.

Basically, you wouldn’t find the terms colonoscopy and TCP in the same email and if you do then something is probably up.

What if you build a dictionary of words found in e-mails, then assign both a level to the word and a theme. The level would be 1 for high level terms such as ‘doctor’ or ‘computers’ and 2 for lower level terms. Some themes could be medical, and computers. If you find level 1 words from different themes in an email that’s ok. However, if you find level 2 words from different themes in the same email then something is probably up.

