<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Sarel, a Pragmatic Programmer &#187; Ideas</title>
	<atom:link href="http://blog.botha.us/sarel/category/ideas/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.botha.us/sarel</link>
	<description></description>
	<lastBuildDate>Thu, 14 Jan 2021 12:46:10 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>https://wordpress.org/?v=4.2.38</generator>
	<item>
		<title>Fighting Spam</title>
		<link>http://blog.botha.us/sarel/fighting-spam/</link>
		<comments>http://blog.botha.us/sarel/fighting-spam/#comments</comments>
		<pubDate>Sat, 16 Jun 2007 17:28:51 +0000</pubDate>
		<dc:creator><![CDATA[sarel]]></dc:creator>
				<category><![CDATA[Ideas]]></category>

		<guid isPermaLink="false">http://blog.botha.us/sarel/?p=13</guid>
		<description><![CDATA[I&#8217;m noticing some spam messages these days that contain at the top the ad of what they&#8217;re trying to sell to you. Then the rest of the message is a copy and paste of some legit content, sometimes very technical. I have an idea that may help in coping with these. Most emails out there [&#8230;]]]></description>
				<content:encoded><![CDATA[<p>I&#8217;m noticing some spam messages these days that contain at the top the ad of what they&#8217;re trying to sell to you. Then the rest of the message is a copy and paste of some legit content, sometimes very technical. I have an idea that may help in coping with these.<br />
Most emails out there are are of one of two forms:<br />
1. They cover a lot of different themes but at a very high level.<br />
2. They cover a specific theme and use some low-level terms.</p>
<p>Basically, you wouldn&#8217;t find the terms colonoscopy and TCP in the same email and if you do then something is probably up.</p>
<p>What if you build a dictionary of words found in e-mails, then assign both a level to the word and a theme. The level would be 1 for high level terms such as &#8216;doctor&#8217; or &#8216;computers&#8217; and 2 for lower level terms. Some themes could be medical, and computers. If you find level 1 words from different themes in an email that&#8217;s ok. However, if you find level 2 words from different themes in the same email then something is probably up.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.botha.us/sarel/fighting-spam/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
