Understanding Persuasion in Fraud Transcripts

An interesting question about a lot of advance-fee and other fraud, where the victim has to trust a relative stranger, is how exactly the conman persuades the victim to do this. Of course, there is usually a case of material interest presented to the victim, but this isn't a sufficient explanation -- people know scams exist, may even have been scammed before themselves, but still are talked into handing over money. From a linguistic and psychological perspective, this is interesting for building detection systems and coming up with intervention strategies.

Literature Review

The scope is constrained to advance-fee fraud emails and the important elements of their content, victims and perpetrators as they pertain to the persuasion of the victim. The lack of standard terminology is unhelpful -- authors will refer to 'spam' or 'phishing' and sometimes include advance-fee fraud.

Annotated Scambaiter Corpus

Scambaiters are people who like to waste the time of advance-fee scammers. Typically, they respond to a mass-marketed solicitation, pretending to be a duped victim, and then lead their scammer through a series of ridiculous hurdles and delays, often trying to get them to provide some token gesture like taking a ridiculous picture of themselves. Though they occasionally forward information to the authorities about scammer bank details, hosting providers, etc., they for the most part are vigilantes aimed at low-level disruption of advance-fee operations, and their own amusement.

The latter part is quite important, because it means that the scambaiters release full transcripts of their conversations with scammers. While various researchers have collected some examples of scam solicitation messages, relatively little attention has been given to the rest of the exchanges between scammer and victim, in part because this data is difficult to obtain. Scambaiters are not victims -- in many ways they more resemble the scammers themselves -- but their transcripts nonetheless give some insight into how these conversations proceed.

As well as working with some limited victim data (which cannot be released), I have been collecting scambaiting exchanges from various online sources and processing them into a common format. The main hurdle in any of this is the varied, sometimes unclear formatting that scambaiters use, which makes it hard to write automatic processing or do any non-manual analysis. So far I have worked on data from:

Other data: