## A Taxonomic Literature Review of Fraud Detection & Prevention

Within each paper found and included, I'm looking particularly for descriptions of techniques applied to either detect or disrupt fraud operations, and evaluations of their effectiveness. I have a secondary interest in locating any data sources that might be available. Studies that only characterise crime or estimate damages are out of scope.

Beals et al., 2015 of the Financial Fraud Research Center at Stanford provide a systematic categorisation system for dealing with fraud, including a standardised coding scheme. Their definition of 'fraud against an individual' for the most part overlaps neatly with that I'm working with for 'mass-marketing fraud'. This taxonomy provides an ideal basis for systematically categorising the literature, providing some structure for further research. I will be omitting the leading '1' from the codes here, and will only use the additional tags where they are relevant. Studies are grouped below according to their Level 2 categorisation.

1. Consumer Investment Fraud - where someone knowingly misleads an investor using false information about securities, commodities etc.

2. Consumer Products and Services Fraud - fraud related to the purchase of tangible goods and services, with no intention of delivery.

• 2.1.1 Worthless products -- Leontiadis et al., 2011: Measuring and Analyzing Search-Redirection Attacks in the Illicit Online Prescription Drug Trade.

Detection/Prevention - As part of a characterisation study, the authors check three blacklisting services (Google Safebrowsing,SpamHaus and McAfee SiteAdvisor) against their own list of infected sites maliciously redirecting traffic to drug spam. They find that 95% of the infected sites do not appear on blacklists, though about half of the targets, and two thirds of pharmacies in particular, appear on at least one blacklist. The authors note that certain TLDs, such as .edu, are more persistent, and should be targets for remediation, and further note that particular domain registrars and hosts are responsible for much of the traffic.

• 2.1.1 Worthless products -- Levchenko et al., 2011: Click Trajectories: End-to-End Analysis of the Spam Value Chain.

Detection/Prevention - Following a characterisation of the infrastructure behind online pharmaceuticals, software and replica sales campaigns represented by roughly 15 million URLs gathered by honeypots and followed up with purchases, the authors compare the efficacy of interventions at the domain registrar, hosting provider and bank. They note that ~10% of observed domains and hosts could be eliminated by one provider, but 60% of services could be hit by interventions at a single bank, with only 13 distinct payment processors making up a bottleneck which is ripe for enforcement. They also point out that the switching costs for banks are much higher than domains or hosts.

• 2.1.1 Worthless products -- McCoy et al., 2012: Priceless: The Role of Payments in Abuse-advertised Goods.

Detection/Prevention - building on Levchenko et al., the authors detail the payment infrastructure supporting spam, and describe interventions aimed at cutting off the flow of money to fraudsters. They continue the process from Levchenko et al. of buying goods, placing a total of 676 purchase orders, 429 successful, against pharmaceutical and software sales merchants, enabling them to trace financial enablers. Their payments were processed through 30 banks, 25 of which seemed to be in active participation with the scam affiliate programs. Alongside this, they report on the impact of measures taken by industry and government to enable targeted complaints via the International Anti-Counterfeiting Coalition, which then identifies merchants and manages processes through the card networks. They find evidence that complaints enforced in this manner are effective, with banks targeted by complaints less likely to continue processing transactions for spam affiliates, and confirmations from affiliates that these efforts are disrupting business.

• 2.2.9 Phishing -- Grier et al., 2010: @spam: The Underground on 140 Characters or Less.

Detection/Prevention - Multiple studies on a large dataset of Twitter accounts and shared URLs:

• The authors evaluate whether URL blacklists are an effective means of blocking Twitter spam. They measure the time between a malicious URL appearing on a blacklist (of Google Safebrowsing, URIBL, and Joewein) and it appearing on Twitter. They find that the blacklists lag behind Twitter by 4-20 days, while 90% of click-throughs occur during the first two days after a link is posted, so conclude URL blacklists would be ineffective. They also describe evasion techniques scammers might use, pointing out that 98% of the observed malicious tweets (39% of distinct malicious URLs) are shortened URLs which are not caught by Google's Safebrowsing system.
• The authors characterise the accounts behind the spam. They distinguish between 'career' spam accounts and compromised accounts repurposed for spam, based on the mix of benign and malicious content. They attempt to classify the former group based on a combination of a chi-squared test on the tweet timestamps and the entropy of tweet text and links (i.e. looking for repetitive posts at predictable times). Evaluating on 43,000 accounts which posted at least one spam link, they find 16% to be career scammers, but in manual evaluation find that a further 5-12% (CI=95%) of bad accounts are clearly career spammer accounts, indicating at best a recall of 0.76 for this method.
• The authors deploy a clustering method to aggregate malicious accounts into campaigns. Every account which shares a link is clustered into one campaign, iteratively until there are no more links. 10-20% of campaigns used more than one account by this measure.
