This research develops a data set of postings on Craigslist during the mid-November to late December 2010 time period. The purpose of the data set is to support subsequent analysis for evidence of "scams" -- fraudulent postings. The dataset will provide a basis for testing algorithms for automatic scam identification. Such scams are expected to be most common during the high volume period targeted for collection.

Project Report

Online auction and shopping websites, such as Craigslist, are used by millions of users to buy and sell a variety of goods and services worldwide. Internet fraudsters on the lookout for lucrative opportunities to find potential victims have taken note of this phenomenon. Specifically, fraud and spam are prevalent across virtually every section of Craigslist, including automobiles, tickets, housing, jobs, and services. To our knowledge, very little work has been undertaken by the academic research community in studying the prevalence of fraudulent activities on Craigslist. Consequently, hardly any approaches to defend against these ills have been proposed. We used this seed funding to conduct research to alleviate both of these deficiencies. We looked at automobile fraud in the ``for-sale by owner'' section. The fraud posts were targeting many makes and models but certain makes and models were more popular. Fraudsters were posting ads for the same vehicle in many different cities, at close-enough times. They rotated images, morphed the title and text to make their posts look different and almost always offered vehicles at prices much lower than their market prices. Also, the body of their posts sometimes was made only of an image that contained information Craigslist might suggest its users look for, such as phone numbers and addresses. Overall, we felt that fraudsters were taking care to evade Craigslist's warnings. Upon seeking IRB approval, we corresponded with several posters of fraudulent ads. Their reasons for selling the vehicle at a discounted price ranged from military service, divorce and sick children. Personally seeing the vehicle was never an option, for their responses said that the vehicle was parked far away. However, they claimed that they had transportation credit from eBay or a similar credible-sounding source to ship the vehicle without any extra charge to us. We were even told that returning the vehicle in the event we were dissatisfied was an option. We found that only a few of the fraudulent posts were flagged to be deleted by Craigslist. We were able to train a Support Vector-based machine learning classifier (SVM) based on the features we learned of fraudulent posts which was quite successful at finding the rest with a high accuracy. This project opened multiple avenues for future investigation of fraud, spam and automated activities on Craigslist, each of which diminish its utility as an electronic marketplace. Additionally, it funded two graduate students and led to one MS dissertation.

National Science Foundation (NSF)
Division of Computer and Network Systems (CNS)
Standard Grant (Standard)
Application #
Program Officer
Samuel M. Weber
Project Start
Project End
Budget Start
Budget End
Support Year
Fiscal Year
Total Cost
Indirect Cost
Indiana University
United States
Zip Code