Florida Man Headlines Generated With Machine Learning


One of the headlines below is real, the rest are generated by a text generation model. Try to guess the real headline!


florida man tries to scam $350 lottery winner with fake cheque, offers to pay the winner only $5,000.00 instead.

florida man writes to get off scottish road, "i'm going to hell because of you two, but i'm not going to hell because of you two".

florida man finds pipe bomb in fishing gear while fishing off the florida shore. tells police it was "a good smoke".

florida man calls 911 and lies about a woman being shot, because he wanted the cops to help him find his cellphone

Submit Reload

Training The Model


Training data came from scraping headlines from the FloridaMan subreddit. There isn't a ton of data, total about 2000 headlines and I didn't spend any time cleaning or preprocessing due to lack of time. It is very possible there are duplicates, and probably even duplicates in train/test split. But I wanted to get something working quickly and worry about retraining a proper model later.

The model being used is huggingface's GPT-2. There are numerous tutorials on how to get started and how to do traning & inference, and I followed one from the huggingface docs.

Interface


The design and operability of this website was modified from a free template offered by Bootstrap. I am not a web developer and this is actually my first site, so I don't claim any credit on the design, look and feel or operability of this site. I give full credit to the Bootstrap developer(s) and a tutorial I followed that showed me how to get started.

The logic behind selecting the headlines and the 'game' aspect of trying to guess the real headline is my own creation. I'm using flask, mongodb and some very rudimentary HTML + CSS.

Number of correct responses


Incorrect responses means the machine generated model has fooled someone with a fake headline

Correct Responses:

Incorrect Responses:

Future Work


Things I'd like to integrate:

1. Newer text generation models, or the ability to pick which one to use

2. Scrape more data (or find new sources in which to grab headlines for training)

3. Re-train the model and focus on data cleansing

4. Log additional statistics - present the stats in a better way

Any Feedback Or Suggestions?


This was a fun little personal project for me in which I learned a lot. I'm always looking to improve. Any suggestions are welcome.

floridaman@floridamanengine.com