Wednesday, September 23, 2015

Adventures in Data Science: Labeling Data

Now that my environment is set up, I can finally start doing some coding.

My current project is creating a spam filter for okcupid messages. I have a dataset of about 700 messages as well as the profile of the woman that the messages were sent to.

Setup: The profile is of a straight cis-woman. The messages are from straight men writing first messages to her.

As I mentioned in my last post, I initially had a csv of these messages that I loaded into a MYSQL table.

I have also started thinking about what features I should use to analyze this data and what the end goal is.

Some of the initial features I thought of were:
- length of messages (number of characters or number of words)
- match percentage
- enemy percentage
- keywords

I also started thinking about what type of learning algorithm to use. I decided that I should do some form of supervised learning.

I was initially planning on just labeling the messages as spam or not. However, I quickly realized that I needed another category of messages. This new category I am calling "terrible".

Here are the definitions I am using so far.
Spam: A message that has no reference to the profile of the woman they are messaging. This type of message is often really short, focuses on the woman's looks , or asks lame arbitrary questions ("How are you", "how's your weekend")

Terrible: This message does reference the woman's profile in some way, but typically in a shallow way. This message may focus on the woman's looks, doesn't ask any questions, or is otherwise terrible in some way.

I realize this is all arbitrary labeling, but I have to start somewhere.

You can check out the code for the labeling on my github.

Wednesday, September 16, 2015

Adventures in Data Science: MYSQL on a Chromebook

After a thousand million years, I was finally able to put data into a MYSQL table.

I first tried to follow this tutorial to learn how to use MYSQL with python.

However, I quickly got stuck at the step of:
$ mysql -u root -p
This brought me the first error message:
- "ERROR 2002 (HY000): Can't connect to local MySQL server through socket '/var/run/mysqld/mysqld.sock' (2)""

I followed this down a rabbit hole of other commands and error messages, some of which are listed below.

$ /etc/init.d/mysql
- "Rather than invoking init scripts through /etc/init.d, use the service(8)
utility, e.g. service mysql start
/etc/init.d/mysql: 54: /etc/init.d/mysql: initctl: not found

Since the script you are attempting to invoke has been converted to an
Upstart job, you may also use the start(8) utility, e.g. start mysql
/etc/init.d/mysql: 82: /etc/init.d/mysql: start: not found/usr/bin/service: 123: exec: start: not found"

$ service mysql start
- "start : Unknown job: mysql"

Many hours of research brought me to this page after I searched for "initctl not found crouton": https://github.com/dnschneid/crouton/wiki/Running-servers-in-croutonO

Thr problem apparently transpired because I was running Crouton on a chromebook. Turns out the MYSQL server doesn't start automatically when the chroot boots up. Never would have guessed that.

Once I applied the solution listed on the github site, the MYQL server started running when I start my chroot. Fingers crossed that it will stay like that!

I went back and finished the python tutorial mentioned above. I found it to be really useful for getting me started with mysqldb in python.

I finally got to read in my csv of data, and put it into a mysql table.

I'm very happy that I don't have to deal directly with the csv anymore. I'm hoping the MYSQL database will be easier to handle.

Next time: labeling data.

Wednesday, September 9, 2015

Adventures in Data Science: Setting up a Chromebook

After a year and a half of no posts, I'm back with a new blog series about doing data science on a chromebook.

Recently, I decided to up my coding skills by working on some programming side projects. Unfortunately, my personal computer is a Windows laptop, which is terrible for programming.

During Amazon Prime Days, I purchased a Chromebook with the intention of putting Linux on it.

The model I purchased was the Acer C720-3871 (Amazon).This one has 2GB of RAM, an Intel processor, and 32 Gb of solid state storage.

Stickers so you know I'm a real programmer.

The open chromebook.

For loading Linux, I followed this tutorial from LifeHacker and used Crouton. I chose to load xfce4 on top of ubuntu.

This version is pretty light weight so it might not work for everyone. I'm only planning on using the chromebook for programming so I didn't care about having a pretty interface or graphics.

Once I installed Linux, I set up my environment with:
- git
- ipython
- sublime text
- and more!

Since I'm a scientist, I'm sticking with Python 2.7 right now.

So far, the chromebook has been good enough for me. There are times when the laptop feels slow to respond, but I'm not sure yet whether that's a result of only having 2 GB of RAM or me not remembering how to use Linux.

More on installation issues next time.

Sunday, February 9, 2014

Camera, Camera on the wall, who's the hottest of them all?

If you have ever stood next to another human being, then you probably felt some body heat radiating from her. What you might not realize is that everything emits radiation: you, your dog, your chair. Everything! (see this article: http://discovermagazine.com/2007/jun/life-is-rad).

Explainers play in the infrared camera with ice and hot water. Photo courtesy of Sylvia Algire*

Wednesday, November 20, 2013

Unnecessary Superpowers: Seeing Polarized Light

“Once you see it, you can’t unsee it”*

Earlier this fall, I conversed with my coworker, Julie, about what superpower we would choose. She mentioned that she and another coworker, Rob, were interested in learning how to see different types of light. I immediately grew excited because this seemed like an attainable superpower. I decided to focus on seeing polarized light.

During my search, I even found a website that listed seeing polarized light as a useless superpower. My search ended with Haidinger’s Brush, the manifestation of how humans see polarized light.

Tuesday, January 29, 2013

Science Theories in Practice:

As the evolution/creationism debate continues in the US, one argument that creationists repeatedly use is that evolution is just a theory, not something that is actually proven (Creationist arguments: http://www.talkorigins.org/faqs/faq-misconceptions.html).

Unfortunately, people who make this argument don’t understand evolution (topic for another post) or scientific theory (http://en.wikipedia.org/wiki/Theory#Scientific_theories).

First, what is a scientific theory?

A theory is a set of statements that explains a group of observations; usually the statements have been widely tested and accepted. (In physics, either the theory or observations can come first.)

Wednesday, January 2, 2013

Best in Show!

Scientists often discuss (or bicker) about the order of authorship on papers. “I got a first-author paper published!” or “I was only second author on the paper.”

What do such phrases mean?

The first author on a paper is supposed to be the person who worked the most on the paper. Usually, the researcher who did the research of the paper, not necessarily the person who did the most writing. (See Lead Author, http://en.wikipedia.org/wiki/Lead_author). Often, this is the graduate student who worked on the project.