Something has been bothering me about Data Science Central

So, what I’m about to write about actually occurred a few months ago, but I am reminded of it every day when I receive an email from Data Science Central or see someone tweet an article from the blog network (which includes Analytics Bridge, Big Data News, etc.), so I figured if it’s still bothering me, it’s worth writing about.

In April, I saw a post by Vincent Granville, owner of and primary author at Data Science Central, which said something like

One way we attract women and minorities to Data Science Central is to create accounts that post articles with female profiles and photos, which are not actually written by women. Can you use data science to find these 5 faux bloggers? The winner will receive $500.

I have to try to remember the original post and paraphrase here (I’m sure this is not close to the original text, but I hope I am capturing the message), because the post has now been modified to appear as if the contest was only to find the one “example” fake account with a camel avatar (current version here).

However, you can tell that the original contest was different based on the submission by “Alton” on the site, who was nice enough to hold back the names of the accounts he found in case they weren’t decoys, but was clearly trying to find more accounts than just the “camel” decoy. Below is a screenshot in case it gets modified on the site.

Alton_DSC_comment

My initial response to the original post with the fake female bloggers was originally, “How many sites do this? Is this sexist? It sure is off-putting for this guy Vincent Granville to post articles under fake accounts, pretending to be a woman or underrepresented minority. Do we actually fall for this kind of thing? Is it a widely accepted practice?” I scrolled down and saw that Cory Teshera had posted a comment questioning the practice, and responding incredulously to the approach as well. I posted about it on twitter, showing my surprise and asking questions about the approach (Including her name and tweets here with her permission):

No one had responded, then I checked back to the post and saw that Cory’s comments had been deleted! I couldn’t believe that I was seeing a post talking about attracting women to the site, but the first woman to comment on the approach was being silenced!

I found Cory on Twitter and asked whether she was the one that posted and whether she had deleted the comment or the site had, and she responded:

Then I let her know I might blog about it and we chatted a bit via tweets and DMs. At this point, the blog post had been modified to remove all reference to the practice. Neither of us had received responses from the site. The only response was the “silent” deletion of comments and editing of the contest post.

I was curious at this point, and started browsing Data Science Central to see if I could find any of these fake female accounts. I didn’t have to use any data science methods to find one right away. I just looked at the top featured posts, clicked on one with a female avatar, and found this article:
Good and Not So Good Companies for Data Scientists
Here is Amy’s profile: http://www.datasciencecentral.com/profile/Amy
I see “she” is blogging heavily now since I last looked. She lists no last name, so I can’t look her up anywhere else that way, but I did an image search on Google and found: “Amy” Image Search Results
Amy is spending a lot of time posting on all of the Data Science Central network sites. On the Hadoop360 site, she is listed as “Amy Cordan“. At this point, I was still holding out a glimmer of hope that Amy could be a real woman, and looked her up on LinkedIN. I would have been happy to find out she was actually working for Data Science Central as a writer. Oh look! There is an Amy Cordan on LinkedIN who is listed as a “Data Scientist” with a PhD in Computer Science from Stanford! Her photo looks a little different though… she sure has a lot of endorsements… but she only has one experience listed… “Co-Founder for Data Science Foundation”… let’s check out their site… DataShaping.com. Uh… well, this is clearly a dummy site, and the email address is apparently Vincent Granville’s. I did find this “staff” page, which strangely doesn’t list “co-founder” Amy. It appears the whole profile, including the sparse LinkedIN profile page with the Stanford PhD but no experience other than working on a data science blog, is totally fake.

Anyway, you get the point. Amy does not appear to be a real woman. What really got me is that there is an apparently off-topic response to “Amy”‘s post (linked above) by Vincent Granville about how “Amazon should hire people to improve security on AWS and deal with fake reviews.” Excuse me, mister… you are replying to a post on your own site, which was written by a fake author, which was probably written by you! How hypocritical.

At this point I was totally turned off from Data Science Central, so if the intent of these fake profiles was to attract women to the site, it definitely backfired for me.

Here are my questions now. How many of the females on the site are actually real? Are there many women and minorities joining the site, and are they influenced by these fake accounts falsely making it appear as if more females are participating than actually are? Is this a common practice among technology networking websites? Does it work? Should we accept it as necessary? Has Vincent Granville made any real effort to ask females to write for Data Science Central?

Do the people endorsing “Amy”‘s LinkedIN profile know it is fake? Are they all fake profiles that Vincent Granville created and had endorse each other?

I have so many questions, and am not coming up with many satisfactory answers myself, other than feeling sad and put off by it all. Please let me know what you think!

(P.S. If you’re reading this, Mr. Granville, posts like this that say things like “The first to prove or disprove our conjecture will win $500 and will have his name associated with the theorem in question” aren’t helping you attract any female readers.)

As for me, it has left a bad taste in my mouth, and I’m currently not retweeting anything that I recognize as being from the Data Science Central network, because I just can’t trust anything it produces at this point.

If anyone wants to do any analysis on the site posts, I’m sure there are algorithms out there that can determine whether they’re likely to have all been written by the same author (same typos, style, etc.). I know there is also this analysis tool which is supposed to be able to tell whether a male or female likely wrote a clip of text: Text Gender Classifier (h/t Paul Marks)
The software is of course not perfect, but the text from “Amy”‘s short article above came out to 68% likely to be written by a male, while this article you’re reading right now was classified as 65% likely female.

I guess I’m surprised at how little “sleuthing” I needed to do to see right through all of this. I didn’t spend hours poring over the site, I clicked on the first article I saw with a female author photo, and researched that author’s profile using Google. It’s practically out there in the open, and since Mr. Granville posted the contest – which has now been edited – to identify these faux bloggers, it appears he wasn’t trying to hide the practice.

And by the way, though Vincent Granville apparently has trouble finding females in Data Science to write for his blog, they do exist and aren’t hard to find on twitter or LinkedIN. I’ve started following the data science women I find on Twitter using a twitter list (Please suggest more in the comments!):
Women in Data Science Twitter List

Also check out Meta Brown’s “Binder fulla Women in Analytics” posts on LinkedIN!

4 Comments

  1. Alton
    Jul 7, 2014

    I’m glad I’m not the only one who has had this on their mind. Normally I’m a live and let live kind of guy but it has bothered me too. However ultimately I think that Vincent has good intentions and that his efforts are helping many but sometimes ends don’t justify the means.

    I thought it was a little arrogant to try the community on such a sensitive topic but as a good data scientist, when I have the time, I’m usually up fora good challenge. From the comments you’ll notice that Vincent agreed to reward me the prize. Because I was deemed the winner I took screen shots to capture the agreement and hold him accountable. I didn’t think they would come in handy but since he hasn’t shown the willingness to live up to his side of the deal then I don’t feel guilty for exposing his strange practices regarding fake profiles both on his own network and several other networks including LikedIn.

    The screenshots are located at https://drive.google.com/file/d/0B4JAreDAupYgb0ZEcFNMdHM0YU0/edit?usp=sharing
    https://drive.google.com/file/d/0B4JAreDAupYgeG1iYlZzQWxQb28/edit?usp=sharing

    Also if you are curious. To solve this problem I used a classification method on a feature set of all users of his websites. The feature set included things like time since joined, number of posts, number of likes, favorite website, number of comments, and then the frequency and nature of each of those including common text analytics like word count.

    Note that I used this feature set to also run another post which was deleted but seems to be cached at google http://webcache.googleusercontent.com/search?q=cache:ju9MLUdF3JoJ:www.datasciencecentral.com/xn/detail/6448529:Comment:162518%3Fxg_source%3Dactivity+&cd=1&hl=en&ct=clnk&gl=us

    Last I cant find the complete data set that I used but I do have a portion of it saved for those interested in exploring these users: https://dl.dropboxusercontent.com/u/96237511/recent_DSC_members.csv

    I only spent a couple of hours on this (the bulk of it spent collecting and cleaning the data) so naturally I expected my results to be wrong. Still, here is the list of individuals that I originally found having high probability of being fake according to the training metric and feature set I had.

    http://www.datasciencecentral.com/profile/Amy

    http://www.datasciencecentral.com/profile/Alesia

    http://www.analyticbridge.com/profile/DorothyHewittSanchez

    The one with the camel icon:
    http://www.analyticbridge.com/profile/Titus

    Thanks again for your great journalism and for using best practices while paving the way for many aspiring data scientists.

    Sincerely,
    Alton Alexander
    @10altoids

  2. ERose
    Jul 11, 2014

    I’ve noticed a very similar practice on social media sites although it appears to be more of a ploy to reach younger demographics than necessarily women or people of color.
    It seems to be commonly used as a tool in politics – ie: one side or the other in a particular election will create fake profiles to engage in the political discussions on Facebook.
    Luckily, enough goes into an authentic online presence that aping one realistically becomes pretty difficult after about 2-3 weeks, especially if anyone does even pretty basic Google work to check you out – as you prove here.
    Anything along these lines definitely turns me off whoever does it.

    On a feminist note here, I wonder how many people have tried to contact “Amy” or “Alesia” as speakers for an event and took their inevitable refusal as evidence that it’s hard to find female speakers? I wonder how many people have seen their LinkedIn profiles and took their lack of experience as evidence that women don’t bother to get involved in the field? I am even less cool with the legitimate potential harm to real diversity efforts than I am with someone attempting to manipulate me with a false effort.

    • Ellie Kesselman
      Sep 20, 2014

      I have been active on Analytic Bridge since 2007. I thought his Ning websites might be useful in my career and job searches. I have education in operations research and math and work in quantitative risk analysis. I didn’t care for Vincent Granville’s anti-vaccination attitude, nor his dislike of the US Census Bureau, both of which he stated on Analytic Bridge. These are my profiles on his websites, and yes, I am real, not a Granville figment!

      http://www.datasciencecentral.com/profile/LKW
      http://www.analyticbridge.com/profile/lek

      I was quite angry about the Titus fake profile, as I had interacted with Titus (him?), not realizing. I thought he was real. Now I feel like a fool.

      As for Amy, I became suspicious in March 2013 when I tried to contact her, and realized she didn’t seem to exist. I came to the same conclusion about Granville’s Amy Cordon today. This post was the third search result returned foe her name by Google! I am in agreement with author Renee and E Rose. This dishonest behavior by Granville is unprofessional, regardless of whether or not it pertains to women. Titus was an elderly man, I thought. The admin of a professional group shouldn’t deliberately deceive members, then arrogantly play guessing games with identity. Talk about losing trust! That he would do this with women for his absurd reasons, including creating fraudulent profiles on LinkedIn is contemptible and violates LinkedIn Terms of Service, at a minimum.

  3. Renee
    Sep 21, 2014

    Thanks for your comments, everyone. I appreciate your additions to this conversation!

    Ellie, your comment about trust is an important one. I think a key factor that will help bring more women into tech is knowing they can trust the people they’re working with to have their best interests in mind. When a major website in the field is faking female profiles in order to appear more diverse, it breaks that trust and drives women away from what could otherwise be a valuable networking resource.

    I’ve been seriously thinking about starting my own site that can serve as an alternative to Data Science Central after I finish grad school (in the spring).

Submit a Comment

Your email address will not be published. Required fields are marked *