Tuesday, February 28, 2012

Go Strata! Go DATA!

Today I finally walked in the Strata Conference for Data (and thank God that I live in California now.) I was quite excited about this, because there are tons going on in this conference. And people won't think you are a nerd, when you express your passion on ... DATA. Well, in my mind, the entire universe is a big dynamic information system. And what's floating inside the system? Of course, the data! And knowing more about data essentially helps people understand the system better, the universe better! It's so importance that it will become bigger and bigger part of your life. And maybe someday people will think data as vital as water and air :)

Anyway, today is the training day of Strata. I chose the 'Deep Data' track. The speakers were all fantastic! It's a great opportunity to see what others actually do with data and how they do it, instead of the tutorial sections where people just talk about the data. The talks I enjoyed the most are Claudia Perlich's 'From knowing what to understanding why' (she really has no holdout on the practical data mining tips. And I like the fact that she baked a lot of statistics knowledge into problem solving, which in my mind is missing on some of the data scientists. And I really like the assertive attitude when she said 'I will even look at the data, if somebody else pulled it'.), Ben Gimpert's 'The importance of importance: introduction to feature selection' (well, I always like these type of high level summary talks.), and, Matt Biddulph's 'Social network analysis isn't just for people' (the example that most impressed me is he used the fact that developers often listen to music while they write their code, so there is a connection between music and the programing language. Something that seems totally unrelated got brought into the wok and cooked together. Besides, he had some cool visualization using Gephi.)

At the end of day, there is an hour long debate between leading data scientist in the field (most of them came or come from Linkedin). The topic was 'Does domain expertise matters more than machine learning expertise?', meaning when you trying to assemble a team and make hire, do you have the machine learning guy or the domain expert? I personally vote against the statement, and I think the machine learning expertise matters more when I try to make the first hire. Think about it this way: when you have such an opening, you, the company should at least have idea about what you trying to solve (unless you are starting a machine learning consulting company, in which case the first hire better be machine learning people). So at that time, you already have some business domain experts inside your company. Then bringing in data miners will help you solve the problem that a domain expert couldn't solve. For example, your in-house domain expert could complain about data not very accessible, or too many predictors they don't know how and which one to look at. A machine learning person hopefully could provide advice on data storage, data processing, and modeling knowledge to help you sort out the data into some workable format, and systematically tell you that you are spending too much time on the features that do not make any difference and some other features should get more of your attention. To me, it's always an interactive feedback system between your data person and your domain expertise. And it's the way of thinking about business problems systematically in an approachable and organized fashion that values the most, not necessarily how many models or techniques that machine learning candidates knows.

Overall, Strata is a well-organized conference, that I want to attend every year!

No comments:

Post a Comment