So, you graduated college, spent half a decade in a quantitative PhD program, maybe took a postdoctoral fellowship….and now what? You may want to stay the course and pursue a tenure track faculty position at a university–or, you may not. While the pressure to remain within the academy post PhD can at times feel overwhelming, if you’re in the latter group and are considering making the switch from academia to industry, know that you’re not alone.
In fact, nearly 50% of the Data Science team at Wayfair hold PhDs and once made this same transition.
*As of 10/31/18
Our team came to Wayfair with various academic backgrounds (from cognitive science to economics to nuclear engineering) and had different reasons for pursuing a career outside the academy (a desire for job security, better funding, and a faster tempo of development being some of the top). But once they made the decision to leave academia, all faced the same questions: How can I jump start my new career in Data Science? What skills or experience do I need to succeed?
Wayfair Data Science recently hosted a PhD Networking Event to aid current PhD students or postdocs in the same position. Over 100 local PhD students registered for the event, which featured lightning talks by Zephy McKanna (Data Scientist, Wayfair) and guest speaker Amy Winecoff (Data Scientist, True Fit Corp), a panel of PhDs from the Wayfair Data Science team who shared advice on making the transition, and of course food, drinks, and networking.
After the event we spoke to our panel (Stephanie Sorenson, Hussain Karimi, Robert Yi, and Rudi Natarajan–moderated by Senior Data Science Manager and fellow PhD Licurgo Almeida) and consolidated their feedback into their Top 10 Tips for breaking into Data Science as a PhD.
Check them out below!
Our Top 10 Tips for Transitioning from Academia to Data Science
Part One: How to Develop the Right Skills
1) Learn/practice Python
“The most important thing you can do to prepare for the transition into data science is to switch over to python if you haven’t already. To get started with machine learning, I recommend Python Machine Learning (2nd edition) by Sebastian Raschka.”
2) Learn/practice SQL
“Many PhDs/postdocs haven’t used SQL, but it’s a critical skill for a data scientist. If you’ve used pandas in Python, or dplyr/tidyr in R, a lot of the concepts are similar; putting in just a few hours a week doing SQL tutorials/practice problems can help you prepare.”
3) Take/audit classes on Statistics, Computer Science, and Math
“If you have time to attend classes at your university, this is a great way to pick up new skills (or refresh old ones). There are several online courses too (e.g., Coursera, Khan Academy). Statistics (e.g., basic stats, probability, data mining), Computer Science (e.g., intro to CS, machine learning), and Math (e.g., linear algebra) are a few useful areas.”
4) Be careful which resources you use
“I used to give (and get) the advice: “Try to read through as much of The Elements of Statistical Learning as you can,” but in retrospect that’s generally bad advice.
To be completely honest, machine learning algorithms are not conceptually difficult, but the rigor of many textbooks make the learning curve feel very steep. If you spend 2 hours reading about neural networks, for example, and do not understand that they are simply functions of linear combinations of functions of linear combinations (and so on) of some variables, whatever you’re reading is not doing a good job. There’s no need to know about “neurons” and “activation functions” – you just need to know first what the core algorithm is, then, later, the details of its implementation and how it is adapted to solve problems in different domains.”
5) Explain aloud how things work
“Get a list of algorithms/tools/concepts in machine learning that you need to know, then try to explain out loud how each thing works. This will bring up a number of questions you don’t know the answer to. Google the questions (even Youtube videos can often be quite good!), try to work out the math yourself, and/or refer to a textbook like ESL (or my personal favorite, Pattern Recognition and Machine Learning by Christopher Bishop). Then repeat (this is known as the Feynman technique). This will not only give you deep understanding fairly quickly, but it will also prepare you for interviews by forcing you to sequentially ask and answer difficult questions.”
6) Sharpen your skills at a bootcamp/training fellowship
“Programs like Insight Data Science and other incubators/bootcamps can be good resources for making the transition from academia to data science. You can also create your own study group with other students/postdocs interested in data science careers, and practice white boarding and giving each other mock interviews.
Part Two: How to Practice Those Skills
7) Incorporate ML methods into your research and/or work on side projects
Why is this important?
“Once you have some experience writing code, getting hands-on experience in applying machine learning, statistics, and/or causal inference in real-world data sets is the best way to understand what data science is like.”
“Generating your own idea, and then getting/cleaning the data, doing some EDA, modeling, and thinking through the results/impact of your model is a super valuable experience. Doing a project is also a great opportunity to practice using common data science tools (e.g., pandas, sklearn, GitHub).”
How do you find/choose data?
“Kaggle is a good place to find data sets, along with shared notebooks that have quality ratings. The top rated notebooks, in my experience, usually have clever approaches to data cleaning and modeling.”
“There are several common data sets available (mnist, cats vs. dogs, IMDB reviews, etc.), but instead of just picking one, try to form a project around a question that you care about, then study out how applications of ML algorithms would help answer it; Not only will it be more interesting for you, the project will carry more weight in discussions with future employers.”
“Consider if there are any interesting questions you can ask with publicly available data (e.g., https://data.boston.gov/)? Any “data science for good” volunteer projects you can get involved with?
Part Three: How to Prepare for the Job Market
8) Find out if industry is right for you, with an internship/immersion program
“If you’re unsure you want to make the jump into data science, getting hands-on experience is a great way to find out if it’s a good fit. Many companies offer internships, where you can spend a couple of months actually doing data science. Wayfair also has an immersion program, where PhDs/postdocs can spend a week learning more about the day-to-day life of a data scientist.
9) Determine what sets you apart
“If you are close to interviewing for data science positions, it also helps to understand what skills you are bringing and how that might separate you from the rest of the competition. One book I recommend for anyone making a career transition (even though it’s geared towards people applying for MBA programs) is Avi Gordon’s MBA Admissions Strategy. In particular, I’ve found that adapting his profile building approach really helps to identify what skills you bring to the table as a future data scientist. Once you have these outlined, it’s a matter of demonstrating them in the interview.”
10) Use your network
“Talk to friends, colleagues, and friends of friends who are data scientists. Having informal “informational interviews” is an excellent way to learn more about what types of problems data scientists work on, what tools they use, culture at different workplaces, etc. This can help you figure out what types of data science roles might be a good fit for your background and/or interests. Meetup groups can also be a great way to get involved in local data science communities (e.g.,https://www.meetup.com/socialdatascience).
Additional Recommended Resources
- fast.ai is an invaluable resource for gaining an understanding of how to use ML methods. Their ML for coders and deep learning for coders are fantastic.
- The online probability and stats textbook www.probabilitycourse.com
- Andrew Ng’s Machine Learning Coursera course
- An Introduction to Statistical Learning
- Mode Analytics SQL Tutorial
- SQLZoo tutorials
- Interview Cake programming questions
- The O’Reilly data science books (with sketches of animals on the covers) are a great overview of various DS topics, from types of algorithms/problems data scientists work on, to doing data science in Python or R.
Not all tips will be necessary for all people–these are just a few ideas to get you started. Good luck exploring the Data Science world! Stay tuned for future events from Wayfair Data Science, or check out our jobs page for open positions!