[翻译]通过Metromap成为数据科学家

课程演讲  收藏
0 / 799

RoadToDataScientist1.png

Becoming a Data Scientist – Curriculum via Metromap

July 8, 2013 By Swami Chandrasekaran
数据科学,机器学习,大数据分析,认知计算……。好吧,我们所有人到处都有文章,技能需求信息图以及关于这些主题的观点*(打哈欠!)*。有一件事是肯定的; 您不可能在一夜之间成为数据科学家。它的旅程,当然是具有挑战性的。但是你如何成为一个数据科学家呢?从哪儿开始?您什么时候开始在隧道尽头看到光?什么是学习路线图?我需要知道哪些工具和技术?当您实现目标时,您将如何知道?

Data Science, Machine Learning, Big Data Analytics, Cognitive Computing …. well all of us have been avalanched with articles, skills demand info graph’s and point of views on these topics (yawn!). One thing is for sure; you cannot become a data scientist overnight. Its a journey, for sure a challenging one. But how do you go about becoming one? Where to start? When do you start seeing light at the end of the tunnel? What is the learning roadmap? What tools and techniques do I need to know? How will you know when you have achieved your goal?

鉴于可视化对数据科学的重要性,具有讽刺意味的是,我无法找到(除少数几个之外)务实而又直观的方法来表征成为数据科学家所需要的内容。因此,这是我适度尝试创建的课程表,一项学习计划,可以在此过程中用作数据科学家的旅程。我从地铁地图中汲取了灵感,并用它来描绘学习路径。我将总体计划逐步组织到以下领域/领域,

Given how critical visualization is for data science, ironically I was not able to find (except for a few), pragmatic and yet visual representation of what it takes to become a data scientist. So here is my modest attempt at creating a curriculum, a learning plan that one can use in this becoming a data scientist journey. I took inspiration from the metro maps and used it to depict the learning path. I organized the overall plan progressively into the following areas / domains,

  1. Fundamentals 基本原理
  2. Statistics 统计数据*
  3. Programming 程式设计*
  4. Machine Learning 机器学习
  5. Text Mining / Natural Language Processing 文本挖掘/自然语言处理
  6. Data Visualization 数据可视化
  7. Big Data 大数据
  8. Data Ingestion 数据获取
  9. Data Munging 数据整理
  10. Toolbox 工具箱

Each area / domain is represented as a “metro line”, with the stations depicting the topics you must learn / master / understand in a progressive fashion. The idea is you pick a line, catch a train and go thru all the stations (topics) till you reach the final destination (or) switch to the next line. I have progressively marked each station (line) 1 thru 10 to indicate the order in which you travel. You can use this as an individual learning plan to identify the areas you most want to develop and the acquire skills. By no means this is the end; but a solid start. Feel free to leave your comments and constructive feedback.

每个领域/领域都代表一条“地铁线”,各站以渐进的方式描述您必须学习/掌握/理解的主题。这个想法是您选择一条线,乘火车然后经过所有车站(主题),直到到达最终目的地(或)切换到下一条线。我已逐步标记每个站点(行)1到10,以指示您的旅行顺序。您可以将其用作个人学习计划,以识别您最想发展的领域和掌握技能。这绝不是终点。但一个良好的开端。随时留下您的评论和建设性的反馈。

PS: I did not want to impose the use of any commercial tools in this plan. I have based this plan on tools/libraries available as open source for the most part. If you have access to a commercial software such as IBM SPSS or SAS Enterprise Miner, by all means go for it. The plan still holds good.

PS: I originally wanted to create an interactive visualization using D3.js or InfoVis. But wanted to get this out quickly. Maybe I will do an interactive map in the next iteration.

https://github.com/MrMimic/data-scientist-roadmap