【观点】一文读懂21世纪最性感职业--数据科学相关专业

【观点】一文读懂21世纪最性感职业--数据科学相关专业

作者 @留德华叫兽 系美国克莱姆森大学运筹学硕士,Ph.D. Candidate,师从整数规划大师W. Adams,后跳槽至欧盟玛丽居里博士项目,期间前往意大利IBM Cplex实习半年,巴黎综合理工学术访问一季,现任德国海德堡大学交叉学科计算中心、组合优化实验室研究员,师从组合优化大师G. Reinelt,主攻供应链管理计算机视觉、(医学)图像处理。
欢迎原链接转发,转载请前往 @留德华叫兽 的主页获取信息,盗版必究。
敬请关注和扩散本专栏及同名公众号,会邀请全球知名学者陆续发布运筹学、人工智能中优化理论等相关干货、知乎Live及行业动态:
『运筹OR帷幄』大数据人工智能时代的运筹学--知乎专栏

本文首发于公众号 @运筹OR帷幄

一文读懂21世纪最性感职业--大数据科学家|分析师(上)

一文读懂21世纪最性感职业--大数据科学家|分析师(下)


利益相关:本人背景应用数学、运筹(优化)、机器学习,目前主要做图像处理及供应链方面数据分析的应用,和目前大热的Data Science渊源比较深(优化、统计、数值计算等基础学科乃数据分析、数据科学的底层模型和算法),希望对想入门数据分析/科学的朋友们有点借鉴作用。


本文提纲:

  1. DA/DS专业的由来
  2. MS in Data/Business Analysis
  3. MS in Data Science
  4. 运筹学与DA/DS的关系
  5. 学术界经验小结
  6. 几个DA/DS招聘广告的剖析
  7. 工业界实战经验


前言:

市场有专业人才需求,学校便应需开设相关的专业

数据分析(DA),特别是数据科学(DS),是2007年才涌现出来的新兴学科(后者2013年

与之密切相关的其他专业:

  • Business Intelligence/Analytics/Informatics (商业智能/分析/信息)
  • Decision Analytics/Science (决策分析/科学)
  • Information/Quantitative Management (信息/量化管理)
  • Information/Management Science (信息/管理科学)

以及更为specific的专业:

  • Strategic Analytics(战略分析)
  • Risk Management (风险管理)
  • Health Data Science(医疗/健康领域的DS)
  • Data Science for Public Policy(公共政策领域的DS)


大公司职能分工较为明确

而小公司,通常一人身兼数职

当你去问小公司的data scientist,数据科学到底是什么

或许他也答不上来

于他,只是一个title而已


《哈佛商业评论》报道,“数据科学家”是二十一世纪最性感的职业

所谓性感,既代表难以名状的诱惑,又说明大众不知它到底是干啥的


因此,我倾向于从其源头出发:从这俩个专业硕士课程安排的角度,来诠释他们的异同。

此外,我们会以三个招聘为案例,逐个剖析雇员眼中数据科学家职位所需掌握的核心技能。


1. DA/DS专业的由来

先看图说话:

美国DA/DS专业项目数量增长情况

上图出自以下链接

Master of Science in Analytics

汇总了美国几乎所有开设DA/DS专业的学校

而第一所开设DA项目的大学,是2007年的北卡州立大学的 M.S. in Analytics

从2011年开始,呈现井喷之势

下面切入正题,以俩所学校DA/DS项目的背景介绍、Learning Outcome和课程安排,为大家剖析俩者异同。


2. MS in Data/Business Analysis

项目介绍
The Master of Science in Business Analytics (MSBA) degree is a professional degree for students wishing to pursue careers management with a strong quantitative and data analysis training. Modern management professionals and business data analysts increasingly need significant mathematical, statistical and computational knowledge to understand and manage data available to business and government enterprises, and to utilize that understanding in making optimal quantitative decisions using mathematical models. The MSBA program is structured to provide and build not only mathematical and statistical skills such as quantitative modeling, operations management, data mining and simulation and relevant computational skills such as big data, network and infrastructure management, but also business and managerial skills and domain knowledge to apply the technical skills in a business environment.
  • 提供量化和数据分析的训练
  • 现代管理和商业数据分析需要大量的数学、统计和计算的知识
  • 利用数学模型给决策者提供最优量化决策
  • 用到的技巧包括:量化建模、运筹学、数据挖掘、模拟


Learning Outcomes
  • Students will demonstrate a clear understanding of the fundamental concepts of Statistics, Data Analysis, Quantitative Modeling, Simulation, and Optimization.
  • Students will demonstrate proficiency in the practical tools and techniques of modern Business Analytics


核心课程
  • Data, Models, and Decisions:empirical (“data models”) and logical (“decisions”)
Introduces students to analytical techniques that establish the optimality of managerial decisions via empirical (“data models”) and logical (“decisions”) means. The course may be viewed as consisting of two integrated parts. In the first part, various methods of analyzing data, including regression analysis are studied. The second part covers models for making optimal decisions in situations characterized by either an absence of uncertainty or where the uncertainty arises from non-competitive sources.
  • Decision Analytics: 核心技术--优化(线性、非线性、整数优化)和模拟(不确定性)
This course explores basic analytical principles that can guide a manager in making complex decisions. It focuses on two advanced analytics techniques: optimization, dealing with design and operating decisions for complex systems, and simulation, dealing with the analysis of operating decisions of complex systems in an uncertain environment. The course provides students with a collection of optimization and simulation modeling and solution tools that can be useful in a variety of industries and functions. The main topics covered are linear, integer, and nonlinear optimization applications in a wide variety of industry segments, and Monte-Carlo Simulation and risk assessment. Application-oriented cases are used for developing modeling and analytical skills, and to simulate decision-making in a real-world environment.
  • Data Analytics:数据挖掘,其实是一些老的机器学习方法
This course gives an overview of the data-mining process, from data collection, through data modeling and analytical algorithms, to data-driven decision making. The focus is on introducing data-mining algorithms such as logistic regression, classification trees and clustering, and their application to real-world data, as well as introducing some of the more recent developments in the field such as ensemble methods.
  • Database Management Systems:教你如何储存和调用企业大数据
Provides fundamental concepts and skills necessary for designing, building, and managing business applications which incorporate database management systems as their foundation. Topics covered include the fundamentals of database management (DBMS) technology, alternative methods for modeling organizational data, the application of delivering data through Web-based and other graphical interfaces.


选修课程
Price Optimization and Revenue Management
Healthcare Analytics
Operations Analytics

不多赘述,基本都是运筹学的经典应用领域,详细课程介绍点击下方链接。


以上,请参考自马里兰大学商学院商业分析硕士项目

MS Business Analytics Overview--University of Maryland


3. MS in Data Science

项目介绍
Data Science lies at the intersection of statistical methodology, computational science, and a wide range of application domains. The program will offer strong preparation in statistical modeling, machine learning, optimization, management and analysis of massive data sets, and data acquisition. The program will also focus on topics such as reproducible data analysis, collaborative problem solving, visualization and communication, and security and ethical issues that arise in data science.

项目提供统计模型、机器学习、优化、管理、大数据分析、数据采集的学习训练。


LEARNING OUTCOMES
  • Build statistical models and understand their power and limitations
  • Design an experiment
  • Use machine learning and optimization to make decisions
  • Acquire, clean, and manage data
  • Visualize data for exploration, analysis, and communication
  • Deliver reproducible data analysis
  • Manage and analyze massive data sets
  • Assemble computational pipelines to support data science from widely available tools
  • Conduct data science activities aware of and according to policy, privacy, security and ethical considerations
  • Apply problem-solving strategies to open-ended questions

设计实验(数学模型)

利用大数据和机器学习和优化模型帮助决策者做决策

获得、清晰、管理大数据


核心课程 Data Science 1: Introduction to Data Science
The course will focus on the analysis of messy, real life data to perform predictions using statistical and machine learning methods. Material covered will integrate the five key facets of an investigation using data: (1) data collection - data wrangling, cleaning, and sampling to get a suitable data set; (2) data management - accessing data quickly and reliably; (3) exploratory data analysis – generating hypotheses and building intuition; (4) prediction or statistical learning; and (5) communication – summarizing results through visualization, stories, and interpretable summaries.

数据收集--数据管理--数据分析--预测或学习--沟通。

Data Science 2: Advanced Topics in Data Science
Building upon the material in Data Science 1, the course introduces advanced methods for data wrangling, data visualization, and statistical modeling and prediction. Topics include big data and database management, interactive visualizations, nonlinear statistical models, and deep learning.

在数据科学1的基础上,学习大数据、数据库、可视化、非线性统计模型、深度学习

Advanced Scientific Computing: Stochastic Methods for Data Analysis, Inference and Optimization
Develops skills for computational research with focus on stochastic approaches, emphasizing implementation and examples. Stochastic methods make it feasible to tackle very diverse problems when the solution space is too large to explore systematically, or when microscopic rules are known, but not the macroscopic behavior of a complex system. Methods will be illustrated with examples from a wide variety of fields, like biology, finance, and physics.

一些随机模型,应用在生物、金融和物理。

Systems Development for Computational Science
This is a project-based course emphasizing designing, building, testing, maintaining and modifying software for scientific computing. Students will work in groups on a number of projects, ranging from small data-transformation utilities to large-scale systems. Students will learn to use a variety of tools and languages, as well as various techniques for organizing teams. Most important, students will learn to fit tools and approaches to the problem being solved.

设计--建造--测试--维护--修改科学计算软件。

Critical Thinking in Data Science
This course examines the wide-ranging impact data science has on the world and how to think critically about issues of fairness, privacy, ethics, and bias while building algorithms and predictive models that get deployed in the form of products, policy and scientific research. Topics will include algorithmic accountability and discriminatory algorithms, black box algorithms, data privacy and security, ethical frameworks; and experimental and product design. We will work through case studies in a variety of contexts including media, tech and sharing economy platforms; medicine and public health; data science for social good, and politics. We will look at the underlying machine learning algorithms, statistical models, code and data. Threads of history, philosophy, business models and strategy; and regulatory and policy issues will be woven throughout the course.

算法:设计算法时的公平性--隐私--道德--偏见

黑箱算法--数据隐私和安全等。


选修课程,分为:
Approved CSE Applied Math electives
1. Quantitative Finance
2. Statistical Inference
3. Bayesian Data Analysis
4. Sequential Decision Making
Approved Computer Science electives
1. Data Structures and Algorithms
2. Machine Learning
3. Artificial Intelligence
4. Introduction to Distributed Computing
5. Big Data Systems
6. Machine Learning for Natural Language

非常“人工智能”,不多赘述,详细课程介绍点击下方链接。


此外,见页面底部,哈佛还开设另外俩个数据类硕士项目:

以上,请参考哈佛大学机械和应用科学学院提供的三学期制数据科学项目:

Master of Science in Data Science--Harvard University


4. 运筹学与DA/DS的关系

为了凸显运筹学与俩者的紧密关系(底层模型、算法),费劲千辛万苦找到下图

MIT商业分析专业由MIT运筹学中心鼎立支持
  • The program is tailored for current students or recent college graduates who plan to pursue a career in the data science industry.

其次,可以看到,虽然此项目名字叫商业分析,但是明确说了,是为了学生能够在数据科学行业找到工作而量身设计的。


以上参考MIT管理学院商业分析硕士项目:

Master of Business Analytics--MIT


5. 学术界经验小结

因为从本科到博士一路交叉学科的缘故,关于数据科学、人工智能、智能供应链,以及数学院、商学院、计算机学院、工业工程系相关专业的课程设置,都有所涉猎。

一些心路历程:

想转专业数据科学(机器学习)需要学哪些课程?

  • 今天列举的所有专业,都不分家,没有明确的界线
  • 粗看,数据分析属于“传统”领域,运筹学、统计为主要数学方法,分析数据和决策为主
  • 数据科学属于“新兴”领域,大量加入了机器学习、人工智能,以及收集、清洗和管理数据
  • 俩者都需要数据库作为储存和调用数据的媒介
  • ERP企业管理软件,是其“产品”-- 对应专业“企业管理信息系统”

更多专业的分享,敬请关注:

『运筹OR帷幄』大数据人工智能时代的运筹学

运筹学/控制论/随机优化爱好者,欢迎加qq群:686387574

人工智能爱好者,欢迎加qq群: 685839321


6. 几个DA/DS招聘广告的剖析

波士顿咨询招聘数据科学/分析家

可以看到波士顿咨询的这个职位,直接把DA/DS放在了一起。

期望专业:计算机、数学、统计、机器学习。

技能树中,明确强调了机器学习、优化、深度学习。

没错,咨询公司也在大量招募数据科学家。


德国连锁超市NO.1招聘资深数据科学家

同样的,把DA/DS摆在一起招聘。

期望专业:基本同上

技能树中,明确强调了机器学习、优化、数据挖掘和数据库。


这家公司很“聪明”,直接把期望的技能(机器学习)写在了Job Title里面。

并且,把咨询和数据科学家联系在了一起。


DHL公司招募数据科学家

同样的,DHL把数据科学家按技能要求细分了:运筹学、机器学习、统计。


小结:

招聘市场上DS职位一瞥,相关专业竟然没有DA/DS/BI专业?

由此可见这些新兴专业还没有完全普及,用人单位为了扩大覆盖专业增加候选人数,“不得不”加进了传统的数学、优化、统计和计算机等专业。


以上职位链接:

Analytics Associate/Data Scientist--The Boston Consulting Group (BCG)

Senior Data Scientist (m/w) Advanced Analytics--Lidl in Germany

Data ScienceSenior Consultant Data Science and Machine Learning

Search Results for data scientist--DHL


7. 工业界实战经验

以上绝大部分乃学术界角度的分析,关于工业界实战经验:

有幸邀请到

  • 斯坦福大学决策科学与风险分析博士
  • Google全球商业运营/分析高级经理
  • 运筹学/人工智能创业公司联合创始人兼首席产品官

王曦博士与大家在知乎Live中座谈

他会用IT巨头及运筹学创业公司的第一手干货经验,为大家深入剖析大数据决策中暗藏的玄机,并分析数据科学/分析、决策科学、商业智能的异同。

敬请大家分享、扩散,并参与到知乎Live中来,与王曦博士(以及我)实时线上交流。

北京时间【3月3日20:00】

从数据与决策科学到智能商业决策 www.zhihu.com图标


相关参考专业:


如果你是运筹学/人工智能硕博或在读,请在下图的公众号 @运筹OR帷幄 后台留言:“加微信群”。系统会自动辨认你的关键字,并提示您进一步的加群要求和步骤,邀请您进全球运筹或AI学者群(群内学界、业界大佬云集)。

同时我们有:【运筹学|优化爱好者】【供应链|物流】【人工智能】【数据科学|分析】千人QQ群,想入群的小伙伴可以关注下方公众号点击“加入社区”按钮,获得入群传送门。

学术界|工业界招聘、征稿等信息免费发布,请见下图:

编辑于 2019-07-27

文章被以下专栏收录